Build successful end-to-end machine learning systems, Eugene Yan - The Data Scientist Show#15
By Daliana Liu
Summary
## Key takeaways - **First rule: No ML initially**: The first rule of machine learning is to try solving the problem without machine learning first, using simple SQL, rules, or statistics to get 80% of the way in 10% of the time, then test customer reaction before investing in complex models. [14:33], [15:26] - **Ask 'why' to uncover true problems**: When stakeholders request boosting fulfilled product ranks to reduce complaints, repeatedly asking 'why' revealed the real issue was poor forecasting underestimating demand, not ranking. [17:02], [18:20] - **Build UI prototypes to excite stakeholders**: After building an image search prototype with Flask UI where users upload images to find similar dresses, the boss and business got excited because it felt real and responsive, securing buy-in and GPU resources. [22:18], [24:40] - **Timebox ML projects: feasibility to production**: Iterate ML projects with feasibility analysis (2-4 weeks), proof-of-concept (2-3 months), then production (3-6 months), calling off if data or performance doesn't meet needs after minimal investment. [38:06], [41:15] - **Write one-pagers to anchor intent**: Before starting, write a one-pager defining intent, deliverable, success metrics, and constraints; refer back during long projects to avoid scope creep from feedback or shiny new ideas. [01:04:16], [01:05:02] - **Behavioral tests encode domain sanity**: Test models with behavioral checks like smoker-to-non-smoker lowering lung cancer propensity or male-to-female changing breast cancer risk appropriately, ensuring trust beyond standard metrics. [51:15], [52:44]
Topics Covered
- Communication trumps technical skills
- Solve without machine learning first
- Build UIs to sell ML
- Own projects end-to-end
- Writing clarifies ML thinking
Full Transcript
the most effective data scientist the one that i cannot do without and the one that we discussed is the people who can communicate i was really surprised because i thought hey everyone can talk everyone can write
yeah what is the big deal about communication hello everyone welcome to the data scientist show today we have eugene yen
eugene is a applied scientist at amazon he's a part of the book organization where his mission is to help customers discover and read more books
as an applied scientist he designs builds and operates a machine learning system that serves customers at scale previously he led a data science team at
lazada which was later acquired by alibaba in his free time he writes and speaks about data science on his website
eugeneyen.com with 2050 subscribers today we'll talk about his career journey and his best advice to build a successful end-to-end machine learning
project make sure you stick to the very end where he shares his secret thoughts in his career development and productivity welcome to the show eugene
thank you for having me here diana um so what's your career path towards working with machine learning how did you get into the field yeah it's a
it's a pretty interesting path um and i don't recommend other people try to replicate it um so what happened was i graduated with a degree in psychology why i did that because i was really
interested in how people think how people perceive and how that causes them to behave but right out of school i wasn't sure what to do so i actually joined the government as
an investment analyst um but two years in or about one and a half years in i got i re i started to really miss
working with data again so i did a couple of sql courses online from school i had my r and statistics training and experiment training
i did a few interviews and i was really grateful that ibm decided to give me a a role as a data analyst so i started as a data analyst uh initially at the supply chain
center and i was also working on a bit of social media analytics a year in uh they offered me a more permanent role as a workforce analytics part of the
workforce analytics team so what i was doing there was i was trying to forecast what jobs were going to be in demand in future and at that point in time it was things like cloud ios mobile developers etc
so to tie that in uh we actually built an internal job recommendation engine uh to try to recommend people jobs within ibm to you know to get people from more
obsolete skills to try to transfer transition to some of these more more recent skills and two years into lazada
funny story i i did a cargo competition with a friend we did pretty well uh we were in the top three percent and someone just said you know why don't you just present it at a
meet-up yeah um and and we did i mean it was a young meet-up i was happy to contribute uh every contributor and it turns out that there were some people from this startup
it's called lazada that were there and i was presenting about my solution for product classification and lazada being an ecommerce startup they were really interested in that as well
so someone there heard about my solution for product classification i got invited in for lunch when we went for lunch it was a dinky i think two or three story building very run down
and the team was only like two or three people and again i presented how i did the product classification in the cargo competition and then they asked me hey you know you have this you have uh
product titles yeah it's in multiple languages including vietnamese indonesian malaysian how would you do it for us so i i spoke about my solution and i think
then we went for lunch and i got an offer for that and at a point in time was a startup i'm not sure i would have gone uh so it
and my parents were and my girlfriend now wife were also suspicious like hey you know you are you really going to live i leave ibm yeah you go to this startup we've never
heard of it at that point that no one has heard of it but at the point that i was yeah let's let's try it i'm seeing some young i mean even though and you know for listeners out there you might not be young anymore but if you haven't
done a startup before i still encourage you try it uh so i did it anyway and it turned out to be very well uh it's super fun uh i learned a lot we made a lot of great friends we still have a lazada
data alumni chat group it was a great fun um and you know after i joined a few years in we got acquired by alibaba and then i joined after that i moved on to a health
tech startup again uh in southeast asia but i think the industry wasn't quite ready for that so then after my wife and i we want to get out of our comfort zone so we started looking for opportunities
outside and one of the opportunities happened to be amazon and seattle so here i am part of a great team where i mean the mission really attracted me
which was to help people read more and the leadership principles really appeal to me which is the we're starting with customers and then delivering results uh this is the last leadership principle we have two more now but previously there's a lasting
principle so here i am yeah yeah thanks for sharing that definitely want to ask more about your the lessons in your career and what factors helped along the way but
let's start with your day-to-day job right now so what do you do every day as a machine learning engineer well i think um this this would uh differ across different projects and across different
roles um but at least for me the bulk of it is actually span uh design and implementation so what does design mean so design means that hey you know you
get an ambiguous question you're going to ambiguous requests and you sort of need to try to talk with the stakeholder okay you want me to let's say maybe the stakeholder you want me to classify
products okay does it need to be real time does it uh can it be in batch um what are different languages what kind of uh input will you give me what kind of output do you require what kind of uh
performance do you require is it top three accuracy top one accuracy is it recall position so that's really framing the problem and after you frame the problem it's researching what other people have done
i mean product classification is not a not a new topic uh companies like walmart amazon multiple companies have have published however they try to do it and then after you've come across this
multiple solutions maybe you prototyped it um you write a one-pager and you get all the metrics online that's when i recommend you write a design doc so that's what we do in amazon i think we
call that six pages um where you talk about the context what the deliverable is as well as how you're gonna solve it and as you're presenting how you're gonna solve it right uh you share this with
people and people are gonna give you feedback can this be simpler can you reuse this have you thought about the cost can you lower the cost etc so that's the design phase and actually that's a big chunk of the design phase there's a big chunk of it then of that
is the implementation phase okay so your design you've gone through multiple rounds of feedback people seem pretty happy with it it's as good as it can get with things on paper then now you have to start to write some code and start to
see how things go so that's the implementation phase you know it usually involves things like spark pipelines you know writing data pipelines building machine learning models
deploying machine learning models either on sagemaker in batch or running just running them in spark and then you know once your machine learning model is live you have to put in monitoring for example hey you
know if it needs to be real time and the the errors are on you you have a more alarm for how many errors you get per minute and if it exists a certain threshold you want to cancel that are you you want to
be alerted and then you know latency metrics and throughput metrics so that's what i mostly do for design and implementation and of course a big part of it is uh communication right i mean
we have daily standups within our team and within the science scientists in our larger organization we have uh every once every two weeks we you know
go through a paper together um so that at least we have the shared context um hey you know this this is a paper that we someone found interesting they present it to us and then we try to think hey you know can this be applicable what lessons can we draw from
this that we can apply to work also uh once every two weeks or whenever there's a new project we we take our time so i i just shared about how i would write a design document and share the team right
uh the same thing people will write design documents and it is uh i i have to read it and provide feedback so that there's quite a bit of communication
going on as well yeah so basically your day-to-day is depending on which type of uh what kind of project cycle you're in exactly uh early in the style project cycle is gonna be heavily designed um
and in the later part of the project cycle it's mostly implementation and operations right and uh i love that you shared some tools you use like sage maker spark can you um
give us more insights about the tools you use sure i think for big data processing for etl i think okay so for small data processing um you can't get away from a
sequel so for everyone who doesn't know sequel yet oh it's not very good i highly recommend it and for big data processing i think most companies that i'm aware of are slowly converging to
spark spark or maybe some snowflake or dbt so within in my team we use spark a lot uh we spark a lot and you know we have a nice
uh internal uh approach whereby you know it includes scheduling dependency checking for machine learning models i mean i tend to favor pytorch
and i also rely a lot on my trusty decision tree libraries including xgboost and like gbm i i love those they're really fast to train and they get pretty good bass lines right off the bat but you know
sometimes for image stuff or tech stuff um deep learning is still better and then so maybe one thing i like to do is after i build a model i like to
demo it i like to make it real for our product managers or stakeholders when that happens so i try to build something with i try to hack together something with fast api which
is beautiful or you might also use flash previously i use flash i've seen move on the fast api um and you know just some simple html and css maybe a job real javascript that helps make the machine
learning model very real right they can see they can interact with it uh so that's a big plus i think besides that if i need more infra i i mean clearly obviously i use
sagemaker um so if i need more infra than my then my macbook pro um i need more cpu cores i need more ram on any gpu i just used to i just turned the sagemaker and stage maker um through
jupyter notebooks just makes it transaction maker drip in a notebook makes it easy to spin up a siege maker training instance or whatever and just use the gpu or whatever infra i need so i think those
would be the bread and butter tools that i use i guess a couple of tools they also use i mean jupiter i think everyone uses jupiter 99 of data science i know i also use ml flow for tracking experiments i i love that and i think
that's about it cool thanks for sharing that so now um let me ask you this if you have a new problem especially when it's unfamiliar
how would you approach it if you want to you know try to solve it using machine learning yeah i think i would start by clarifying the intent
so for example someone comes to me and says that hey i need to improve improve conversion on the website right and you know improve conversion what is
your intent is your intent to sell more units is your intent to get more new customers in your is your intent to increase revenue uh for for all this you require a different kind of model right if if you
want to sell more units you can sort of lower the price or you can recommend lower lower price goods you would sell more units but you wouldn't increase revenue if you want to get more new customers that's more of an emailing campaign or marketing uh marketing
campaign kind of thing so first you clarify the intent then after that you discuss about what is your desired outcome right and i just spoke a bit about that hey you know we want to increase uh we want to get more new
customers by maybe 10 percent or maybe sell more units by 10 et cetera so that defines your your metric and then after that there'll be certain constraints right uh like what you cannot do so if i want to sell more
units they could say yeah i want to sell more units and okay then can i lower revenue oh yeah you can lower revenue by five percent as long as we sell more units by ten percent so so that would be
it so once i finalize this i would write it down in the document i write it down into one pager and then i will circulate it and get people to agree on it right get feedback get people agree on it and once that's
done that's our roadmap for the next two or three months or however long the project takes this is what we're going to try to achieve uh and then and this is going to come sounds uh become weird from a data
scientist what i would first try to do is try to solve it without machine learning yeah so for example if i had to build recommendation engine my first baseline would be just a popularity
baseline i would just recommend what's most popular yesterday and you know you can segment this multiple ways what's most popular yesterday for that category was most popular yesterday for this gender was most popular yesterday across
everything so across everything recommended you put on the homepage for the category you put on category page on the detail page of the of the item itself and then once i have this baselines ready i would
then try to improve on it see if i can improve it with some basic machine learning or even more advanced techniques if i have to and once i have this prototype ready i would then go into the
design dock right which is writing a design and getting everyone and getting feedback before i put into production yeah um and i love that you mentioned uh the
first rule of machine learning is try to solve it without using machine learning can you share more about that yeah i think uh i i don't think i was the first person to say this i think i've seen it
multiple times i think google's rule of machine learning and that's that's also the first rule in google's rules on machine learning i think what i see
is that a lot of times um in my previous teams is that when someone tries to solve a problem they immediately jump to what's the latest and greatest whatever
was released at a conference this week or whatever was released in archive recently and what ends up happening is that a lot of times the implementation for this is not clear it takes a lot of time maybe
six months nine months whereas a very simple a very simple non-machine learning approach maybe just some sql and some rules and some
statistics can maybe get you 80 of the way there in like 10 of the time and so what ends up what what is great about not using machine learning or and just a sql statement is that hey you
know you can very quickly launch something and test your customer uh reaction to it do customers really enjoy this feature do they really like it and if they don't that's great you just it could be that
whatever you build without machine learning is just not good enough or it could be that hey customers just don't need this feature they don't even notice it and if that's the case you would have saved whatever time that
uh you would have taken with if you had gone down the machine learning route so i think that that's that's um what i recommend yeah um i totally agree with that and
previously you mentioned when you approach a new problem you always ask the customer or stakeholder what's their intent and sometimes
from my experience they don't know what their intent is they have a vague idea about what they want so um how do you work with business to
identify and define problems suited for machine learning um especially when they don't know what they want i think that's a very difficult question um
i think that it helps okay so i can share with you my mindset yeah when businesses come to me with requests so my mindset when business has come to
me requests is that i have too many things to work on i'm lazy and whatever estimate i have in my mind to solve this problem i want to solve it in only half the time or even a quarter of the time
so let me show you an example um in the previous row there was this uh stakeholder who came to us that said that hey you know off the product ranking on the website could you
increase the rank of products that are fulfilled by lazada it's something very similar to prime right so i thought hey you know that makes sense i immediately came to assumption oh of course that makes sense we want to promote items that are
fulfilled by our company right you know to get more sellers to sign on uh that was incorrect assumption so i asked them why why do you want uh to increase uh rank of this uh products that prefer balazara
they said because uh products that are fulfilled by lazada are delivered faster and okay so why do you care about products uh why do you want to increase the rank of products that are delivered
faster right again i thought it makes sense you want to help customers get their goods and their products faster right ask them why do you want products that are delivered faster because we get less complaints
so that was the key thing because it's delivered faster the ops team gets less pressure from the complaints so and why are these complaints happening these complaints are happening not
because customers are not buying fulfilled balazara these companies are happening because our forecasting algorithm back then was underestimating a lot so from this you can see that the problem
is the right way to solve this is not to change the ranking algorithm to increase the rank of products that are fulfilled by lazada but the right way to solve
this is to improve our forecasting uh methodology yeah so that's how i try to i try to work with customers to
understand it can be solved sometimes the problem cannot be solved by machine learning yeah so let me give you an example um what's an example okay uh
there was an example in the past where uh actually i can't think of a problem that cannot be solved by machine learning right now yeah most problems i think can be
sufficient okay let's say that there's a problem there's a problem in the past whereby uh i'm just gonna make up a fictional problem and maybe actually can be solved by machine uh in the past maybe we had a
lot of customers signing up for new accounts to get a new voucher right so every time you you sign up for a new account you get a new voucher um new voucher account so they say hey you
know can we use machine learning to solve this problem i was like hey for new customers and brand new accounts we have absolutely no data on them at all
how are we able to solve this problem and then i ask them do you have any labels on this do you have any examples of new accounts they're actually fraudulent new accounts and they don't
have that so in this case we don't have data on these new accounts and we don't have labels on what fraudulent such foldulan accounts are so we are not able to solve it rare machine learning yeah
on the other hand a very simple you can solve this very easily very very simple ways such as device id for example the same device id
um if used to uh create two or three or more than more than one account should not be given the uh new signup voucher so that that's a this is a problem that you know it's easily solved by
non-machining approaches maybe just rule-based using device id and machine learning approaches because you don't have the data and the labels is very difficult yeah so that's how i would work with um
stakeholders to sort of distill and frame the problem better and define the problems more so that for machine learning um but there are times where you know it's not the stakeholders
coming to you the problems it's us going to stakeholders to get sponsorship for something we want to solve for example so for example in the past in lazada it would be
trying to do something with the images that we have because we are not using it right now in amazon it will be you know trying to develop our first real-time recommender that
they can be contextualized so let me give you an example of how i tried to do this in lazada and maybe we can try to draw some lessons from it so i think back then in lazada we had a lot of product images
but we were not using it so i i i went to my boss he's a great guy he's really uh encouraging of his team i said hey no john we should try to do something about images i can do some product
classification or etc and improve our classification models he's really nice guys eugene i'm i'm pretty sure i think that we might not be ready for it yet i think our team doesn't have the
capabilities uh i was undeterred so i spent the next i don't know three four months on weekends learning about image classification computer vision and you know transfer learning transfer learning was a great
thing uh big thing that back there so i learned a lot of transferring i did transfer learning on resnet and vgg and very quickly i was able to train a pretty decent uh image classification
model but you know that was not quite sexy enough uh and along the way i learned that hey you know i could take this image classification models take the embeddings and try to build image search
uh okay so at that point in time i was convinced okay image search is sexy enough to get people to want to do this um but i was not quite done i had to build a ui right i had to build a ui where people could upload their own
images find similar images it had to be fairly responsive so at the point in time that was when i first picked up flask learn some html and css build some built some nice prototype
so once that was done after three or four months i was ready i took my laptop i went to show my boss i was like hey you know john i built this pretty cool thing uh you can upload an image of a dress and i can find you similar dresses
and it's like holy crap why didn't tell me you're doing this i said i told you i told you that i wanted to do this you didn't think i could do it i said
yeah then he was really excited about it he he brought me to demo that to the business and the business is really excited about it because fashion furniture um those are really
big aspects of our business and you know image search or image recommendation i know fashion is very aesthetic based right you want things that look similar so that that was a very big cool big big reason big feature for the
business to get excited about it so they're excited about it and that's how uh they agreed hey you know you should do this and i really wanted to do this because i want to help my team get gpu
capabilities at any point in time we didn't have gpu and the only way to solve image classification problems or you may search problems properly was with gpu so that's how we got it so i
think how would you pitch it that is that is a story that's an anecdote um i think there are two key principles i think the way to pitch it is to not start from hey we're gonna use
this latest and greatest technology we're gonna use gpu yeah we're gonna use tensorflow uh are we gonna use this latest and greatest image classification model because most people wouldn't understand that and
the business wouldn't really care about that truth be told but the way to approach it is to go with it from a cust again starting from the customer what would the customer want and how can
we help them how can we solve it for them and usually if you're solving problems for a customer you're you're you are providing them with new features the business tends to benefit
as well so i saw it as i want to build this image classification so people can find easy clothes easier to find toys or furniture easier the business saw it as hey if we have this image search or
image recommendation we'll improve conversion so that is a that's a win-win so approaching the business i think the second thing that makes a big difference and i made a mistake on this before is
to build a ui so previous um and maybe you can talk about that later but yeah once you have a ui once it's very responsive it's on their fingertips and the business can play with it they just
get very excited it just it feels real right even though the u the prototype is just a single it runs on a single machine it's not scalable at all but once you let them play with it it just
feels real as compared to oh you know here's the chart of our image classification accuracy metrics here's the graph of the learning curve here's what you can do in a slider in the document it just doesn't feel real to
them but once you have a ui it just feels oh my goodness it is so real unless i want this on the website right now um well it won't be right now it'll take maybe three or six months but at least they have you get them to commit
that let's do this so two main things um get them really excited about it and allow them to interact with it which helps them get really excited about it yeah i really love the story i have
similar experience when our stakeholders don't believe in the capacity of machine learning and we have to quickly build a proof concept type of model i think it's
very important to show them something tangible that they can play with so um exactly like you mentioned if i just tell them hey i built this machine
learning model 99 accuracy they still don't really have an intuition of what is the value you're gonna bring to the business but to build this prototype you
don't need to have a very high you know accuracy or other performance metric maybe you have 70 or 80 percent confidence but
you make it into a product from a machine learning model to something people can interact with i think that's very important and i think that going back to
a blog post you wrote you said a data scientist or machine learning scientist should be more end to end and i think your story about you know building api is very inspiring so can
you talk about why you think machine learning scientists or data scientists should be more end to end sure i think the reason i wrote that post was that
i think i saw a reddit thread um that had you know almost 10 different data science roles that's like the product data scientist who's doing the analytics that's the machine learning scientist who's doing the machine learning there's the machine
learning engineer there's a data engineer there's a data pm that's the actor and and i was really confused i was like how how would this team work anytime you want to build a product how
many people would you need in a room and anytime as a let's assume that i'm applying scientists i want more features i want more data features i would have to talk to the data engineer how long
would that take so i think that it really fragments everything and i felt that it it firstly you reduce the context for example me as a data scientist if i'm
and i've seen teams that are like this where the data scientist is not allowed to actually do data engineering or even do the feature engine yeah so if you're not allowed to look at the data itself
how do you know what data can be useful for customers for your machine learning model if you don't know or are not able to explore the data how you know what trends are used for or
create to create features or you know whether it will work or not so i think that's one thing and without the context so that's one thing where you don't have the context of the upstream you don't have context of the data then that's the machine
learning engineer imagine i'm building a model and passing into the machine engineer and maybe the machine learning engine has certain constraints uh he might have a constraint that you know your model has to respond in 40
milliseconds and if you're not aware of this you'll build a big ass transformer a bird model that takes i don't know a second to respond you don't know how to scale it how would he how would he put
it into production that would never be used i think so i think this context is really important if you really want to build useful products that gets used by the business and that
really impacts customer lives and you know like i mentioned i think let's say we put something in production there's a bug right now and we knew the three budget i would need a product scientist on the product scientist you know who
understands the dashboards i know a data engineer who understands that there are pipelines i don't need the ml engineer who knows the alarms who probably is the on-call and only the data scientist or machine learning scientist who's actually building the model yeah only
all these people in there and you know it's so difficult to coordinate shadows and you know how how to triage and debug it it's just a headache and i so think that the main thing
i think why their scientists would be more entertained i think it's a it's really a concept of ownership and accountability uh far too often i've seen it that i i so when i was talking to my peers who
also leaders in other teams uh i've seen this this approach whereby they break it up with nice interfaces the data engineer passes the data to the data scientist
via the data warehouse interface and the data scientist passes the model to the ml engineer via some kind of interface it could be just r model or r script and ml engineer
has to convert it into java or whatever c plus plus um and that just doesn't work right and there's so much just lost in translation and i've come to i'm aware of cases whereby you know the
model doesn't work in production um and then things get passed around who is responsible for it did the was it an issue is it an issue with the data drift for the data engineer uh
would the offline data and the online data doesn't match is it an issue with the machine learning model and the machine learning aspect whereby you know when you're doing your train test train validation split you don't do
it properly does it reflect production or is it a problem an issue on a machine learning engineer side where you know uh the trunk the code doesn't translate translated properly and it just gets passed back
and forth and i i just don't like that i think that that's a that's an impossible way to work so i think so a lot of people read that
post and they kind of misunderstand that i want all data scientists to be full stack whereby you can do everything end to end that's that's not a that's not an intent
i want data scientists to have more ownership not to have more ownership to own a project but to have more ownership in the sense that they are responsible for everything end-to-end from the start
of the problem the start of the project to after it's delivered and even when something happens when it's delivered they should feel that they have responsibility and it's not oh i finally off to someone it's someone else's problem right now
um so that's why i think it should be it should be end to end yeah i think it's very important to differentiate ownership from doing everything on your
own right because exactly like you said i also read a blog post from chip when she said data science shouldn't do kubernetes i don't think that's the absolute
uh that was her purpose i think what she meant was a lot of data scientists get into data science because they're passionate about finding patterns in the data driving insights and they're not so
passionate about the engineering pattern you don't have to do that but i think it's also very important to remember to take ownership of the entire pipeline if
the model the model you developed you're passionate about fail in the production you don't have to be the person to fix it but you need to go
talk to the engineers to brainstorm with them be in the meeting and make sure you put on a product management or program management head to make sure the model
you spend a lot of time to build on will actually be delivered and drive business value yeah i fully agree on that i think um if you think about it the time spent to
build a machine learning model or build machine learning system is very little compared to the time spent to maintain and operate it for the next couple of years
and that's when the effort is is really really comes in and if people take more ownership their scientists take more ownership that effort can be greatly reduced yeah
that's a great point so you mentioned design building operating machine learning system is a big effort so who exactly do you usually
collaborate with and how do you scale yourself usually i collaborate with the downstream folks that would be product designers product managers
and engineers so product managers would really uh they would share oh you know here's maybe we're gonna create a widget let's just say a random widget or uh review some uh we're gonna show reviews
whether they're phone or printer manager might say that hey you know for this widget we're going to show the number of stars the item has the price etc
and they might say hey you know we don't want two products that are too similar to be each other because they want some kind of diversity in that uh so okay you would you take in this this considerations and you try to
build it because let's say if you were trying to build a recommender and someone goes to find harry potter and you know the most relevant book will be harry potter series one two seven and eight or
uh but that wouldn't meet their intent of diversity right so you might want to lay it in um and engineers i work with engineers a lot because as much as i would like to do it all
myself uh you know set down infra with aws cdk or sending alarms i would take five to ten times the amount of time that a decent uh engineer would take so
i i work with them to try to get them to do the stuff that i'm bad at uh such as setting up infra or you know maybe uh systems that involve java that
uh i'm just not not as uh i'm just not as effective or efficient with them so mostly working with these two these two bunches of people so previously we talked about working with stakeholders asking the right
questions and now you're actually building a model that someone at the other side of the ui will use it but
usually data science are several steps away from the users relative to product and ui so how do you maintain empathy with your end users so
i realized that in my career i work in a very specific field of data science specifically i work in b2c data science whereby ah i'm not a consultant i set for a
short stint in ibm i'm not a consultant i don't build things for enterprises i mostly build systems machine learning systems for users and i really enjoy that uh so a big because i it makes me makes it feel really
meaningful and i feel there's a lot of impact one big benefit of this is i get to dog food my own product so if i'm reading recommendation systems right now i tend to browse and search
for books a lot on amazon and i get to see hey you know this recommendations oh it's very personalized it doesn't consider context or this context is uh it considers the context but you know doesn't take into
account my personal interests or maybe things like oh you know this book is uh has very low reviews why is it showing up uh et cetera so so that is there's one thing also uh in my previous roles and even
now in amazon we do a lot of user studies so user studies would be um they could be quantitative like we sent out service uh get a couple of a few hundred people to respond or it could be very
qualitative where someone trained a ux researcher is there interviewing them and we also do some some really interesting stuff like maybe we develop a prototype of our widget or
of our of our app maybe our app redesign and we get users to try to use it for the first time while they are videoing themselves and we get to see their uh real expressions of course this is
really expensive um we don't do too many of this but it really gives you a sense of which features spark joy and which features confuse
users so that's one way of using it as well and of course i look at it through the data right i think um so let's say we're trying to figure out maybe okay do
users read a book completely how many users stop reading a book which books are popular right so i mean being from kindle i actually know reading behavior and i know maybe if the user
gives a book halfway maybe they're not that interested in it as opposed to they finishing it and of course maybe in other aspects uh maybe for example let's say on
on a phone detail page maybe let's say in uh in the previous row on the phone detail page the question was on an iphone detail page should we recommend other phones
or should we recommend accessories well we can actually look at this from the data when people browse for phones do they browse other phones or browse other accessories
at what rate and what actually leads to more purchases and based on that we can actually learn about what users need when they're browsing for an iphone do they actually want comparisons or do
they want accessories so that's useful okay and uh so now you understand the users um when we think about launching the
product in production how do you iterate machine learning projects i think a time box approach i'm quite a fan of um agile instagram
so i take a timebox approach where i try to imagine hey what's what do i estimate this to take and i try to half the amount of time i i need
so the iterative approach actually there's three main steps so the first step is a feasibility analysis so let's say someone comes to me of a problem uh maybe the problem i don't know just take
any generic problem i'll say okay let's define a problem define the metrics define the constraints and i will go back and i'll look at it on at the data do we have the data to solve this so
let's let's take the previous example are you able to identify voucher fraud from first-time users in that case i don't have the data maybe after a week of of trying to find it i would tell them hey you know we actually
don't have the data to be able to solve this problem and then we'll stop ins and we at that point in time we'll just cut our losses and say you know this problem can be solved maybe you can try to start collecting data now and go back to it
but if i had a feasibility analysis we decided hey we do have this data i would have a check-in with the stakeholders again okay identify that we do have this data we do have these labels that sort of proxy for whatever
business objective you're trying to achieve then if if everything goes well we agree and i move on to the proof of concept stage so proof of so this visibility analysis stage takes about any time
between two to four weeks and then the proof of concept stage i would you know try to use the data we have with a proper train validation split to
try to solve the problem so let's say the stakeholder wants to have this sort of an accuracy of 95 and let's say i'm only able to solve it um to an accuracy of 85 percent uh maybe
every two weeks i'll check with them 85 percent is it uh usable in production um if sometimes it might be you know just put in production first and try to improve on it sometimes sometimes i
could get hey my accuracy is only 65 percent and it's way too far and there's no way i would ever close the gap with that existing data so that way at the proof of concept stage maybe it usually
takes about two two or three months uh in proof of concept state we just say i don't think this is solvable with the data to the level of uh performance that you require and we
might just call it off and that's okay right because after feasibility analysis which takes about a month at most after proof of concept it takes about two months uh we've only spent three months on it and we learned something we learned that we don't have the right
data to be able to solve it at a performance metric that we need we tried it let's move on but of course if the proof of concept works out well maybe they wanted 95 accuracy we've got
91 92 is good enough we then move on to production right which is which probably takes about anywhere between three to six months you know hardening the code putting the code
in production with proper uh with proper testing validation um you know setting up the infra et cetera so this is a this is the iterative approach at any point in time after feasibility
analysis or after the poc we can always call off the project and say that you know we're not just not able to solve this machine learning right now or solve this with the data we have right now and that's okay
right yeah so that's how i would try to iterate a problem from baby steps from the basics from the data to the machine learning and then to production yeah thanks for sharing that now
how do you measure the impact of your project and can you share some project that had a great impact yeah i think this really defines uh depends on the definition of impact yeah
i think for the business uh the definition of impact would be dollars right like maybe it would be conversion or number of uh items you sold or number
additional revenue from your widget so that's one for me personally my definitional impact is uh how i help customers or sellers so let
me give you an example in my previous role i was working on ranking products on our website and it was pretty successful i mean i had a roadmap i want to improve my ranking algorithm and then
yada yada yada so i i built something that improved the ranking program we improved conversion and revenue significantly i think like by 5 to 10 or 15
but that was that was the first item on the roadmap right to to give something a business but then the next two items on the roadmap were what i were the things that i was really interested in which is the next item
was um how would i uh introduce new products to customers so new products would be things like um you know sometimes new brands get onboard to our website or new
sellers come on and you know in e-commerce there's a long-term effect which is something like 20 of sellers account for 80 of sales so the small sellers don't really um can't
really grow so what i wanted to do was to introduce new products in a smart way that doesn't affect conversion too much um and i did that i mean we found a very our team found very smart way to do that
and it was very good we improved the engagement on new products and new sellers by 5x that's 500 so you can think about it we are giving these new sellers traffic to help them
grow and then the next one and we didn't reduce conversion at all and then the next one uh was improving the quality of the product so improve the quality of product would be things like dowel ranking very
popular product but is very high very low rating or has very high return rate so imagine if you are a new customer on our platform you look for a dress and you
know this dress is ranked right number one and it's very popular a lot of ratings but it's bad dress uh we know that seller actually delivers something of very poor quality not like the images and you get it and you would never come
back to our platform again right you're just so disappointed by the experience i i want to prevent that from happening uh so what i did was i proposed that i'm going to actively down rank poor quality
products based on what we know uh based on statistics like high return rate low ratings high complaints or fraudulent there's a mismatch between what they show on the website and what
they uh what the customer gets so we downrank that uh and we we have again i we cut return rate by half oh wow that is um if you ask me that is amazing it means
we have reduced the number of unsatisfied customers by half yeah uh so that to me was very fulfilling and we made a post they made a poster of that and today i still have that poster
um so what i found was that what is really impactful for me is um helping people um so for example i mean we get
increased conversion and revenue by i don't know tens twenties millions i don't know how many millions uh but that didn't i didn't get as much of a high from that from helping sellers
grow and and from uh helping customers reduce their return rate so i would say impact really differs from per person to person yeah so now after you
shift uh have shipped to your machine learning project and you made impact um if you want to keep it in the production how does you
and your team operate and maintain the model there are few things to think about i think the first thing the first thing the most basic thing is to have your monitoring and alarms
so if you think about it this way you have a modern production and all of a sudden the website is not calling the api correctly and all the calls are erroring out if you don't have a monitor
on the on the percentage of alarms it will never actually be seen by customers and you will never actually know so it's important to have these alarms on you know uh the error rate the the latency uh and the throughput for example all of
a sudden maybe they stopped calling your api for some reason and your throughput is false down to zero or the ones or tens you know something is wrong so that's something you should monitor at a latency level and you should have alarms
and you should get notified if any of those metrics don't look good and then in terms of data i think it's essential to make sure that your data pipelines are robust i think for
some more mature companies uh where the data pipelines are pretty pretty good it's not not that big of a problem uh in the past where i was in the healthcare startup we were getting data from
healthcare providers and what we didn't know about eventually realized is that they were giving us periodic data uh in terms of this data but these periodic thumbs were actually
uh handmade in the sense they were writing the sql query from scratch every single time and the columns the data schemas would change and you know of course that would
break our models yeah so what we had to do was we had a pre-check which is you know checking the columns of the data to make sure the columns of data match and sometimes standards change for example i work in
healthcare we have something called the international classification of diseases and that changes uh uh we have i don't know at a point in time we were at icd 9 10 i know that was
11 maybe 12 is out so standards change uh so you also have to be aware of that so anytime these standards change every time they upgrade to a new classification of diseases all your categorical values would change that's
when you would have to retrain your model uh and you know all your one hot encoders or all your categorical encoding would have to change as well so that's the data and for machine learning model i think a
very straightforward way to make sure that your machine learning model works in production is to have a small slice of the most recent data and you know after you train your model
uh just evaluate your model on that small slice of data and you know you would just set some threshold so you know on the lat on yesterday's data is the accuracy still above 95 percent if it is you know just deployed to
production if it's not uh just break the whatever orchestrator you're using maybe airflow pipeline or something and it will just error there and send you an email um i think it's better
to have a more to have a stale model in production maybe one or two weeks old than to have a incorrect model in production that's just going haywire yeah
yeah thanks for sharing that i'm curious what are some failures you had in your machine learning projects and what did you learn from those i think the one thing that i can share
uh is maybe it's not a failure uh i don't know actually it's a good learning so there was once i actually deployed build and deploy the product
classic classification model so this was at lazada right where they hired me to do product classification on titles i was so excited i built it i built and deployed on a prototype server and then i
i we had a newsletter then so i shared it and then in our newsletter i'm now saying hey if you want to use this model just use this curl command
so curl is something that you need to type in command line so i didn't realize was that actually i probably did i just didn't really care about it is that our business users the people that
really mattered the people who i really wanted to try this model they were on windows machines and didn't have curl and i mean it it
it was just so difficult for them to you know figure out how to get curl working uh to try this product classification model to change their product title that no one just no one
bothered so after i sent out that newsletter i was so excited i was just waiting there looking at my prototype server right because i have logs yeah every time a new request come in qrikets
there was nothing for two hours uh so i realized that okay i need to put an interface on this so that was when i went back i learned a bit of flask html and css um built a
nice ui uh for it where you could just type uh type the product title you press enter and you know top 10 or top five product classifications come out
in the next newsletter i sent it out and again i was waiting at my server this time uh my tiny flash server actually crashed wow because there was so much traffic
and it was at the point i learned that the customer is really important yeah and the internal business stakeholders are really important you cannot expect a customer to be typing a curl command you have to provide them with nice interface and and if you do that they will use it
and they will provide you with so much feedback so that was a big mistake i learned very embarrassing and from there i learned a very valuable lesson yeah i think that's a great
lesson so if you really want people to try your machine learning prototype learn something about building api flask yeah i love that
and in your blogs i also read you talk about is very important to writing tests in machine learning project develop phase but it hasn't
really been emphasized a lot can you share more about that this is something that came about in my time in healthcare so for example
let's say we are going to predict the propensity of i'm just going to take a i think it actually happens as well in males but i'm just going to take an example let's say we're going to predict
the propensity of breast cancer right in males and females so in females the probability would be non-zero in males yes even though males can get breast cancer as well let's assume that
males should not get breast cancer so let's say someone changes the categorical valuable to males variable to males let's say we have the
exact profile but we just change the gender from female to male and all of a sudden the propensity to get breast cancer goes up how would the user feel it just feels like oh i cannot trust this model let's
take another example propensity for lung cancer right and let's take the exact same profile exact same person but we change it from smoker to non-smoker yeah should the propensity for lung cancer go
up no it should drop right right because it's not a smoker yeah so these are some of the behavioral tests that users expect from your model and you know
the the standard software engineering unit test we do for this doesn't really cover this no uh so i think it's really helpful to create these behavioral tests so creating these behavioral tests is can be quite taxing you're going to need
some business domain knowledge but as you come across these complaints you know this they work very well you you start to understand that the business nominee knowledge would be that hey you know generally
people who are older would have higher rates of cardiac arrest right you know if someone is given the same profile increase in age shouldn't lead to a lower propensity of
cardiac arrest or generally people who smoke should have higher propensity for i don't know diabetes lung cancer etc so you can start to encode this and you can
start to test if your model behaves this way yeah before releasing your production so you want to make sure that your model learns the right thing and why is this very important because if you make enough of these mistakes in
production you really start to lose trust in the customer yeah right and and and then that's when um it gets really difficult yeah i think that's very important
because sometimes when we build a machine learning model we're just focusing on the metrics we'll forget about um the domain knowledge and i think now
people start to talk more about causal inference and machine learning did you just get lucky or your model actually you know those there's some relationship
within those variables so i think it's um to your point the you know smoker non-smoker uh maybe you need to have some bayesian
rule some element in your model to think about um what exactly going on in your model instead of just treating it as a black box yeah i fully agree and i think
um i got inspired by by this testing i think i read this paper it was an nlp paper whereby i can't remember the exact name uh whereby you know they just took uh
sentences and they just changed some of the sentences and synonyms and they tried to predict uh the sentiment did the sentiment change because it was just a change in synonym
or they tried to predict changes in the model's behavior classification of the sentence classification of occupation by changing it male and female with a difference in
gender does the occupation predict the change and it does and so that's where i got the idea of behavioral testing but but i i thought to apply in in my world which is mostly tabular data at least at
a point in time uh in healthcare so i think it's it's useful to to have this i mean you won't be able to catch all of it but it's always useful to to have a have a good test harness so
that you're aware of what you're feeling and what you're doing well on yeah and previously you have a lot of experience working in startups and now
you work at amazon so um what's the difference between um you know startup and working for a big company i think amazon is slightly
different uh you can think of amazon as a group of startups right so even within a team is too very startupish i think there are few key differences though i think in amazon the infra is really
nicely set up so in my first week at amazon we have a jupiter like jupiter notebook like interface and we have a very nice data warehouse
and a very nice data catalog within my first week at amazon i could access all the data i needed i was able to plot a graph of
the the rate that which of you know uh use user reading completion rates within the first week i had access to the data i was able to do in a jupyter normal interface i was able to spin up my own cluster to
crunch that data so that was really nice in a startup you probably don't have this i recall in my previous startups in order to crunch data in spark it was still mostly spark summit we don't have
a nice jupiter notebook interface you know that could help us spin up a cluster and and do all the plotting so that's one thing on the flip side though is that and maybe i'll just give you an
example on the flip side though is that startups uh run our money so yeah big companies can run out of money too but more so for startups so
previously i was in a startup and i i was i was in the know i i was aware of the amount of cash balances that we have and you know it was not much and we'll
always be running our money and with this startup meant that with this uh pressure in mind sometimes uh you might not be so inclined to take big
bets yeah on innovation right you would be okay a one-year bet is too long can you make it a two-month bet or a three-month bet so that could be one thing so
um pros and cons i think that mindset that mentality is really good for the customers you really test and iterate very quickly i'm a fan of the book the lean startup yeah i i really practice that in my work
um so i really like that mentality i like that when everyone has that same mentality we iterate and we push very fast for customers on the other hand in big companies the infra is all set up
you have more breeding room breeding room uh to do a bit more research and innovation to serve customers and take more bolder big bets yeah so that i
think that's one key difference and previously we'll talk about roles right should we be more specialized or should we learn everything a little bit so
um if someone says i want to really build a machine learning solution from the beginning to the very end i want to build api i want to do kubernetes
or there maybe someone else says oh i just want to you know do my research work and create a prototype and someone else take it over to production so for those two type of people
what type of organization size of the company do you suggest them to join firstly we need both types of people yeah we do need people who really focus on the research and the techniques to really push it forward
but we also need generalists who are able to get it done end to end to put it in the production so customers can benefit from it i would say
that in general it is the bigger companies which have innovation labs and and the budget for research that allow more of the specialists to
conduct their research but that said there are also some very well-funded startups who maybe not startups anymore like openai hugging phase that also
i believe have room for such uh such innovation on the other hand if you want to be a generalist um and if you really want that opportunity i think there's no better place than a startup
in the sense that you are forced to do it to do everything end-to-end because there's really no one else to help you uh whereas in a bigger company um you
might not be allowed to do everything end-to-end um you know there might be a clear separation of responsibilities uh i know of data scientists who are not allowed to write production code i also know of data scientists are not allowed
to do their own feature engineering or write data pipelines because you know that's the stereotype uh the incorrect perception that data scientists can write production code or they they don't know how to write
efficient data pipelines or and orchestrate them and they just consume too much of the cluster so if you want to be a journalist and do everything and i think a smaller startup
is company or even a startup like team in a big company would be more suitable for you yeah thanks for sharing that um so when i when we talked about your
day-to-day you mentioned in the beginning of the project you talk to stakeholders and then you write documents i feel like this is a very
important thing in our daily work but it's kind of like an undervalued skill can you share more about why writing documents are important for
machine learning projects yeah i think as part of that i also want to share um why i started writing yes documents right i think i think five or six years ago i was at the phase
whereby i was interviewing data scientists who were two or three steps ahead of me like data scientists uh chief data scientist cto even i was asking them you know what
i guess the main question was asking them hey you know what does it take to be an effective data like this yeah think of the rockstar data scientists you know of what can you do very well i thought i asked them hey dude
are they phds can they connect very good research are they able to write good code or are they able to write good pipelines or do they really understand business i think generally the majority of them them said
that hey you know those are really good skills so you mentioned but the most effective data scientist the one that i cannot do without and the one that we discussed is the people who can communicate i was really surprised because i thought
hey everyone can talk everyone can write yeah what is the big deal about communication but so many of them sit there that i decided to just try it so i decided to just start writing once
a month right on my blog i decided to speak at meetups and that made a big difference i realized oh my goodness uh so many people can speak
okay so the feedback i got was that that my boss got was that hey you know i just had a meeting with your data this data scientist another data scientist and i didn't understand what he was what they were talking about um or
i i i just got a report and i don't understand what it means oh communication everyone can talk everyone can speak but not very many people can can do it in a
way that the other party can understand yeah so i thought that's very important so that's why i think writing is important so the question is why is writing documents important for
machining projects i think it is important for machining projects for the same reason that is important for all projects yeah so let's just imagine before you try to do a project or you have some vague idea
or something in your head it's very vague it's just in your mind you haven't really defined it but when you try to write it down on paper i don't know about you but for me it's always very very difficult yeah it's so difficult to
write it down and and i was talking alexey a mutual friend he says he he thinks it's so easy but you know to right now a sentence or even a paragraph takes half an hour um and i have the same thing and what i
realized and what i read from a lot of people is that because when it's in your head it's very fuzzy right it's not clear you don't have a very detailed point of view but when you have to write it down you force
yourself to very specific oh how much is it going to cost how many mandates are you going to take uh what infra are you going to need because if it's your hit oh i'm going to deploy this i'm going to scale this it's going to be auto scale i mean when you write it down you have to
write the details on design dog you realize that you have to work out all these details and that's really important and the thing is even after you written the design talk
and it gets lost or no one reads it that's okay because you've already done all the thinking on all the research in your mind and if you think about machine learning right only on any
decently sized uh technical project it's going to be more than a one person effort usually it's across multiple teams multiple functions with product with business with engineering
um if it's just all in your head how are you going to share with them and get their feedback and you know some people do powerpoint slides just bullet points oh we're gonna do this with auto
scaling and whatever um but how is that gonna scale you're gonna need to give i know three or four presentations and maybe they can record and view it but nothing beats a solid document and that's the
that's what i learned from amazon's document culture and i really love it i cannot do without it right now and i think one thing that is really important that i found really useful and i imposed
on my team in the past is that before they start a project is to write a one pager and the one page is very basic what is the intent what's the problem we're trying to solve what is the deliverable what will we deliver what is
the success matrix and what are the constraints so we write that down in the document and sometimes if the project gets long it stretches out six months nine months and you get a
bunch of feedback and new feature requests every now and then people forget what the original intent is and you get lost but if you write it down in a document at any point in time you can always refer to it it is your map
and and that just gives you so and it has happened to me before i've worked on long projects or after long hours or you know after so many changes i forget what am i supposed to do oh there's this new fan uh oh there's this new fancy technique yes
usually new data let's use this but when i forget and i refer to my original one page i was like oh this is the original intent this is a success metric i don't not need to use this i don't need to do
that that feature is not required this update is necessary i find it very is very useful for me uh to guide me forward so i i tend to have a one pager
to to summarize the intent of this so i i think that's very important for machining uh projects and and most technical projects in general right so writing
is not about finding beautiful words writing basically is thinking clearly i fully agree i think that one thing that data scientists are guilty of
is that they know how to use the right words yeah in terms of academia like they would say oh the presentation is this the kl divergence or the back propagation or whatever but business
people don't understand this oh you know we're going to put in an embedding space and we're going to try to vectorize it and we're going to try and find the nearest neighbors but you need to write it in a way that people can understand what it means is that we're just going to change this
image to a set of numbers and because numbers have distance we're going to find the nearest numbers yeah and based on that we can find the nearest looking images and that is a way that they can understand and
um when they understand something it makes it easier for them to trust it if they completely don't understand something if it's just a black box how would they trust it how would they use it how would they
push it to users to do to customers and have them try it so i think writing is very important by simplifying it making it clear for other people i understand yeah exactly
and the writing is or communicating is not about just telling people what you know it's about thinking about one who is your audience and two
how can you speak their language when you talk about precision and recall those are great if you speak to a technical audience we speak to the business translate them into a business
use case this is what happens when we have recall of 90 give them a little story an example so that's easier for them to grasp
yeah i think uh i fully agree i think it really depends on yours your your your your audience i think in my work i mostly speak to two audiences one is the business
stakeholders so for them i try to make sure that i don't use our scientific jargon and the next one is fellow scientists and in those cases using scientific jargon can be useful
because it eliminates a lot of the context yeah we know that this work means this thing and it just makes our communication efficient yeah thanks for sharing that so when you talk to
stakeholders in the beginning of the project how do you ask the right questions i think that's a good question uh i'm asking a right question i think um again
it's that it's just this um mindset that i have is okay they've given me a problem how can i cut the problem in half
and again it just goes back to me being very lazy trying to solve the problem in as little time as possible because we always have too many problems so let me give you an example and again i'll try to draw some
principles from it there was once when a stakeholder came to us uh and said that hey we uh we have a problem with reviews product reviews right now it's due
manually uh classified can be automated okay can we automate it so that's just a that's that's a very big fuzzy problem right okay okay so i started to break it down okay
automate it and i try to see what's the minimum that i could get away with okay automated what's accurate what accuracy do you need because at the start everyone outside of accuracy is okay i need 95 accuracy
okay 95 accuracy is a pretty high bar so i asked them okay which one is more painful for you is it more painful if we let a fraudulent review show up on the
website or is it more painful if a valid review is flagged as fraudulent and we lose that one review is it certainly a fraudulent review shown on our website is more painful so if you consider
fraudulent reviews as positive for them what is more painful is to have uh it it's not too bad to have uh false positives it's not too bad to have valid
reviews flagged as fraudulent but what's more uh painful for them is for these fraudulent reviews to get away these false negatives so immediately i moved away from the recall or the accuracy metric into false negative
false positive okay for false negative what is the amount you can bear 95 percent for false negative it was as low as 50 so that now i then uh zoomed in on false
negatives and forcing this there are a lot of different false negatives and there's a lot of kinds of uh reviews that they want to flag it could be that
hey sometimes the review is a competitor seller provide posting a link of the website or even on the website to a different product that's something that we don't want to encourage it could be a customer you know thinking
that the review uh they can provide their contact details they want to make a complaint they want to refund they post their connect decisions in the review again we don't want that or it could be just you know spam reviewers sellers posting reviews of their own
website of their own product and that would just be spam so i i have to ask them which of these were more important to you and again i try to break it down just i try to break down the problem smaller and smaller so that's how i try
to ask questions i don't know if they are the right questions but they are questions that help me reduce the scope of the problem and solve it in half the time so i think there's two ways to think about this um
there are few principles that you can be it can be useful so i think one way is to really take on uh ownership end to end imagine that
you are your manager and your manager has multiple projects that you need to do and you don't have unlimited time to solve this problem some data scientists i know when they get a problem oh they're so happy about
it they love to solve it to the best extent they spend years on it but we don't have that we don't have sometimes we don't have that luxury um so that's one thing so if you have that mentality of you know trying to
reduce amount of time trying to be lazy you will try to scope it down second thing is you if you have that mentality whereby you feel the sense of ownership from the business um you try to
solve the problem to the best extent for them and you know sometimes you know uh some people might just think oh 95 accuracy okay and just run off and solve it but if you're if you try to put your shoe
yourself in the shoes of the business from the stakeholder what do they really need and you ask these questions again again you get to drill down the problem and what you build actually solves their
solve their problem yeah yeah thanks for sharing that and uh um in my experience sometimes
stakeholders come to us and they want something with you know 100 accuracy or something that machine learning
isn't capable to provide and with those type of requests or some other things that you feel you cannot support how do you
communicate and push back so i think there's a few things here so i think one thing is that hey you know if a stakeholder requires an exceptionally high level of
performance that maybe our team or our data is not able to do usually um the problem i have is that i would never say no usually i would try a feasibility
assessment on it and that takes only two to four weeks at least we put an effort to try to solve it and maybe we'll solve it and then we go back to them hey no this really isn't solvable we don't have the labels we don't have the data
to solve it we tried and we're just not able to do it and sometimes they they would accept it so that's one problem where stakeholders uh that's that's one issue where stakeholder comes
to us with uh exceptionally high level of performance that we just kind of meet um but more often than not the problem i face is with too many stakeholders coming to us with
too many problems um and usually in the past what i would typically do is i'll just say yes and just overstretch myself and i try to please everyone and you try
to and you actually please know what but then this change when i began to lead a team i began to manage a team and they were counting on me to turn away these problems right i
couldn't just take on everything because the team was just overstretched so how i would say know how i turn down these requests is to ask them okay we are going to do this
for you but how would customers benefit how would the business benefit uh surprisingly uh a majority of the time i would send them this email list okay it's just a two sentence email how would the
business and customers benefit a lot of times they just don't respond because they don't know they haven't thought it through uh they just want to do it uh because they're interested in it so that's one early line of defense
that's easy the second line of defense is uh is pretty straightforward so if they have a very well articulated business benefit if they have a maybe
like a pr faq like in amazon terms yeah um okay we have to do it but sometimes the team is stretched so what i would do back then as a manager was i'll pull up my my excel sheet roadmap look these are
the things that we want to achieve for you by the end of this year and i know that this new thing is very high priority um how would you like me to shift the priorities so i put the ownership on
them right because we are serving them right and you have to decide how to prioritize your projects um sometimes at a point in time oh you know this actually these other projects are very much more important uh let's
just put this on the backlog or sometimes they would take something in the existing roadmap uh off the backlog off the roadmap and put a new thing in and that's very important in the sense that it allows my team not to
overstretch ourselves we serve our customers with our limited resources and time yeah i think that's very important when you if you have to say no to a request
provide evidence it's not like i don't want to help you there's something else that's more important to you at this point so going back to your career
in your journey getting into machine learning and also as your grow your career what are some factors that are important to your success i would say that the number one most
important factor my success is luck uh you know back then i was coming from a government i was an investment analyst i knew a bit of sql neobiol r i mean why did ibm
uh hire me as a data scientist um maybe the only reason i could think of is because i was lucky okay maybe there was some preparation involved i i learned a
bit of sql on my own um i i knew r and statistics from my undergraduate degree i was i knew a bit of machine learning i was interested in it i i dressed up well for an interview i
think i spoke well and they hired me so at a point in time they hired 20 people and i was the only non-technical people the rest of people were all engineering
or information systems or phds in biochemical or whatever um there was also one other econs person so why why they hired me i can't ever think it's luck and then the
other thing was that um so from ibm to lazada how did i do that i think again it's luck right i mean uh i did a cargo
competition we did well i just gave a presentation luckily someone there was a startup who had the same problem and someone was there to watch a presentation and then they
invited me in again that i i think that's like okay maybe um if i had not done that meet up if i had not taken effort to do that right that would never have happened yeah
um but how do you know i mean i've done yeah i mean how do you know uh that would happen so maybe it's i think naval calls this hustle luck you hustle enough sometimes sometimes luck
comes and finds you maybe that's that and then of course uh getting um highlighted amazon i mean i'll share that i received multiple
offers um one of the offers was from zillow was from the eye buying department uh at that point in time my wife was like you know you're going to really love
zelo i mean she she isn't real she really likes her she's in airbnb so she she likes this and i was really thinking very hard between 0 and amazon but i was really drawn to the mission of the team
in books and leadership principles so and i took this offer so i think again i attribute it to luck that i dodged the bullet of zillow well not really a bullet i think zelo is really awesome
but you know yeah i i can probably imagine that if um now they are letting go 25 of their workforce i can't imagine if i joined that team i would be part of that so yeah i think luck and maybe just
doing a lot of things yeah i i yeah i think everyone you know there's some luck in our career and including myself but i think
by putting yourself out there you are attracting a lot of luck uh for example you go to the meetup you did a hackathon if you never have done that maybe there be someone else getting lucky in that
event and also if you don't write your blogs there won't be um a lot of other speaking opportunities for you and we wouldn't
met so maybe i can also share how we met i don't think you know this part i i was just thinking about this but i i was not thinking oh no i've done lana asked me how i mean i can't remember i can't remember if i reached out to her
or she reached out to me or how it would happen yeah so i posted something on linkedin i don't remember today and then i saw someone commented
and it was you and you said something like oh i also wrote about this blog so in the beginning when i see this comment say who this guy i don't like you why you promote your article under my post sorry
and then i was curious i click into the blog that okay this actually has a great point it's a great blog and then i look at your linkedin profile
so then i was like oh i want to meet eugene and then he also works at amazon so i send you an email and i suggest we can do a virtual coffee chat
so that's how we met and if eugene doesn't have the blog if he doesn't promote his blog i would never met him so the lesson is if you have good uh you know good
products you have good blog if you really believe in yourself promote it market otherwise people won't be able to see it and it doesn't matter how good your
model how good your blog is if you don't find a way to distribute it and people won't be able to learn from you so it's not about marketing yourself if
you really have something to share to the world it's about um having a way to help more people think about think about it that way that will help you to feel more comfortable to
share your content yeah i fully agree i think the reason why i posted on i really can't remember but i hope at least my post was relevant uh to what you're sharing yeah i think the reason a big part of the
reason why i write is because i get a lot of uh questions and you know after a while i just figured hey you know it's easier if i just try to answer all these questions
in the blog post so maybe a common question i get is what's the difference between a data scientist an applied scientist a researcher i'm a engineer i wrote as a blog post uh another question i got was how was your omscs experience
like errol dennis blog post um so i write it there and i it's very helpful for people and a big unexpected benefit of writing it and putting online
um and you know sometimes randomly posting on other people's linkedin posts and maybe um getting them upset at me is that i make a lot of new friends uh i can't remember when dan and i met
but from the first chat we had since then we've always had a monthly recurring one-on-one call and i've made multiple new friends this way online that has
um has been very fulfilling um alexa is another one goku is another one so um and it's just so much fun to
to get to know and meet all these like-minded people online and i think uh one of the i can't remember i think uh chip mentioned something there was
once i wanted to interview chip for and for on my blog uh informal mentors and uh i reached out to her and then you know through my throughout chat she uh she said hey you
know actually i read some of her stuff this is really good and that's the reason why i agreed to it and i said hey you know it's because of you this post that you wrote that's why i wanted to interview you that's right and she mentioned that writing was very
important to her yeah she really enjoys writing and it's really useful for her uh online and and that's i think that's a very big benefit of writing and sharing your ideas your content uh it could be
it could start small maybe just a tweet oh i know one thing that daliana does very well is um her linkedin content right which is very helpful for people and helps a lot of people and and
she makes a lot of new friends through that um and for me i just tend to write on my on my uh on my site maybe once a week or actually less often now so yep highly recommend that
yeah thanks for sharing that um so what are something you learned in your career um that you never learned in school
in general i think that in career in most machining courses you learn the ins and outs of machine learning you learn you learn how to split the model properly what what what the different
metrics are and you know what the machine learning techniques are and in kaggle you learn the same thing and you get a lot of fast feedback what i realize is that at work um
often the most difficult part that trips people up is defining the problem so for example let me give you a problem maybe you want to catch fraud
okay is it an outlier detection problem or is it a classification problem and if it's a classification problem are we predicting if something is fraud or are we predicting if something is not fraud
depending on how you frame the problem it affects the machine learning models you use and then depending on which metrics you use it affects your actual online performance so this is the one thing that starts
right at the beginning that has immense uh downstream impact on everything else you do so that's one thing that we just don't get enough practice in online courses because yeah it's such an open space right the search
base is so huge whereas in online courses at school your search base is heavily restricted in a good way because it gives you fast feedback you iterate fast you get a data set you get the metric
so that's very straightforward but in the real world you know where to find data what metrics to use how to define the problem how the customer experiences it that's very important
i think another thing that i learned at work that i didn't pick up in school or in online courses is that in online courses you just write you write the code you need to get it working
but you can't do that in the in at work right so at work you can write the code you need to get it working you can get it deployed but just deploying it it's really just the
start of the project what's really important is maintaining it and upgrading improving on it so you need to write your code design your systems in a way so that it's easily maintainable
easily operatable you get this observability you get notified this that's monitoring easily extensible and that takes a lot of effort and that's not something that you you learn at school because how how is someone going
to provide you feedback with hey you know this design is it's not extensible uh or this code is not maintainable because in school you probably don't do code reviews there's probably no requirement for for code coverage or
linting unless you actually do a course like that but i'm not aware of any calls like that so that's one thing and i think in school
you also don't learn about all the operations of it so what i mean is that hey i mean if if you're at work use an f4 pipeline or you know or use spark i'm sure you have come across errors every now and then and you have
to try to debug it quickly triage it debug it and move on and maybe write an error report in school you don't have that because you don't have something live that's running in
production you don't have airflow pipelines they are failing you don't have airflow pipelines they are failing because there's a leap year you don't have sql queries that are failing because you the amount of data you accumulated
because your business has been growing suddenly exceeds uh the capacity you have so these are very real-world funny problems tricky problems that occur
uh they're just un so unthinkable that it it's just impossible for school or online courses to cover uh cover them comprehensively uh so so those are a couple of things that i learned at work
but not at school yeah thanks for sharing that and uh what are some mistakes you made in your career well you know now that i think about it
i think it's generally hard to uh hard for me to uh see of anything as a mistake i think because i tend to always see as a learning opportunity i guess
yeah one mistake that comes to mind is the error where i sent uh the sample curl commands to the business and no one used it right yeah i think that's that's one mistake i made i think a couple of
other mistakes i mean i think i i recall one now where my previous boss a very great guy um i was giving a presentation to executives you know like how i would
solve a product classification problem and i went through you know i went through my slide so here's the literature research here's what walmart did here's all linkedin here's our here's what the data we have the preparation we have here's our how i
would do it here's the machine nemours i'll use etc etc and towards the end and that's it and i i i cannot remember how the execs looked back then but i think they were
very kind they didn't stop me they said thank you and then they left the presentation after that my boss took me aside and usually that presentation really sucked i mean like you talk about all this but what's in it for them what's in it for
the business he said that you should start the exact summary out front i say we can solve this problem to 95 accuracy and then everything else is in the appendix right you just say what you need to say in maybe 5 or 15 minutes and
the rest is just if they have questions they'll ask you questions if they don't you just save everyone time i think that's one big mistake that i learned and i'm very grateful for my boss back then who just
took me aside and corrected me immediately i think that's one mistake i mean um machine learning wise maybe i make mistakes but they're not as
painful because um i have this very lazy mentality whereby whatever i think i can do i try to do it half the time yeah so even if i make a mistake it's not so painful right
because i haven't sunk in that much uh cost and i tend to fire multiple shots and um maybe out of i don't know maybe our five shots one fails two fails but that's not so
painful because i still have three shots as opposed to someone who spends the entire year working on something and if that fails you have nothing for that year so yeah yeah thanks for sharing that and i don't
think you're calling it lazy i think it's actually smart because i think sometimes they're really not productive to dive into a model and spend months to
develop something without knowing whether to work or not why don't you just quickly put something together something is not perfect and then exactly
using the lean startup approach in your machine learning model development cycle and then iterate fail fast and see whether it works and i think lazy is
actually the source of a lot of uh great invention yeah i fully agree with that and now that you mention it i think i sort of recall where this mindset came about
this mindset of mind came about when i was mentoring three interns and these three interns were only going to be with us for three to four months and back then
my kpi for them my personal goal for them was that i wanted them to do a solid project that they could put on their resume and help them get their first job um so you can imagine in three to four
months there's not a lot of room to fail no so i had to very quickly iterate through multiple problem statements to make sure that i had one that i believed that they had a reasonable chance of success and work on it for the rest of
the three weeks um because of that and it's the same thing for the for for the team when i manage a team it was it's very painful um you can imagine it's very painful
when you're doing appraisal with someone and at the end of the year and they have nothing to show for it because they sunk so much time into a project that went nowhere yeah um and i wanted to prevent
that from happening to my team so i was like i would always iterate quickly for them even if um every project in the in the year field at least there'll be four learnings for four lessons they they try four things and that's something to show
that's that's actually something admirable to to show for instead of having one project where they went too deep down a rabbit hole and it i guess maybe that's partially what guides my my
thinking and also because i think i'm generally someone who's curious about many many things and just wants to try many things yeah um i i admit i'm just not someone who would dive
very very very deep into natural language processing and just use the latest and greatest but i'll just tell you something that's very efficient maybe transformers via hugging face that will work let's just do it solve that
problem and move on because the way i see it there's just too many customer problems to be solved too many features i want to deliver to customers there i don't have enough time to work on them
so i i just don't have the patience to to dive into something so deeply when there's so many other things that i can solve yeah exactly because there's always a million things you can improve
if you don't stop so important to know the baseline of your model and to know how your stakeholder would like to measure success and know when do you
want to stop um and you know i think we don't have to be the perfectionist we can just execute either learn from the customer and then
i think perfection is achieved through multiple iteration it's not you put your head down and then um figure something out give someone a surprise is usually
not good yeah i agree on that i think in in most real world systems i think perfection is very hard to achieve i think and most uh the thing is that that really uh is relevant is that it's not the the
destination but it's the journey right and in most machine learning systems it's an ongoing journey until it gets deprecated or gets replaced by something new but a lot of times it's really the journey that matters there's there's no
destination where you just stop and if there's a destination stop it's probably just keeping the lights on business as usual and eventually about to be deprecated yeah i agree with that
so you write you give talks i ask my audience what do they want to ask you about they wonder how can you be so productive well actually a few people
have asked me this and it's no secret i think from 2017 to 2019 i was uh working at la zara and then a startup so
it's very intense at a point in time was also when alibaba acquired us i had to fly overseas for weeks at a time uh to integrate lazada with alibaba so
it's very taxing that was very taxing and at the same time i had to do that masters of computer science at georgia tech so that was done part time and truth be told i'm not a very good
student and that took uh 30 to 40 hours of my time every week i was very intense very stressed out thankfully
my family and my girlfriend that that point in time now wife was very understanding um essentially on the weekends i don't go out at nights i don't go out i i i don't really have a
social life yeah at that point in time i was just studying and doing the work so what happened in 2020 was that i i moved to seattle started my work at amazon
um and i felt it immediately like all of a sudden i had so much spare time i don't go out at night at a point in time and on weekends i was new here yeah i had so much better time so i was like okay
what is it that i should be doing um and i thought about it okay let's let's try since amazon is such a writing heavy culture let's try to improve my writing a bit more instead of writing
monthly i'll try to write once a week and so that's where i put my time into and you know with work from home and the covet log down everywhere is closed i just got more time to do that so i started pouring my time into that so i
did that for one year i learned a lot about writing and about myself uh but this year i haven't been writing as much uh i think a lot of stuff i wanted to write about that i kept inside me i've just poured it out
and now i'm just reflecting like what is the what is the next step for me as a writer what should i be thinking about what should i be writing about uh i have a lot of personal stuff that i would like
to write about but when i write about it i think people just um the audience engagement is very low for example i think i'm a big fan of writing i think it's really important for people right but every time i write about writing no one's that interested
uh i also think a lot about other things like you know some like habits that i'm very interested in they can spend very little time in like maybe just investing or meditation or stoicism but you know when
i write about this personal topics the engagement is very low so that's that's how i find the time to be productive uh because of of covet work from home and because i was having a very intense chat
yeah um i think you didn't really try to be productive you just followed your interest when you're writing uh sounds like you of course you want to share your wisdom with your audience but
you're writing for yourself yeah i don't really have wisdom uh um i so i think i think one of my posts okay let's just imagine i mean people at a point in time i was
at the point okay so at the start of it i had a mindset shift which is that i thought that i had to write perfect stuff i thought i had to write perfect stuff that was completely original but then i learned that actually i don't have the right perfect stuff and it's okay if your work is not original
because it's impossible to write original work once a week and i was so interested to share this and i shared about this um and then as part of our sharing about this i was also learning about this note-taking approach called as a
telecaster and i was talking to my friend gabriel and he's like you know how can i learn about this and i wrote about it so i just wrote about it online and just shared there and for the
longest time nothing happened to it um and then someone posted it on hacker news and all of a sudden it was like the top of hacking wow you got lucky yeah yeah yeah i put yourself out there yeah i got lucky and a lot of times
it's just um i just wasn't trying to be productive yeah i was trying to be lazy someone asked me a question i was like okay instead of answering you this question via telegram or via chat
let me write it out i can share with you the link and then if other people should ask me the same question which multiple people have i can just share the link again right um and that's how it happens
yeah so the secret being productive is try to be like lazy i love that and you mentioned through writing you learn a lot about writing and about yourself um
could you share that yeah i think one thing that a lot of people expect is that so this is the question that when i have chats with people and they ask me hey you know i want to start
writing but i don't know what to write about um i don't know what if no one reads my writing um what should be my niche i mean
let me be brutally honest i think for the first 10 or 20 pieces that you write probably no one is going to read it even if you share the link with family and friends you probably need to back them to read it to
get their feedback yeah so who cares who cares what you write and then the other thing is what should my niche be should i write about fitness uh investing should i write about
careers and you know a lot of times and i thought about that a lot as well and what i realized is that um you don't find your niche your niche finds you so
what i found in that one year of writing every week was that um i just write a lot and after that when i look at the statistics i look at what people resonated with i found that i had this
niche in more about machine learning systems and the different ways to uh organize them and how i would look at different literature from other companies how they organize it and summarize that so that's one thing so i
guess the lesson i learned is that just right there's really no need to think too much about it if if no one and you you can just write like no one is reading it just pretend
the ones reading and just write anyway eventually people might read it maybe but the main benefit of writing it is to yourself is that you clarify your thinking and you learn a lot through the
way uh along the way and that's what i learned about writing yeah um i agree with that um and also previously you mentioned you
write about investment personal finance so why do you think that's important i think that i started investing pretty late i started investing only two years of graduation i wish i started earlier
and the 2018 uh financial crash so uh when i started investing then i started to save a lot of my income i started investing on my income
and so when i wanted to leave the government to go to ibm i had to take a huge pay cut it's like 33 percent pay cut wow and at a point in time i wasn't even sure if i could make it in this field but
what gave me confidence was that hey you know i had a sizable amount of savings i had a sizeable amount not sizeable but i have decent amount in my investment portfolio that if anything
goes wrong if ibm fires me um i would have a i would still be okay and then you know moving from ibm a giant multinational company to lazada this dinky startup
again it was very um risky right a lot of people think are very risky but i knew then that hey i have this cash reserves i have this investment portfolio that's growing yeah
i just felt safe to do that and because i had this cushion it allowed me to do that yes the opportunities came but i have said no to it and i think i would have said no to it if
uh i was too risk adverse if i had no cash reserves if if i was too worried that lazada might fail and where would i get a job but because i had this cash reserves that's okay
and i think investment is really important because um to start early and it which is to start early which is you know compound interest is just so magical i think uh i think this year the
smp is up what 20 so imagine if you have an investment and i looked at my early investments in the total index right now it's up more than
100 yeah um so starting early helps you to feel more safe financial security which allows you to take more risk in your career and of course it allows you to retire happily
yeah i agree with that i think a lot of times when we get our paycheck we just put it in a bank but i think although
we're data scientists we spend a lot of time learning and it's also important to pay attention to you know personal finance and invest like eugene said it
kind of indirectly helps you with your career once you feel more secure you can just do join a startup or maybe you want to start your own startup or even start a startup i mean if you have enough reserves that you say hey you know if i
only spend two thousand dollars a month here i have a hundred thousand in reserves that can last me 50 months right and you might take the leap and do a startup or join a startup or create
a startup of your own and and because you have this pers your personal finances sorted out it gives you the opportunity opportunity to do that right yeah
and you also talk about stoicism and meditation how do those practice help you your career or life in general yeah i think one thing
uh one problem i have is is that um my mind is a monkey mind oh yeah so am i my mind is overly anxious about everything yeah i'm overly nervous about
everything so i used to meditate uh maybe every day in the morning for 10 minutes and that didn't really help but last year of last year i had two weeks two weeks vacation and i had everything all
planned i was gonna do this i was gonna play a game uh really uh immerse myself into it just take a break but it turned out that that game sucked uh within the first day i just couldn't play anymore i mean it was so bad
i had to return it so now i had two weeks of time and i had nothing to do i had nothing planned so i decided okay let's try this meditation let's give this meditation
thing a try i'm gonna try it for one hour every morning and in the past i i could not imagine wasting one hour of time not reading not writing a draft not
exercising um but for this vacation i had nothing to do so i just tried it and now when i was thinking doing it i i started to explore a lot about myself
um i explored hey you know uh why am i so anxious about this um so maybe let's let's let's take a an example let's take a topic that i explore hedonic treadmill so i examine
why am i so anxious about this why do i want to get to the next level why do i want a bigger house so i don't want a bigger car for example and i realized that it's just a hedonic treadmill that once
you get something you just want more yeah and i try to tie it back to my my learning of evolutionary psychology so you know this this may or may not be correct but it's just how i try to
understand in the sense that in the past um when we are cavemen and cave woman caveman just wants more resources because if he has more resources he has more meat he has more shelter he's able
to get mates and he's able to reproduce but in today's world we don't really need that no we are able to really support ourselves and live very well and i realized oh my goodness i'm on the hedonic treadmill
and i'm so anxious about this i just want more and more do more do more do more do faster do better um then i just really didn't really enjoy my life so and it
was through that meditation process that i learned about this and since then i've continued to try to meditate uh once every hour and it's always been very enlightening uh for me
and i i really treasure that because if not i would just be carrying these problems with me throughout the day it would just be there in my head affecting me but
but if i take that one hour to just think it through um i i would resolve it and i was settled so meditation as many kinds some some of it is like you know mindful meditation when you focus on your breath it helps
you focus for me i try to i do something called i don't know what's the right time i think it's inside meditation i would try to take something that's bothering me something that comes to my mind i just keep asking myself why why why until i sorted out and then i can
let it go oh wow and then it just clears my mind and clears my heart and it just makes it easier to live i think yeah thanks for sharing that i think
maybe i should also try that um about the revolutionary psychology i recently wrote a post about i thought i would be happy when i get promoted to you know
level five and level six and uh then i realized after i get promoted i'll be happy for a few days and then i'll feel i'm not enough i want to go to the next level
and there's always going to be another level so i think it's also never said your desire is basically you prevent yourself from being happy
you know unless you you achieve that and you always gonna it's like put your happiness on hold so there's no way no level is gonna make
you happy you have to be happy um happiness is the way exactly um and i think one video that i watched that was really powerful is from this guy called justin khan he's the co-founder of twitch yeah so the
co-founder of twitch he sold twitch for a billion dollars right 970 million dollars and he was sharing that you know he was uh advisor or partner at ycy combinator and he was seeing people his
friends you know like airbnb is a i don't know how many billion dollar companies like he was thinking hey you know um i'm no worse than them i can do that too i want to build a 10 billion dollar company 100 billion company
and he tried that and he didn't work out but along the way what he learned is that hey you know this is just a never-ending thing yeah we keep wanting to achieve more and more and more and more but
what really matters on our death a bit and so he was talking a lot about his head on a treadmill and that that is now very apparent to me every time i see myself desiring something am i desiring this just because it's a
head-on treadmill or am i desiring this because it helps people um and i really try to distinguish between that if it's just a head-on treadmill i just try to put it aside and not let it affect my affect me
yeah that's great insights so um before we wrap up what do you think is the future of machine learning yeah that's a very tricky question i i'm
not i'm no profit but what i can see is that i think a lot of machine learning is going to be automated right a lot of it would be i mean companies like data
robot sagemaker h2o they try to make it easy for you but you know you provide some data you provide your labels and they just auto tune your machine learning models and and they learn whatever is best
but what cannot be automated what cannot be automated is defining what metrics to use defining what data to use what label to use how to frame the problem so
i think that in five to ten years time or maybe even earlier a lot of uh most people wouldn't be doing machine learning that's what we think of it now
which is hyper parameter tuning feature engineering a lot of it you know we would just use things like hugging phase or sagemaker but what's really important what's really difficult is to try to frame the
real world problem as a data and machine learning problem to try to find the right metrics for them so i think that's what i think the future of machine learning would be like yeah so basically there are going to be
a lot of tools to empower data scientists and the machine learning engineers but there's something that's not going to change it's how you communicate with stakeholders how you
pitch your idea how you give presentation and those are the parts we mentioned earlier fully agree with that yeah and what is something in your life or in
your career you are excited about well i think right now i recently adopted a puppy oh so i'm spending a lot of time uh with her training her and bringing her on our hikes and everything yeah
such as the adorable uh that's one uh the other thing is that i think uh you know with cove a bit more under control i really like that uh seattle is
opening up so i'm uh and you know today the weather is pretty good i really like that uh going out and eat again uh socialize with friends again and you know meet people face to face like i wouldn't be able to do this with
uh if we were still in the long term and i think also i think uh one idea that dilena and i have been thinking of is you know to think about how we could have more um
more talks like this yeah in a more informal manner uh with maybe more audience participation yeah um where we just take a topic and just share our ideas and opinions and maybe have some audience
ask questions uh maybe on on a more freeform uh approach maybe clubhouse or twitter spaces et cetera so i think that's
that's something that's an idea that we'll be keen to explore so if you're interested you can follow me and eugene on twitter and uh also join our newsletter we'll
post those events uh probably in uh january 2022.
yep that's it yeah well it was really fun i feel like i learned so much more from you about your
machine learning approach and also you as a person so um thanks again for joining me in the show and thank you for having me here i really um enjoyed those questions uh they were very top
provoking and made me think hard about it and i'm glad to be able to share whatever i know what little i know with your audience thank you you
Loading video analysis...