Watch me Cleaning Data in minutes with Python
By Lore So What
Summary
## Key takeaways - **Strip, lower, replace column headers**: Remove spaces at start/end with strip, convert to lowercase, replace spaces with underscores in one line of code to fix messy headers like ' campaign ID '. [06:20], [07:30] - **Strip dollar signs from Spend**: Replace non-numeric characters (not digit, dot, minus) with nothing, then pd.to_numeric to clean Spend column for math operations. [09:22], [10:44] - **Fuzzy map fixes channel typos**: Create cleanup map to standardize 'tick talk' to 'tiktok', 'google;' to 'google ads', merging inconsistent categorical values. [11:42], [14:01] - **Boolean map unifies Active values**: Map 'yes','y','1' to True and 'no','0' to False, filling NA with False to handle mixed boolean formats. [14:16], [15:39] - **Cap outliers at Q3 + 3*IQR**: Define upper limit as Q3 plus three times interquartile range, cap massive Spend outliers instead of removing them. [21:34], [24:29] - **Extract season from campaign name**: Use string slicing between first and second underscore to create new 'season' column like 'summer' from campaign names. [24:36], [25:58]
Topics Covered
- Data Cleaning Trumps Glamour
- Raw Debugging Builds Real Skills
- Clicks Can't Exceed Impressions
- Cap Outliers at 3x IQR
- Extract Features from Messy Strings
Full Transcript
data analyst, data engineer and data scientist arguably spend majority of their time cleaning data. I definitely
agree that this is not the sexiest part of these type of roles, but data cleaning still remains one of the most important skills that you can have in the job market. And so my goal in this video is very simple. I want to show you
end to end what a data cleaning process with Python looks like. And there are a few reasons why I think this video is going to be super useful to you. First
of all, we're going to use tools that are actually used in a real work scenario. and so not really tools that
scenario. and so not really tools that you can find online but are not actually used in the real world. Second, we're
going to work with Python which is a tool that is super important for data scientists, data engineers, and data analysts. And throughout the tutorial,
analysts. And throughout the tutorial, I'm going to also show you some Python concepts that might be uh useful in case you want to revisit some theoretical aspects of this coding language. Third
reason, I'm going to show you a bit of a raw format of data cleaning, which means that this is not going to be a very clean tutorial where everything goes well. I'm probably going to get some
well. I'm probably going to get some errors and trying to debug those errors so that you can also see how I reason as a data analytics professional when I encounter errors and trying to find the best solution. And also this is going to
best solution. And also this is going to be very useful for complete beginners with Python but also advanced people cuz we're going to start from an easy level but towards the end of this tutorial you will see way more advanced concepts that
I usually use in the real work scenario again and also we're going to work with business data which are very similar to what you can encounter in a real workplace scenario and so not random data about sport or music but actually
things that are used in departments in a real company. If we never met before,
real company. If we never met before, I'm Lorenzo. I'm a data analytics lead
I'm Lorenzo. I'm a data analytics lead with over 7 years of experience working at companies like AWS and Deote and I'm also the founder of the analytics and automation academy which is a six-week
program where I teach you everything you need to know in that analytics and I will follow you on a 101 basis until you land a job offer in this field. This is
not really a video course where I only teach uh theoretical aspects. This is
pretty much the only resource that you need today to actually land a job offer in the world of data analytics. And so
if you're interested in learning Excel, SQL, Tableau, Python, work together with me on four advanced projects, build a portfolio [music] website and also AI agents end to end, plus learn the domain
knowledge that you need in the actual field and then optimizing your resume, LinkedIn, having me preparing you for coding interviews and business case type of scenarios, then definitely make sure to check the link in the video
description. Applications are open right
description. Applications are open right now. So definitely apply, schedule a
now. So definitely apply, schedule a call with me, and I'm going to see if you are a right fit. And now without further ado, let's start our end to end data cleaning project using Python. So
the setup of this project is pretty simple. I am as always in Google Collab
simple. I am as always in Google Collab is my favorite editor for coding in Python. Super easy to set up. You just
Python. Super easy to set up. You just
open Google Collab on your browser. It's
all free. You don't have to install anything and you can just start coding straight away. And the thing that you
straight away. And the thing that you see here is I'm basically importing pandas spd import numpy as MP. These are
the only two packages that we're going to use in uh this tutorial. Also, I'm
creating a data frame and uh this is going to be based on a CSV file that you will see in video description. So, make
sure that you load it on your uh laptop as well. And what I'm going to do is
as well. And what I'm going to do is simply doing a PD do uh read CSV file.
The CSV file name is called marketing campaign data messy CSV. This is going to be marketing data. And we're going to see this in just a second. What I'm
doing is simply printing some initial information about our data set. So I
want to see the shape for the rows as also the shape for the columns. So let
me run this code here. And uh result here that is printing on the screen is simply having 2,20 rows and 12 columns.
Our data frame is called DF. So I can also show a bit of information about this data frame here in the screen so that we are all clear on which kind of information we are working on. So uh you can see that we have information about
different campaign ids. We have the name of the campaign. So these are all marketing campaign that this business is running. Uh we have the start date and
running. Uh we have the start date and the end date of the campaign. We have
the channel. So this campaign is can be run on Tik Tok, Facebook, email and so on. We have the number of impressions.
on. We have the number of impressions.
We have the number of clicks. We have
the amount spent based on these campaigns. We have the conversion. We
campaigns. We have the conversion. We
know if the campaign is now active or not. We also have the clicks and the
not. We also have the clicks and the campaign tag. So these are lot of
campaign tag. So these are lot of information about campaigns. This is
definitely uh replicating what is happening in a real business scenario because these are uh you know information that are pretty common to find for uh marketing campaigns. And so
yeah, we are just wanted to make sure that we're working with a data set that is representing exactly what's happening in a real business. And now our goal is to make sure that this data set here
that you see in the screen is clean, is ready for analysis because it is definitely not in a right state right now. Maybe you don't see some missing
now. Maybe you don't see some missing stuff. I would challenge you now to you
stuff. I would challenge you now to you know pause this video and just check if you see some anomalies in the screen but for example I see dates here sometimes there is a time stamp sometimes there is not I also see in the spend sometimes
there is the dollar amount sometimes there is no dollar amount and a lot more stuff that maybe you're not seeing it at first sight but we're going to analyze it together and just making sure that we
clean this data so that is ready for analysis. So again data cleaning is such
analysis. So again data cleaning is such an important task for any data analytics task is initial step that we have to make sure uh in order to progress on the more fun thing to do in data analytics which is actually finding insights on
this data set. So if you're ready if this setup is clear as always make sure to check the video description to download all the resources that we're going to use in this video. But if this is clear let's uh get started with this
project. So the way that I'm going to
project. So the way that I'm going to proceed on this project, I simply added here a few sections and the titles of our uh steps that we're going to take for this data cleaning process. So first
step for example is cleaning headers.
And I'm going to go ahead and writing some Python code that we're going to use to clean this data set. And so again, make sure that potentially you have it in your screen so that you can follow along. And so let's start with the first
along. And so let's start with the first thing that we want to clean on our data set which is based on the headers. So
the first thing that I want to show you before we actually clean the data, I want to uh definitely show you what is the problem that we're trying to fix. So
what I'm going to do here is printing and then opening bracket and do a df doc columns. I'm going to move the columns
columns. I'm going to move the columns into a list and then open close brackets. So let me run this part here.
brackets. So let me run this part here.
And this one is simply printing the list of columns that we have. What I see here in our column list is that in some cases we have some spaces at the start of the column name and even at the end of the
column name like this campaign ID. And
these are definitely things that we want to clean up because that can cause some issues later on in our project. And so
what I want to do is doing a df doc columns so that I select all columns that we have in our uh data set. And
then I'm going to do a df doc columns again. All columns would be strings. And
again. All columns would be strings. And
then what I'm going to do is doing a strip which is simply removing all the uh spaces at the start and end of the um of the string that we have. Also I'm
going to do a string uh dot lower because I want to convert uh all the column names into lowerase. And then
also I'm going to do a string dotreplace. And just in case we have a
dotreplace. And just in case we have a extra space I'm going to actually follow the suggestion of Google collab. And let
me remove this part here. But what I want to do here is uh simply replacing the an empty space with a underscore. So
uh this is basically three operations.
So the strip the lower and the replace three operations that are doing only one line of code to make sure that our columns are properly formatted. I'm
going to print a message maybe saying fix applied something like this. And
again I'm going to follow the suggestion of Google Collab. I'm going to print the new dfc columns to list and basically seeing what's the uh new columns that we have. So let me rerun this part here.
have. So let me rerun this part here.
And now as you can see here I have the first one is the list of columns the initial one that we had. So I'm
reprinting it again. And then just as a separation I said here okay fix applied and this now is our new cleaned column names. So as you can see here campaign
names. So as you can see here campaign ID we remove that extra spaces at the start and end of the column name. Also
we brought everything lower case so that is more clean and now we have basically all of our columns that are ready and in a clean clean status. Now on our next step I want to check the column spend
because I showed you already we have some issues there. So again let me try to uh show you what the problem uh is in this column. So I'm going to define a
this column. So I'm going to define a column called uh dirty and then spendore mask. This one is equal to DF and then
mask. This one is equal to DF and then spend. I'm going to convert this as type
spend. I'm going to convert this as type and then string. And then I'm going to follow Google Collab suggestion. I want
to select uh where this column contains this dollar sign in here. Now I can also print where we find this problem. So I'm
going to do print and then open brackets df.lo
df.lo and then here I'm going to put our variable. So dirty. Let me actually
variable. So dirty. Let me actually paste it from here. I'm going to do a comma and then in here I want to just select the campaign ID and then the
column spend and then simply want to show the first three rows of this. So
let me try to run this part here. And in
here as you can see I have uh three campaigns ID that I want to show where the [music] spend we have this kind of dollar sign here that is causing a problem because uh you know we have a number with a string at the start. So
again, we want to make sure that we have a number here because maybe we want to calculate the sum and I do some mathematical operations. So we don't
mathematical operations. So we don't want to have the dollar sign at the start. And so what I can do here to fix
start. And so what I can do here to fix this is again I'm going to follow what Google Collab is telling me to do. So
I'm going to define again a new version of the span column. This span column I'm going to um convert it into a string simply because I want to uh operate with
a string. And so what I'm saying here is
a string. And so what I'm saying here is that I want to replace any characters that is not a digit, a dot or a minus with basically nothing. So this is just
to make sure that we have only kind of a numbers in our string. So that is exactly what this line is doing. And
then now because I convert it into a string, I want to convert it back to a numeric variable. So what I'm going to
numeric variable. So what I'm going to do is I'm going to copy paste this part here. And now I'm going to do a PD dot
here. And now I'm going to do a PD dot to numeric. And then again I'm going to
to numeric. And then again I'm going to select DF spend. And now actually I'm going to grab the same print statement that we had before just to see how these three campaign ids are fixed after the
data cleaning process that we just implemented. So let me rerun this part
implemented. So let me rerun this part here. And as you can see here we have
here. And as you can see here we have our mask. Let's say the initial step
our mask. Let's say the initial step that we had before. And then now with the fix apply we have the right format for the numbers. So you see that there is no more uh dollar sign as we expected. Now you might have noticed and
expected. Now you might have noticed and actually I'm going to implement this strategy a lot in later on in this tutorial that I'm creating a variable all the time called dirty spend mask.
And the reason why I'm creating this variable is that I want to identify what we are trying to clean but I also want to uh refer it back to it to the initial
stage even after our data cleaning process. And so that is why I'm creating
process. And so that is why I'm creating this mask variable just because I want to see the uh before and after if that makes sense. So that is actually a very
makes sense. So that is actually a very good uh common practice when you do data cleaning so that you know you can go ahead with your data cleaning process but also to save a previous version on
how the data looked like originally. Now
let's get on with the next step of our data cleaning process. So uh this is categorical typos. So we're going to do
categorical typos. So we're going to do a fuzzy logic here. So first of all, what is the problem here? So I'm going to do a print and then open brackets df
and then in this case I'm interested in the channel uh column and I want to print uh the unique values from this column. So let's do this part here and
column. So let's do this part here and as you can see here in the channel we have for example Tik Tok returning this way but also tick talk and then also Facebook different ways. So we basically
have a column that is channel but then inside it we have many different values that uh should actually be merged together. So again that's a uh something
together. So again that's a uh something that happens quite a lot of time different values that actually refer to the same thing. And so obviously want to clean this column here. And so how do we
uh do this? Well we can uh create a cleanup map. So let me show you how that
cleanup map. So let me show you how that looks like. So I'm going to do equal and
looks like. So I'm going to do equal and then curly brackets. And so a bit of a manual process here but I'm simply bringing back the right format here. So
first thing is this uh Facebook. So let
me copy from here. So this one Facebook with a semicolon should be this which is the right format. Okay. And then comma this uh Instagram let me find it. This
Instagram here is also wrong clearly. So
this one will become this one here which is the right version. And then comma this one is definitely wrong as well. So
let me put it in the right format. So
Google collab is helping me here. So
perfect. Google becomes Google ads. Tick
talk become Tik Tok and then email.
Yeah, we're going to keep this uh right format here. Also I am going to uh deal
format here. Also I am going to uh deal with the NA. So again I'm going to follow uh Google Collab suggestion. So
I'm going to change this NA into a node value based on NPI library. And so I'm happy with this one. And so now what I need to do is uh redefine uh the F
channel. So again I'm going to follow
channel. So again I'm going to follow Google Collab and uh I'm going to replace the values in the F channel with our cleanup map. And then I'm going to print our kind of a fix applied that we
uh print all the time when we complete our cleaning step. And then now I'm going to reprint again the unique value for channel and see if we uh clean up the the mess that we had before. So let
me run this part here. And now as you can see here again I'm reprinting the original status what we had before and then our fix applied and then our values that now are uh looking like way better
than before. Now what we can do is also
than before. Now what we can do is also handling the mix booleans. So booleans
is values like true and false yes or no.
Uh so these kind of values. So that's
also a step that we can do together.
Again I'm going to follow Google Collab.
I'm going to print what we have in the active column. So you can see here that
active column. So you can see here that [music] we have different values that actually should be uh again boolean as I mentioned. So we have yes but then zero
mentioned. So we have yes but then zero then no true yes. So a bit of a mix of things and pretty much we're going to follow the logic that uh we had before.
So I uh we need to do a bulld map. So
I'm creating my map for the right booleans and then opening the brackets.
And here for example the yes will need to become true and then comma the y and also the one becomes true. Then we have no zero and also another zero for false.
So that's perfect and I think that should clean up all of our messy values for the active column. So now I'm going to again redefine our df and then open
brackets the column is active and this one will be equal to okay again following what we have for Google collab. I'm going to uh use our bull map
collab. I'm going to uh use our bull map to map the right values to what we want in the final uh stage of this column.
Also, in case we have na values, we are going to put false. Then we're going to print our uh usual message fix applied.
And then also we are going to print our active column again and checking the unique values. Going to run this part
unique values. Going to run this part here. And as you can see here again we
here. And as you can see here again we have our first initial value for our column fix applied. And then now we going to only see true or false. Now
let's get into the uh dates column cuz probably there will be some issues there as well. So I want to print uh the data
as well. So I want to print uh the data type for the start date column. So let's
follow what Google Collab is um telling us. And so what you can see here is that
us. And so what you can see here is that our start date is not a date as a data type but is actually an object. So we
want to make sure that this start date is a actual date. So that again when we work with this date uh all of our operations will not fail and cause errors. And so what we want to do here
errors. And so what we want to do here is simply changing the uh type for the start date. I'm going to again follow
start date. I'm going to again follow Google Collab as a suggestion. What we
are going to do is using pandas to uh convert the start date into a date time.
And if there are some errors, if for some reasons there are uh things that we cannot convert into a datetime, then we use this course that uh will turn that
unparable format into not a time value.
And because we also have the end date in our uh file in our data set. Then we're
going to do the same also for the end date. So let me put this value here
date. So let me put this value here again following Google Collab suggestion. And then we're going to
suggestion. And then we're going to print our fix supply. Actually let me take it from here. And then I'm going to print again the uh data type for our column start date. So let me uh go into
this one and then if I run it again again we have the object which is our initial format. then fix applied and
initial format. then fix applied and then now we have the new data type for our column start date that is a date time as we wanted. Now the other stuff that I want to check is actually
integrity between the clicks and the impressions. So uh it is not possible
impressions. So uh it is not possible that clicks are higher than impressions.
It's logical. Impression is when something appears on the screen of a person and then the click is actually when that person clicks into something clickable on the campaign. It's not
possible that there are more clicks than impression event if that makes sense because it means that there are more clicks compared to how many times something appeared in a screen of a person. So what we want to try to see
person. So what we want to try to see here is I'm going to again define an impossible I'm going to call it impossible mask. And this one will be
impossible mask. And this one will be equal to our DF clicks higher than DF impressions. Let me remove this part
impressions. Let me remove this part here. And then now I'm going to again
here. And then now I'm going to again follow Google Collab. I'm going to print and I'm going to locate the impossible mask. And for this impossible mask, I
mask. And for this impossible mask, I just want to see the campaign ID, impression clicks, and only uh three rows for now. So, let me run this part
here. And uh okay, so we got an error
here. And uh okay, so we got an error and I think that I know why we have this error. So, let me um print again our
error. So, let me um print again our data frame. The error that I think we
data frame. The error that I think we have is something that I noticed before when loading our data frame. we have
actually two columns that are called clicks. So I think this is causing a
clicks. So I think this is causing a problem in this part of the code. And so
what I'm going to do is actually pasting uh this part of the code here, which is basically saying uh locate the columns that are duplicated and remove the duplicates. So I'm going to run this
duplicates. So I'm going to run this part here. Then I want to check again if
part here. Then I want to check again if our data frame now looks okay. And as
you can see here, we have one clicks column and that is it. The other one disappeared. So this is perfect and uh
disappeared. So this is perfect and uh running as expected and so now let me try again this part of the code here and uh now this is not causing any error and also it's telling me empty data frame.
So it seems that there is no instances where clicks are higher than impressions. So that is actually kind of
impressions. So that is actually kind of a data quality check that we have in place and everything is uh clean uh in this regards. So [music] this is
this regards. So [music] this is perfect. Uh we can move on to the next
perfect. Uh we can move on to the next step which is uh logical integrity for time travel. So what I mean by this is
time travel. So what I mean by this is that because we have start date and end date, I want to make sure that we uh have kind of chronological order between start and end date and no instances
where the end date is actually in the past compared to the start date. So I'm
going to follow Google collab suggestion here. So I define time travel mask as
here. So I define time travel mask as end date less than the start date. And
then I want to uh locate the time travel mask. And I want to check campaign ID,
mask. And I want to check campaign ID, start date, end date and only the first three appearances. So let me go here.
three appearances. So let me go here.
And what I can see here is that this campaign for example has a start date on the 6th of May. And actually the end date is uh in the past. So the 1st of May. So this is definitely a problem. So
May. So this is definitely a problem. So
now how do we fix this? Well, there's
not really like a right or wrong answer.
I'm going to make an assumption here.
And I'm going to assume that the end date is 30 days uh from the start date.
So I'm going to change the end date for these appearances. Uh maybe because I
these appearances. Uh maybe because I know that campaigns usually are lasting for 30 days. So that is an assumption that I'm making based on maybe my knowledge of this data set. And so
that's what uh I'm going to um input for this data cleaning step. And so again, I'm going to uh do a uh df.lo. I'm going
to select our time travel mask and specifically the end date because that's what I want to change. And then I'm going to do another df.lo and following the uh Google collab suggestion here and
having the uh star date. And then from the start date I'm going to do a plus bd time delta and I'm going to add 30 days.
Now I'm going to print our usual message. Let me get it from above. So
message. Let me get it from above. So
I'm going to print our fix applied and then I'm also going to print DF basically yeah the same message that before I want to again check the instances where we might still have the
error. So let me uh run this part here
error. So let me uh run this part here and as you can see here we have the campaigns fix applied and then now the same campaign is starting on the 6th of May but actually ending on the 5th of
June. So again I'm adding 30 days from
June. So again I'm adding 30 days from the start date. So this is kind of a just a demonstration that the uh the code is working as expected. Now the
next one is going to be uh a bit more let's say advanced but also I think it is very interesting as a use case. So
I'm going to handle outliers that we find in our initial data set. So the
thing that I want to actually show is when uh the in the column spend we have massive outliers. So there are different
massive outliers. So there are different ways to do this but uh one way to do this is actually by checking the interquartile range which actually find
the middle spread of our data. So in
order to do this what I'm going to do is creating the first variable called uh Q1 and um yeah basically Google Collab is already suggesting what I want to do.
I'm going to use the column spend and I'm going to find the uh 25th quantile and then I'm going to define Q3 which is the uh 75th uh quantile. So then I'm
going to define the inter quartile range which is IQR and this one will be equal to Q3 minus Q1. Perfect. And then now what I can do is actually defining upper
limit. So I'm going to do upper limit
limit. So I'm going to do upper limit here. And I want to define what is the
here. And I want to define what is the let's say maximum value that I can accept in the column spend. Again this
is more like an assumption or something that I decide in terms of like this data cleaning step. And the way that I'm
cleaning step. And the way that I'm going to define the upper limit is basically starting from Q3 and then summing three times the interquartile range. And then again as suggested by
range. And then again as suggested by Google Collab, I'm going to define this outlier mask. And I want to check the
outlier mask. And I want to check the instances where the spend is higher than the upper limit. By the way, I'm going to tell you why this is the logic that I'm using. I'm going to just going to
I'm using. I'm going to just going to check the campaign ideas where we have the uh spend that is an outlier. So let
me again follow the suggestion here. So
let me run this bit here and uh I just print in three instances where yeah the span is way higher than the upper limit that I just defined and again just trying to explain the logic here. I'm
basically uh calculating the interquartile range which is kind of like an idea here is that I want to see what is the usual range for the value spend. So what is considered like a
spend. So what is considered like a normal uh variation for this value and then I'm saying uh let's get the Q3. So
pretty much kind of on the end tail of this span column. And then I'm going to add three times the uh usual range for this column. And then I'm saying yeah I
this column. And then I'm saying yeah I don't want to see anything that is higher than this kind of three times the interquartile range. And then these are
interquartile range. And then these are the instances where we have these outliers. As always I'm going to then
outliers. As always I'm going to then print our message fix applied. So let me put this one here. And then what I'm going to do is for those outlier mask in the column spend I will cap it to the
upper limit. So instead of removing
upper limit. So instead of removing these values, I'm going to say okay, that is too much as an outlier. So I'm
just going to put the upper limit.
Again, that is a decision that I'm making. It's not necessarily the right
making. It's not necessarily the right decision. It's just kind of a logic that
decision. It's just kind of a logic that I'm using to clean these sort of outliers. And then I'm going to print
outliers. And then I'm going to print the uh instances that we had before. So
same line that we had before. Let me put this one here. And then as you can see here, this was the original values. And
then now instead of these massive values, I'm putting our upper limit.
Then in our case is this [music] 8603.
Now the other cool thing that we can do as part of our data cleaning process is actually a featured extraction which means that I'm going to create a new column based on an existing one. So let
me explain exactly what I want to do here. So here I want to print the
here. So here I want to print the campaign name. I'm just going to print
campaign name. I'm just going to print the first three just to show you that in the campaign name we have something like this like Q4 and then summer and then ID. And what I want to extract is
ID. And what I want to extract is actually uh the the season here. So,
summer or launch or winter cuz maybe that's an extra information that I want to get from the campaign name. So, what
I'm going to do here is I'm going to define a new column called season and this one is going to be equal to yeah I'm going to follow again the suggested
Google collab. going to get the campaign
Google collab. going to get the campaign name and uh this [music] thing here are definitely very hard to read is basically saying that I want to extract only what is present after the first
underscore and the second underscore which is exactly this text here the summer I'm going to print our fix applied uh as always our uh message in our terminal and then again I'm going to
get the uh three instances here but then now apart from the campaign name I also want to select season which is our new
column. I think we are missing uh
column. I think we are missing uh brackets here and here as well. Let me
try to run this one. And uh there you go. We have the campaign name and also
go. We have the campaign name and also this new column called season where we are just extracting something from the campaign name which we might uh use uh later on for our analysis. And there you
have it. I implemented nine different
have it. I implemented nine different steps for our data cleaning process. Use
different techniques, different assumption and logics. But hopefully uh you have seen end to end how I u reason about data cleaning. How to actually clean your data and make it ready for
analysis using Python and different techniques like uh you know extracting string from text handling outliers. Then
we seen also how to work with the dates.
We check for integrity of our data. We
check for duplicate columns. We change
the data type for dates. We also uh handle mix uh booleans and uh categorical typos. We also fixed uh
categorical typos. We also fixed uh numerical uh values as well and also we clean the headers of our data frame. So
a lot of stuff that we implemented today. So hopefully this was useful. And
today. So hopefully this was useful. And
there you have it, a full end toend data cleaning project using Python. If you
found at least one useful information in this video, make sure to like and subscribe to my channel so that I can help you even further in the next videos. And if you like what we've done
videos. And if you like what we've done in this video and you want me to help you land a job offer in data analytics, then definitely make sure to check out my analytics and automation academy. You
have the link in video description and I definitely encourage you to apply if you're serious about this type of career. We still have some spots open.
career. We still have some spots open.
So [music] definitely make sure to check it out. I will also leave here in the
it out. I will also leave here in the screen another video that I made on data cleaning but this time done with SQL.
Loading video analysis...