1. Algorithms and Computation
By MIT OpenCourseWare
Summary
## Key takeaways - **Algorithms: Solve + Prove + Communicate**: The course teaches you to solve computational problems, communicate that your solutions are correct and efficient, and prove those properties rather than just coding. [01:08], [02:47] - **Problem as Input-Output Relation**: A computational problem is a set of inputs, a set of outputs, and a binary relation specifying correct outputs for each input, often defined via a checkable predicate. [03:44], [05:01] - **Birthday Match Algorithm**: Interview students in order: maintain record of birthdays, check if new student's birthday matches any in record and return pair if yes, else add to record; return none at end. [11:05], [13:11] - **Induction Proves Correctness**: Inductive hypothesis: if first k students contain a match, algorithm returns it before interviewing k+1. Base: k=0. Step: cases where first k has match (done by IH) or not (check k+1 against record). [19:16], [24:17] - **Efficiency via Asymptotic Analysis**: Measure performance by counting fundamental operations relative to input size n using Big-O, not wall-clock time, as it abstracts machine differences; polynomial good, exponential bad. [27:46], [34:05] - **Word RAM Model**: Word RAM assumes random access to byte-addressable memory, CPU operates on fixed-size words (e.g., 64 bits) in constant time for arithmetic, comparisons, reads/writes. [37:17], [42:14]
Topics Covered
- Algorithms Prioritize Proof Over Code
- Problems Are Bipartite Input-Output Relations
- Induction Proves Large-Scale Correctness
- Efficiency Means Polynomial Beats Exponential
- Word RAM Limits Scale by Bits
Full Transcript
good morning everybody my name is jason kuh i'm going to be teaching uh this class in uh introduction to algorithms with uh two other instructors here uh faculty
in the department uh eric domain and justin solomon uh they're excellent people and so uh they will be working on uh teaching
this class with me i will be teaching the first lecture and we'll have uh each of them teach one of the next two lectures
and then we'll go from there yeah so uh that's this is intro to algorithms okay so we're going to start talking about this uh course content now what is this course about
it's about algorithms right introduction algorithms really what the course is about is teaching you to solve computational problems but it's more than that it's not just
about teaching you to solve computational problems so solve goal one solve computational
problems but it's more than that it's also about communicating those solutions to others and being able to communicate that your
way of solving the problem is correct and efficient okay so it's about two more things uh
prove correctness argue efficiency and in general it's about
communication communication i can't spell by the way uh communication of these ideas and you'll find that over the course of this class you'll be doing a lot more
writing than you do in a lot of your other courses it really should maybe be a ci kind of class because you'll be doing a lot more writing than you will be coding
for sure so it's really imp of course solving the the computational problem is important but really the thing that you're getting out of this class and other theory classes
that you're not getting in other classes in this department is that we really concentrate on being able to prove that the things you're doing are correct and better than other things and being
able to communicate those ideas to others and not just to a computer right to other people convince them that it's correct okay so that's what this class is about
so what do we what do i mean when i say solve a computational problem what is a problem what is an algorithm right i know people make fun of me because i start
with this question but i mean anyone want to answer that question no what's a problem computationally no okay so it's not such a stupid
question yeah something you want to compute okay yes that's that's true right but kind of a little bit more abstractly what i'm going to think of
a computational problem being and this is where uh kind of your prerequisite and discrete mathematics should come in
right it's a problem is kind of you've got a set of inputs inputs okay maybe i have uh one two three
four five possible inputs i could have to my algorithm okay then i have a space of outputs okay outputs and maybe i don't know maybe i have more
of them than i do inputs but these are the possible outputs to my my problem and what a problem is is a binary relation between these inputs and
outputs essentially for each input i specify which of these outputs is correct right it doesn't necessarily have to be one
right if i say give me the index in an array containing the value five there could be multiple fives in that array and so any of those indices would be
correct so uh maybe this guy maps to that output and maybe this guy maps to i don't know two or three outputs
right this input goes to one two i don't know there's a there's some kind of mapping here these edges represent a binary relation it's kind of a graph a bipartite graph
between these inputs and outputs and these are specifying which of these outputs are correct for these inputs all right that's really the formal definition of
what a problem is now generally if i have a problem and a computational problem i'm not going to specify the problem to you by saying
okay for input 1 the correct answer is 0. and for input 2 the correct answer
is 0. and for input 2 the correct answer is 3 and so on and so forth that would take forever right usually what we do when defining a problem is specify some kind of predicate
saying that oh we can check if i give you an input and an output i can check whether that output is correct or not that's usually how we define
a problem is if i i'm checking for whether this index contains a five i can just go to that array look at index five and say or that the index you gave me
and see if it equals five right so usually we're putting it in terms of predicates because in general we don't really want to talk about small instances of problems so let's say
i had the problem of among the students in this classroom do any pair of you have the same birthday okay all right well probably if there's more
than 365 of you the answer is yes right right that by what pigeonhole principle right two of you must have the same birthday right but
so let's let's generalize it a little bit say that i don't know i need a bigger space of birthdays for this question to be interesting maybe i tack on the year maybe i tack on
the hour that you were born and that's a bigger space of inputs and i wouldn't necessarily expect that two of you would be born in the same year on the same day in the
same hour right that would be a little less likely in fact as long as that space is larger than something like the square of the number of you
right then i'm less likely than then than even to uh to have a pair of you that that's a kind of a birthday problem you may have
seen in uh 0-4-2 potentially right but in general i don't i don't i'm not going to mess with probability so much here
i want a deterministic algorithm right a way of checking whether two of you have the same birth time let's say okay so uh in general in this class
we're not going to concentrate on input such as is there a pair of you in this class that have the same birthday that's kind of boring right uh i i could
just say well uh i i could do a lot of different things but what we we do in this class this is first a fixed
classroom of you i want to make algorithms that are general to any classroom right to go to your recitation i want an algorithm that will apply to your recitation an algorithm that not only applies to
this classroom but also the machine learning class before you right i want an algorithm that can change its it can accept an arbitrarily sized
input right here we have a class of maybe 300 400 students right but i want my algorithm to work for a billion students right like maybe i'm trying to check if
there's a match of something in the facebook database or something like that right okay so in general uh
we are looking for general problems that have arbitrarily [Music]
sized inputs right so these these inputs could grow very large but we want a kind of a fixed
size algorithm to solve those problems okay so what is an algorithm then i really can't spell i told you okay
i didn't lie to you um so an algorithm is a little different than a problem of problem specification i can state to you what this graph
or i can tell you what this graph looks like an algorithm is really i don't know what the outputs are i don't know what these edges are but i want to i want a fixed size kind of
machine or procedure that if i give it an input it will generate an output and if it generates an output it better be one of these correct outputs
right so if i have an algorithm that takes in this input i really want it to output this output or else it's not a correct algorithm similarly if for this one it could
output any of these three outputs but if it outputs this guy for this input that would not be a correct algorithm right
and so generally what we want is a an algorithm is a function it takes inputs to outputs right an algorithm is some kind of function right that takes these inputs maps it to
a single output and that output better be correct based on our problem okay so that's what our algorithm is uh it solves the problem if it returns a
correct output for every uh problem input that is in our domain okay and the example for uh does anyone have a possible algorithm
for checking whether any two of you have the same birth time as specified before i'm going to let someone else have a try sure just ask everyone one by one
ah great so uh what your colleague has said is a great algorithm well essentially what it's going to do is i'm going to enter i'm going to put you guys in some order okay i'm going to give each of you a
number one through however many number of students there are in this class and i'm going to interview you one by one right i'm going to say what's your birthday anytime we'll write it down
right i'm going to put in in some kind of record okay and then as i keep interviewing you i'm going to find out your birthday i'm going to check the record i'm going to look
through all the birthdays in the record if i find a match right then i return yay i found a pair and i can stop otherwise if i get through the record list i
don't and i don't find a match i just stick you at the end of the record right i add you to the record and then i move on to the next person i keep doing this okay so that's that's a proposed algorithm for this birthday problem
for birthday problem what's the algorithm here
maintain a record interview students students in some order and what does interviewing a student
mean it means two things it means check if birthday in
record and if it is return a pair so return
pair otherwise add a new student to record
and then at the very end if i go through everybody and i haven't found a match yet i'm going to return that there is none right okay so that's a statement of an
algorithm that's kind of the level of description that will be looking for you in the theory parts of this theory questions that we ask you on your problem sets right it's a verbal
description in words that you know it's maybe not enough for a computer to know what to do but if you said this algorithm to any of your
your friends in this class right they would at least understand what it is that you're doing yeah does an algorithm have to be a pure function mathematical pure function in the does a
an algorithm have to be a pure function in the mathematical sense uh as in it needs to map to a single output so we're talking about kind of a
functional programming definition of uh of a of a of a function right this is um i am talking about the mathematical
i have a binary relation and this thing has an output for every input and there is exactly one output to every input that's that's the mathematical definition of
function that i'm using for when i'm defining an algorithm yeah so basically is an algorithm like uh like a plan yeah an algorithm is a
procedure that somehow i can do whatever i want but i have to take one of these inputs and i have to produce an output and at the end it better be correct right so it's just a procedure you can
think of it as like a recipe you can think of it it's just some kind of procedure right it's a a sequence of things that you should do and then at the end you will return an output
okay so here's a possible algorithm for solving this birthday problem okay now i've given you what i argue to you or i'm asserting to
you is a solution to this birthday problem and maybe you guys agree with me and maybe some of you don't right so how do i convince you that this is
correct right well if i had you know if i was just running this algorithm on say the four students in the front
row here right i could argue it pretty well to you i could go through every you know i could assign these for people
birthdays in various combinations of either their none of them have the same birthday some two of them have the same birthday i could try all possibilities and i could go through lots of different
possibilities i had to check that this algorithm returns the right answer in all such cases right but when i have i don't know 300 of you that's going to be a little bit more difficult to argue
and so if i want to argue something is correct in mathematics i want to prove something to you for some large value what kind of technique do i use to prove such things
yeah induction right and in general what what we do in this class what we do is as a computer scientist is we write a
constant sized piece of code right that can take on any arbitrarily large size input right if it's if the input can be
arbitrarily large but our code is small then that code needs to loop or recurse or repeat some of these lines of code in
order to just read that output right and so that's another way you can arrive at this conclusion that we're going to probably need to use recursion induction
and that's part of the reason why we ask you to take a course on proofs and and inductive reasoning and discrete mathematics before this class
okay so how do we prove that this thing is correct we got to use induction so how can we set up this induction what do i need for an inductive proof
sure base case we need a base case uh we need some kind of uh a predicate yeah but we need some kind of statement of a hypothesis of
something that should be maintained right and then we need to have an inductive step which basically says i take a small value of this thing i use the inductive hypothesis and i
argue it for a larger value of my well-ordered set that i'm inducting over right okay so in uh for this algorithm if we're going to try
to prove correctness what i'm going to do is i'm going to what do i want to prove for this thing that at the end of interviewing all of you that my
algorithm has either already it has returned with a pair that match or if we're in a case where there wasn't a mat it wasn't a pair somewhere in my set
that it returned none right that would be correct right so how can i generalize that concept to make it something i can induct on right
what i'm going to do is i'm going to say let's say after i've interviewed the first k students right if there was a match in those first k
students i want to be sure that i ever returned a pair right because if after i interview all of you i've maintained that property
then i'll be sure at the end of the process i will have returned a pair if one exists so here's
going to be my inductive hypothesis sis okay if first k
students contain a match algorithm returns a match before interviewing say student
k plus 1. okay so that's going to be my inductive hypothesis now if there's n students in this class right and at the end of my thing i'm trying to
interview student n plus one oh student n plus one is not there if i have maintained this then if i replace k with n then i will
have returned a match before interviewing the last student or the when i have no more students left and then this algorithm returns none
as it should right okay so this inductive hypothesis sets up a nice variable to induct on right this k i can have increasing
up to n starting at some base case so what's my base case here my base case is the easiest thing i can do sure two that's an easy thing i could do
i could check those possibilities but there's an even easier base case yeah there's an even easier base case than one zero right after interviewing
zero students i haven't done any work right certainly the first zero can't have a match right
and so this predicate this inductive hypothesis is true just because this uh initial predicate is false
right so i can say you know base case zero check definitely this predicate holds for that
okay now we get to go for the uh the meat of this thing right assume the inductive hypothesis
true for k equals say some k prime okay and we're considering k prime plus one right
then we have two cases one of the nice things about induction is that it isolates our our problem to not consider everything all at once but break it down into
a smaller interface so i can do less work at each step right so there are two cases either the first k
already had a match right in which case by our inductive hypothesis we've already returned the correct answer right the other case is
the it doesn't have a match and we interview the k plus one student the k prime plus one student if there is
a match in the first k prime plus one students then it will include k plus k prime plus 1 the student k prime plus 1 because you know otherwise there would
have been a match in in the things before it right so there are two cases if k contains
match k prime if first k contains match already returned
turn turn it by induction right else if k prime plus one students contains match
the algorithm checks all of the possibilities k prime checks cape against
against all students essentially by brute force it's a case analysis i check all of the possible possibilities right this
check if birthday is in record i haven't told you how to do that yet but if i'm able to do that i'm going to check if it's in the record if it's in the record
then there will be a match and i can return it otherwise i have uh re-established the inductive high policy
hypothesis for the k prime plus one students does that make sense guys yeah okay so that's how we prove correctness this is
a little bit more formal than we would ask you to do in this class all the time but it's definitely sufficient you know for the levels of arguments that we'll ask you to do the bar that
we're usually trying to set is if you communicated to someone else taking this class what your algorithm was they would be able to code it up and tell a stupid computer how to do that
thing okay so any any questions on induction you're going to be using it throughout this class and so if you're
unfamiliar with this line of argument then you should go review some of that that would be good okay so that's correctness being able to
communicate that the problem the algorithm we stated was correct now we want to argue that it's efficient
right what does efficiency mean efficiency just means not only how fast does this algorithm run but
how fast does it compare to other possible ways of approaching this problem right so how could we measure how fast an algorithm
runs this is kind of a silly question yeah yeah just well i mean just record the time it takes for a computer to do this thing right now
there's a problem with just re coding up an algorithm telling a computer what to do and timing how long it takes why
yeah it would depend on the size of your data set okay we expect that but there's a bigger problem there yeah it depends on the strength of your
computer right so i would expect that um you know if i had a watch calculator and i programmed it to do something
right that might take a lot longer to solve a problem than if i asked you know ibm's research computer
right to solve the same problem using the same algorithm even with the same code right because its underlying operations are much faster right how it runs is
much faster so i don't want to count how long it would take on a real machine i kind of want to abstract the time it takes the machine to do stuff out of the picture what i kind of want to say is
let's assume that each kind of fundamental operation that the computer can do takes some fixed amount of time okay how many of those kinds of fixed operations
does the algorithm need to perform to be able to solve this problem right so here we don't
don't measure time instead count kind of fundamental operations okay we'll get to what some of those
fundamental operations are in a second but the idea is we want a measure of how well an algorithm uh performs not necessarily an
implementation of that algorithm right kind of an abstract notion of how well this algorithm does and so what we're going to use to measure
time or efficiency right is something called asymptotic analysis anyone here understand what asymptotic analysis is probably since it's in
both of your prerequisites i think uh but we will go through a a formal definition of uh asymptotic notation in uh recitation tomorrow
and you'll get a lot of practice in comparing functions using an asymptotic analysis but just to give you an idea
right the idea here is we don't measure time we instead measure ops and like your colleague over here was saying before we expect uh performance
i'm going to use performance instead of time here we expect that to depend on size of
our input right if we're trying to run an algorithm to find a birthday uh in this section we expect the algorithm to run in a
shorter amount of time than if i were to run the algorithm on all of you right so we expect it to perform differently depending on the size of the input
and how differently uh is how we measure performance uh relative that input usually we use n as a variable for what the size of our input is
right but that's not always the case so for example if we have a an array that i give you an n by n ray right that we're going to say n but
what's the size of our input how much information do i need to convey to you to give you that information it's n squared right so that's the size of our input in that
that context right or if i give you a graph it's usually the number of vertices plus the number of edges that's how big how much space i would need to convey to you
that graph that information okay so you we compare how fast an algorithm is with respect to the size of the input right and if that
we'll use the asymptotic notation we have big o notation which corresponds
to upper bounds we'll have omega which corresponds to lower bounds
and we have theta which corresponds to both right this thing is tight it is bounded from above and below by a function of this form okay
now we have a couple common ways a com a couple common functions that algorithms their running time we have a couple common functions that uh relate
an algorithm's input size to its performance some some things that we saw all the time does it can anyone give me some of those say again sorry sorry so like
a function i'm not asking this question well but has anyone heard of a linear algorithm a linear time algorithm
right that's basically saying that the side the running time of my algorithm the performance of my algorithm is linear with respect to the size of my input right
yeah say it again like putting something in a list okay so that's there's there's a lot behind that question that we'll go into later
uh this week uh but that's an example of if i do it in a silly way i stick something in the middle of a list and i have to move everything that's an operation that could take
linear time right okay so uh linear time is is a type of function we've got a number of these i'm going to start with uh
this one does anyone know what this one is constant time okay basically no matter how i change the input the amount of time this running time the
the performance of my algorithm takes it doesn't really depend on that okay the next one up is something like this this is logarithmic
time okay we have theta n which is linear and log n right
sometimes we call this log linear but we usually just say n log n okay we have a quadratic running time in general
if i have a constant power up here right it's n to the c for some constant this is what we call polynomial time
right we're as long as c is some constant and this right here is what we mean by efficient in this class usually right in
other classes right when you have big data sets maybe this is efficient right but in this class generally what we mean is polynomial and as you get down this
thing these are things are more and more efficient okay there's one class i'm going to talk to you about over here which is something like uh
let's do this 2 to the theta of n right exponential time this is some constant to a function of n
that's say super linear uh that's going to be uh you know pretty bad why is it pretty bad if i were to plot some of these things
as a function of n right let's say i plot values of up to a thousand on on my n scale here okay
what does constant look like maybe this is a thousand up here too what does a constant look like looks like a line right it looks like a line over here
somewhere it could be as high as i want but eventually anything that's an increasing function will get bigger than this right and on this scale if i
use log base 2 or some reasonable small constant what does log look like well let's let's do an easier one what does linear look like
yeah this right that's what i saw a lot of you doing okay that's linear that's the kind of base that we're comparing everything
against what does log look like like this okay but on at this scale right at this scale
really it's much closer to constant than linear and actually as n gets much much larger this almost looks like a straight line it almost looks like a constant
so log is almost just as good as constant right what does exponential look like it's the exact inverse of this thing right
right it's almost an exact straight line going up right so this is crap this is really good almost anything in this region over here is
better right at least i'm gaining something uh i i'm not i'm i'm able to not go up too high relative to my input size so quadratic i don't know something like
this and n log n is something like this and log n after a long time really starts just looking linear with a a constant multiplied
in front of it right okay so these things good that thing bad okay that's what that's trying to convey all right
so how do we measure these things if if i don't know what my fundamental operations are that my computer can
can can use right so we need to define some kind of model of computation for what our computer is allowed to do in constant time in a fixed amount of
time right in general what we use in this class is is a machine called a word ram which we you know use for its theoretical
brevity all right word ram it's kind of a loaded term what do these things mean it means i have
what can some does someone know what ram means random access memory right it means that i can randomly access different places in memory
in constant time that's that's the assumption of random access memory basically what our model of a computer is is you have memory memory which is essentially just a
string of bits it's just a bunch of ones and zeros right and we have a computer like a cpu right
which is really small it can basically hold a small amount of information but it can change that information right it can operate on that information and it also has instructions to randomly
access different places in memory bring it into the cpu act on it and read it back that makes sense but in general
we don't have an address for every bit in memory every zero and one in memory we actually does anyone know how modern computers are addressed okay so so we're gonna get there
actually what a modern computer is addressed in is bytes okay collections of eight bits so there's an address i have for every eight bits in memory consecutive eight bits in memory and so
if i want to pull something in into the cpu i give it an address it'll take some chunk right and bring it into the the cpu operate it on it and spit it back okay
how big is that chunk this goes to the the answer that you were asking which or saying which is it's some sequence of some fixed number
of bits which we call a word okay a word is how how big of a chunk that the the cpu can take in from memory at a time and
operate on okay in your computers how big is that word size 64 bits right that's how much i can operate on a time
when i was growing up when i was your age okay my word size was 32 bits and that actually was a problem for my computer
because in order for me to be able to read to address in memory i need to be able to store that address in my cpu in a word
right but if i have 32 bits how many different addresses can i address i have a limitation on the memory addresses i can address right
so how many different memory addresses can i address with 32 bits 2 to the 32 right that makes sense well if you do that calculation out how big of a hard disk
can i have to access it's about four gigabytes right so when i in my day all hard drives were limited to you know being partitioned even if you
had a bigger than four gigabyte hard drive i had to partition it into these four gigabyte chunks which you know the the computer could then read on to
right that was very limiting actually that's a restriction with 64 bits what's my limitation on memory that i
can address byte addressable turns out to be something like uh 20 exabytes to put this in context
all data that google stores on their servers on all drives throughout the world it's about 10 right so we're not going to run out of this limitation
very soon right so what do we got we've got a cpu it can address memory what are the operations i can do in this cpu
i have binary operations i can compare two words in memory and i can either do you know integer
arithmetic kind of logical operations bitwise operations but we're not going to use those so much in this class
and i can read and write from an address in memory a word in constant time those are the operations that i have available to me on most cpus
some cpus give you a little bit more power but this is generally what we analyze algorithms with respect to okay but you'll notice that my cpu is only built to operate on a constant amount of
information at once generally two words in memory an operation produces a third one and i spit it out right it takes a constant amount of time to
operate on a constant amount of memory if i want to operate on a linear amount of memory end things how long is that going to take right if i just want to read everything
in that thing it's going to take me linear time right because i have to read every part of that thing okay so in general what we're going to do for the first
half of this class mostly first eight lectures anyway is talk about data structures
and it's going to be concerned about not operating on constant amount of data at a time like our cpu is uh doing but instead what it's going to do is operate
on store a large amount of data and support different operations on that data okay so if i had a record that i want to maintain to store those birthdays that we had before
i might use something like a static array right which you guys maybe are not familiar with if you have been working in python as your only programming language
okay python has a lot of really interesting data structures like a list and a set and a dictionary and all these kinds of things that are actually not in this model there's actually
a lot of code between you and the computer and it's not always clear how much time that interface is taking right and so we're going to do
starting on thursday is talk about ways of storing a non-constant amount of information to make operations on that information faster
so just before you go i just want to give you a quick overview of the class to solve an algorithms class in algorithm problem in this class we
essentially have two different strategies we can either reduce just to using the solution to a problem we know how to solve or we can design our own algorithm which
is going to be recursive in nature we're going to either put stuff in the data structure and solve a sorting problem or search in a graph and then to design a recursive algorithm
we have various design paradigms this is all in your notes but this is essentially the structure of the class we're going to spend quiz 1 the first to eight lectures on data structures
and sorting second quiz will be on shortest paths algorithms and graphs and then the last one will be on dynamic programming okay that's the end of the first lecture thanks for coming
Loading video analysis...