1. Algorithms and Computation

By MIT OpenCourseWare

Summary

## Key takeaways - **Algorithms: Solve + Prove + Communicate**: The course teaches you to solve computational problems, communicate that your solutions are correct and efficient, and prove those properties rather than just coding. [01:08], [02:47] - **Problem as Input-Output Relation**: A computational problem is a set of inputs, a set of outputs, and a binary relation specifying correct outputs for each input, often defined via a checkable predicate. [03:44], [05:01] - **Birthday Match Algorithm**: Interview students in order: maintain record of birthdays, check if new student's birthday matches any in record and return pair if yes, else add to record; return none at end. [11:05], [13:11] - **Induction Proves Correctness**: Inductive hypothesis: if first k students contain a match, algorithm returns it before interviewing k+1. Base: k=0. Step: cases where first k has match (done by IH) or not (check k+1 against record). [19:16], [24:17] - **Efficiency via Asymptotic Analysis**: Measure performance by counting fundamental operations relative to input size n using Big-O, not wall-clock time, as it abstracts machine differences; polynomial good, exponential bad. [27:46], [34:05] - **Word RAM Model**: Word RAM assumes random access to byte-addressable memory, CPU operates on fixed-size words (e.g., 64 bits) in constant time for arithmetic, comparisons, reads/writes. [37:17], [42:14]

Topics Covered

Algorithms Prioritize Proof Over Code
Problems Are Bipartite Input-Output Relations
Induction Proves Large-Scale Correctness
Efficiency Means Polynomial Beats Exponential
Word RAM Limits Scale by Bits

Full Transcript

good morning everybody my name is jason kuh i'm going to be teaching uh this class in uh introduction to algorithms with uh two other instructors here uh faculty

in the department uh eric domain and justin solomon uh they're excellent people and so uh they will be working on uh teaching

this class with me i will be teaching the first lecture and we'll have uh each of them teach one of the next two lectures

and then we'll go from there yeah so uh that's this is intro to algorithms okay so we're going to start talking about this uh course content now what is this course about

it's about algorithms right introduction algorithms really what the course is about is teaching you to solve computational problems but it's more than that it's not just

about teaching you to solve computational problems so solve goal one solve computational

problems but it's more than that it's also about communicating those solutions to others and being able to communicate that your

way of solving the problem is correct and efficient okay so it's about two more things uh

prove correctness argue efficiency and in general it's about

communication communication i can't spell by the way uh communication of these ideas and you'll find that over the course of this class you'll be doing a lot more

writing than you do in a lot of your other courses it really should maybe be a ci kind of class because you'll be doing a lot more writing than you will be coding

for sure so it's really imp of course solving the the computational problem is important but really the thing that you're getting out of this class and other theory classes

that you're not getting in other classes in this department is that we really concentrate on being able to prove that the things you're doing are correct and better than other things and being

able to communicate those ideas to others and not just to a computer right to other people convince them that it's correct okay so that's what this class is about

so what do we what do i mean when i say solve a computational problem what is a problem what is an algorithm right i know people make fun of me because i start

with this question but i mean anyone want to answer that question no what's a problem computationally no okay so it's not such a stupid

question yeah something you want to compute okay yes that's that's true right but kind of a little bit more abstractly what i'm going to think of

a computational problem being and this is where uh kind of your prerequisite and discrete mathematics should come in

right it's a problem is kind of you've got a set of inputs inputs okay maybe i have uh one two three

four five possible inputs i could have to my algorithm okay then i have a space of outputs okay outputs and maybe i don't know maybe i have more

of them than i do inputs but these are the possible outputs to my my problem and what a problem is is a binary relation between these inputs and

outputs essentially for each input i specify which of these outputs is correct right it doesn't necessarily have to be one

right if i say give me the index in an array containing the value five there could be multiple fives in that array and so any of those indices would be

correct so uh maybe this guy maps to that output and maybe this guy maps to i don't know two or three outputs

right this input goes to one two i don't know there's a there's some kind of mapping here these edges represent a binary relation it's kind of a graph a bipartite graph

between these inputs and outputs and these are specifying which of these outputs are correct for these inputs all right that's really the formal definition of

what a problem is now generally if i have a problem and a computational problem i'm not going to specify the problem to you by saying

okay for input 1 the correct answer is 0. and for input 2 the correct answer

is 0. and for input 2 the correct answer is 3 and so on and so forth that would take forever right usually what we do when defining a problem is specify some kind of predicate

saying that oh we can check if i give you an input and an output i can check whether that output is correct or not that's usually how we define

a problem is if i i'm checking for whether this index contains a five i can just go to that array look at index five and say or that the index you gave me

and see if it equals five right so usually we're putting it in terms of predicates because in general we don't really want to talk about small instances of problems so let's say

i had the problem of among the students in this classroom do any pair of you have the same birthday okay all right well probably if there's more

than 365 of you the answer is yes right right that by what pigeonhole principle right two of you must have the same birthday right but

so let's let's generalize it a little bit say that i don't know i need a bigger space of birthdays for this question to be interesting maybe i tack on the year maybe i tack on

the hour that you were born and that's a bigger space of inputs and i wouldn't necessarily expect that two of you would be born in the same year on the same day in the

same hour right that would be a little less likely in fact as long as that space is larger than something like the square of the number of you

right then i'm less likely than then than even to uh to have a pair of you that that's a kind of a birthday problem you may have

seen in uh 0-4-2 potentially right but in general i don't i don't i'm not going to mess with probability so much here

i want a deterministic algorithm right a way of checking whether two of you have the same birth time let's say okay so uh in general in this class

we're not going to concentrate on input such as is there a pair of you in this class that have the same birthday that's kind of boring right uh i i could

just say well uh i i could do a lot of different things but what we we do in this class this is first a fixed

classroom of you i want to make algorithms that are general to any classroom right to go to your recitation i want an algorithm that will apply to your recitation an algorithm that not only applies to

this classroom but also the machine learning class before you right i want an algorithm that can change its it can accept an arbitrarily sized

input right here we have a class of maybe 300 400 students right but i want my algorithm to work for a billion students right like maybe i'm trying to check if

there's a match of something in the facebook database or something like that right okay so in general uh

we are looking for general problems that have arbitrarily [Music]

sized inputs right so these these inputs could grow very large but we want a kind of a fixed

size algorithm to solve those problems okay so what is an algorithm then i really can't spell i told you okay

i didn't lie to you um so an algorithm is a little different than a problem of problem specification i can state to you what this graph

or i can tell you what this graph looks like an algorithm is really i don't know what the outputs are i don't know what these edges are but i want to i want a fixed size kind of

machine or procedure that if i give it an input it will generate an output and if it generates an output it better be one of these correct outputs

right so if i have an algorithm that takes in this input i really want it to output this output or else it's not a correct algorithm similarly if for this one it could

output any of these three outputs but if it outputs this guy for this input that would not be a correct algorithm right

and so generally what we want is a an algorithm is a function it takes inputs to outputs right an algorithm is some kind of function right that takes these inputs maps it to

a single output and that output better be correct based on our problem okay so that's what our algorithm is uh it solves the problem if it returns a

correct output for every uh problem input that is in our domain okay and the example for uh does anyone have a possible algorithm

for checking whether any two of you have the same birth time as specified before i'm going to let someone else have a try sure just ask everyone one by one

ah great so uh what your colleague has said is a great algorithm well essentially what it's going to do is i'm going to enter i'm going to put you guys in some order okay i'm going to give each of you a

number one through however many number of students there are in this class and i'm going to interview you one by one right i'm going to say what's your birthday anytime we'll write it down

right i'm going to put in in some kind of record okay and then as i keep interviewing you i'm going to find out your birthday i'm going to check the record i'm going to look

through all the birthdays in the record if i find a match right then i return yay i found a pair and i can stop otherwise if i get through the record list i

don't and i don't find a match i just stick you at the end of the record right i add you to the record and then i move on to the next person i keep doing this okay so that's that's a proposed algorithm for this birthday problem

for birthday problem what's the algorithm here

maintain a record interview students students in some order and what does interviewing a student

mean it means two things it means check if birthday in

record and if it is return a pair so return

pair otherwise add a new student to record

and then at the very end if i go through everybody and i haven't found a match yet i'm going to return that there is none right okay so that's a statement of an

algorithm that's kind of the level of description that will be looking for you in the theory parts of this theory questions that we ask you on your problem sets right it's a verbal

description in words that you know it's maybe not enough for a computer to know what to do but if you said this algorithm to any of your

your friends in this class right they would at least understand what it is that you're doing yeah does an algorithm have to be a pure function mathematical pure function in the does a

an algorithm have to be a pure function in the mathematical sense uh as in it needs to map to a single output so we're talking about kind of a

functional programming definition of uh of a of a of a function right this is um i am talking about the mathematical

i have a binary relation and this thing has an output for every input and there is exactly one output to every input that's that's the mathematical definition of

function that i'm using for when i'm defining an algorithm yeah so basically is an algorithm like uh like a plan yeah an algorithm is a

procedure that somehow i can do whatever i want but i have to take one of these inputs and i have to produce an output and at the end it better be correct right so it's just a procedure you can

think of it as like a recipe you can think of it it's just some kind of procedure right it's a a sequence of things that you should do and then at the end you will return an output

okay so here's a possible algorithm for solving this birthday problem okay now i've given you what i argue to you or i'm asserting to

you is a solution to this birthday problem and maybe you guys agree with me and maybe some of you don't right so how do i convince you that this is

correct right well if i had you know if i was just running this algorithm on say the four students in the front

row here right i could argue it pretty well to you i could go through every you know i could assign these for people

birthdays in various combinations of either their none of them have the same birthday some two of them have the same birthday i could try all possibilities and i could go through lots of different

possibilities i had to check that this algorithm returns the right answer in all such cases right but when i have i don't know 300 of you that's going to be a little bit more difficult to argue

and so if i want to argue something is correct in mathematics i want to prove something to you for some large value what kind of technique do i use to prove such things

yeah induction right and in general what what we do in this class what we do is as a computer scientist is we write a

constant sized piece of code right that can take on any arbitrarily large size input right if it's if the input can be

arbitrarily large but our code is small then that code needs to loop or recurse or repeat some of these lines of code in

order to just read that output right and so that's another way you can arrive at this conclusion that we're going to probably need to use recursion induction

and that's part of the reason why we ask you to take a course on proofs and and inductive reasoning and discrete mathematics before this class

okay so how do we prove that this thing is correct we got to use induction so how can we set up this induction what do i need for an inductive proof

sure base case we need a base case uh we need some kind of uh a predicate yeah but we need some kind of statement of a hypothesis of

something that should be maintained right and then we need to have an inductive step which basically says i take a small value of this thing i use the inductive hypothesis and i

argue it for a larger value of my well-ordered set that i'm inducting over right okay so in uh for this algorithm if we're going to try

to prove correctness what i'm going to do is i'm going to what do i want to prove for this thing that at the end of interviewing all of you that my

algorithm has either already it has returned with a pair that match or if we're in a case where there wasn't a mat it wasn't a pair somewhere in my set

that it returned none right that would be correct right so how can i generalize that concept to make it something i can induct on right

what i'm going to do is i'm going to say let's say after i've interviewed the first k students right if there was a match in those first k

students i want to be sure that i ever returned a pair right because if after i interview all of you i've maintained that property

then i'll be sure at the end of the process i will have returned a pair if one exists so here's

going to be my inductive hypothesis sis okay if first k

students contain a match algorithm returns a match before interviewing say student

k plus 1. okay so that's going to be my inductive hypothesis now if there's n students in this class right and at the end of my thing i'm trying to

interview student n plus one oh student n plus one is not there if i have maintained this then if i replace k with n then i will

have returned a match before interviewing the last student or the when i have no more students left and then this algorithm returns none

as it should right okay so this inductive hypothesis sets up a nice variable to induct on right this k i can have increasing

up to n starting at some base case so what's my base case here my base case is the easiest thing i can do sure two that's an easy thing i could do

i could check those possibilities but there's an even easier base case yeah there's an even easier base case than one zero right after interviewing

zero students i haven't done any work right certainly the first zero can't have a match right

and so this predicate this inductive hypothesis is true just because this uh initial predicate is false

right so i can say you know base case zero check definitely this predicate holds for that

okay now we get to go for the uh the meat of this thing right assume the inductive hypothesis

true for k equals say some k prime okay and we're considering k prime plus one right

then we have two cases one of the nice things about induction is that it isolates our our problem to not consider everything all at once but break it down into

a smaller interface so i can do less work at each step right so there are two cases either the first k

already had a match right in which case by our inductive hypothesis we've already returned the correct answer right the other case is

the it doesn't have a match and we interview the k plus one student the k prime plus one student if there is

a match in the first k prime plus one students then it will include k plus k prime plus 1 the student k prime plus 1 because you know otherwise there would

have been a match in in the things before it right so there are two cases if k contains

match k prime if first k contains match already returned

turn turn it by induction right else if k prime plus one students contains match

the algorithm checks all of the possibilities k prime checks cape against

against all students essentially by brute force it's a case analysis i check all of the possible possibilities right this

check if birthday is in record i haven't told you how to do that yet but if i'm able to do that i'm going to check if it's in the record if it's in the record

then there will be a match and i can return it otherwise i have uh re-established the inductive high policy

hypothesis for the k prime plus one students does that make sense guys yeah okay so that's how we prove correctness this is

a little bit more formal than we would ask you to do in this class all the time but it's definitely sufficient you know for the levels of arguments that we'll ask you to do the bar that

we're usually trying to set is if you communicated to someone else taking this class what your algorithm was they would be able to code it up and tell a stupid computer how to do that

thing okay so any any questions on induction you're going to be using it throughout this class and so if you're

unfamiliar with this line of argument then you should go review some of that that would be good okay so that's correctness being able to

communicate that the problem the algorithm we stated was correct now we want to argue that it's efficient

right what does efficiency mean efficiency just means not only how fast does this algorithm run but

how fast does it compare to other possible ways of approaching this problem right so how could we measure how fast an algorithm

runs this is kind of a silly question yeah yeah just well i mean just record the time it takes for a computer to do this thing right now

there's a problem with just re coding up an algorithm telling a computer what to do and timing how long it takes why

yeah it would depend on the size of your data set okay we expect that but there's a bigger problem there yeah it depends on the strength of your

computer right so i would expect that um you know if i had a watch calculator and i programmed it to do something

right that might take a lot longer to solve a problem than if i asked you know ibm's research computer

right to solve the same problem using the same algorithm even with the same code right because its underlying operations are much faster right how it runs is

much faster so i don't want to count how long it would take on a real machine i kind of want to abstract the time it takes the machine to do stuff out of the picture what i kind of want to say is

let's assume that each kind of fundamental operation that the computer can do takes some fixed amount of time okay how many of those kinds of fixed operations

does the algorithm need to perform to be able to solve this problem right so here we don't

don't measure time instead count kind of fundamental operations okay we'll get to what some of those

fundamental operations are in a second but the idea is we want a measure of how well an algorithm uh performs not necessarily an

implementation of that algorithm right kind of an abstract notion of how well this algorithm does and so what we're going to use to measure

time or efficiency right is something called asymptotic analysis anyone here understand what asymptotic analysis is probably since it's in

both of your prerequisites i think uh but we will go through a a formal definition of uh asymptotic notation in uh recitation tomorrow

and you'll get a lot of practice in comparing functions using an asymptotic analysis but just to give you an idea

right the idea here is we don't measure time we instead measure ops and like your colleague over here was saying before we expect uh performance

i'm going to use performance instead of time here we expect that to depend on size of

our input right if we're trying to run an algorithm to find a birthday uh in this section we expect the algorithm to run in a

shorter amount of time than if i were to run the algorithm on all of you right so we expect it to perform differently depending on the size of the input

and how differently uh is how we measure performance uh relative that input usually we use n as a variable for what the size of our input is

right but that's not always the case so for example if we have a an array that i give you an n by n ray right that we're going to say n but

what's the size of our input how much information do i need to convey to you to give you that information it's n squared right so that's the size of our input in that

that context right or if i give you a graph it's usually the number of vertices plus the number of edges that's how big how much space i would need to convey to you

that graph that information okay so you we compare how fast an algorithm is with respect to the size of the input right and if that

we'll use the asymptotic notation we have big o notation which corresponds

to upper bounds we'll have omega which corresponds to lower bounds

and we have theta which corresponds to both right this thing is tight it is bounded from above and below by a function of this form okay

now we have a couple common ways a com a couple common functions that algorithms their running time we have a couple common functions that uh relate

an algorithm's input size to its performance some some things that we saw all the time does it can anyone give me some of those say again sorry sorry so like

a function i'm not asking this question well but has anyone heard of a linear algorithm a linear time algorithm

right that's basically saying that the side the running time of my algorithm the performance of my algorithm is linear with respect to the size of my input right

yeah say it again like putting something in a list okay so that's there's there's a lot behind that question that we'll go into later

uh this week uh but that's an example of if i do it in a silly way i stick something in the middle of a list and i have to move everything that's an operation that could take

linear time right okay so uh linear time is is a type of function we've got a number of these i'm going to start with uh

this one does anyone know what this one is constant time okay basically no matter how i change the input the amount of time this running time the

the performance of my algorithm takes it doesn't really depend on that okay the next one up is something like this this is logarithmic

time okay we have theta n which is linear and log n right

sometimes we call this log linear but we usually just say n log n okay we have a quadratic running time in general

if i have a constant power up here right it's n to the c for some constant this is what we call polynomial time

right we're as long as c is some constant and this right here is what we mean by efficient in this class usually right in

other classes right when you have big data sets maybe this is efficient right but in this class generally what we mean is polynomial and as you get down this

thing these are things are more and more efficient okay there's one class i'm going to talk to you about over here which is something like uh

let's do this 2 to the theta of n right exponential time this is some constant to a function of n

that's say super linear uh that's going to be uh you know pretty bad why is it pretty bad if i were to plot some of these things

as a function of n right let's say i plot values of up to a thousand on on my n scale here okay

what does constant look like maybe this is a thousand up here too what does a constant look like looks like a line right it looks like a line over here

somewhere it could be as high as i want but eventually anything that's an increasing function will get bigger than this right and on this scale if i

use log base 2 or some reasonable small constant what does log look like well let's let's do an easier one what does linear look like

yeah this right that's what i saw a lot of you doing okay that's linear that's the kind of base that we're comparing everything

against what does log look like like this okay but on at this scale right at this scale

really it's much closer to constant than linear and actually as n gets much much larger this almost looks like a straight line it almost looks like a constant

so log is almost just as good as constant right what does exponential look like it's the exact inverse of this thing right

right it's almost an exact straight line going up right so this is crap this is really good almost anything in this region over here is

better right at least i'm gaining something uh i i'm not i'm i'm able to not go up too high relative to my input size so quadratic i don't know something like

this and n log n is something like this and log n after a long time really starts just looking linear with a a constant multiplied

in front of it right okay so these things good that thing bad okay that's what that's trying to convey all right

so how do we measure these things if if i don't know what my fundamental operations are that my computer can

can can use right so we need to define some kind of model of computation for what our computer is allowed to do in constant time in a fixed amount of

time right in general what we use in this class is is a machine called a word ram which we you know use for its theoretical

brevity all right word ram it's kind of a loaded term what do these things mean it means i have

what can some does someone know what ram means random access memory right it means that i can randomly access different places in memory

in constant time that's that's the assumption of random access memory basically what our model of a computer is is you have memory memory which is essentially just a

string of bits it's just a bunch of ones and zeros right and we have a computer like a cpu right

which is really small it can basically hold a small amount of information but it can change that information right it can operate on that information and it also has instructions to randomly

access different places in memory bring it into the cpu act on it and read it back that makes sense but in general

we don't have an address for every bit in memory every zero and one in memory we actually does anyone know how modern computers are addressed okay so so we're gonna get there

actually what a modern computer is addressed in is bytes okay collections of eight bits so there's an address i have for every eight bits in memory consecutive eight bits in memory and so

if i want to pull something in into the cpu i give it an address it'll take some chunk right and bring it into the the cpu operate it on it and spit it back okay

how big is that chunk this goes to the the answer that you were asking which or saying which is it's some sequence of some fixed number

of bits which we call a word okay a word is how how big of a chunk that the the cpu can take in from memory at a time and

operate on okay in your computers how big is that word size 64 bits right that's how much i can operate on a time

when i was growing up when i was your age okay my word size was 32 bits and that actually was a problem for my computer

because in order for me to be able to read to address in memory i need to be able to store that address in my cpu in a word

right but if i have 32 bits how many different addresses can i address i have a limitation on the memory addresses i can address right

so how many different memory addresses can i address with 32 bits 2 to the 32 right that makes sense well if you do that calculation out how big of a hard disk

can i have to access it's about four gigabytes right so when i in my day all hard drives were limited to you know being partitioned even if you

had a bigger than four gigabyte hard drive i had to partition it into these four gigabyte chunks which you know the the computer could then read on to

right that was very limiting actually that's a restriction with 64 bits what's my limitation on memory that i

can address byte addressable turns out to be something like uh 20 exabytes to put this in context

all data that google stores on their servers on all drives throughout the world it's about 10 right so we're not going to run out of this limitation

very soon right so what do we got we've got a cpu it can address memory what are the operations i can do in this cpu

i have binary operations i can compare two words in memory and i can either do you know integer

arithmetic kind of logical operations bitwise operations but we're not going to use those so much in this class

and i can read and write from an address in memory a word in constant time those are the operations that i have available to me on most cpus

some cpus give you a little bit more power but this is generally what we analyze algorithms with respect to okay but you'll notice that my cpu is only built to operate on a constant amount of

information at once generally two words in memory an operation produces a third one and i spit it out right it takes a constant amount of time to

operate on a constant amount of memory if i want to operate on a linear amount of memory end things how long is that going to take right if i just want to read everything

in that thing it's going to take me linear time right because i have to read every part of that thing okay so in general what we're going to do for the first

half of this class mostly first eight lectures anyway is talk about data structures

and it's going to be concerned about not operating on constant amount of data at a time like our cpu is uh doing but instead what it's going to do is operate

on store a large amount of data and support different operations on that data okay so if i had a record that i want to maintain to store those birthdays that we had before

i might use something like a static array right which you guys maybe are not familiar with if you have been working in python as your only programming language

okay python has a lot of really interesting data structures like a list and a set and a dictionary and all these kinds of things that are actually not in this model there's actually

a lot of code between you and the computer and it's not always clear how much time that interface is taking right and so we're going to do

starting on thursday is talk about ways of storing a non-constant amount of information to make operations on that information faster

so just before you go i just want to give you a quick overview of the class to solve an algorithms class in algorithm problem in this class we

essentially have two different strategies we can either reduce just to using the solution to a problem we know how to solve or we can design our own algorithm which

is going to be recursive in nature we're going to either put stuff in the data structure and solve a sorting problem or search in a graph and then to design a recursive algorithm

we have various design paradigms this is all in your notes but this is essentially the structure of the class we're going to spend quiz 1 the first to eight lectures on data structures

and sorting second quiz will be on shortest paths algorithms and graphs and then the last one will be on dynamic programming okay that's the end of the first lecture thanks for coming

Loading...

Loading video analysis...