PK #iAW H Unit_1.2_-_Algorithm_Analysis_and_the_RAM/01-Analyzing_Algorithms.en.srt1
00:00:00,000 --> 00:00:05,300
In this course, we're going to do a considerable amount of algorithm analysis,
2
00:00:05,300 --> 00:00:09,960
which is why we're going to do an overview of the techniques and notations that we're going to use.
3
00:00:09,960 --> 00:00:15,240
If you've already had an algorithm course before and it terms like Big O-notation or RAM
4
00:00:15,240 --> 00:00:19,360
or a worst-case running time are already meaningful to you, then you might try
5
00:00:19,360 --> 00:00:22,800
and go directly for the quizzes for the respective sections.
6
00:00:22,800 --> 00:00:24,700
Let's try it out with a little quiz.
7
00:00:24,700 --> 00:00:30,180
What I want you think about is the various factors that we need to take into account
8
00:00:30,180 --> 00:00:35,060
when we are analyzing an algorithm or more precisely, when we are analyzing the running time
9
00:00:35,060 --> 00:00:38,520
of an algorithm because that would depend on the number of factors.
10
00:00:38,520 --> 00:00:40,950
What I would like you to think about is what factors
11
00:00:40,950 --> 00:00:43,590
can determine the running time of an algorithm.
12
00:00:43,590 --> 00:00:46,640
So here are six different possibilities.
13
00:00:46,640 --> 00:00:52,830
First the size of the input, secondly the content or structure of the input,
14
00:00:52,830 --> 00:00:56,620
third the type of computer that we are using for the algorithm,
15
00:00:56,620 --> 00:01:03,280
fourth the amount of memory the computer has, five how the algorithm is implemented,
16
00:01:03,280 --> 00:01:07,260
and six the programming language that we're using to implement the algorithm.
17
00:01:07,260 --> 00:01:12,780
I want you think of each of those six aspects and then make a check mark
18
00:01:12,780 --> 99:59:59,000
where you think that this aspect can influence the running time of an algorithm.
PK
#iAsR R Q Unit_1.2_-_Algorithm_Analysis_and_the_RAM/02-Analyzing_Algorithms_Solution.en.srt1
00:00:00,000 --> 00:00:05,340
The answer here is that all of these six factors can influence the running time of an algorithm.
2
00:00:05,340 --> 00:00:10,230
So the size of the input, I think that's a rather obvious one so in the example of Alice--
3
00:00:10,230 --> 00:00:14,290
if she is running her algorithm on a very small network that takes much shorter time
4
00:00:14,290 --> 00:00:18,800
than if she is running it on the huge Python communication center network for example.
5
00:00:18,800 --> 00:00:23,740
The structure of the input can also influence the running time of an algorithm.
6
00:00:23,740 --> 00:00:29,930
So for example if the network was structured in a way that we find that it can be covered
7
00:00:29,930 --> 00:00:35,470
with one or two monitoring devices, then the algorithm could worked in a way that
8
00:00:35,470 --> 00:00:39,500
we could immediately stop and not need to look at more complex assignments.
9
00:00:39,500 --> 00:00:43,520
Finally, the type of computer that we're using that is also very obvious one.
10
00:00:43,520 --> 00:00:48,600
If you're using a computer that is much faster so say we're using a huge workstation
11
00:00:48,600 --> 00:00:53,430
instead of a super laptop, then the algorithm would run much faster.
12
00:00:53,430 --> 00:00:57,530
The amount of memory the computer has that can also be a very important factor,
13
00:00:57,530 --> 00:01:01,790
although it might not be obvious at first sight while the memory has to do with running time.
14
00:01:01,790 --> 00:01:06,720
Let's say the memory of your computer is not enough to keep all the data
15
00:01:06,720 --> 00:01:11,560
that the algorithm is using and it has to use the hard disk for example to do some of the work
16
00:01:11,560 --> 00:01:16,860
or the memory is not enough and the algorithm has to recalculate certain parts of the solution.
17
00:01:16,860 --> 00:01:21,160
Memory is also an important factor for running time.
18
00:01:21,160 --> 00:01:24,200
How the algorithm is implemented that is of course very important.
19
00:01:24,200 --> 00:01:31,180
So are you using an implementation that is very efficient or do you have unnecessary code or
20
00:01:31,180 --> 00:01:34,050
any data structures that are inefficient that can
21
00:01:34,050 --> 00:01:36,880
make a huge difference in the practice when we run an algorithm.
22
00:01:36,880 --> 00:01:41,020
And finally the programming language use that is of course a debate
23
00:01:41,020 --> 00:01:44,100
that many people like to have that it certainly a factor.
24
00:01:44,100 --> 00:01:48,260
So there are some programming languages that will make an algorithm run a lot faster
25
00:01:48,260 --> 00:01:51,270
than other programming languages.
26
00:01:51,270 --> 00:01:53,430
And so you need to think about if efficiency matters,
27
00:01:53,430 --> 00:01:56,760
they usually also use a programming language that is suited for that.
28
00:01:56,760 --> 00:02:01,180
That's quite a lot of factors to look at and actually, I think there are lots of other factors
29
00:02:01,180 --> 00:02:04,290
that would also determine the running time of an algorithm.
30
00:02:04,290 --> 00:02:08,410
So that's why when we talked about analyzing algorithms,
31
00:02:08,410 --> 00:02:12,800
we'll have to work with the number of simplifications to focus on what's really important
32
00:02:12,800 --> 99:59:59,000
and not have to take account of all these factors and all the countless others that you might think of.
PK #iAy, C Unit_1.2_-_Algorithm_Analysis_and_the_RAM/03-Simplifications.en.srt1
00:00:00,000 --> 00:00:05,790
There's three simplifications that we'll introduce to make analyzing algorithms easier for us.
2
00:00:05,790 --> 00:00:11,450
The first one is we will talk about a way that we can analyze algorithms without actually having
3
00:00:11,450 --> 00:00:15,120
to implement them, which we already that a lot of times.
4
00:00:15,120 --> 00:00:20,890
The second simplification is going to be that we're not going to consider all possible inputs
5
00:00:20,890 --> 00:00:27,100
that an algorithm can be run on, but we're going to only consider the worst possible inputs
6
00:00:27,100 --> 00:00:29,600
and I'll say more about what that means too.
7
00:00:29,600 --> 00:00:35,420
And finally, I'll introduce a notation to you called Big O-notation that will allow us to ignore
8
00:00:35,420 --> 00:00:40,110
details that for now we'll call unnecessary, so that we can really focus on the parts
9
00:00:40,110 --> 00:00:45,350
that are important for an algorithm and ignore those that we're not interested in.
10
00:00:45,350 --> 00:00:49,300
What do these simplifications actually look like or how are we going
11
00:00:49,300 --> 00:00:51,770
to go about those simplifications.
12
00:00:51,770 --> 00:00:55,650
First of all, to analyze an algorithm without implementing, we're going to consider
13
00:00:55,650 --> 00:00:59,810
a special kind of computer, which is not really a computer but a model that comes pretty close
14
00:00:59,810 --> 00:01:02,070
and that model is called the RAM.
15
00:01:02,070 --> 00:01:06,560
Looking at the worst possible inputs is a concept called worst-case running time,
16
00:01:06,560 --> 00:01:11,910
and finally to be able to ignore all unnecessary details, we'll use a type of notation
17
00:01:11,910 --> 00:01:16,280
called Big O-notation or Landau notation.
18
00:01:16,280 --> 99:59:59,000
So let's start out with our theoretical computer called the RAM.
PK #iA&. ; Unit_1.2_-_Algorithm_Analysis_and_the_RAM/04-The_Ram.en.srt1
00:00:00,000 --> 00:00:05,290
The goal of working with a model computer instead of a real computer is that
2
00:00:05,290 --> 00:00:10,270
we want to have a machine, which is as easy as possible
3
00:00:10,270 --> 00:00:14,660
but still let us capture the main aspects of a real computer.
4
00:00:14,660 --> 00:00:19,430
And the reason why you want to have a simple computer is that a real computer contains
5
00:00:19,430 --> 00:00:24,240
lots of parts that are not really interesting when we want to analyze an algorithm
6
00:00:24,240 --> 00:00:29,570
and also you might have different computers that run at different speeds
7
00:00:29,570 --> 00:00:32,400
but actually that doesn't really say something about your algorithm.
8
00:00:32,400 --> 00:00:35,350
In the next year, when the faster computer model comes along,
9
00:00:35,350 --> 00:00:38,430
you can already run your algorithm a bit faster but that doesn't really say
10
00:00:38,430 --> 00:00:41,240
that your algorithm has become better.
11
00:00:41,240 --> 00:00:48,280
So our goal here is to strip down a computer that you used to to a model that is as simple as possible
12
00:00:48,280 --> 00:00:52,300
but still has basically the same capabilities as a real computer.
13
00:00:52,300 --> 00:00:58,430
So we will do this as a quiz and what I want you to think about is what are the essential components
14
00:00:58,430 --> 00:01:01,250
for having a minimal computer model.
15
00:01:01,250 --> 00:01:07,340
And what I mean by essential is that the computer must still be able to solve problems
16
00:01:07,340 --> 00:01:13,260
to perform any computation that this machine here could but also not anymore.
17
00:01:13,260 --> 00:01:17,260
So what I would like you do is to check off each of those boxes
18
00:01:17,260 --> 00:01:23,260
if you think that this component of a computer is essential to keep in a computer model.
19
00:01:23,260 --> 00:01:28,260
So do you need a memory to store your results in?
20
00:01:28,260 --> 00:01:33,270
Do you need a graphics card? Some sort of input and output?
21
00:01:33,270 --> 00:01:38,200
Do you need a mouse? Do you need a monitor? Some sort of file system?
22
00:01:38,200 --> 00:01:42,460
An operating system? Programming capabilities? Or a CD-ROM?
23
00:01:42,460 --> 99:59:59,000
And keep in mind that we are looking for a model that is as simple as possible.
PK #iA/x0 0 D Unit_1.2_-_Algorithm_Analysis_and_the_RAM/05-The_Ram_Solution.en.srt1
00:00:00,000 --> 00:00:05,920
Now, of course, the answers to these are a little bit subjective depending on what you want to model,
2
00:00:05,920 --> 00:00:10,020
but I think that there are three components that are absolutely essential to keep.
3
00:00:10,020 --> 00:00:13,520
The first one is you need a memory for the computer because, otherwise,
4
00:00:13,520 --> 00:00:16,920
it will not really be able to perform any computations.
5
00:00:16,920 --> 00:00:20,840
The second one is you need some sort of input and some sort of output.
6
00:00:20,840 --> 00:00:25,270
It doesn't really have to be a graphical output or be displayed on the monitor,
7
00:00:25,270 --> 00:00:30,650
but you need to be able to tell the computer what the input is that it's suppose to work on
8
00:00:30,650 --> 00:00:35,570
and you need to have some way of reading the output once the computer is done.
9
00:00:35,570 --> 00:00:41,330
And you'll also need some sort of flexibility in the processing that the computer is doing,
10
00:00:41,330 --> 00:00:44,760
so we expect it to have certain programming capabilities.
11
00:00:44,760 --> 00:00:49,180
All the other components I think are not essential, so you do not necessarily need a graphic's card
12
00:00:49,180 --> 00:00:51,350
as long as you have a certain type of output.
13
00:00:51,350 --> 00:00:56,740
You certainly don't need a mouse to input anything. You don't need a monitor.
14
00:00:56,740 --> 00:01:00,470
You don't need a file system or operating system because if you really want that
15
00:01:00,470 --> 00:01:04,610
you can emulate that using the programming capabilities and you certainly also
16
00:01:04,610 --> 99:59:59,000
do not need a CD-ROM as long as you have some way of providing the input.
PK #iA8 I Unit_1.2_-_Algorithm_Analysis_and_the_RAM/06-Random_Access_Machine.en.srt1
00:00:00,000 --> 00:00:05,270
As we saw in the quiz, there are three things that our basic machine model needs to have.
2
00:00:05,270 --> 00:00:09,060
One is the memory, the second one is some sort of input/output capabilities,
3
00:00:09,060 --> 00:00:12,890
and finally, we need to have certain programming capabilities.
4
00:00:12,890 --> 00:00:16,329
There's a lot of different models that have these capabilities, but in this course,
5
00:00:16,329 --> 00:00:21,290
we're going to use a model called the RAM, and RAM stands for Random Access Machine.
6
00:00:21,290 --> 00:00:25,720
So, there are many different ways of defining a Random Access Machine, and we'll use a model
7
00:00:25,720 --> 00:00:31,000
similar to the one defined by Steven Skiena and his book, The Algorithm Design Manual.
8
00:00:31,000 --> 00:00:34,070
The first thing that the RAM has is the memory
9
00:00:34,070 --> 00:00:40,830
and that memory can of course be used for input, for output, and for holding the program
10
00:00:40,830 --> 00:00:43,560
that the RAM is running on, but as simplification,
11
00:00:43,560 --> 00:00:46,600
we're going to split the memory into three parts.
12
00:00:46,600 --> 00:00:51,790
We're just going to use this memory here for intermediate results and output,
13
00:00:51,790 --> 00:00:57,250
and we're going to have a separate memory for the input and a separate memory to hold the program.
14
00:00:57,250 --> 00:01:04,209
And, these two memories here on the left side are read only meaning that the RAM cannot modify
15
00:01:04,209 --> 00:01:08,460
the input, it can only read the input, and the RAM can also not modify the program.
16
00:01:08,460 --> 00:01:12,930
It could only read the program, and usually, when we will talk about the memory requirements
17
00:01:12,930 --> 00:01:17,850
of an algorithm, what we will be talking about is how much of this memory here the RAM is using.
18
00:01:17,850 --> 00:01:23,060
Generally, there is no limit of how much of this memory here we have
19
00:01:23,060 --> 00:01:30,550
so we always have as much as we want, but the memory is divided into single cells,
20
00:01:30,550 --> 00:01:34,780
and each of these cells has a limited capacity so each of these cells
21
00:01:34,780 --> 00:01:40,960
cannot have arbitrarily large values, but there are as many cells as the algorithm needs.
22
00:01:40,960 --> 00:01:45,620
Now, if we're running a program on the RAM, what we're mainly interested in is
23
00:01:45,620 --> 00:01:48,860
the time that this program is going to run for a given input,
24
00:01:48,860 --> 00:01:54,630
and there are basically three rules for how long the RAM requires to execute a program.
25
00:01:54,630 --> 00:02:01,950
Simple operations such as adding two numbers, multiplying them, or executing an if or for.
26
00:02:01,950 --> 00:02:04,260
Those all take one time step.
27
00:02:04,260 --> 00:02:08,490
If you have a loop in your program such as a four-loop, this will count
28
00:02:08,490 --> 00:02:12,410
not as a simple operation but count as often as it runs.
29
00:02:12,410 --> 00:02:16,390
So, if you have a loop that executes a simple operation 100 times,
30
00:02:16,390 --> 00:02:23,160
that will count as 100 time steps, and finally, you' ll also get something for free accessing memory.
31
00:02:23,160 --> 00:02:30,290
So, reading a part of the input or writing something into a memory cell here, that is free.
32
00:02:30,290 --> 00:02:33,620
That means that actually takes zero time steps.
33
00:02:33,620 --> 00:02:38,340
And these three rules give us a simple way of determining the running time of an algorithm
34
00:02:38,340 --> 00:02:42,890
or program or also of comparing the running time of two programs because all we need to do
35
00:02:42,890 --> 00:02:48,290
is count the number of time steps that we expect the RAM to execute for a given input.
36
00:02:48,290 --> 99:59:59,000
Let's practice this for a little bit.
PK #iA| G Unit_1.2_-_Algorithm_Analysis_and_the_RAM/07-Number_Of_Timesteps.en.srt1
00:00:00,000 --> 00:00:03,740
We're going to practice this analysis as a number of quizzes.
2
00:00:03,740 --> 00:00:06,880
For the first quiz, we're going to start off with two simple lines of code
3
00:00:06,880 --> 00:00:12,490
and using these three rules up here, I want you to tell me the number of time steps
4
00:00:12,490 --> 99:59:59,000
that the RAM will need to execute these 2-lines of code.
PK #iA$ P Unit_1.2_-_Algorithm_Analysis_and_the_RAM/08-Number_Of_Timesteps_Solution.en.srt1
00:00:00,000 --> 00:00:05,470
The answer is that the RAM is just going to take a single time step, one time step,
2
00:00:05,470 --> 00:00:10,460
to execute these 2-lines of code because for this line up here--this is just a memory access
3
00:00:10,460 --> 00:00:13,060
and we get memory access for free.
4
00:00:13,060 --> 00:00:19,420
That's 0 steps and down here, that is a simple operation, which takes one time step.
5
00:00:19,420 --> 99:59:59,000
So a total of one time steps.
PK #iA o o I Unit_1.2_-_Algorithm_Analysis_and_the_RAM/09-Number_Of_Timesteps_2.en.srt1
00:00:00,000 --> 00:00:02,960
So now let's try a little more challenging example.
2
00:00:02,960 --> 00:00:06,330
It's not really challenging, but it's a bit more difficult than the first one.
3
00:00:06,330 --> 00:00:11,390
So this is a 3-line code and then I want you to tell me the number of time steps
4
00:00:11,390 --> 00:00:15,060
that this code will need on the run.
5
00:00:15,060 --> 00:00:19,220
For this, I should mention that we're going to count this operation here--the while
6
00:00:19,220 --> 99:59:59,000
and the comparison of s if it's greater than 0 as a simple operation.
PK #iAόM@ @ R Unit_1.2_-_Algorithm_Analysis_and_the_RAM/10-Number_Of_Timesteps_2_Solution.en.srt1
00:00:00,000 --> 00:00:04,850
While we're recording these, actually we thought it wasn't a very challenging example
2
00:00:04,850 --> 00:00:10,430
and then we got it wrong about 4 times, but the correct answer here is that these three lines
3
00:00:10,430 --> 00:00:13,300
take 11 time steps to execute.
4
00:00:13,300 --> 00:00:19,270
The first line is free as in the first quiz and the second line is actually why we had
5
00:00:19,270 --> 00:00:22,160
to count a few times until we all got it right.
6
00:00:22,160 --> 00:00:27,870
This line here is executed 6 times--once for s=5
7
00:00:27,870 --> 00:00:32,549
and then s is decreased by 1 each time this line down here is executed.
8
00:00:32,549 --> 00:00:40,520
So, s starts out as 5, 4, 3, 2, 1 and 0, which means this line here is executed 6 times
9
00:00:40,520 --> 00:00:46,320
whereas this line down here is only executed 5 times because once s is equal to 0
10
00:00:46,320 --> 99:59:59,000
this line here is not executed anymore, which makes for a total of 11 time steps.
PK #iAjp'1
K Unit_1.2_-_Algorithm_Analysis_and_the_RAM/11-Simplifications_For_Ram.en.srt1
00:00:00,000 --> 00:00:04,150
As you can see, exactly counting the number of time steps
2
00:00:04,150 --> 00:00:07,790
even though we have a very simple model of just three rules
3
00:00:07,790 --> 00:00:13,390
and a code that doesn't even have a variable input is already quite tedious.
4
00:00:13,390 --> 00:00:17,300
We are going to introduce a number of additional simplifications
5
00:00:17,300 --> 00:00:20,450
that will give us a little more levy here so that we do not have to go
6
00:00:20,450 --> 00:00:24,770
through this exact counting process but still learns something about the algorithm.
7
00:00:24,770 --> 00:00:29,430
In general, the capabilities of the RAM closely matched what you would expect from
8
00:00:29,430 --> 00:00:33,100
your own computer but of course it is a simplification
9
00:00:33,100 --> 00:00:35,790
but nevertheless it is a rather solid model.
10
00:00:35,790 --> 00:00:41,020
It's difficult to design an algorithm that when you analyze it
11
00:00:41,020 --> 00:00:44,860
it will give you a very different idea about its performance that it would get on the RAM.
12
00:00:44,860 --> 00:00:47,870
Nevertheless, I think it's important to think about the differences
13
00:00:47,870 --> 00:00:55,080
between a RAM model and a normal computer, and we will do this as a quiz.
14
00:00:55,080 --> 00:01:01,940
Here are four main simplifications that I think the RAM is making.
15
00:01:01,940 --> 00:01:04,739
Here are four properties of the RAM.
16
00:01:04,739 --> 00:01:08,340
The first one is simple operations take only one time step.
17
00:01:08,340 --> 00:01:12,290
The second one is we assume that we have as much memory as we need.
18
00:01:12,290 --> 00:01:17,410
The third one is that memory access is considered to be free in terms of time.
19
00:01:17,410 --> 00:01:23,170
And the fourth one is that a unit of memory cannot hold an arbitrarily large number.
20
00:01:23,170 --> 00:01:27,160
And of course this is going to be subjective, but I would like you to tell me
21
00:01:27,160 --> 00:01:34,350
which of these four properties you think is realistic in the sense that it comes pretty close
22
00:01:34,350 --> 00:01:37,130
to a real computer and which of these you would consider unrealistic
23
00:01:37,130 --> 00:01:41,430
meaning of you were to run your algorithm on a real machine
24
00:01:41,430 --> 00:01:46,010
then there would be considerable differences and again the answers to these are subjective.
25
00:01:46,010 --> 99:59:59,000
If you get stuck or if you disagree with me just take next and see where our opinions differ.
PK #iA3
T Unit_1.2_-_Algorithm_Analysis_and_the_RAM/12-Simplifications_For_Ram_Solution.en.srt1
00:00:00,000 --> 00:00:03,730
In my opinion, I would say that the assumption that simple operation
2
00:00:03,730 --> 00:00:06,760
take only one-time step is rather unrealistic
3
00:00:06,760 --> 00:00:12,680
because in real computer you have lots of differences between simple operations
4
00:00:12,680 --> 00:00:17,660
and also loops such as ifs and whys it can take very different amounts of time
5
00:00:17,660 --> 00:00:19,810
depending on where and when they are executed.
6
00:00:19,810 --> 00:00:24,860
The second one we have as much memory as we need that might be debatable for some cases
7
00:00:24,860 --> 00:00:28,590
given that memory is actually not that expensive anymore,
8
00:00:28,590 --> 00:00:32,630
but there's still lots of problems where you can run out of memory and so that is still a real issue.
9
00:00:32,630 --> 00:00:35,170
I would also say this is rather unrealistic.
10
00:00:35,170 --> 00:00:39,810
Memory access being free in terms of times that is very unrealistic
11
00:00:39,810 --> 00:00:42,790
because in the real computer, you really have to watch out
12
00:00:42,790 --> 00:00:44,250
what kind of memory you're using.
13
00:00:44,250 --> 00:00:47,540
For example, if you're using register and a processor
14
00:00:47,540 --> 00:00:50,900
that is memory that almost come for free that's very fast,
15
00:00:50,900 --> 00:00:53,410
but when you have to read and write to your hard disk that is going to take very long.
16
00:00:53,410 --> 00:00:56,330
Memory access being free is really unrealistic.
17
00:00:56,330 --> 00:01:03,110
Finally, a unit of memory not being able to hold an arbitrarily large number that is very realistic
18
00:01:03,110 --> 00:01:06,860
because your computer depending on how many bits each unit of memory can hold.
19
00:01:06,860 --> 00:01:13,720
If its 32 or 64 and there is always the limit of how much you can store in a single unit of memory
20
00:01:13,720 --> 00:01:15,040
and then you have to go to the next one.
21
00:01:15,040 --> 00:01:19,790
Given that there's lots of unrealistic aspects for the RAM,
22
00:01:19,790 --> 00:01:21,760
why are we using it in this course?
23
00:01:21,760 --> 00:01:28,850
Well, there's basically two reasons. The first reason is that its a very simple model to use.
24
00:01:28,850 --> 00:01:31,840
So, if we have to consider all of those aspect as well,
25
00:01:31,840 --> 00:01:35,409
our analysis would get even more complicated than in the simple example.
26
00:01:35,409 --> 00:01:41,100
And the second thing is that when we are running or analyzing algorithms on the RAM,
27
00:01:41,100 --> 00:01:45,440
the general behavior of the algorithm, how it behaves on various inputs,
28
00:01:45,440 --> 00:01:49,070
and how the running time changes with various types of inputs
29
00:01:49,070 --> 00:01:54,920
still is rather realistic even though we're not really taking into account these aspects.
30
00:01:54,920 --> 00:01:57,950
When you have to implement an algorithm,
31
00:01:57,950 --> 00:01:59,880
which is something that we're not doing in this course
32
00:01:59,880 --> 00:02:03,230
then of course, you have to take care of these things
33
00:02:03,230 --> 00:02:08,340
and see for example if the types of operation you're using is making your algorithm
34
00:02:08,340 --> 99:59:59,000
run faster or slower or if you need to take care of memory.
PK #iA|6Y M Unit_1.2_-_Algorithm_Analysis_and_the_RAM/13-Running_Time_Vs_Structure.en.srt1
00:00:00,000 --> 00:00:05,300
As we've already seen for the algorithm that Alice was using, the running time of an algorithm
2
00:00:05,300 --> 00:00:13,430
can often very dramatically vary with the size of the input that the algorithm has given,
3
00:00:13,430 --> 00:00:19,290
but the running time can also change with the content or the structure of the input.
4
00:00:19,290 --> 00:00:21,320
And here's the simple example to show you this.
5
00:00:21,320 --> 00:00:31,490
So the algorithm takes as input a string s(0) to s(n-1), which means it's a string of length n,
6
00:00:31,490 --> 00:00:35,450
so n characters in the string and here's the algorithm.
7
00:00:35,450 --> 00:00:39,700
So the algorithm does something very simple given that string.
8
00:00:39,700 --> 00:00:44,180
It counts the number of times that the character a appears in that string.
9
00:00:44,180 --> 00:00:50,280
So it sets the counter to zero and then goes through all the characters in the string one by one.
10
00:00:50,280 --> 00:00:55,380
And if that character is equal to a then it will increase the counter.
11
00:00:55,380 --> 00:01:02,220
So as in the previous examples, we're going to take this line here as a simple operation
12
00:01:02,220 --> 00:01:05,400
meaning to take one time step each time it's executed.
13
00:01:05,400 --> 00:01:09,360
And we're also going to consider this one here as simple operation
14
00:01:09,360 --> 00:01:14,310
meaning also this whole line here is going to take one time step each time it's executed.
15
00:01:14,310 --> 00:01:20,520
Now, what I would like you to do as our next quiz is tell me the number of time steps
16
00:01:20,520 --> 00:01:24,200
this algorithm takes for a given string s.
17
00:01:24,200 --> 00:01:28,340
And to give you that answer, there are two things you need to know or two variables
18
00:01:28,340 --> 00:01:30,190
you have to take into account.
19
00:01:30,190 --> 00:01:35,270
One is n, the length of the string, and the other one as you're going to find out is a,
20
00:01:35,270 --> 00:01:40,160
and with a, we're just going to denote the number of times that
21
00:01:40,160 --> 00:01:43,300
the character a actually appears in that string.
22
00:01:43,300 --> 00:01:48,490
So your answer is going to be some formula that includes n and includes a,
23
00:01:48,490 --> 00:01:53,310
and I would like you to give me the running time by entering the coefficients in the following formula
24
00:01:53,310 --> 00:02:04,370
so it's going to be some number multiply by n plus some number multiply by a plus a constant.
25
00:02:04,370 --> 00:02:08,250
Please enter those two numbers. So not the result of the formula.
26
00:02:08,250 --> 00:02:13,310
It's the running time of this algorithm when it encounters a string of length n
27
00:02:13,310 --> 99:59:59,000
where the letter a occurs exactly a times.
PK #iA@3M M V Unit_1.2_-_Algorithm_Analysis_and_the_RAM/14-Running_Time_Vs_Structure_Solution.en.srt1
00:00:00,000 --> 00:00:04,080
There's actually two correct answers here depending on how you think about it.
2
00:00:04,080 --> 00:00:15,970
Both answers in common that it's always 2n+1a, and depending on how you count this line here,
3
00:00:15,970 --> 00:00:22,430
it's going to be either a 0 or a 1 and the reason for that is the following--
4
00:00:22,430 --> 00:00:28,200
the first line again takes 06 as in all previous examples.
5
00:00:28,200 --> 00:00:38,110
For the second line, it either takes n time steps if you assume that this for loop here goes
6
00:00:38,110 --> 00:00:43,110
exactly through each character of the string and then stops immediately or it takes n+1 loops
7
00:00:43,110 --> 00:00:46,810
if you assume that it's executed like a for loop or while loop.
8
00:00:46,810 --> 00:00:52,200
This again shows that it can be very annoying to do exact time counting.
9
00:00:52,200 --> 00:00:57,980
The next step though is always execute n times because the number of times this inter loop here
10
00:00:57,980 --> 00:01:05,530
is executed does not depend on the number of time steps it takes for this line here to end.
11
00:01:05,530 --> 00:01:10,390
And finally, the counter is always increased when the algorithm encounters an a,
12
00:01:10,390 --> 00:01:14,520
so this line here is executed exactly 8 times.
13
00:01:14,520 --> 00:01:22,340
And if you sum up all of these, you get 2n+1a, and depending on how you count this line,
14
00:01:22,340 --> 99:59:59,000
you either get a 0 or 1.
PK #iAC L Unit_1.2_-_Algorithm_Analysis_and_the_RAM/15-Best_Worst_Average_Input.en.srt1
00:00:00,000 --> 00:00:09,090
Let's assume for now that the running time of this algorithm is 2n+a+1,
2
00:00:09,090 --> 00:00:15,650
so even for this simple algorithm, the running time depends on both the size of the inputs
3
00:00:15,650 --> 00:00:20,260
to the length of the string and the structure of the input, which in this case
4
00:00:20,260 --> 00:00:23,540
is the number of times the letter a occurs.
5
00:00:23,540 --> 00:00:28,240
And this is of course very problematic because on the one hand when we get
6
00:00:28,240 --> 00:00:33,130
more complicated algorithms, the formula here will get very, very complicated.
7
00:00:33,130 --> 00:00:37,980
And the second thing is that most of the time we don't even know what kinds of strings
8
00:00:37,980 --> 00:00:44,680
the algorithm will encounter, so we cannot get rid of this variable without making any assumptions.
9
00:00:44,680 --> 00:00:49,260
We want to avoid these complications and there's actually
10
00:00:49,260 --> 00:00:52,120
three different ways you could do that.
11
00:00:52,120 --> 00:00:56,970
The first one would be to have a very optimistic view and state the running time
12
00:00:56,970 --> 00:01:00,940
for the best possible input that this algorithm could receive.
13
00:01:00,940 --> 00:01:05,160
Basically saying this is the minimum running time of the algorithm
14
00:01:05,160 --> 00:01:08,670
and state that as its overall running time.
15
00:01:08,670 --> 00:01:13,620
You could, of course, also take the opposite view and say--we're going to state the running time
16
00:01:13,620 --> 00:01:19,450
as a worst-case view, so we're going to be very pessimistic almost like there was an adversary
17
00:01:19,450 --> 00:01:24,630
who is giving us the worst possible inputs and then state that as the running time.
18
00:01:24,630 --> 00:01:29,780
And then finally we can take an in between view and say--well, the usual input for the algorithm
19
00:01:29,780 --> 00:01:31,050
is not going to be your best case.
20
00:01:31,050 --> 99:59:59,000
It's also not going to be the worst case, so let's state some kind of average.
PK #iA!Y P Unit_1.2_-_Algorithm_Analysis_and_the_RAM/16-Best_And_Worst_Running_Times.en.srt1
00:00:00,000 --> 00:00:07,220
I would like to do a little quiz with you here and have you tell me if I give this algorithm here
2
00:00:07,220 --> 00:00:13,390
the best possible input, what it's running time will be and again I would like you to state that
3
00:00:13,390 --> 00:00:18,520
the function of n, a and the constant and the same thing for the pessimistic view.
4
00:00:18,520 --> 00:00:24,670
If I give the algorithm the worst possible input that it can receive, what will be its running time.
5
00:00:24,670 --> 00:00:28,880
And I should also note that I haven't told you yet what the best and worst possible input
6
00:00:28,880 --> 99:59:59,000
for the algorithm will be but I'll have you figure that out.
PK #iA&Uφ Y Unit_1.2_-_Algorithm_Analysis_and_the_RAM/17-Best_And_Worst_Running_Times_Solution.en.srt1
00:00:00,000 --> 00:00:03,270
What would be the best possible input for the algorithm.
2
00:00:03,270 --> 00:00:08,800
The best possible input for the algorithm would be if we give it a string that does not contain
3
00:00:08,800 --> 00:00:18,870
the letter a at all because this rates at a to 0, so the overall running time will be 2n+0a+1,
4
00:00:18,870 --> 00:00:21,270
which is this constant here.
5
00:00:21,270 --> 00:00:28,660
Now, in the worst case, the string will just consist of a's and this will set a equal to n
6
00:00:28,660 --> 00:00:34,540
because each of the letters in the string will be added, and so if a=n,
7
00:00:34,540 --> 00:00:39,880
then the overall running time is 32+0a+1.
8
00:00:39,880 --> 00:00:45,270
Now, the great thing about this here is that in both cases, both with a best-case view
9
00:00:45,270 --> 00:00:49,350
and with the worst-case view, we have eliminated the dependence on a,
10
00:00:49,350 --> 00:00:54,580
so we know that no matter what string the algorithm receives, the running time will always be
11
00:00:54,580 --> 00:00:59,000
between 2 and +1 and 3 and +1.
12
00:00:59,000 --> 00:01:04,910
Now, you could also say--why didn't we ask you to analyze the average running time
13
00:01:04,910 --> 00:01:09,080
or the running time for an average input and that is simply because it's very hard
14
00:01:09,080 --> 00:01:11,290
to define the notion of average.
15
00:01:11,290 --> 00:01:16,100
Does an average string, for example, contain only half of the letters as a
16
00:01:16,100 --> 00:01:20,690
or is it at end of sentence, in which case the number of a's will be very low.
17
00:01:20,690 --> 00:01:24,020
Unless we have a very precise view of the term average,
18
00:01:24,020 --> 99:59:59,000
the running time here would be very hard to define.
PK #iAH
J Unit_1.2_-_Algorithm_Analysis_and_the_RAM/18-Which_Type_Of_Analysis.en.srt1
00:00:00,000 --> 00:00:06,490
As you just saw on the quiz, there's three kinds of view we can take with regard to running time.
2
00:00:06,490 --> 00:00:12,220
We can take a best-case view and assume that all the inputs that we're getting are the ones that
3
00:00:12,220 --> 00:00:16,930
make our algorithms run the fastest or we can take the very opposite view
4
00:00:16,930 --> 00:00:23,460
and say that the running time of our algorithm is going to be determined by the worst possible input
5
00:00:23,460 --> 00:00:29,980
that it can receive or we can define running time as the average time that our algorithm will take
6
00:00:29,980 --> 00:00:32,040
over a number of inputs.
7
00:00:32,040 --> 00:00:35,660
Which one of the three are we going to choose in this course?
8
00:00:35,660 --> 00:00:42,090
Best-case running time as you've seen in the previous quiz is often rather trivial or meaningless.
9
00:00:42,090 --> 00:00:48,700
So for example, if we use best-case running time for our algorithm that counts the number of As,
10
00:00:48,700 --> 00:00:53,370
it would only be valid for strings that contain no A at all so we're not going to take this.
11
00:00:53,370 --> 00:00:58,870
For the average-case view, actually that could be a very interesting view and practice
12
00:00:58,870 --> 00:01:03,810
because when we run the algorithm a couple of times we might not care about
13
00:01:03,810 --> 00:01:07,860
how much that algorithm runs on a single run but over many inputs.
14
00:01:07,860 --> 00:01:13,000
But as you've just seen in the quiz, it's actually quite hard to define what an average input looks like.
15
00:01:13,000 --> 00:01:16,220
And also when we do the analysis of an algorithm
16
00:01:16,220 --> 00:01:20,460
sometimes we might not even know what kinds of input the algorithm receives.
17
00:01:20,460 --> 00:01:24,540
Average-case analysis is very interesting sometimes.
18
00:01:24,540 --> 00:01:29,870
But here, it's actually not that suitable. Now, worst-case analysis.
19
00:01:29,870 --> 00:01:33,640
Well we always assume that the algorithm receives an input that
20
00:01:33,640 --> 00:01:38,450
makes it run as long as possible might seem a bit pessimistic to you.
21
00:01:38,450 --> 00:01:44,520
It's almost like we had a mean adversary who was trying to give us the worst possible inputs.
22
00:01:44,520 --> 00:01:51,000
But on the other hand, the advantage of worst-case analysis is that it offers guarantees.
23
00:01:51,000 --> 00:01:55,690
What I mean by that is when we take a worst-case view we know that our algorithm
24
00:01:55,690 --> 00:02:00,640
will not run longer than the worst-case analysis suggests no matter what happens.
25
00:02:00,640 --> 00:02:06,860
And this is actually what we're interested in in this course so we are going to use worst-case analysis.
26
00:02:06,860 --> 99:59:59,000
Meaning, we're always going to state the time of our algorithm for the worst possible input.
PK #iAB N Unit_1.2_-_Algorithm_Analysis_and_the_RAM/19-Best_And_Worst_Case_Inputs.en.srt1
00:00:00,000 --> 00:00:07,820
Worst-case running time gives us guarantees and also it sounds like it's a very big simplification,
2
00:00:07,820 --> 00:00:13,460
but the thing is as long as we're still counting the number of time steps of the algorithm exactly
3
00:00:13,460 --> 00:00:18,380
even worst-case running time can get pretty tricky and I'll give you one example for this.
4
00:00:18,380 --> 00:00:24,190
Again, we're going to consider an algorithm that takes as input a string of length n.
5
00:00:24,190 --> 00:00:30,080
What this algorithm does is given the string of length n, it counts the number of times
6
00:00:30,080 --> 00:00:34,010
that the sequence ab appears in that string.
7
00:00:34,010 --> 00:00:41,920
So it will iterate over the length of the string and if it finds an a, it will look if the next character is a b
8
00:00:41,920 --> 00:00:45,530
and if that is the case, it will increase the counter here.
9
00:00:45,530 --> 00:00:52,780
What I would like you to consider for our next quiz is what a worst case and a bust case input
10
00:00:52,780 --> 00:00:54,840
for that algorithm would look like.
11
00:00:54,840 --> 00:00:59,430
Here we have the number of potential inputs and it doesn't really matter how long those inputs are.
12
00:00:59,430 --> 00:01:04,590
It's more about their structure and if the structure would cause the algorithm to behave
13
00:01:04,590 --> 00:01:08,510
as it would in the worst case or in the best case or something else,
14
00:01:08,510 --> 00:01:14,830
and what I would like you to do is for each of those strings, tell me if that string will cause the algorithm
15
00:01:14,830 --> 00:01:20,820
to go into a worst-case running time, a best-case running time, or something else.
16
00:01:20,820 --> 00:01:25,830
For each of those four strings, I would like you to tell me if those strings caused the algorithm
17
00:01:25,830 --> 00:01:33,260
to have the best possible running time or the worst possible running time or something in between,
18
00:01:33,260 --> 00:01:35,740
and it doesn't really matter how long those strings are.
19
00:01:35,740 --> 99:59:59,000
It's the structure that I want you to look at.
PK #iAEy W Unit_1.2_-_Algorithm_Analysis_and_the_RAM/20-Best_And_Worst_Case_Inputs_Solution.en.srt1
00:00:00,000 --> 00:00:05,070
To see how any of those strings make the algorithm behave, now let's have a closer look at it.
2
00:00:05,070 --> 00:00:09,580
The first three lines here are independent of the structure of the input.
3
00:00:09,580 --> 00:00:14,200
If you look at them, the counter is set to 0, we go through a range,
4
00:00:14,200 --> 00:00:20,430
and then we look if the letter that we're currently at is an 'a'.
5
00:00:20,430 --> 00:00:23,590
Those are independent of what kind of string we're looking at.
6
00:00:23,590 --> 00:00:26,440
We'll always be executing these parts of the code.
7
00:00:26,440 --> 00:00:30,450
But for those two parts here, that's kind of different.
8
00:00:30,450 --> 00:00:33,300
This line here, which is the fourth line,
9
00:00:33,300 --> 00:00:38,030
so this line here will only be executed if we have encountered an 'a'.
10
00:00:38,030 --> 00:00:44,970
And this line here will only be executed if we have encountered an 'a', then the next character is a 'b'.
11
00:00:44,970 --> 00:00:50,100
How often these two lines here are executed is dependent on the structure of the input.
12
00:00:50,100 --> 00:00:56,180
Now, in the worst case input, we would want these two lines to get executed as often as possible
13
00:00:56,180 --> 00:01:03,050
and in a best case scenario we would want them to get executed the least possible number of times.
14
00:01:03,050 --> 00:01:10,450
What we can already say is that a best case input will cause these two lines here
15
00:01:10,450 --> 00:01:16,180
to never be called and that only happens if the string does not contain the letter 'a' at all,
16
00:01:16,180 --> 00:01:18,340
which is the one down here.
17
00:01:18,340 --> 00:01:22,000
For the other strings, let's just go through the algorithm and count
18
00:01:22,000 --> 00:01:27,200
how many times this line here is executed and how many times this line down here is executed.
19
00:01:27,200 --> 00:01:36,460
If the string looks like ababab and so on, then every time that the algorithm encounters an 'a'
20
00:01:36,460 --> 00:01:43,800
it will execute this line down here and it will also execute this line down here
21
00:01:43,800 --> 00:01:45,920
because the next letter is a 'b'.
22
00:01:45,920 --> 00:01:47,920
Now, what if it encounters 'ab'?
23
00:01:47,920 --> 00:01:51,180
It will not execute this line, and it will not execute this line.
24
00:01:51,180 --> 00:01:55,940
And then it encounters an 'a' again, so this line will be executed and this line will be executed.
25
00:01:55,940 --> 00:02:01,000
And then again those two won't be. Then we have them both again.
26
00:02:01,000 --> 00:02:04,140
Then zero times and so on.
27
00:02:04,140 --> 00:02:12,620
Basically, for every two letters that it encounters, it will execute these two lines.
28
00:02:12,620 --> 00:02:20,200
So two lines per two letters makes one extra line per letter.
29
00:02:20,200 --> 00:02:23,670
Now for the second string, what happens is this.
30
00:02:23,670 --> 00:02:27,090
The algorithm encounters an 'a' so it will execute this line
31
00:02:27,090 --> 00:02:30,760
but the next letter is not a 'b' so it will not execute that line down here.
32
00:02:30,760 --> 00:02:33,710
And then it goes on to the next letter and again it encounters an 'a'
33
00:02:33,710 --> 00:02:36,520
so this line here will be executed but not this one down here
34
00:02:36,520 --> 00:02:39,180
because the next letter again is not an 'a'.
35
00:02:39,180 --> 00:02:44,960
So as the algorithm progresses, it executes one extra line per letter just as the string above.
36
00:02:44,960 --> 00:02:51,600
Now, finally, if the algorithm encounters acacac, what will happen is this.
37
00:02:51,600 --> 00:02:56,190
First, the algorithm encounters an 'a' so it executes this line down here.
38
00:02:56,190 --> 00:03:00,350
But then it does not encounter a 'b' so it will not go into this line.
39
00:03:00,350 --> 00:03:08,120
Next letter, the algorithm encounters a 'c' so it will not execute this line down here and so on.
40
00:03:08,120 --> 00:03:14,600
I think you get the picture. So this is not one extra line or letter but it will only be 0.5.
41
00:03:14,600 --> 00:03:22,000
It already tells us that the string here is not a worst case string but it's something in between.
42
00:03:22,000 --> 00:03:26,520
Now those two strings because they both require the same amount of time
43
00:03:26,520 --> 00:03:31,200
are either both worst case or they're both something in between.
44
00:03:31,200 --> 00:03:35,000
Now, this is something where you just have to look closely at the algorithm
45
00:03:35,000 --> 00:03:42,110
to see that a worst case input is among others one that contains as many times the letters 'a' 'b'
46
00:03:42,110 --> 00:03:44,530
as possible which is exactly this string.
47
00:03:44,530 --> 00:03:48,730
So there's no worse strings to encounter for this algorithm.
48
00:03:48,730 --> 00:03:50,810
So this one is a worst-case input.
49
00:03:50,810 --> 00:03:56,170
But surprisingly also this one down here although it doesn't contain the sequence ab at all
50
00:03:56,170 --> 00:03:58,170
it's also a worst-case input.
51
00:03:58,170 --> 00:04:00,960
Even now that we have introduced a number of simplifications.
52
00:04:00,960 --> 00:04:05,260
We have introduced the RAM as a simplified computer model to count the number if time steps
53
00:04:05,260 --> 00:04:09,590
and we've also said that we're just going to look at worst case running time
54
00:04:09,590 --> 00:04:13,030
you see that the analysis of algorithms can still be quite tedious.
55
00:04:13,030 --> 00:04:17,980
It's sometimes hard to identify what exactly a worst-case input will be.
56
00:04:17,980 --> 00:04:21,360
And even if we know the worst-case input, finding the exact formula
57
00:04:21,360 --> 00:04:25,450
for the running time is very difficult, which is why we're going to introduce
58
00:04:25,450 --> 99:59:59,000
another simplification to state running time and that simplification is known as big O notation.
PK #iAW H Unit_1.2_-_Algorithm_Analysis_and_the_RAM/01-Analyzing_Algorithms.en.srtPK
#iAsR R Q Unit_1.2_-_Algorithm_Analysis_and_the_RAM/02-Analyzing_Algorithms_Solution.en.srtPK #iAy, C Unit_1.2_-_Algorithm_Analysis_and_the_RAM/03-Simplifications.en.srtPK #iA&. ; Unit_1.2_-_Algorithm_Analysis_and_the_RAM/04-The_Ram.en.srtPK #iA/x0 0 D %) Unit_1.2_-_Algorithm_Analysis_and_the_RAM/05-The_Ram_Solution.en.srtPK #iA8 I 0 Unit_1.2_-_Algorithm_Analysis_and_the_RAM/06-Random_Access_Machine.en.srtPK #iA| G