PK 0!iA U 8 Problem_Set_3/01-Formal_Grammar_Concepts_Solution.en.srt1
00:00:00,000 --> 00:00:04,000
This first question asks us to make a few statements about formal grammars.
2
00:00:04,000 --> 00:00:07,000
Let's go over a little bit of vocabulary.
3
00:00:07,000 --> 00:00:09,000
A language is simply a set of strings.
4
00:00:09,000 --> 00:00:12,000
A regular language is a type of language.
5
00:00:12,000 --> 00:00:15,000
In particular, it's a set of strings that can be represented by
6
00:00:15,000 --> 00:00:18,000
a finite-state machine or a regular expression.
7
00:00:18,000 --> 00:00:22,000
A context-free language is also a set of strings
8
00:00:22,000 --> 00:00:25,000
but can be represented by context-free grammar,
9
00:00:25,000 --> 00:00:28,000
and what these 3 questions are asking us in this problem
10
00:00:28,000 --> 00:00:33,000
is to determine the relationship between a regular language and a context-free language,
11
00:00:33,000 --> 00:00:39,000
so let's say this bubble represents the set of regular languages.
12
00:00:39,000 --> 00:00:41,000
It's a set of sets of strings.
13
00:00:41,000 --> 00:00:45,000
And let's say this blob is the set of context-free languages.
14
00:00:45,000 --> 00:00:49,000
What we want to determine is how we can connect these two.
15
00:00:49,000 --> 00:00:54,000
Is it the case that no regular languages are context-free languages and vice versa?
16
00:00:54,000 --> 00:00:57,000
Or perhaps there are some regular languages that are not context-free.
17
00:00:57,000 --> 00:01:00,000
There are some context-free languages that are not regular,
18
00:01:00,000 --> 00:01:03,000
but there are some languages that are regular and context-free.
19
00:01:03,000 --> 00:01:08,000
And the last 2 possibilities are that all regular languages are context-free and vice versa.
20
00:01:08,000 --> 00:01:12,000
This kind of looks like a Pac-Man. Nom, nom.
21
00:01:12,000 --> 00:01:14,000
As it turns out, this diagram is the correct one.
22
00:01:14,000 --> 00:01:19,000
All regular languages are context-free, but not all context-free languages are regular.
23
00:01:19,000 --> 00:01:23,000
Briefly in lecture, Wes outlined the proof that all regular languages are context-free.
24
00:01:23,000 --> 00:01:27,000
The way the proof works is that you show that given an arbitrary
25
00:01:27,000 --> 00:01:31,000
finite-state machine, you can create a context-free grammar
26
00:01:31,000 --> 00:01:34,000
that embodies it, that represents the same functionality.
27
00:01:34,000 --> 00:01:38,000
Namely, it generates the same exact set of strings,
28
00:01:38,000 --> 00:01:41,000
and to show that there are some context-free languages that are not regular,
29
00:01:41,000 --> 00:01:43,000
we just have to come up with an example.
30
00:01:43,000 --> 00:01:47,000
An example of this is this set of matching parentheses.
31
00:01:47,000 --> 00:01:51,000
As it turns out--and I'm not going to prove it here--you can't represent this language
32
00:01:51,000 --> 00:01:53,000
with a regular expression.
33
00:01:53,000 --> 00:01:56,000
That's because you can't count
34
00:01:56,000 --> 00:01:59,000
how many times you see a left parentheses
35
00:01:59,000 --> 00:02:03,000
and then require that you see exactly that many right parentheses.
36
00:02:03,000 --> 00:02:06,000
There's just no form of counting in a regular expression.
37
00:02:06,000 --> 00:02:08,000
You can't hold that much state.
38
00:02:08,000 --> 00:02:12,000
Given that review, let's go over the actual questions.
39
00:02:12,000 --> 00:02:17,000
If a language is regular, than that language is also context-free.
40
00:02:17,000 --> 00:02:21,000
This is always true because we have our Pac-Man diagram here,
41
00:02:21,000 --> 00:02:24,000
and also because you can take any regular language
42
00:02:24,000 --> 00:02:26,000
and represent it with a context-free grammar.
43
00:02:26,000 --> 00:02:30,000
You can take any finite-state machine and represent it with a context-free grammar.
44
00:02:30,000 --> 00:02:34,000
If a language L is context-free, than that language L is also regular.
45
00:02:34,000 --> 00:02:36,000
This is sometimes true.
46
00:02:36,000 --> 00:02:40,000
If we have a regular language, that language is context-free,
47
00:02:40,000 --> 00:02:42,000
and, well, it's also regular.
48
00:02:42,000 --> 00:02:46,000
We have a case where sometimes a context-free language is regular.
49
00:02:46,000 --> 00:02:49,000
The question is is this always true, and the answer is no.
50
00:02:49,000 --> 00:02:54,000
If we have a context-free language such as the matching parentheses I went over before,
51
00:02:54,000 --> 00:02:57,000
this is a language that's context-free, but it's not regular.
52
00:02:57,000 --> 00:03:00,000
Sometimes this statement is true, and sometimes it's not.
53
00:03:00,000 --> 00:03:04,000
If 2 context-free grammars, G1 and G2,
54
00:03:04,000 --> 00:03:08,000
both accept an infinite number of strings, than they accept the same strings.
55
00:03:08,000 --> 00:03:10,000
This is sometimes true.
56
00:03:10,000 --> 00:03:12,000
Let's think of a couple examples.
57
00:03:12,000 --> 00:03:14,000
Well, there's a straightforward example that shows
58
00:03:14,000 --> 00:03:17,000
that this is a statement that can be true,
59
00:03:17,000 --> 00:03:21,000
and that is when G1 and G2 are the same grammar.
60
00:03:21,000 --> 00:03:26,000
Let's say we have the language ba that Wes likes to use in the examples.
61
00:03:26,000 --> 00:03:33,000
We have baaa, baaaaa.
62
00:03:33,000 --> 00:03:35,000
All right, so we have ba.
63
00:03:35,000 --> 00:03:38,000
Let's say we have the grammar G2.
64
00:03:38,000 --> 00:03:40,000
And you know what?
65
00:03:40,000 --> 00:03:42,000
Let's make it the same exact language.
66
00:03:42,000 --> 00:03:44,000
I have 2 grammars.
67
00:03:44,000 --> 00:03:46,000
Both accept an infinite number of strings.
68
00:03:46,000 --> 00:03:50,000
They happen to be the same strings, so this statement is sometimes true.
69
00:03:50,000 --> 00:03:52,000
We didn't say G1 and G2 had to be unique.
70
00:03:52,000 --> 00:03:56,000
You could also come up with examples where G1 and G2 are different,
71
00:03:56,000 --> 00:04:02,000
but they're the same exact string, so you could do baa.
72
00:04:02,000 --> 00:04:06,000
And you'll also note I used a regular expression for my grammar,
73
00:04:06,000 --> 00:04:11,000
but all regular languages are context-free, so this is perfectly acceptable.
74
00:04:11,000 --> 00:04:14,000
Let's come up with a situation where this isn't true.
75
00:04:14,000 --> 00:04:19,000
Let's change G2 so that instead of ba
76
00:04:19,000 --> 00:04:24,000
it's fa, so we've got fa, faaa, and faa.
77
00:04:24,000 --> 00:04:26,000
I don't know, a lot of a's.
78
00:04:26,000 --> 00:04:28,000
These are both infinite, infinite grammars.
79
00:04:28,000 --> 00:04:30,000
They're regular expressions, therefore, they're context-free,
80
00:04:30,000 --> 00:04:34,000
but they don't accept the same strings, so we've come up with a case
81
00:04:34,000 --> 00:04:37,000
where this can be true, and we come up with a case where it can be false,
82
00:04:37,000 --> 99:59:59,000
so we're going to say it's sometimes true.
PK 1!iA# # 8 Problem_Set_3/02-Reading_Machine_Minds_2_Solution.en.srt1
00:00:00,000 --> 00:00:06,000
In this problem we've been asked to define a function, cfgempty,
2
00:00:06,000 --> 00:00:10,000
which takes in a representation of a context-free grammar,
3
00:00:10,000 --> 00:00:14,000
a starting state, and a list of visited states
4
00:00:14,000 --> 00:00:18,000
and determines whether or not that context-free grammar is empty.
5
00:00:18,000 --> 00:00:22,000
Just like regular languages, we define an empty grammar
6
00:00:22,000 --> 00:00:24,000
as one that accepts no strings.
7
00:00:24,000 --> 00:00:28,000
Worded the other way, the set of strings that the grammar accepts is empty.
8
00:00:28,000 --> 00:00:30,000
So what's an example?
9
00:00:30,000 --> 00:00:35,000
Say we have the grammar with symbol S and S goes to S with the terminal a.
10
00:00:35,000 --> 00:00:37,000
In order for the grammar to make a string,
11
00:00:37,000 --> 00:00:41,000
we have to be able to replace all of the non-terminals with terminals.
12
00:00:41,000 --> 00:00:47,000
In this case, you can only replace S with itself and another terminal,
13
00:00:47,000 --> 00:00:50,000
but you're never going to get to the point where the string that you're building up,
14
00:00:50,000 --> 00:00:54,000
the substitutions you make, results in a string that's only terminals.
15
00:00:54,000 --> 00:00:56,000
So we say this grammar is empty.
16
00:00:56,000 --> 00:00:59,000
We want to write a program that does this for us because, well,
17
00:00:59,000 --> 00:01:01,000
it can get pretty complicated.
18
00:01:01,000 --> 00:01:03,000
As is the case with most of these examples,
19
00:01:03,000 --> 00:01:06,000
we're going to use a recursive strategy.
20
00:01:06,000 --> 00:01:10,000
As I said before, our procedure takes in 3 inputs:
21
00:01:10,000 --> 00:01:15,000
the grammar, the symbol that we're on, and the symbols that we've visited.
22
00:01:15,000 --> 00:01:17,000
This is important so that we don't loop forever.
23
00:01:17,000 --> 00:01:21,000
An empty language is going to be one where we keep visiting the same states
24
00:01:21,000 --> 00:01:23,000
over and over again without ever finishing.
25
00:01:23,000 --> 00:01:27,000
So this is really crucial to making this function work.
26
00:01:27,000 --> 00:01:29,000
We're going to go through the grammar,
27
00:01:29,000 --> 00:01:33,000
visiting all the rules that apply and doing the rewrite,
28
00:01:33,000 --> 00:01:35,000
and we're going to have 2 base cases.
29
00:01:35,000 --> 00:01:39,000
One is that we visited this before, and this means we're going in a loop
30
00:01:39,000 --> 00:01:41,000
and so this is a dead end.
31
00:01:41,000 --> 00:01:45,000
We're going to return something that shows that this wasn't true.
32
00:01:45,000 --> 00:01:48,000
We're going to return the value None.
33
00:01:48,000 --> 00:01:51,000
That says that this grammar is indeed empty.
34
00:01:51,000 --> 00:01:55,000
If the symbol is a terminal, then the grammar isn't empty,
35
00:01:55,000 --> 00:01:57,000
or at least the grammar so far isn't empty.
36
00:01:57,000 --> 00:01:59,000
We're just going to return the terminal because that's an example
37
00:01:59,000 --> 00:02:02,000
where if you start from a terminal you get a terminal.
38
00:02:02,000 --> 00:02:06,000
It's an example where the grammar finishes, it produces a string.
39
00:02:06,000 --> 00:02:10,000
Otherwise, we're going to go through all the matching symbols,
40
00:02:10,000 --> 00:02:15,000
all the rules that apply, do the substitutions, and recurse.
41
00:02:15,000 --> 00:02:21,000
And if the recursion returns a not null value, it returns a string of terminals,
42
00:02:21,000 --> 00:02:24,000
we're going to return the string back.
43
00:02:24,000 --> 00:02:27,000
And if the grammar does accept any string,
44
00:02:27,000 --> 00:02:32,000
we're going to build it up, going back up through the recursive calls
45
00:02:32,000 --> 00:02:34,000
and return a string that is a member of this language.
46
00:02:34,000 --> 00:02:37,000
Let's go to the IDE and write it out.
47
00:02:37,000 --> 00:02:39,000
Here I've defined my function.
48
00:02:39,000 --> 00:02:42,000
As we described earlier, we've taken a grammar
49
00:02:42,000 --> 00:02:44,000
which is represented as a list of rewrite rules,
50
00:02:44,000 --> 00:02:49,000
we have the symbol that we're currently on, and we have a list of symbols that we've visited.
51
00:02:49,000 --> 00:02:53,000
If we've currently visited the symbol that we're on, we're going to return None.
52
00:02:53,000 --> 00:02:56,000
We didn't find a string that's a member of this language,
53
00:02:56,000 --> 00:03:00,000
and this is going to avoid infinite loops so that our program terminates.
54
00:03:00,000 --> 00:03:04,000
So before I go on, you might have noticed I use the built-in function any,
55
00:03:04,000 --> 00:03:07,000
which was given as a hint in the problem.
56
00:03:07,000 --> 00:03:09,000
Let's briefly go over it.
57
00:03:09,000 --> 00:03:15,000
Any takes in a list, and if any value in this list is true, any returns True.
58
00:03:15,000 --> 00:03:20,000
It only returns False if all the values in this list are false.
59
00:03:20,000 --> 00:03:22,000
Let's go back to the code.
60
00:03:22,000 --> 00:03:28,000
Going back to our list comprehension, we're going to create a list of true and false values.
61
00:03:28,000 --> 00:03:33,000
We're going to create a true value if there's a rewrite rule that applies to the symbol--
62
00:03:33,000 --> 00:03:36,000
in other words, if there's a rewrite rule where the left-hand side
63
00:03:36,000 --> 00:03:39,000
is the symbol that we're currently on.
64
00:03:39,000 --> 00:03:41,000
Otherwise, it's going to be false.
65
00:03:41,000 --> 00:03:45,000
This statement is essentially asking, "Is this symbol a terminal?"
66
00:03:45,000 --> 00:03:48,000
If no rewrite rules apply, we're going to say it's a terminal.
67
00:03:48,000 --> 00:03:52,000
And if it's a terminal, we're just going to return the current symbol we're on.
68
00:03:52,000 --> 00:03:55,000
That's the base case that we decided to do when we were sketching out this program.
69
00:03:55,000 --> 00:03:57,000
So those are our 2 base cases.
70
00:03:57,000 --> 00:04:00,000
Now we have to write the code that goes through the rewrite rules
71
00:04:00,000 --> 00:04:03,000
that apply to the non-terminal that we're on.
72
00:04:03,000 --> 00:04:08,000
We're going to update the list of visited symbols with the current visit symbol.
73
00:04:08,000 --> 00:04:15,000
Here we have a for loop that's going to go through every right-hand side of a rewrite rule
74
00:04:15,000 --> 00:04:18,000
that applies to the current symbol.
75
00:04:18,000 --> 00:04:21,000
This is going to give us every right-hand side
76
00:04:21,000 --> 00:04:24,000
where the left-hand side is the symbol.
77
00:04:24,000 --> 00:04:26,000
And the way it does that is it goes through the grammar,
78
00:04:26,000 --> 00:04:31,000
and if the left-hand side is our symbol, it's going to return the right-hand side.
79
00:04:31,000 --> 00:04:37,000
So this line asks the question, "Does every symbol in the right-hand side of our rule
80
00:04:37,000 --> 00:04:39,000
"result in a non-empty grammar?"
81
00:04:39,000 --> 00:04:43,000
And we determine that by calling cfgempty on every single symbol.
82
00:04:43,000 --> 00:04:47,000
We're using a list comprehension to create a list of true or false values
83
00:04:47,000 --> 00:04:52,000
that corresponds to whether the relative symbol is non-empty.
84
00:04:52,000 --> 00:04:54,000
It's true if it's non-empty.
85
00:04:54,000 --> 00:04:57,000
So if we're in the situation where we have a list of all true values,
86
00:04:57,000 --> 00:05:01,000
that means every symbol in the right-hand side is non-empty
87
00:05:01,000 --> 00:05:08,000
and the function all will return True in a similar way to any.
88
00:05:08,000 --> 00:05:13,000
But all, as opposed to any, returns True only if every element in the list is true.
89
00:05:13,000 --> 00:05:18,000
So if it's the case, we can get an example that proves that this context-free grammar
90
00:05:18,000 --> 00:05:20,000
isn't empty.
91
00:05:20,000 --> 00:05:24,000
So we create by going through each symbol in the right-hand side
92
00:05:24,000 --> 00:05:27,000
and building up our string and then returning it.
93
00:05:27,000 --> 00:05:30,000
If none of our rewrite rules terminated, then we're going to return None
94
00:05:30,000 --> 99:59:59,000
because we didn't find anything down this path.
PK 3!iAOj j 1 Problem_Set_3/03-Making_Ambiguity_Solution.en.srt1
00:00:00,000 --> 00:00:04,000
Here we have a grammar that represents some made-up language,
2
00:00:04,000 --> 00:00:07,000
kind of a cross between--I don't know--Python and JavaScript.
3
00:00:07,000 --> 00:00:10,000
It has a lot of keywords that are found in a lot of languages.
4
00:00:10,000 --> 00:00:12,000
Let's say we add a new rule.
5
00:00:12,000 --> 00:00:18,000
The question we need to answer is whether adding this new rule to this grammar
6
00:00:18,000 --> 00:00:20,000
creates an ambiguous grammar.
7
00:00:20,000 --> 00:00:26,000
As a reminder, an ambiguous grammar is one that given a string in that language,
8
00:00:26,000 --> 00:00:31,000
there are 2 parse trees, there are 2 ways to traverse the grammar and create that string.
9
00:00:31,000 --> 00:00:37,000
As we'll see later, this is really bad for languages that we want to interpret
10
00:00:37,000 --> 00:00:42,000
because it can result in the situation where 1 string, 1 program,
11
00:00:42,000 --> 00:00:47,000
1 HTML document has 2 valid meanings.
12
00:00:47,000 --> 00:00:49,000
And so you can't really choose one,
13
00:00:49,000 --> 00:00:51,000
and that's not what we like.
14
00:00:51,000 --> 00:00:54,000
We use these precise languages such as HTML and JavaScript
15
00:00:54,000 --> 00:00:57,000
so that we know exactly what we want to be displayed
16
00:00:57,000 --> 00:01:01,000
or the exact actions that we want to be performed. We don't want ambiguity.
17
00:01:01,000 --> 00:01:05,000
Computers don't handle ambiguity very well. Their little brains explode.
18
00:01:05,000 --> 00:01:08,000
We can sit and stare and think about this for a while,
19
00:01:08,000 --> 00:01:13,000
but I'll just give you an example string that actually shows that adding this rule
20
00:01:13,000 --> 00:01:15,000
makes the grammar ambiguous.
21
00:01:15,000 --> 00:01:18,000
That gets us a check. Let's go over that.
22
00:01:18,000 --> 00:01:22,000
Here I have the string, and what we need to do is show that there's 2 ways
23
00:01:22,000 --> 00:01:24,000
using this grammar to generate the string.
24
00:01:24,000 --> 00:01:28,000
One way is using the if-then-else construct
25
00:01:28,000 --> 00:01:35,000
where this if, this then, and this else match here, here, and here.
26
00:01:35,000 --> 00:01:42,000
That means this part of the string corresponds to this C,
27
00:01:42,000 --> 00:01:49,000
this one goes here, and this one is from the E right there. That's 1 way to do it.
28
00:01:49,000 --> 00:01:54,000
The second way is using the if-then, do these 2 symbols,
29
00:01:54,000 --> 00:01:57,000
and then the if-then-else here,
30
00:01:57,000 --> 00:02:01,000
meaning that this corresponds to that S,
31
00:02:01,000 --> 00:02:03,000
this corresponds to that E,
32
00:02:03,000 --> 00:02:06,000
and then we have an if-then-else as that S.
33
00:02:06,000 --> 00:02:10,000
So here's 2 valid ways using this grammar to generate the same string.
34
00:02:10,000 --> 00:02:13,000
This shows that the grammar is ambiguous when we add this rule.
35
00:02:13,000 --> 00:02:15,000
Let's do another.
36
00:02:15,000 --> 00:02:17,000
Let's say we add this rule.
37
00:02:17,000 --> 00:02:19,000
B goes to a B and then a B.
38
00:02:19,000 --> 00:02:21,000
This is also ambiguous.
39
00:02:21,000 --> 00:02:29,000
Say we have print 4; print 4.
40
00:02:29,000 --> 00:02:31,000
So now we have 2 ways to create this.
41
00:02:31,000 --> 00:02:33,000
One way is to start with a B.
42
00:02:33,000 --> 00:02:35,000
From there we go to S and B.
43
00:02:35,000 --> 00:02:37,000
From here we can do print 4.
44
00:02:37,000 --> 00:02:41,000
We can replace the B--oh, I forgot to add that.
45
00:02:41,000 --> 00:02:43,000
We can replace the S with the print 4.
46
00:02:43,000 --> 00:02:49,000
An alternative way is basically do something very similar
47
00:02:49,000 --> 00:02:53,000
where we start with the B and we go to B B,
48
00:02:53,000 --> 00:02:58,000
but this time we replace each B with the S;
49
00:02:58,000 --> 00:03:00,000
That's our second way.
50
00:03:00,000 --> 00:03:02,000
Here there are 2 more rules.
51
00:03:02,000 --> 00:03:04,000
Neither of these rules makes the grammar ambiguous.
52
00:03:04,000 --> 00:03:08,000
When we add the keyword int, this is the only way to get an int.
53
00:03:08,000 --> 00:03:12,000
So there's not going to be a second way to generate strings with int.
54
00:03:12,000 --> 00:03:16,000
The same basic logic applies to the parentheses as well.
55
00:03:16,000 --> 00:03:18,000
This is the only way I have parentheses,
56
00:03:18,000 --> 00:03:21,000
and there's not going to be a second way there.
57
00:03:21,000 --> 00:03:24,000
All it does is add parentheses on the outside of a statement.
58
00:03:24,000 --> 00:03:26,000
And we can do it as many times as we want,
59
00:03:26,000 --> 00:03:31,000
but given x number of parentheses, you just apply this x number of times.
60
00:03:31,000 --> 00:03:33,000
Not a whole lot of fancy stuff going here.
61
00:03:33,000 --> 00:03:38,000
The fact that they match also is kind of a key contributor to this.
62
00:03:38,000 --> 00:03:41,000
If you had a parenthesis on 1 side and you had another rule with the other side,
63
00:03:41,000 --> 00:03:47,000
then you could maybe get some ambiguity depending on how you word it.
64
00:03:47,000 --> 00:03:50,000
But that's how this is. Let's go on to the last one.
65
00:03:50,000 --> 00:03:52,000
This should look familiar.
66
00:03:52,000 --> 00:03:56,000
I believe we went over this in lecture or something very similar to this,
67
00:03:56,000 --> 00:03:59,000
and this is definitely ambiguous.
68
00:03:59,000 --> 00:04:02,000
I'll come up with 1 example.
69
00:04:02,000 --> 00:04:07,000
Say we have 1 + 2 + 3. We can do this in 2 ways.
70
00:04:07,000 --> 00:04:16,000
We have E goes to E + E goes to 1, and this eventually goes to 2 + 3.
71
00:04:16,000 --> 00:04:21,000
An alternative way is that we have the 2 on the left-hand side and then we have the 3.
72
00:04:21,000 --> 00:04:24,000
Although the meaning of a plus, the associativity of addition
73
00:04:24,000 --> 00:04:29,000
allows us to do it in any order, from the perspective of a language of strings,
74
00:04:29,000 --> 00:04:34,000
perspective of determining whether or not the context-free grammar is ambiguous,
75
00:04:34,000 --> 99:59:59,000
the meaning of a plus is irrelevant here.
PK 3!iA̅# # 6 Problem_Set_3/04-Infinite_Mind_Reading_Solution.en.srt1
00:00:00,000 --> 00:00:03,000
In this problem we're tasked with creating a function
2
00:00:03,000 --> 00:00:07,000
that determines whether or not a context-free grammar is infinite.
3
00:00:07,000 --> 00:00:11,000
We say an infinite grammar is one that accepts an infinite number of strings.
4
00:00:11,000 --> 00:00:13,000
So let's come up with an example.
5
00:00:13,000 --> 00:00:20,000
Here is a grammar that accepts strings that spell out ba--different length of a, 3 a's--
6
00:00:20,000 --> 00:00:24,000
and this can be possibly an infinite number of a's that follow the b.
7
00:00:24,000 --> 00:00:28,000
In an example of a finite grammar, I simply took the infinite grammar,
8
00:00:28,000 --> 00:00:32,000
removed the a right here, and this grammar only accepts the string b.
9
00:00:32,000 --> 00:00:37,000
Since the number of strings that this grammar accepts is finite--it's not infinite--
10
00:00:37,000 --> 00:00:39,000
it's not an infinite grammar.
11
00:00:39,000 --> 00:00:42,000
So before I go on with how we're going to approach this problem,
12
00:00:42,000 --> 00:00:46,000
I'm going to go over one of the assumptions that you were given in the problem specification.
13
00:00:46,000 --> 00:00:50,000
That assumption is that for every non-terminal in the grammar,
14
00:00:50,000 --> 00:00:54,000
it derives at least 1 non-empty--that's critical--string.
15
00:00:54,000 --> 00:00:59,000
If you see S, you can assume that there's at least 1 non-empty string that S derives.
16
00:00:59,000 --> 00:01:01,000
So how does that help us?
17
00:01:01,000 --> 00:01:04,000
The way we're going to approach this problem is that we're going to try to see
18
00:01:04,000 --> 00:01:10,000
if we can rewrite Q where Q is any non-terminal in our grammar
19
00:01:10,000 --> 00:01:16,000
in the form of some terminal x, Q again, and some terminal y
20
00:01:16,000 --> 00:01:18,000
where either x or y is non-empty--
21
00:01:18,000 --> 00:01:23,000
that is, the length of x and y together is greater than 0.
22
00:01:23,000 --> 00:01:26,000
So they both can't be empty because that's not a very helpful rule
23
00:01:26,000 --> 00:01:30,000
if Q goes to Q. That's not going to make us any progress.
24
00:01:30,000 --> 00:01:33,000
The reason that this is important is that if we have this rule
25
00:01:33,000 --> 00:01:39,000
or can rewrite Q in this form, we're back to the ba rule that we had originally.
26
00:01:39,000 --> 00:01:44,000
If we have this rule, we can build up an infinite string by simply repeating it
27
00:01:44,000 --> 00:01:48,000
over and over again and adding x's to the beginning or y's to the end.
28
00:01:48,000 --> 00:01:50,000
And so that's how we get an infinite grammar.
29
00:01:50,000 --> 00:01:55,000
The assumption comes in again because if we see this rule,
30
00:01:55,000 --> 00:02:01,000
we've assumed that Q can derive at least 1 non-empty string.
31
00:02:01,000 --> 00:02:05,000
We don't have to run what we wrote earlier in the homework, cfgempty, on Q;
32
00:02:05,000 --> 00:02:07,000
we can just assume it.
33
00:02:07,000 --> 00:02:10,000
So I don't actually even have to check if Q can eventually finish.
34
00:02:10,000 --> 00:02:15,000
If I find Q, I've found that the grammar is infinite as long as this property holds.
35
00:02:15,000 --> 00:02:17,000
So let's go over the game plan.
36
00:02:17,000 --> 00:02:20,000
I'm going to go through each non-terminal in the grammar
37
00:02:20,000 --> 00:02:22,000
because it doesn't really matter where I start.
38
00:02:22,000 --> 00:02:27,000
Any one of them is a valid way to generate an infinite grammar.
39
00:02:27,000 --> 00:02:31,000
Then I'm going to check for each of those non-terminals if it's possible
40
00:02:31,000 --> 00:02:35,000
to form something in the manner of this rewrite rule. So how do I do this part?
41
00:02:35,000 --> 00:02:38,000
I'm going to write a recursive helper function,
42
00:02:38,000 --> 00:02:40,000
and I'm going to check if that rule is of that form.
43
00:02:40,000 --> 00:02:45,000
If it's not, I'm going to recurse; if it is, I'm going to stop. Let's go to the code.
44
00:02:45,000 --> 00:02:47,000
Here I've declared my function, cfginfinite,
45
00:02:47,000 --> 00:02:50,000
where I take in a Python representation of the grammar.
46
00:02:50,000 --> 00:02:52,000
I'm going to go through every rule in the grammar--
47
00:02:52,000 --> 00:02:54,000
that's what this list comprehension does--
48
00:02:54,000 --> 00:02:58,000
and Q is going to be the left-hand side symbol in that rule.
49
00:02:58,000 --> 00:03:01,000
So this may be new to you.
50
00:03:01,000 --> 00:03:05,000
What I've done is I've declared a function within my function.
51
00:03:05,000 --> 00:03:10,000
The effect of this is that I can only call the helper function that I'm creating
52
00:03:10,000 --> 00:03:12,000
within the function cfginfinite.
53
00:03:12,000 --> 00:03:19,000
One advantage of doing this is that this function is not useful to any other code
54
00:03:19,000 --> 00:03:22,000
because I wrote it specifically for cfginfinite.
55
00:03:22,000 --> 00:03:25,000
So I can restrict access so that it's not misused in other places
56
00:03:25,000 --> 00:03:27,000
or depended on in other places.
57
00:03:27,000 --> 00:03:30,000
One disadvantage is that it kind of makes the code harder to read,
58
00:03:30,000 --> 00:03:33,000
and it makes it more difficult to understand where functions can be called
59
00:03:33,000 --> 00:03:35,000
and where they can't.
60
00:03:35,000 --> 00:03:40,000
Regardless, I'm using this helper function to determine whether or not the symbol Q
61
00:03:40,000 --> 00:03:49,000
can be rewritten in our form of x Q y where the length of x and y together is greater than 0.
62
00:03:49,000 --> 00:03:52,000
The 3 parameters are the current symbol we're at, the symbols we visited--
63
00:03:52,000 --> 00:03:55,000
we don't want to loop around; otherwise, the helper function won't finish--
64
00:03:55,000 --> 00:03:59,000
and the size of x and y in that case. It's this value.
65
00:03:59,000 --> 00:04:02,000
If I've looped around and I've already visited something,
66
00:04:02,000 --> 00:04:06,000
I'm going to return whether or not the size of x and y is greater than 0.
67
00:04:06,000 --> 00:04:11,000
If it is, we've found the rule that we're looking for and the helper will return True.
68
00:04:11,000 --> 00:04:15,000
If it's not, we're going to return False because we're not making the progress
69
00:04:15,000 --> 00:04:19,000
towards an infinite number of strings that we wanted to.
70
00:04:19,000 --> 00:04:22,000
Otherwise, we're going to have to parse through the rewrite rules
71
00:04:22,000 --> 00:04:25,000
that apply to that symbol current.
72
00:04:25,000 --> 00:04:27,000
I'm going to update my list of visited symbols,
73
00:04:27,000 --> 00:04:30,000
and then I'm going to go through each rewrite rule.
74
00:04:30,000 --> 00:04:36,000
So here, this for loop iterates through the right-hand sides of every rewrite rule
75
00:04:36,000 --> 00:04:38,000
that the current symbol applies to.
76
00:04:38,000 --> 00:04:41,000
Where the current symbol is equal to the left-hand side of that rule,
77
00:04:41,000 --> 00:04:44,000
it returns the right-hand side. Just another way to say it.
78
00:04:44,000 --> 00:04:46,000
I'm going to go through every symbol on the right-hand side,
79
00:04:46,000 --> 00:04:49,000
and I'm going to recurse on that symbol
80
00:04:49,000 --> 00:04:52,000
where the current of the recursive call is the symbol I'm on,
81
00:04:52,000 --> 00:04:57,000
updated symbols I've visited, and I'm updating the size of x and y
82
00:04:57,000 --> 00:05:03,000
that was passed in before with the length of the right-hand side minus 1.
83
00:05:03,000 --> 00:05:08,000
We do the minus 1 to exclude what we're recurring on, the Q.
84
00:05:08,000 --> 00:05:10,000
In this case, Q is symbol.
85
00:05:10,000 --> 00:05:13,000
When I do prove that Q can go back to Q,
86
00:05:13,000 --> 00:05:18,000
I can then check the size of everything else that I found before I found that loop.
87
00:05:18,000 --> 00:05:21,000
If helper is never true, then I'm just going to return False.
88
00:05:21,000 --> 00:05:24,000
So now that I've defined my helper function, I need to use it.
89
00:05:24,000 --> 00:05:29,000
For each symbol Q which this list comprehension returns,
90
00:05:29,000 --> 00:05:31,000
I'm going to try my helper function on that symbol.
91
00:05:31,000 --> 00:05:34,000
If that's ever true, I'm going to return True.
92
00:05:34,000 --> 00:05:38,000
If for every symbol I never find the helper function to be true, I'm going to return False.
93
00:05:38,000 --> 99:59:59,000
And I'm done.
PK 4!iAoa.L , , 4 Problem_Set_3/05-Detecting_Ambiguity_Solution.en.srt1
00:00:00,000 --> 00:00:03,000
In this problem we've been asked to determine
2
00:00:03,000 --> 00:00:09,000
when given a grammar and a string if that string shows that this grammar is ambiguous.
3
00:00:09,000 --> 00:00:11,000
So how are we going to do that?
4
00:00:11,000 --> 00:00:16,000
The key here is going to be in writing out derivations of the string under that grammar.
5
00:00:16,000 --> 00:00:18,000
Let me show you an example.
6
00:00:18,000 --> 00:00:20,000
Here I've written out a grammar.
7
00:00:20,000 --> 00:00:24,000
For the sake of writing out the derivations, I'm going to label each rewrite rule
8
00:00:24,000 --> 00:00:28,000
with a number, just an identifier, so I can refer to this rule as rule 1.
9
00:00:28,000 --> 00:00:33,000
Let's say my given string is the string a and b.
10
00:00:33,000 --> 00:00:38,000
What are some ways I can derive ab given this grammar?
11
00:00:38,000 --> 00:00:42,000
Starting from s, I can take rule--let's do 0.
12
00:00:42,000 --> 00:00:44,000
That takes me to P.
13
00:00:44,000 --> 00:00:48,000
From P I can go to aT using rule 2.
14
00:00:48,000 --> 00:00:51,000
From aT I just need to get b.
15
00:00:51,000 --> 00:00:54,000
And fortunately, using rule 5, T goes to b.
16
00:00:54,000 --> 00:00:58,000
So there's 1 way to derive the string ab using this grammar.
17
00:00:58,000 --> 00:01:03,000
And later on when I do solve this problem, I'm going to refer to this derivation
18
00:01:03,000 --> 00:01:09,000
in the form of a list where I took rules 0, 2, and 5 to get ab.
19
00:01:09,000 --> 00:01:13,000
Now, to determine whether or not the grammar is ambiguous
20
00:01:13,000 --> 00:01:16,000
with regard to this string, I need to try and find another derivation.
21
00:01:16,000 --> 00:01:20,000
So if I take rule 1 instead of rule 0 starting from S,
22
00:01:20,000 --> 00:01:23,000
that gives me a and Q.
23
00:01:23,000 --> 00:01:25,000
From aQ I can take rule 4.
24
00:01:25,000 --> 00:01:29,000
Rule 4 goes to b, giving me ab.
25
00:01:29,000 --> 00:01:33,000
So the derivation is 1, 4.
26
00:01:33,000 --> 00:01:38,000
So I found 2 derivations--these are 2 very different derivations--for the string ab
27
00:01:38,000 --> 00:01:41,000
under this grammar starting from S.
28
00:01:41,000 --> 00:01:45,000
That's kind of a good example, but how does that help us solve the problem?
29
00:01:45,000 --> 00:01:49,000
So, just like an earlier problem in this homework assignment,
30
00:01:49,000 --> 00:01:51,000
we get to make an important assumption.
31
00:01:51,000 --> 00:01:56,000
The important assumption I'm making is that whatever grammars we feed into our function,
32
00:01:56,000 --> 00:01:58,000
they have a finite number of derivations.
33
00:01:58,000 --> 00:02:03,000
This actually means that there's a finite number of strings in the grammars.
34
00:02:03,000 --> 00:02:07,000
So they may not be all that interesting, but nonetheless,
35
00:02:07,000 --> 00:02:10,000
determining whether or not they're ambiguous is a very cool problem.
36
00:02:10,000 --> 00:02:13,000
So what does that allow us to do?
37
00:02:13,000 --> 00:02:17,000
It grants us the ability to get a very simple solution
38
00:02:17,000 --> 00:02:19,000
for detecting whether or not the grammar is ambiguous.
39
00:02:19,000 --> 00:02:23,000
What we're going to do is simply exhaustively enumerate
40
00:02:23,000 --> 00:02:26,000
all the derivations for that grammar.
41
00:02:26,000 --> 00:02:29,000
And the way we're going to do that is take the expand function
42
00:02:29,000 --> 00:02:32,000
that we wrote at the end of the lecture unit.
43
00:02:32,000 --> 00:02:36,000
Instead of enumerating just the strings, we're going to enumerate strings
44
00:02:36,000 --> 00:02:38,000
along with their derivation.
45
00:02:38,000 --> 00:02:45,000
Once we've enumerated all the derivations, we can check more than 1 derivation for that string.
46
00:02:45,000 --> 00:02:49,000
If we have found one, then the grammar is ambiguous.
47
00:02:49,000 --> 00:02:51,000
Otherwise, it's not.
48
00:02:51,000 --> 00:02:56,000
So if you're like me, you're eager to get right to the code. Let's do that.
49
00:02:56,000 --> 00:03:02,000
As the hint described, the first thing we're going to do is write our expand function.
50
00:03:02,000 --> 00:03:05,000
This is going to be taken almost right out of the lecture.
51
00:03:05,000 --> 00:03:08,000
If you don't remember or haven't seen this syntax before,
52
00:03:08,000 --> 00:03:13,000
this says I'm going to take the first and second values out of tokens and derivation
53
00:03:13,000 --> 00:03:16,000
and put them in the tuple (tokens, derivation).
54
00:03:16,000 --> 00:03:21,000
Here I have 2 loops that are going to enumerate each token in tokens
55
00:03:21,000 --> 00:03:23,000
and each rule in the grammar.
56
00:03:23,000 --> 00:03:29,000
I'm going to access them as I want given their position or index.
57
00:03:29,000 --> 00:03:32,000
So this gives me the rule out of the grammar.
58
00:03:32,000 --> 00:03:36,000
This says if the current token applies to this rule,
59
00:03:36,000 --> 00:03:40,000
it's on the left-hand side of this rule, then I'm going to do the following.
60
00:03:40,000 --> 00:03:44,000
I'm going to yield as part of this generator function a tuple
61
00:03:44,000 --> 00:03:50,000
where the first value in the tuple is the current state of the string.
62
00:03:50,000 --> 00:03:53,000
I'm going to make 1 substitution. That's what I'm doing here.
63
00:03:53,000 --> 00:03:55,000
I'm including everything to the left of the current token,
64
00:03:55,000 --> 00:04:00,000
making the 1 substitution, and then I'm including everything to the right of the token.
65
00:04:00,000 --> 00:04:03,000
And then I'm updating the derivation with the rule I used.
66
00:04:03,000 --> 00:04:05,000
So what does that look like?
67
00:04:05,000 --> 00:04:08,000
If I go back to the example I had here, let's say I pass in S.
68
00:04:08,000 --> 00:04:12,000
I am going to go through this list and see every rule that applies to S.
69
00:04:12,000 --> 00:04:18,000
Rule 0 and rule 1 both apply to S, so I'm going to yield 2 things.
70
00:04:18,000 --> 00:04:22,000
First one is going to be P because I'm substituting S for P,
71
00:04:22,000 --> 00:04:25,000
and then I'm going to return the derivation, which is using just rule 0.
72
00:04:25,000 --> 00:04:27,000
That's going to be the first thing I yield.
73
00:04:27,000 --> 00:04:33,000
The second thing I yield is going to be the substitution for the second rule, or rule 1.
74
00:04:33,000 --> 00:04:38,000
And so what I'm going to do in the body of is ambiguous now that I have the expand
75
00:04:38,000 --> 00:04:42,000
is I'm going to go back around and expand this expression even more.
76
00:04:42,000 --> 00:04:47,000
So when I expand this expression, I'm going to get 2 things.
77
00:04:47,000 --> 00:04:50,000
The first one is going to be using rule 2 to get aT,
78
00:04:50,000 --> 00:04:54,000
and the next one is going to be using rule 3 to just get c.
79
00:04:54,000 --> 00:04:57,000
So as you can see, I'm starting to build up all these derivations.
80
00:04:57,000 --> 00:05:00,000
So what I'm going to do in is ambiguous is I'm going to keep building up
81
00:05:00,000 --> 00:05:04,000
all these derivations until I can't expand anymore,
82
00:05:04,000 --> 00:05:07,000
and then I'm going to look at all the derivations
83
00:05:07,000 --> 00:05:09,000
that yield the string I was originally looking for,
84
00:05:09,000 --> 00:05:14,000
all the derivations once fully expanded yield ab,
85
00:05:14,000 --> 00:05:17,000
and then check to see if I got there the same way.
86
00:05:17,000 --> 00:05:20,000
If I got there in 2 different ways, then the grammar is ambiguous
87
00:05:20,000 --> 00:05:24,000
under the string I was looking for. Otherwise, it's not.
88
00:05:24,000 --> 00:05:26,000
So let's go write that function.
89
00:05:26,000 --> 00:05:28,000
I've declared my function.
90
00:05:28,000 --> 00:05:32,000
I've taken a grammar, a starting location, a starting symbol,
91
00:05:32,000 --> 00:05:34,000
and the utterance that I'm looking for.
92
00:05:34,000 --> 00:05:39,000
Enumerated is going to be all the possible strings in the grammar I've built up so far.
93
00:05:39,000 --> 00:05:42,000
Since I'm just going to start with start, the first value in the tuple
94
00:05:42,000 --> 00:05:47,000
is just going to be the string start or just the symbol start--
95
00:05:47,000 --> 00:05:52,000
it's a 1-symbol string--and an empty derivation since I haven't done anything yet.
96
00:05:52,000 --> 00:05:54,000
So what do I have here?
97
00:05:54,000 --> 00:05:59,000
I'm going to take my enumerated list, try to enumerate it more,
98
00:05:59,000 --> 00:06:02,000
try to expand it using the expand function,
99
00:06:02,000 --> 00:06:06,000
and if it changes, then I'm making progress, I'm finding new strings,
100
00:06:06,000 --> 00:06:08,000
I'm expanding the derivations,
101
00:06:08,000 --> 00:06:13,000
and so I should keep going until I've gone through all the finite number of derivations.
102
00:06:13,000 --> 00:06:16,000
If this hasn't been updated, I haven't done anything new,
103
00:06:16,000 --> 00:06:18,000
the expand function didn't find anything new,
104
00:06:18,000 --> 00:06:20,000
I'm just going to break because I'm done.
105
00:06:20,000 --> 00:06:23,000
I found all the derivations I was looking for.
106
00:06:23,000 --> 00:06:29,000
So what I'm doing here is I'm going through each entry in my enumerated list.
107
00:06:29,000 --> 00:06:33,000
I'm going to try expanding that entry.
108
00:06:33,000 --> 00:06:38,000
If I found something new, then I'm going to add it to my new enumerated list.
109
00:06:38,000 --> 00:06:40,000
Otherwise, I'm not going to do anything.
110
00:06:40,000 --> 00:06:43,000
So what this is going to do is it's going to update new enumerated
111
00:06:43,000 --> 00:06:46,000
for every new thing expand finds.
112
00:06:46,000 --> 00:06:52,000
So once I've done all that, I'm going to count how many derivations I found
113
00:06:52,000 --> 00:06:54,000
for my utterance.
114
00:06:54,000 --> 00:06:59,000
And if I found more than 1, then I've shown the grammar to be ambiguous.
115
00:06:59,000 --> 00:07:01,000
I'm doing that by doing a list comprehension
116
00:07:01,000 --> 00:07:06,000
that essentially filters out everything that's not related to the utterance
117
00:07:06,000 --> 00:07:08,000
and then taking the length of that list.
118
00:07:08,000 --> 00:07:12,000
If it's greater than 1, then I've found 2 derivations. Otherwise, I haven't.
119
00:07:12,000 --> 99:59:59,000
So that's how you solve this problem.
PK 0!iA U 8 Problem_Set_3/01-Formal_Grammar_Concepts_Solution.en.srtPK 1!iA# # 8 Problem_Set_3/02-Reading_Machine_Minds_2_Solution.en.srtPK 3!iAOj j 1 A Problem_Set_3/03-Making_Ambiguity_Solution.en.srtPK 3!iA̅# # 6 <\ Problem_Set_3/04-Infinite_Mind_Reading_Solution.en.srtPK 4!iAoa.L , , 4 q Problem_Set_3/05-Detecting_Ambiguity_Solution.en.srtPK