Regular Expressions

This section will be about Regular Expressions. You will first learn what regular expressions are and how they are different from DFAs and NFAs. We will go through examples where we find the language represented by a given regular expression and where we write down a regular expression to denote a certain language. Finally, we will see the equivalence between FAs and regular expressions. You can find the notes related to this section here.

What are regular expressions?

What exactly is a regular expression? You may have already encountered them in the context of a programming class. They are a type of mathematical object that denote a certain language. They do this by establishing a “pattern” that all strings in the denoted language must obey. They are not language acceptors (like DFAs and NFAs were) nor are they language generators (which we will see is a grammar’s role). The rules for building regular expressions are recursive. This means that we determine whether a given expression is a valid regular expression by checking whether each of its sub-expressions are valid. Once we can distinguish between valid and invalid regular expressions, we can start talking about what language they describe (which I use interchangeably with denote). Given a regular expression \(r\), we will refer to its described language as \(L(r)\). To extract the language that \(r\) describes, we will also do so in a recursive manner. I discuss this and an example in the following videos.

This is the introductory video.

And this video describes how to determine the language denoted by a certain regular expression. In this case, \((a+bb)(ab)^*\).

Okay, now that we have a basic understanding of regular expressions, let’s look at an exercise where we try to find a (why not the?) regular expression that denotes a particular language. In this exercise, we find a regular expression for the language \(L = \{ w \in \{0, 1\}^* : w \text{ contains the substring } 101\}\).

More exercises for identifying the languages denoted by regular expressions as well as writing regular expressions to denote particular languages in the Exercises section.

Equivalence between regular expressions and regular languages

Now that we are a bit more familiar with regular expressions, you might be wondering what exactly is their connection to regular languages. After all, both of these terms have “regular” in them, is that just a wild coincidence? (I honestly don’t know the exact answer to that, woops.) In turns out that regular expressions are exactly equivalent to DFAs and NFAs. That is, any language that is accepted by an FA can be described by a regular expression and vice-versa! This is quite an important result because it means that to prove that a language is regular you can now either 1. Create a DFA that accepts it, 2. Create an NFA that accepts or 3. Create a regular expression that denotes it. This is great because sometimes it is easier to design an FA for a particular language and other times a regular expression is more convenient. Now, how is it possible for the family of languages denoted by regular expressions, let’s call this \(L_{REX}\), to be exactly the same as the family of language, \(L_{NFA}\), accepted by NFAs (and, by extension, DFAs)? Because every regular expression can be converted to an equivalent (N)FA and vice-versa! More in the following videos.

Converting regular expressions to NFAs.
Converting NFAs to regular expressions.

Exercises

Exercise. Given the regular expressions \(r = (0+01)(1+11)^*\), how many strings of length 3 or less are there in \(L(r)\)?

Exercise. What is the the shortest string in \(L(((a+\lambda)\emptyset^* + b + \emptyset)^*) \cap L(a(ba)^*)\)?

Exercise. Write a regular expression \(r\) such that \(L(r) = \{vwv: |v| = 2, v,w \in \{a,b\}^*\}\).

Exercise. Write a regular expression \(r\) such that \(L(r) = \{w \in \{0, 1\}^* : |w| \mod 2 = 0\}\). (I also do \(|w| \mod 2 = 1\) in the solution video.)

Exercise. Find an NFA \(M\) equivalent to the regular expression \(r = (abb)^* + a^*bb^*\). By equivalent we mean that \(L(M) = L(r)\).

Exercise. Find a regular expression that denotes the language \(L = \{ w \in \{0,1\}^* : n_0(w) \mod 2 = 0, n_1(w) \mod 2 = 0\}\).

Exercise. In the introductory video, an example of an invalid regular expression I gave had something like \((abb)^R\) in it where \(R\) was the "reversal" operator. This was indeed not a standard valid regular expression because it wasn't clear what the reversal operator applied to a regular expression would give you. This question asks you to provide a formal definition of this operator. That is, provide a (recursive) definition of the regular expression reversal operation and show how, based on your definition, \(L(r^R) = (L(r))^R\).