substring search is to find a single string in text.
pattern matching is to find one of a specified set of strings in text
a regular expression is a notation to specify a set of strings. (possibly infinite)
writing a RE is like writing a program.
it needs to understand programming model, can be easier to write than read, can be difficult to debug.
for any DFA, there exists a RE that describes the same set of strings.
for any RE, there exists a DFA that recognizes the same set of strings
DFA have no backup in text input stream, and linear time guarantee.
but to build DFA from RE is infeasible because DFA may have exponential number of states.
so we could use NFA, it is Quadratic time guarantee (linear-time typical)
NFA is for nondeterministic finite state automata
DFA. Deterministic ? easy because exactly one applicable transition.
NFA. Nondeterministic ? can be several applicable transitions;
need to select the right one!
Q. How to simulate NFA?
A. Systematically consider all possible transition sequences.
NFA representation
how to efficiently simulate an NFA?
Maintain set of all possible states that NFA could be in after reading in the first i text characters
NFA construction
for parentheses. add ε-transition edge from parentheses to next state.
RE Application
QUIZ
Q1 Challenging REs.
check answer in my github https://github.com/yixuaz/algorithm4-princeton-cos226
Q2 Exponential-size DFA
check answer in my github https://github.com/yixuaz/algorithm4-princeton-cos226
Q3 Extensions to NFA
Add to NFA.java the ability to handle multiway or, wildcard, and the + closure operator.
check answer in my github https://github.com/yixuaz/algorithm4-princeton-cos226