Regular Expression Matching

Description

Given an input string (s) and a pattern (p), implement regular expression matching with support for ‘.’ and ‘*’.

1
2
3
'.' Matches any single character.
'*' Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).

Note:

  • s could be empty and contains only lowercase letters a-z.
  • p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example :

1
2
3
4
5
Input:
s = "aab"
p = "a*b"
Output: true
Explanation: "a*" does not match the entire string "aa".

Backtracking Solution

We define search(si, pi) as the search function, which means if s[si:] is matched by p[pi:].

First we need to check if p[pi] is a wildcard: p[pi+1] == "*", and don’t forget the corner case: pi+1 < len(p)

  1. Not wildcard: Just check if s[si] and p[pi] is same character: si < len(s) and s[si] == p[pi]
    • If it is, search next level: search(si+1, pi+1)
    • If not, just return False, search ends here.
  2. Wildcard: Same as step 1, check if it’s the same character:
    • If it is, that’s a valid wildcard match. But we could still choose to match or skip this wildcard: search(si+1, pi) or search(si, pi+2)
    • If not, the only choice is to skip this wildcard: search(si, pi+2)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        def search(si: int, pi: int) -> bool:
            if pi == len(p): 
                return si == len(s)
            wildcard: bool = pi+1 < len(p) and p[pi+1] == "*"
            match: bool = si < len(s) and p[pi] in {s[si], "."}
            if wildcard:
                if match:
                    return search(si+1, pi) or search(si, pi+2)
                else:
                    return search(si, pi+2)
            else:
                if march:
                    search(si+1, pi+1)
                else:
                    return False
        return search(0, 0)

Some conditional branches could be merged by and operator.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        def search(si: int, pi: int) -> bool:
            if pi == len(p): 
                return si == len(s)
            wildcard: bool = pi+1 < len(p) and p[pi+1] == "*"
            match: bool = si < len(s) and p[pi] in {s[si], "."}
            if wildcard:
                return (match and search(si+1, pi)) or search(si, pi+2)
            else:
                return match and search(si+1, pi+1)
        return search(0, 0)
  • The code could be easily optimized using a cache to avoid duplicate calculation. See Optimize Recursion.
  • Always pay attention to corner cases in recursion.
  • Below is an example for matching aab and c*a*ab, with the dicision tree of the backtracking search path.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
graph TD;
  subgraph DecisionTree
  DROOT(Wildcard?)-->|Y|DL(char?);
  DL-->|Y|DLL(match?);
  DLL-->|Y|DLLL(si+1, pi);
  DLL-->|N|DLLR(si, pi+2);
  DL-->|N|DLR(si, pi+2);
  DROOT-->|N|DR(char?);
  DR-->|Y|DRL(si+1, pi+1);
  DR-->|N|DRR(False);
  end
  subgraph MatchExample
  ROOT(aab,c*a*ab)-->|skip c*|L(aab, a*ab);
  L-->|match a*|LL(ab,a*ab)
  L-->|skip a*|LR(aab,ab);
  LL-->|match a*|LLL(b,a*ab);
  LLL-->|skip a*|LLLL(b,ab);
  LL-->|skip a*|LLR(ab,ab);
  LR-->|match a|LRL(ab,b);
  LLR-->|match a|LLRL(b,b);
  LLRL-->|match b|LLRLL([END]);
  end

Dyamic Programming

First we can construct state transision from the backtracking method. Jut let search(si, pi) to be dp[si][pi]

1
2
3
4
dp[si][pi] <--------------- dp[si][pi+2]
    ^      \
    |       \  
dp[si+1][pi] dp[si+1][pi+1]
  • We have a (len(s) + 1, len(p) +1) matrix, the iteration order is described below.
  • Starting from dp[len(s)][len(p)-1], we need to get dp[0][0]
  • Set dp[len(s)][len(p)] = True, which indicates our backtracking terminal condition(find a match).
s\p c * a * a b
a end F
a F
b F
start T

The core logic is pretty straightforward like backtracking solution:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
class Solution:
    def isMatch(self, s: str, p: str) -> bool:
        dp: List[List[bool]] = [[False]*(len(p)+1) for _ in range(len(s)+1)]
        dp[len(s)][len(p)] = True
        for si in range(len(s), -1, -1):
            for pi in range(len(p)-1, -1, -1):
                wildcard: bool = pi+1 < len(p) and p[pi+1] == "*"
                match: bool = si < len(s) and p[pi] in {s[si], "."}
                if wildcard:
                    dp[si][pi] = (match and dp[si+1][pi]) or dp[si][pi+2]
                else:
                    dp[si][pi] = match and dp[si+1][pi+1]
        return dp[0][0]

Notes:

  1. This is similar to Distinct Subsequences