This is about calculation Levenshtein Distance
Levenshtein Distance
Check Levenshtein Distance for information.
Let $\operatorname{lev}_{a,b}(i,j)$ be the distance between the first $i$ characters of $a$ and the first $j$ characters of $b$. $i$ and $j$ are 1-based indices. We have:
$$ \operatorname {lev}_{a,b}(i,j) = \begin{cases} \max(i,j) & \text{ if $min(i,j)=0$}, \\ \min {\begin{cases} \operatorname {lev}_{a,b}(i-1,j)+1 \\ \operatorname {lev}_{a,b}(i,j-1)+1 \\ \operatorname {lev}_{a,b}(i-1,j-1)+1_{(a_{i} \neq b_{j})} \end{cases}} &{\text{otherwise.}} \end{cases} $$
Explanation
For simplification, Let dp[i][j]
represent $\operatorname{lev}_{a,b}(i,j)$.
We have 3 ways to reach dp[i][j]
. Take a='abxy', b='abc'
as an example:
- From
dp[i-1][j]
: Havingabx
->abc
, we need one extra del step:abxy
->abcy
(dely
)->abc
- From
dp[i][j-1]
: Havingabxy
->ab
, we need one extra add step:abxy
->ab
(addc
)->abc
- From
dp[i-1][j-1]
: Havingabx
->ab
, we need one extra replace step:abxy
->aby
(replacey
toc
)->abc
- If
a[i] == b[j]
, We could skip the replace step.
DP Solution
Then we consider the corner cases. We have 0-indexing system, in which situation we can achive -1
indexing.
- If
i = -1
orj = -1
, then we need to convert an empty string to another non-empty string, by inserting, the cost is the length of the non-empty string, in the above example, wheni=-1
andj=1
, we convert''
toab
, the cost is 2,dp[i][j]
=j+1
=2
- If
i = -1
andj = -1
, we convert an empty string to another empty string,dp[-1][-1]
=0
There are two strategies to take care of these situations.
- Using a custom indexing system to take care of index overflow. We could use the dp as it was.
- Taking care of the indexing on the fly. This is less readable than method 1, personally not recommended.
- Add an extra row and column, start from (1,1) instead of (0,0), which has pros and cons compared to method 1.
We use method 1:
|
|
This method has problem to deal with empty string, in which we generate the result in index()
on the fly, which is out of the dp table itself. Try method 3:
|
|
Notes:
- word index changes from
i
,j
toi-1
,j-1
in method 3. - We may optimize the space of the code to use only two vectors, or even one.