Support Vector Machine(2.1)-Solving dual problem for hard margin maximization
SVM之硬间隔最大化
In the last section, (1.8) is a convex quadratic programming problem. Using the method of Lagrange multipliers, (1.8) could be represented as: $$ L(w,b,\alpha) = \frac{1}{2} {\Vert w \Vert}^2 + \sum\limits_{i=1}^{N} \alpha_i (1-y^{(i)}(w \cdot x^{(i)} + b)), \alpha_i \ge 0 \tag{2.1}$$
The dual problem is: $$\max\limits_{\alpha} \ \min\limits_{w,b} L(w,b,\alpha) \tag{2.2}$$
Solving $\min\limits_{w,b} L(w,b,\alpha)$
$$ \nabla_w L(w,b,\alpha) = w - \sum\limits_{i=1}^{N} \alpha_i y^{(i)} x^{(i)} = 0 \\ \nabla_b L(w,b,\alpha) = \sum\limits_{i=1}^{N} \alpha_i y^{(i)} = 0 \tag{2.3}$$
Combine (2.3) with (2.1), we have the dual problem:
$$ \begin{align}
& \max\limits_{\alpha} \left\lbrace - \frac{1}{2} \sum\limits_{i=1}^{N} \sum\limits_{j=1}^{N} \alpha_i \alpha_j y^{(i)} y^{(j)} (x^{(i)} \cdot x^{(j)}) + \sum\limits_{i=1}^{N} \alpha_i \right\rbrace \\
& s.t. \quad \sum\limits_{i=1}^{N} \alpha_i y^{(i)} = 0, \alpha_i \ge 0
\end{align} \tag{2.4}$$
Solving $\alpha$ for $\max\limits_{\alpha} \ \min\limits_{w,b} L(w,b,\alpha) $
$$ \begin{align}
& \min\limits_{\alpha} \left\lbrace \frac{1}{2} \sum\limits_{i=1}^{N} \sum\limits_{j=1}^{N} \alpha_i \alpha_j y^{(i)} y^{(j)} (x^{(i)} \cdot x^{(j)}) - \sum\limits_{i=1}^{N} \alpha_i \right\rbrace \\
& s.t. \quad \sum\limits_{i=1}^{N} \alpha_i y^{(i)} = 0, \alpha_i \ge 0
\end{align} \tag{2.5}$$
The solved parameter $\alpha^{*}$ should satisfy the KKT condition (2.3), (2.4) and $ \alpha_i^{*} (1-y^{(i)}(w^{*} \cdot x^{(i)} + b^{*})) = 0 $
Solving $w,b$
Once $\alpha^{*}$ is solved, we have: $$ w^{*} = \sum\limits_{i=1}^{N} \alpha_i^{*} y^{(i)} x^{(i)} \\ b^{*} = y^{(j)} - \sum\limits_{i=1}^{N} \alpha_i^{*} y^{(i)} (x^{(i)} \cdot x^{(j)}) \tag{2.6}$$ in which the $(x^{(j)}, y^{(j)})$ satisfies: $ 1-y^{(j)}(w^{*} \cdot x^{(j)} + b^{*}) = 0 $, which means it’s on the classification border.
Notice that when $\alpha_i = 0$, $(x^{(i)}, y^{(i)})$ has no contribution in deciding the $w$ and $b$, which means that only a few points on the border decides the SVM with their $\alpha_i > 0$, and $ 1-y^{(i)}(w^{*} \cdot x^{(i)} + b^{*}) = 0 $, they are support vectors.