Support Vector Machine(2.2)-Solving dual problem for soft margin maximization

2016-05-12 374 words One minute

Contents

SVM之软间隔最大化

Recall from the 1st section, we have: $$ \begin{align} & \min\limits_{w, \xi} \quad \frac{1}{2} {\Vert w \Vert}^2 + C \sum\limits_{i=1}^{N}\xi_i \\ s.t. \quad & y^{(i)} \left ( w \cdot x^{(i)} + {b} \right ) + \xi_i - 1\ge 0 \ & \xi_i \ge 0\tag{1.10} \end{align}$$ For the constraint condition and target function in (1.10), using the method of Lagrange multipliers, (1.10) could be represented as:

$$ \begin{align} & L(w,b,\xi,\alpha,\mu) = \frac{1}{2} {\Vert w \Vert}^2 + C \sum\limits_{i=1}^{N}\xi_i + \sum\limits_{i=1}^{N} \alpha_i (1 -\xi_i - y^{(i)}(w \cdot x^{(i)} + b)) + \sum\limits_{i=1}^{N} \mu_i(-\xi_i) \tag{3.1} \end{align}$$

Solving $\min\limits_{w,b} L(w,b,\xi,\alpha,\mu)$

$$ \nabla_w L(w,b,\xi,\alpha,\mu) = w - \sum\limits_{i=1}^{N} \alpha_i y^{(i)} x^{(i)} = 0 \\ \nabla_b L(w,b,\xi,\alpha,\mu) = \sum\limits_{i=1}^{N} \alpha_i y^{(i)} = 0 \\ \nabla_{\xi_i} L(w,b,\xi,\alpha,\mu) = C-\alpha_i - \mu_i = 0 \tag{3.2}$$

Luckily it has the same form as (2.4) and (2.5), see the next section below.

Solving $\alpha$ for $\max\limits_{\alpha} \ \min\limits_{w,b,\xi} L(w,b,\xi,\alpha,\mu) $

The dual problem for soft margin maximization is: $ \begin{align} & \min\limits_{\alpha} \quad \frac{1}{2} \sum\limits_{i=1}^{N} \sum\limits_{j=1}^{N} \alpha_i \alpha_j y^{(i)} y^{(j)} (x^{(i)} \cdot x^{(j)}) - \sum\limits_{i=1}^{N} \alpha_i \end{align} \tag{3.2}$

$ s.t. \quad \sum\limits_{i=1}^{N} \alpha_i y^{(i)} = 0, \quad \alpha_i \ge 0, \quad \mu_i \ge 0, \quad C- \alpha_i-\mu_i =0 \tag{3.3} $

(3.3) could be simplified as:

$$ s.t. \quad \sum\limits_{i=1}^{N} \alpha_i y^{(i)} = 0, \quad 0 \le \alpha_i \le C \tag{3.4}$$

Solving $w,b$

Once $\alpha^{*}$ is solved, we have: $$ w^{*} = \sum\limits_{i=1}^{N} \alpha_i^{*} y^{(i)} x^{(i)} \\ b^{*} = y^{(j)} - \sum\limits_{i=1}^{N} \alpha_i^{*} y^{(i)} (x^{(i)} \cdot x^{(j)}) \tag{3.5}$$ in which the $(x^{(j)}, y^{(j)})$ satisfies: $ 0 \le \alpha_j \le C $, which contains $ 1 - \xi_i -y^{(j)}(w^{*} \cdot x^{(j)} + b^{*})) = 0 $, which means it’s on the classification border ‘belt’.

Hinge loss function

Let $\quad [ 1 - y^{(i)} \left ( w \cdot x^{(i)} + {b} \right ) ]_+ = \xi_i $, (1.10) could also be represented as

$$\min\limits_{w,b} \quad \sum\limits_{i=1}^N \xi_i + \lambda {\Vert w \Vert}^2 \tag{3.6}$$

Let $\lambda = \frac{1}{2C}$, then it is equivalent to (1.10), so the loss function could also be written as the hinge loss function:

$$ \sum\limits_{i=1}^N[ 1 - y^{(i)} \left ( w \cdot x^{(i)} + {b} \right ) ]_+ + \lambda {\Vert w \Vert}^2 \tag{3.7}$$