朴素贝叶斯
朴素贝叶斯(Naive Bayes) 如此蛤意盎然的算法, 居然一直没写
原理
-
训练集: $T = \lbrace (X^{(1)}, y_1), (X^{(2)}, y_2), \cdots, (X^{(N)}, y_N) \rbrace$, 其中 $ X = (X_1, X_2, \cdots, X_n), y \in \lbrace c_1, c_2, \cdots, c_K \rbrace $
-
学习过程: 根据贝叶斯模型, 需要学习的参数有 $$ P(y = c_k) = \frac{\sum\limits^{N}_{i=1}I(y_i = c_k)}{N} \tag{1}$$ $$ \begin{align} P(X_j = X^{(i)}_j | y = c_k) & = \frac{P(X_j = X^{(i)}_j, y = c_k)}{P(y = c_k)} \\ & = \frac{\sum\limits^{N}_{i=1}I(X_j = X^{(i)}_j, y_i = c_k)} {\sum\limits^{N}_{i=1}I(y_i = c_k)} \end{align} \tag{2} $$ 此外, $$ P(X = X^{(i)} | y = c_k) = \prod\limits^{n}_{j=1} P(X_j = X^{(i)}_j | y = c_k) \tag{3}$$
其中(3)式利用了朴素贝叶斯的条件独立性假设
- 预测过程: 对于给定的实例$X = A$ $$ \begin{align} P(y = c_k | X = A) &= \frac{P(X = A | y = c_k)P(y = c_k)}{P(X = A)} \\ & = \frac{P(X = A | y = c_k)P(y = c_k)}{\sum\limits_{k} P(X = A | y = c_k)P(y = c_k)} \end{align}$$ 由于分母对任意$c_k$都一样, 因此贝叶斯分类结果是 $$ y = arg \max\limits_{c_k} P(X = A | y = c_k)P(y = c_k) \tag{4}$$
贝叶斯估计
如果样本比较少, 出现$P(X_j = X^{(i)}_j | y = c_k) = 0$ 的情况, 那整个$P(X = X^{(i)} | y = c_k) = 0$, 这会影响后验概率的计算结果. 使用常数项并保持概率分布的特性可以解决这一问题
$$ P_{\lambda}(X_j = X^{(i)}_j | y = c_k) = \frac{\sum\limits^{N}_{i=1}I(X_j = X^{(i)}_j, y_i = c_k) + \lambda}{\sum\limits^{N}_{i=1}I(y_i = c_k) + S_j\lambda} $$
$$ P(y = c_k) = \frac{\sum\limits^{N}_{i=1}I(y_i = c_k) + \lambda}{N + K\lambda} $$
其中$S_j$代表$X$的第$j$个特征$X_j$有多少种可能的取值. 一般$\lambda$取1, 称为拉普拉斯平滑