Contents

吴恩达 ML 公开课笔记(5)-Programming Exercise 1

Programming Exercise 1: Gradient Descent for Linear regression

Gradient Descent for Linear regression

Notification: This is a simplified code example, if you are attempting this class, don’t copy & submit since it won’t even work…

Step 1 - Load & Initialize Data

1
2
3
4
5
6
7
data = load('ex1data1.txt');
X = data(:, 1); y = data(:, 2);
X = [ones(m, 1), data(:,1)];
theta = zeros(2, 1);

iterations = 1500;
alpha = 0.01;

Step 2 - Feature Normalize

This is done before adding a column to X, which representing $x_0$

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
function [X_norm, mu, sigma] = featureNormalize(X)

    X_norm = X;
    mu = zeros(1, size(X, 2));
    sigma = zeros(1, size(X, 2));

    mu = mean(X);
    sigma = std(X);

    X_norm = (X .- mu)./sigma

end

Step 3 - Gradient Descent

1
2
3
4
5
6
7
8
9
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)

    m = length(y);

    for iter = 1:num_iters
        summary = X' * (X*theta .- y);
        theta = theta .- alpha * (1/m) * summary;
    end
end

How gradient works: $$ \mathbf{X} \mathbf{\theta} - \mathbf{y} = \begin{bmatrix}\mathbf{x_1} & \mathbf{x_2} & \cdots & \mathbf{x_m} \end{bmatrix} \begin{bmatrix}\theta_1 \\ \theta_2 \\ \vdots \\ \theta_n \end{bmatrix} - \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix}h_\theta(\mathbf{x}^{(1)}) - y^{(1)} \\ h_\theta(\mathbf{x}^{(2)}) - y^{(2)} \\ \vdots \\ h_\theta(\mathbf{x}^{(m)}) - y^{(m)} \end{bmatrix}$$ $$ \begin{align} (\mathbf{X}^T) * (\mathbf{X} \mathbf{\theta} - \mathbf{y}) \\ & = \begin{bmatrix}\mathbf{x_1} \\ \mathbf{x_2} \\ \cdots \\ \mathbf{x_m} \end{bmatrix} * \begin{bmatrix}h_\theta(\mathbf{x}^{(1)}) - y^{(1)} \\ h_\theta(\mathbf{x}^{(2)}) - y^{(2)} \\ \vdots \\ h_\theta(\mathbf{x}^{(m)}) - y^{(m)} \end{bmatrix} & = \begin{bmatrix} \sum\limits_{i=1}^{m}((h_\theta(\mathbf{x}^{(i)}) - y^{(i)}) x_1^{(i)}) \\ \sum\limits_{i=1}^{m}((h_\theta(\mathbf{x}^{(i)}) - y^{(i)}) x_2^{(i)}) \\ \cdots \\ \sum\limits_{i=1}^{m}((h_\theta(\mathbf{x}^{(i)}) - y^{(i)}) x_i^{(i)}) \end{bmatrix} \end{align} $$