Programming Exercise 1: Gradient Descent for Linear regression
Notification: This is a simplified code example, if you are attempting this class, don’t copy & submit since it won’t even work…
1
2
3
4
5
6
7
data = load ( 'ex1data1.txt' );
X = data (:, 1 ); y = data (:, 2 );
X = [ ones ( m , 1 ), data (:, 1 )];
theta = zeros ( 2 , 1 );
iterations = 1500 ;
alpha = 0.01 ;
This is done before adding a column to X, which representing x 0 x_0 x 0
1
2
3
4
5
6
7
8
9
10
11
12
function [X_norm, mu, sigma] = featureNormalize ( X)
X_norm = X ;
mu = zeros ( 1 , size ( X , 2 ));
sigma = zeros ( 1 , size ( X , 2 ));
mu = mean ( X );
sigma = std ( X );
X_norm = ( X . - mu ) ./ sigma
end
1
2
3
4
5
6
7
8
9
function [theta, J_history] = gradientDescent ( X, y, theta, alpha, num_iters)
m = length ( y );
for iter = 1 : num_iters
summary = X ' * ( X * theta . - y );
theta = theta . - alpha * ( 1 / m ) * summary ;
end
end
How gradient works:
X θ − y = [ x 1 x 2 ⋯ x m ] [ θ 1 θ 2 ⋮ θ n ] − [ y 1 y 2 ⋮ y n ] = [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ]
\mathbf{X} \mathbf{\theta} - \mathbf{y}
= \begin{bmatrix}\mathbf{x_1} & \mathbf{x_2} & \cdots & \mathbf{x_m} \end{bmatrix} \begin{bmatrix}\theta_1 \\ \theta_2 \\ \vdots \\ \theta_n \end{bmatrix} - \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix}
= \begin{bmatrix}h_\theta(\mathbf{x}^{(1)}) - y^{(1)} \\ h_\theta(\mathbf{x}^{(2)}) - y^{(2)} \\ \vdots \\ h_\theta(\mathbf{x}^{(m)}) - y^{(m)} \end{bmatrix} X θ − y = [ x 1 x 2 ⋯ x m ] ⎣ ⎡ θ 1 θ 2 ⋮ θ n ⎦ ⎤ − ⎣ ⎡ y 1 y 2 ⋮ y n ⎦ ⎤ = ⎣ ⎡ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ⎦ ⎤
( X T ) ∗ ( X θ − y ) = [ x 1 x 2 ⋯ x m ] ∗ [ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ] = [ ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i ) ) ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) x 2 ( i ) ) ⋯ ∑ i = 1 m ( ( h θ ( x ( i ) ) − y ( i ) ) x i ( i ) ) ]
\begin{align}
(\mathbf{X}^T) * (\mathbf{X} \mathbf{\theta} - \mathbf{y}) \\
& = \begin{bmatrix}\mathbf{x_1} \\ \mathbf{x_2} \\ \cdots \\ \mathbf{x_m} \end{bmatrix} *
\begin{bmatrix}h_\theta(\mathbf{x}^{(1)}) - y^{(1)} \\ h_\theta(\mathbf{x}^{(2)}) - y^{(2)} \\ \vdots \\ h_\theta(\mathbf{x}^{(m)}) - y^{(m)} \end{bmatrix}
& = \begin{bmatrix} \sum\limits_{i=1}^{m}((h_\theta(\mathbf{x}^{(i)}) - y^{(i)}) x_1^{(i)}) \\ \sum\limits_{i=1}^{m}((h_\theta(\mathbf{x}^{(i)}) - y^{(i)}) x_2^{(i)}) \\ \cdots \\ \sum\limits_{i=1}^{m}((h_\theta(\mathbf{x}^{(i)}) - y^{(i)}) x_i^{(i)}) \end{bmatrix}
\end{align}
( X T ) ∗ ( X θ − y ) = ⎣ ⎡ x 1 x 2 ⋯ x m ⎦ ⎤ ∗ ⎣ ⎡ h θ ( x ( 1 ) ) − y ( 1 ) h θ ( x ( 2 ) ) − y ( 2 ) ⋮ h θ ( x ( m ) ) − y ( m ) ⎦ ⎤ = ⎣ ⎡ i = 1 ∑ m (( h θ ( x ( i ) ) − y ( i ) ) x 1 ( i ) ) i = 1 ∑ m (( h θ ( x ( i ) ) − y ( i ) ) x 2 ( i ) ) ⋯ i = 1 ∑ m (( h θ ( x ( i ) ) − y ( i ) ) x i ( i ) ) ⎦ ⎤