November 28, 2014

Gradient Descent Algorithm for Linear Regression

For this exercise I will use the data-set from this blogpost and compare the results to those of the lm() function used to calculate the linear model coefficients therein.

To understand the underlying math for the gradient descent algorithm, turn to page 5 of these notes, from Prof. Andrew Ng.

Here’s the code that automates the algorithm:

alligator = data.frame(
    lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
                 3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
    lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
                 3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
X<-as.matrix(alligator$lnLength)
X<-cbind(1,X)
y<-as.matrix(alligator$lnWeight)
theta <- matrix(c(0,0), nrow=2)
alpha <- 0.1
for(i in 1:100000){
    theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
    theta<-theta_n    
}

Here’s the output of the gradient descent algorithm:

> theta
          [,1]
[1,] -8.476067
[2,]  3.431098

This matches with the output of the lm() function.

We can use the same piece of code for multiple linear regression, as it is in a general form and applies for multiple features in the matrix X.

Let’s try it out on the iris data-set.

data(iris)

X<-as.matrix(iris[,c(2:4)])
X<-cbind(1,X)
colnames(X)[1]<-"X0"

y<-as.matrix(iris$Sepal.Length)

theta <- matrix(c(0,0), nrow=ncol(X))
alpha <- 0.01

for(i in 1:100000){

    theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
    theta<-theta_n    
}

The output from the above code:

> theta
                   [,1]
X0            1.8558851
Sepal.Width   0.6508652
Petal.Length  0.7091473
Petal.Width  -0.5565092

and if we do the same multiple linear regression using the lm() function, here’s how we would do it:

fit<-lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,data=iris)

The output from the lm() function is:

> fit

Call:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, 
    data = iris)

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length   Petal.Width  
      1.8560        0.6508        0.7091       -0.5565

Again, the outputs match, and my first machine learning algorithm seems to be working! :)

Sanket

Kudos

Gradient Descent Algorithm for Linear Regression

Now read this

The News-Vendor Problem: Discrete Demand Case