Gradient Descent Algorithm for Linear Regression

For this exercise I will use the data-set from this blogpost and compare the results to those of the lm() function used to calculate the linear model coefficients therein.

To understand the underlying math for the gradient descent algorithm, turn to page 5 of these notes, from Prof. Andrew Ng.

Here’s the code that automates the algorithm:

alligator = data.frame(
    lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
                 3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
    lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
                 3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
X<-as.matrix(alligator$lnLength)
X<-cbind(1,X)
y<-as.matrix(alligator$lnWeight)
theta <- matrix(c(0,0), nrow=2)
alpha <- 0.1
for(i in 1:100000){
    theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
    theta<-theta_n    
}

Here’s the output of the gradient descent algorithm:

> theta
          [,1]
[1,] -8.476067
[2,]  3.431098

This matches with the output of the lm() function.

We can use the same piece of code for multiple linear regression, as it is in a general form and applies for multiple features in the matrix X.

Let’s try it out on the iris data-set.

data(iris)

X<-as.matrix(iris[,c(2:4)])
X<-cbind(1,X)
colnames(X)[1]<-"X0"

y<-as.matrix(iris$Sepal.Length)

theta <- matrix(c(0,0), nrow=ncol(X))
alpha <- 0.01

for(i in 1:100000){

    theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
    theta<-theta_n    
}

The output from the above code:

> theta
                   [,1]
X0            1.8558851
Sepal.Width   0.6508652
Petal.Length  0.7091473
Petal.Width  -0.5565092

and if we do the same multiple linear regression using the lm() function, here’s how we would do it:

fit<-lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,data=iris)

The output from the lm() function is:

> fit

Call:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, 
    data = iris)

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length   Petal.Width  
      1.8560        0.6508        0.7091       -0.5565  

Again, the outputs match, and my first machine learning algorithm seems to be working! :)

Sanket

 
10
Kudos
 
10
Kudos

Now read this

The News-Vendor Problem: Discrete Demand Case

Last Sunday, I came across very interesting articles and applications of the News-vendor problem and decided to write R code to automate various cases of the same. Here is a start. In the simplest category of the News-vendor problem,... Continue →