Gradient Descent Algorithm for Linear Regression

For this exercise I will use the data-set from this blogpost and compare the results to those of the lm() function used to calculate the linear model coefficients therein.

To understand the underlying math for the gradient descent algorithm, turn to page 5 of these notes, from Prof. Andrew Ng.

Here’s the code that automates the algorithm:

alligator = data.frame(
    lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
                 3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
    lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
                 3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
X<-as.matrix(alligator$lnLength)
X<-cbind(1,X)
y<-as.matrix(alligator$lnWeight)
theta <- matrix(c(0,0), nrow=2)
alpha <- 0.1
for(i in 1:100000){
    theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
    theta<-theta_n    
}

Here’s the output of the gradient descent algorithm:

> theta
          [,1]
[1,] -8.476067
[2,]  3.431098

This matches with the output of the lm() function.

We can use the same piece of code for multiple linear regression, as it is in a general form and applies for multiple features in the matrix X.

Let’s try it out on the iris data-set.

data(iris)

X<-as.matrix(iris[,c(2:4)])
X<-cbind(1,X)
colnames(X)[1]<-"X0"

y<-as.matrix(iris$Sepal.Length)

theta <- matrix(c(0,0), nrow=ncol(X))
alpha <- 0.01

for(i in 1:100000){

    theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
    theta<-theta_n    
}

The output from the above code:

> theta
                   [,1]
X0            1.8558851
Sepal.Width   0.6508652
Petal.Length  0.7091473
Petal.Width  -0.5565092

and if we do the same multiple linear regression using the lm() function, here’s how we would do it:

fit<-lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,data=iris)

The output from the lm() function is:

> fit

Call:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width, 
    data = iris)

Coefficients:
 (Intercept)   Sepal.Width  Petal.Length   Petal.Width  
      1.8560        0.6508        0.7091       -0.5565  

Again, the outputs match, and my first machine learning algorithm seems to be working! :)

Sanket

 
10
Kudos
 
10
Kudos

Now read this

Iris Data-set : Discriminant Analysis bit by bit using R

Linear Discriminant Analysis is a useful dimensionality reduction technique with varied applications in pattern classification and machine learning. In this post, I will try to do an R replica of the Python implementation by Sebastian... Continue →