Gradient Descent Algorithm for Linear Regression
For this exercise I will use the data-set from this blogpost and compare the results to those of the lm() function used to calculate the linear model coefficients therein.
To understand the underlying math for the gradient descent algorithm, turn to page 5 of these notes, from Prof. Andrew Ng.
Here’s the code that automates the algorithm:
alligator = data.frame(
lnLength = c(3.87, 3.61, 4.33, 3.43, 3.81, 3.83, 3.46, 3.76,
3.50, 3.58, 4.19, 3.78, 3.71, 3.73, 3.78),
lnWeight = c(4.87, 3.93, 6.46, 3.33, 4.38, 4.70, 3.50, 4.50,
3.58, 3.64, 5.90, 4.43, 4.38, 4.42, 4.25)
)
X<-as.matrix(alligator$lnLength)
X<-cbind(1,X)
y<-as.matrix(alligator$lnWeight)
theta <- matrix(c(0,0), nrow=2)
alpha <- 0.1
for(i in 1:100000){
theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
theta<-theta_n
}
Here’s the output of the gradient descent algorithm:
> theta
[,1]
[1,] -8.476067
[2,] 3.431098
This matches with the output of the lm() function.
We can use the same piece of code for multiple linear regression, as it is in a general form and applies for multiple features in the matrix X.
Let’s try it out on the iris data-set.
data(iris)
X<-as.matrix(iris[,c(2:4)])
X<-cbind(1,X)
colnames(X)[1]<-"X0"
y<-as.matrix(iris$Sepal.Length)
theta <- matrix(c(0,0), nrow=ncol(X))
alpha <- 0.01
for(i in 1:100000){
theta_n<-theta-alpha*(t(X)%*%((X%*%theta)-y))/length(y)
theta<-theta_n
}
The output from the above code:
> theta
[,1]
X0 1.8558851
Sepal.Width 0.6508652
Petal.Length 0.7091473
Petal.Width -0.5565092
and if we do the same multiple linear regression using the lm() function, here’s how we would do it:
fit<-lm(Sepal.Length~Sepal.Width+Petal.Length+Petal.Width,data=iris)
The output from the lm() function is:
> fit
Call:
lm(formula = Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width,
data = iris)
Coefficients:
(Intercept) Sepal.Width Petal.Length Petal.Width
1.8560 0.6508 0.7091 -0.5565
Again, the outputs match, and my first machine learning algorithm seems to be working! :)
Sanket