Normalization of data  

In this post, I will cover two of the most popular techniques for normalization of data. Normalization is a very important data pre-processing step for improving performance of many machine learning algorithms.

For this purpose we will use the popular ‘Iris’ dataset. We will load the dataset from the ‘datasets’ package by using the command : data(iris). Next up, we will assign this data to another dataframe called data, which will save the normalized data points as we move through the code.

Z-score normalization:
In this method, the features are rescaled so that they have the properties of a standard normal distribution with the mean equal to zero and unit variance around the mean. The resultant values are more commonly known as z-scores and are computed as follows:
z-score.png
Here’s the R code:

data(iris)
data<-iris
for(i in 1:nrow(iris)){
    data$Sepal.Length[i]<-(data$Sepal.Length[i]-mean(iris$Sepal.Length))/sd(iris$Sepal.Length)
    data$Sepal.Width[i]<-(data$Sepal.Width[i]-mean(iris$Sepal.Width))/sd(iris$Sepal.Width)
    data$Petal.Length[i]<-(data$Petal.Length[i]-mean(iris$Petal.Length))/sd(iris$Petal.Length)
    data$Petal.Width[i]<-(data$Petal.Width[i]-mean(iris$Petal.Width))/sd(iris$Petal.Width)
}

Min-Max Scaling:
In the most simplistic implementation of this method, the features are scaled to a range between 0 and 1 using the following equation:
minmax.png
Here’s the R code:

data(iris)
data<-iris
for(i in 1:nrow(iris)){
    data$Sepal.Length[i]<-(data$Sepal.Length[i]-min(iris$Sepal.Length))/(max(iris$Sepal.Length)-min(iris$Sepal.Length))
    data$Sepal.Width[i]<-(data$Sepal.Width[i]-min(iris$Sepal.Width))/(max(iris$Sepal.Width)-min(iris$Sepal.Width))
    data$Petal.Length[i]<-(data$Petal.Length[i]-min(iris$Petal.Length))/(max(iris$Petal.Length)-min(iris$Petal.Length))
    data$Petal.Width[i]<-(data$Petal.Width[i]-min(iris$Petal.Width))/(max(iris$Petal.Width)-min(iris$Petal.Width))
}

That’s it on Normalization of data for now, and we will see this applied in the upcoming blogs soon.

Sanket

 
4
Kudos
 
4
Kudos

Now read this

The News-Vendor Problem: Discrete Demand Case

Last Sunday, I came across very interesting articles and applications of the News-vendor problem and decided to write R code to automate various cases of the same. Here is a start. In the simplest category of the News-vendor problem,... Continue →