Normalization of data  

In this post, I will cover two of the most popular techniques for normalization of data. Normalization is a very important data pre-processing step for improving performance of many machine learning algorithms.

For this purpose we will use the popular ‘Iris’ dataset. We will load the dataset from the ‘datasets’ package by using the command : data(iris). Next up, we will assign this data to another dataframe called data, which will save the normalized data points as we move through the code.

Z-score normalization:
In this method, the features are rescaled so that they have the properties of a standard normal distribution with the mean equal to zero and unit variance around the mean. The resultant values are more commonly known as z-scores and are computed as follows:
z-score.png
Here’s the R code:

data(iris)
data<-iris
for(i in 1:nrow(iris)){
    data$Sepal.Length[i]<-(data$Sepal.Length[i]-mean(iris$Sepal.Length))/sd(iris$Sepal.Length)
    data$Sepal.Width[i]<-(data$Sepal.Width[i]-mean(iris$Sepal.Width))/sd(iris$Sepal.Width)
    data$Petal.Length[i]<-(data$Petal.Length[i]-mean(iris$Petal.Length))/sd(iris$Petal.Length)
    data$Petal.Width[i]<-(data$Petal.Width[i]-mean(iris$Petal.Width))/sd(iris$Petal.Width)
}

Min-Max Scaling:
In the most simplistic implementation of this method, the features are scaled to a range between 0 and 1 using the following equation:
minmax.png
Here’s the R code:

data(iris)
data<-iris
for(i in 1:nrow(iris)){
    data$Sepal.Length[i]<-(data$Sepal.Length[i]-min(iris$Sepal.Length))/(max(iris$Sepal.Length)-min(iris$Sepal.Length))
    data$Sepal.Width[i]<-(data$Sepal.Width[i]-min(iris$Sepal.Width))/(max(iris$Sepal.Width)-min(iris$Sepal.Width))
    data$Petal.Length[i]<-(data$Petal.Length[i]-min(iris$Petal.Length))/(max(iris$Petal.Length)-min(iris$Petal.Length))
    data$Petal.Width[i]<-(data$Petal.Width[i]-min(iris$Petal.Width))/(max(iris$Petal.Width)-min(iris$Petal.Width))
}

That’s it on Normalization of data for now, and we will see this applied in the upcoming blogs soon.

Sanket

 
4
Kudos
 
4
Kudos

Now read this

Iris Data-set : Discriminant Analysis bit by bit using R

Linear Discriminant Analysis is a useful dimensionality reduction technique with varied applications in pattern classification and machine learning. In this post, I will try to do an R replica of the Python implementation by Sebastian... Continue →