Data Normalization and Standardization
Learn the basics of data normalization and standardization.
We'll cover the following
What is normalization?
Normalization can be defined as the process of transforming the data into a dataset that is roughly normally distributed. This is important because many statistical analysis techniques (click here for details) assume a normal distribution, and thus, if our data isn’t normally distributed, we can use different transformations based on how the data is skewed to turn it into a normally distributed data, as we’ll discuss below.
Normalization is also a term used to define the process of applying a transformation to the data to produce a dataset in the range 0–1 so as not to allow ranges to dominate in some calculations. To differentiate between these two different forms of normalization, we’ll call the process of converting the numerical data to a range 0–1 standardization or scaling (see below for how it’s done). Standardization is important because features that have larger ranges can dominate the results of statistical analysis, which we’ll see later as we embark on labs and examples in the upcoming chapters.
Get hands-on with 1400+ tech skills courses.