Data Science Life Cycle

The life cycle involves problem understanding, data acquisition, data wrangling, data exploration, feature engineering, modeling, deployment, and monitoring. You'll learn about it in this lesson.

We'll cover the following

Data Science life cycle

The Data Science life cycle involves the following steps:

  1. Problem Understanding: It all starts with understanding the problem at hand, the questions, and the answers we are trying to find from the dataset at hand.

  2. Data Acquisition: Data Acquisition, as the name suggests, is about retrieving the data with the help of Data Engineers where required. It also consolidates all of the data required to answer the question or to solve the problem at hand.

  3. Data Wrangling: Data wrangling is about using knowledge to preprocess data. It involves looking for missing values and asking business questions like why they are missing. Furthermore, it uses knowledge to give shape to the dataset appropriate for visualizations and to support the coming steps in the life cycle.

  4. Data Exploration: Data Exploration is about visualization and other statistics’ measures to see whether the questions we asked, in the beginning, are being answered or not? The data analyst’s job ends here.

  5. Feature Engineering and Selection: It is a preprocessing step before modeling in both Machine Learning and Deep Learning. We will look into these fields in the coming sections. It has similar steps to Data Wrangling apart from some algorithms for Feature Selection and transformation.

  6. Modelling: Modeling is the process that uncovers the meaning of the data. It is about capturing underlying trends and the data’s behavior to make the model, which can be used for predictive analytics as described in the previous section.

  7. Deployment: After we build the model we’ll deploy it in the most efficient and optimized manner so that real-world people can use it. It can be deployed on mobile applications and web applications.

  8. Monitoring: After we have deployed the model, we will want to monitor it. Monitoring is about familiarizing the model with the new dataset and tracking the number of requests that the model receives. It also involves making changes to the analysis and starting over if required.