This device is not compatible.
PROJECT
Create Your First Data Pipeline with a Dashboard
We’ll teach you how to create a data pipeline and interactive data visualization in Python. We’ll begin by building a bespoke data pipeline with Kedro and then utilize hvPlot to display the findings as interactive graphs.
You will learn to:
Create the data preprocessing and data transformation pipelines
Apply multiple levels of transformations on data
Visualize data to draw conclusions
Add interactivity to visualizations
Skills
Data Science
Data Visualisation
Data Manipulation
Data Pipeline Engineering
Prerequisites
Basic programming in Python
Basic knowledge of Pandas
Basic knowledge of data pipelines
Basic knowledge of plotting in Python
Technologies
Kedro
Python
HvPlot
Project Description
According to the Statista 2022 analysis, the quantity of data generated, recorded, replicated, and consumed globally is predicted to skyrocket to 181 zettabytes from 2021 to 2025:
Modern organizations are awash in data which necessarily involves data processing and analysis. A data pipeline is the backbone of any reliable data workflow. It takes raw inputs, applies structured transformations, and produces clean outputs one can actually use. In this project, we'll build a data pipeline in Python from scratch using Kedro, an open-source framework designed for creating modular, reproducible, and production-ready data pipelines. Rather than writing one-off scripts, we'll structure the work into reusable nodes and datasets the way professional data engineering workflows are actually organized.
We'll begin with data ingestion, i.e., loading raw data into the pipeline and configuring Kedro's DataCatalog to manage inputs and outputs cleanly. From there, we'll implement data preprocessing and transformation stages as discrete pipeline nodes, learning how Kedro resolves dependencies between steps automatically and makes each stage independently rerunnable. This is what separates a real data pipeline from a notebook full of sequential cells.
Once the pipeline is running end-to-end, we'll shift to visualization. Using hvPlot, a high-level Python plotting library built on HoloViews, we'll build an interactive data visualization dashboard with dynamic charts, filters, zoom, and hover capabilities that Matplotlib alone doesn't offer. This is where raw pipeline outputs become interpretable: we'll explore distributions, compare categories, and surface patterns through interactive views rather than static plots.
By the end, we'll have a complete, working example of a Python data pipeline paired with an interactive dashboard, which will be a practical foundation in both data pipeline design and data visualization that reflects how analysts and data engineers approach the problem in real teams.
Project Tasks
1
Star the Data Pipeline
Task 1: Load the Raw Data
Task 2: Create the First Node
Task 3: Create a Data Preprocessing Node
Task 4: Use the Data Catalog
Task 5: Design the Data Pipeline
Task 6: Run the Data Pipeline
2
Set Up Interactive Plotting
Task 7: Create the Static Plots With Pandas
Task 8: Create Dynamic Plots with hvPlot
Task 9: Create the Dynamically Filtered KDEs Using hvPlot
3
Perform Advanced Data Manipulations in the Pipeline
Task 10: Create a Node for Data Transformation
Task 11: Modify the Data Catalog
Task 12: Run the Data Pipeline With Recently Created Node
4
Enhanced Interactive Plots with hvPlot
Task 13: Load the Transformed Data
Task 14: Plot a KDE: Hourly Temperatures for Individual Classes
Task 15: Plot a KDE: Hourly Wind Speeds for Individual Classes
Congratulations!
Subscribe to project updates
Atabek BEKENOV
Senior Software Engineer
Pradip Pariyar
Senior Software Engineer
Renzo Scriber
Senior Software Engineer
Vasiliki Nikolaidi
Senior Software Engineer
Juan Carlos Valerio Arrieta
Senior Software Engineer
Relevant Courses
Use the following content to review prerequisites or explore specific concepts in detail.