...

/

Build the Data Processing Pipeline

Build the Data Processing Pipeline

Create a new mix project to demonstrate the ability of GenStage to build data processing pipelines.

Complex use cases may require a data processing pipeline with a consumer stage, one or more producers, and several producer-consumers in between. However, the main principles stay the same. Therefore, we’ll start with a two-stage pipeline first and demonstrate how that works.

We will build a fake service that scrapes data from web pages—normally an intensive task dependent on system resources and a reliable network connection. Our goal is to request a number of URLs to be scraped and have the data pipeline take care of the workload.

Create our mix project

First, we’ll create a new application with a supervision tree, as we’ve done before. We will name it scraper and pretend we’re going to scrape data from web pages. We can see this below:

mix new scraper --sup

We have already created an application at the backend for you, so there’s no need to run the command above. This will create a project, scraper. We have added gen_stage as a dependency to mix.exs:

Press + to interact
#file path -> scraper/mix.exs
#add this code at the indicated place mentioned in comments of scraper/mix.exs
#in the playground widget
defp deps do
[
{:gen_stage, "~> 1.0"}
]
end

Then, we run the mix do deps.get command to download and compile all ...