Exercise: Stream Data Processing
Practice how to implement a command-line script for data compression of a file using algorithms of the zlib module.
We'll cover the following...
Problem statement
On Kaggle, you can find a lot of interesting data sets, such as the London Crime Data. You can download the data in
Did the number of crimes go up or down over the years?
What are the most dangerous areas of London?
What’s the most common crime per area?
What’s the least common crime?
If you’re unsure about how to do this, click the “Show Hint” button.
Coding challenge
Write your solution code in the following code widget. We’ve already added the package.json
and london_crime_by_lsoa.csv
files for your ease.
Note: We’ve extracted first 500 records from the
london_crime_by_lsoa.csv
file for your ease.
// Write your code here
Solution
Here’s the solution to the above problem. You can go through it by executing the following command.
node index.js london_crime_by_lsoa.csv
import { Analyzer } from './analyzer.js' export class LeastCommon extends Analyzer { _transform(chunk, encoding, callback) { const currValue = +chunk.value if (currValue && !isNaN(currValue)) { const currArea = +this.map.get(chunk.borough) || new Map() const totalValue = +currArea.get(chunk.major_category) || 0 currArea.set(chunk.major_category, currValue + totalValue) this.map.set(chunk.borough, currArea) } callback() } _flush(callback) { this.result = "" for (const area of this.map.keys()) { this.result += `===> Area: ${area}\n` const crimes = this.map.get(area) // sorting the array const sortedCrimes = Array.from(crimes.entries()) .sort((a, b) => a[1] - b[1]) this.result += sortedCrimes.join(" | ") this.result += "\n\n" } callback() } }
Explanation
In this code, we’re implementing stream data processing on data using pipeline
and ...