...

/

Exercise: Stream Data Processing

Exercise: Stream Data Processing

Practice how to implement a command-line script for data compression of a file using algorithms of the zlib module.

Problem statement

On Kaggle, you can find a lot of interesting data sets, such as the London Crime Data. You can download the data in CSVcomma-separated values format and build a stream processing script that analyzes the data and tries to answer the following questions:

  • Did the number of crimes go up or down over the years?

  • What are the most dangerous areas of London?

  • What’s the most common crime per area?

  • What’s the least common crime?

If you’re unsure about how to do this, click the “Show Hint” button.

Coding challenge

Write your solution code in the following code widget. We’ve already added the package.json and london_crime_by_lsoa.csv files for your ease.

Note: We’ve extracted first 500 records from the london_crime_by_lsoa.csv file for your ease.



// Write your code here


Template code to implement stream data processing

Solution

Here’s the solution to the above problem. You can go through it by executing the following command.

Press + to interact
node index.js london_crime_by_lsoa.csv
import { Analyzer } from './analyzer.js'

export class LeastCommon extends Analyzer {
  _transform(chunk, encoding, callback) {
    
    const currValue = +chunk.value
    if (currValue && !isNaN(currValue)) {
      const currArea = +this.map.get(chunk.borough) || new Map()
      const totalValue = +currArea.get(chunk.major_category) || 0
      currArea.set(chunk.major_category, currValue + totalValue)
      this.map.set(chunk.borough, currArea)
    }
    callback()
  }
  _flush(callback) {
    this.result = ""
    for (const area of this.map.keys()) {
      this.result += `===> Area: ${area}\n`
      const crimes = this.map.get(area)
      // sorting the array
      const sortedCrimes = Array.from(crimes.entries())
      .sort((a, b) => a[1] - b[1])
      this.result += sortedCrimes.join(" | ")
      this.result += "\n\n"
    }
    callback()
  }
}
Solution code to implement stream data processing

Explanation

In this code, we’re implementing stream data processing on data using pipeline and ...