...

Working with Streams

Learn how to work with Node.js streams to process large amounts of data efficiently by reading, writing, and transforming data in chunks.

We'll cover the following...

What are streams?
The readable streams
The writable streams
The duplex streams
The transform streams
Combining streams with piping
Key takeaways

Managing data efficiently is a key aspect of many modern applications, especially when dealing with large files, real-time data, or network requests. Node.js streams provide a way to handle data piece by piece, making it possible to process large amounts of information without overloading memory.

Streams are commonly used in scenarios like:

File handling: Reading and writing large files efficiently.
Web applications: Managing incoming requests and sending responses incrementally.
Real-time applications: Enabling continuous data flow for features like live chat or data feeds.

Unlike traditional methods that require loading all data at once, streams allow us to process data as it arrives. This approach fits well with Node.js's event-driven model and helps build fast, scalable applications.

What are streams?

Streams in Node.js are abstract interfaces for working with data that is read or written sequentially. They represent a continuous flow of information, allowing data to be processed incrementally rather than all at once. This makes streams particularly useful when handling large datasets or real-time data where loading everything into memory would be impractical.

At their core, streams enable efficient, non-blocking I/O operations by breaking data into manageable chunks. This approach ensures that applications remain responsive, even when dealing with significant amounts of data.

To achieve this, streams rely on buffers to handle data in chunks. A buffer temporarily holds small pieces of data as it moves through the stream, enabling efficient, non-blocking processing. This design ensures that even large files or data streams can be processed without overwhelming the system's memory.

Press + to interact

Node.js provides four main types of streams, each serving a specific purpose:

Readable streams: Designed for reading data in chunks from a source, such as reading lines from a file or receiving data over a network connection.
Writable streams: Allow data to be written incrementally to a destination, such as saving logs to a file or sending responses in an HTTP server.
Duplex streams: Combine the functionality of readable and writable streams, enabling data to be read and written simultaneously. They are commonly used in network communication, such as sockets.
Transform streams: A special type of duplex stream that reads data, processes or transforms it (e.g., compressing or encrypting), and then writes the transformed output.

The readable streams

Readable streams allow us to read data in chunks from a source, making them efficient for handling large datasets or continuous data flows. These streams can originate from various sources, such as:

Files
HTTP requests
Network sockets

When working with a readable stream, data is read in chunks, allowing large datasets to be processed incrementally. The size of each chunk is controlled by the highWaterMark option, which specifies the maximum number of bytes to read at a time. By default, this value is 16 KB for most streams, but it can be adjusted to suit specific use cases. Readable streams operate in an event-driven manner, emitting events like:

data: Triggered when a chunk of data is available for processing.
error: Triggered if an error occurs during reading.

The following example demonstrates reading data incrementally from a file using a readable stream created with fs.createReadStream. This method processes data in chunks as it appears in the file, making it especially useful for handling large files efficiently without loading the entire content into memory.

Press + to interact

Explanation

Line 1: The fs module is imported to access file system operations.
Line 4: The fs.createReadStream method creates a stream to read the example.txt file.
- The encoding option is set to utf8 to interpret the file content as a string.
- The highWaterMark option sets the chunk size to 16 bytes. This means that each data event will emit up to 16 bytes of content.
Lines 7–9: The data event listener is attached to the readable stream. Whenever a chunk of data becomes available, the callback function logs the chunk to the console. This event-driven mechanism ensures that the application only processes data as it arrives, avoiding unnecessary memory usage.
Lines 12–14: The error event listener handles any issues during file reading, such as file access errors or missing files. By logging the error message, we can identify and debug issues without causing the application to crash.

The writable streams

Writable streams allow us to write data incrementally to a destination, making them efficient for handling large outputs or continuous data flows. These streams are widely used in various scenarios, such as:

Writing HTTP responses in a web server.
Sending data over network sockets.

One common example is fs.createWriteStream, which is used to write data to files incrementally. The following example demonstrates how to use fs.createWriteStream to write multiple chunks of data to a file:

Explanation

Line 1: Imports the fs module.
Line 4: Creates a writable stream for the file output.txt.
Lines 7–8: Writes multiple chunks of data to the writable stream using the write method.
Line 9: Signals the end of writing with the end method, ensuring all data is flushed.

The duplex streams

Duplex streams are a unique type of stream in Node.js that allow simultaneous reading and writing of data. This makes them ideal for scenarios where data flows bi-directionally, such as network communication using sockets or building custom data processors.

Unlike separate readable and writable streams, duplex streams combine both functionalities into a single object. This means that data can be read and written independently, often through distinct events and methods.

Key scenarios where duplex streams are used are:

Network communication: Duplex streams are commonly used with sockets (e.g., net.Socket) to enable bidirectional communication, such as in a chat application.
Custom data processing: We can create duplex streams for scenarios requiring both input and output, like converting data formats or implementing communication protocols.

The transform streams

Transform streams are a special type of duplex stream that can read, modify, and write data simultaneously. They are often used for tasks like compressing files, encrypting data, or converting formats.

In the following example, we use Node.js's stream.Transform class to create a Transform stream that converts incoming data to uppercase before writing it out.

Press + to interact

Node.js

const { Transform } = require('stream');
// Create a Transform stream
const upperCaseTransform = new Transform({
    transform(chunk, encoding, callback) {
        // Convert the chunk to uppercase and push it to the writable side
        this.push(chunk.toString().toUpperCase());
        callback();
    }
});
// Use the transform stream
upperCaseTransform.write('hello');
upperCaseTransform.write('world');
upperCaseTransform.end();
// Read transformed data
upperCaseTransform.on('data', (chunk) => {
    console.log(chunk.toString());
});

Explanation

Line 1: Import the Transform class from Node.js’s stream module.
Line 4: Create a new Transform stream by instantiating the Transform class. The Transform stream is defined with a transform method passed in the options object.
- Lines 5–10: Define the transform method:
- Parameters:
  - chunk: A portion of data being processed. In this case, it’s a chunk of text written to the stream.
  - encoding: The encoding type of the chunk (e.g., utf8). For binary streams, this parameter is ignored.
  - callback: A function that signals the completion of the transformation for the current chunk.
- Logic:
  - this.push(chunk.toString().toUpperCase()): Converts the chunk to uppercase and pushes the transformed data to the writable side of the stream.
  - callback(): Indicates the chunk transformation is complete. Without calling this, the stream will not process additional chunks.
Lines 13–14: Write data chunks ("hello" and "world") to the Transform stream using the write method.
Line 15: End the stream using end(), signaling no more data will be written.
Line 18–20: Listen for the data event on the stream to read transformed data as it flows through. Each transformed chunk is logged to the console.

This example demonstrates how the stream.Transform class provides a powerful way to manipulate data as it flows through streams. By defining the transform method, we can implement custom logic to modify or process data on the fly, making it especially useful for tasks like data formatting, compression, or encryption.

Combining streams with piping

Streams can be connected using the .pipe() method, allowing data to flow seamlessly from one stream to another. In the example below, we connect the readable stream to the writeable stream using pipe.

Explanation

Line 1: Imports the fs module.
Lines 4–5: Uses .pipe() to connect the readable stream for example.txt to the writable stream for output.txt.
Line 7: Logs a success message once the copying process completes.

Key takeaways

In this lesson, we’ve:

Learned the different types of streams in Node.js: readable, writable, duplex, and transform.
Explored reading and writing data in chunks using readable and writable streams.
Used transform streams to modify data during processing.
Leveraged the .pipe() method to connect streams seamlessly for efficient data flow.

Getting Started with Node.js

Global Objects and Modules

Asynchronous Programming

Event-Driven Programming

Core Node.js Modules

Building a Basic Web Server

npm and Package Management

Working with Databases

Building a RESTful API with Node.js and PostgreSQL

Advanced Topics

Wrap Up

Appendix

Build a Doctor Appointment Booking System Using the MERN Stack

Working with Streams

What are streams?

The readable streams

The writable streams

The duplex streams

The transform streams

Combining streams with piping

Key takeaways