Introduction
Let's study distributed data processing systems.
We'll cover the following
This chapter will examine distributed systems used to process large amounts of data that would be impossible or very inefficient to process using only a single machine.
Categories of distributed data processing systems
Distributed data processing systems can be classified into the following two main categories:
Batch processing systems
Batch processing systems group individual data items into groups called batches, which are processed one at a time. In many cases, these groups can be quite large (e.g., all items for a day), so the main goal for these systems is usually to provide high throughput, but sometimes at the cost of higher latency.
Stream processing systems
Stream processing systems receive and process data continuously as a stream of data items. As a result, the main goal for these systems is to provide very low latency sometimes at the cost of decreased throughput.
Get hands-on with 1400+ tech skills courses.