Big Data and Apache Spark

Learn more about big data and big data processing.

In this chapter we will feature a widely popular big data processing framework called Apache Spark. And in the next chapter, we will discuss a distributed database system.

What is big data?

Answering this question is a bit tricky, given that the definition depends a lot on the context. But let’s first start somewhere.

Big data is a large amount of data that cannot be stored or processed using traditional methods.

In traditional data-processing methods, data is processed on a machine using simple techniques. On the machine, there is some amount of data stored on the disk. A program is run to read the data, extract what is required from the data, and then process it. Suppose the data is small and can be easily processed on an average machine using obvious techniques. In that case, we do not require any fancy specialized ...