Untitled Masterpiece
This lesson describes the differences between the various categories of data.
Terminology
A reader will come across lots of confusing, technical jargon when starting to read about data storage and processing. We’ll briefly discuss the various commonly used terms in the industry throughout the course so that the reader can visualize where Big Data fits in the broader scheme of technologies.
Data
Let’s start with data. Data (plural of datum) are distinct pieces of information. Data can exist in several different forms: numbers, text, bytes, Instagram pictures, or YouTube videos. These represent various types of data that can be stored and transmitted electronically. Note that data are usually interpreted in a context. For example, an ASCII text line representing prose in the English language can’t be interpreted as a .jpeg picture and vice versa.
There are broadly three categories of data:
-
Structured data: has some pre-defined organizational property making it easily searchable and analyzable. The data is backed by a model that dictates the size of each field: its type, length, and restrictions on what values it can take on. Data stored in SQL databases is structured. Structured data is usually formatted in a universally understandable and identifiable manner. In most instances, a schema formally specifies ...