Motivating Integration

Understand how entity resolution fits into the overall picture.

At this point, we are comfortable configuring an entity resolution pipeline on a single computer: extract a snapshot from a source, fit a model, resolve conflicts, and create a cross-reference table. This is great for exploration. However, if we need to scale, we must change the process.

Let’s explore typical technology stacks, how entity resolution might fit into the overall picture, and why integrating is essential.

Modern data stack (MDS)

Many companies have invested in analytics infrastructure, so we don’t want to start from zero—for example, a modern data stack (MDS) is a collection of various tools, typically covering data ingestion, storage, transformation, and much more. Every MDS aims to enable analytics, and SQL is its primary transformation language.

Get hands-on with 1200+ tech skills courses.