Data Team Roles

Learn different roles in a data team and how they work with each other.

Data engineers never work in silos. No matter what they are working on, they will constantly interact with technical and nontechnical people. Typically, they fall under the umbrella of the data team.

Every organization has its own way of structuring a data engineering team. There are no correct answers on how to set up a data team because it depends on various factors. So, rather than finding the best solution, let's explore some of the common data team models in the industry.

Medium/large-size data teams usually consist of data engineers, analytics engineers, data analysts, and data scientists. Unlike the software development teams, where each role has distinct responsibilities (e.g., frontend and backend), there is a huge overlap between different data roles. For example, some data scientists focus on understanding and deriving insights from data and are more similar to data analysts. Other data scientists have strong programming skills. They build systems that deliver machine learning models to production, and they are closer to machine learning engineers.

Note: If you are looking for a job in the data domain, make sure you read the job description carefully. Otherwise, you might end up getting unexpected questions in the interview.

Data engineer

The role of data engineers has experienced a substantial evolution over the course of years. In the early days, the primary work of data engineers was to store and process data in relational databases using SQL-based tools or drag-and-drop ETL tools. Sometimes, they were also called database administrators, database developers, or business intelligence developers.

Over time, the evolution of human culture and technology has stimulated the amount and speed of data generation. After data scientists were recognized as one of the most desirable jobs of the 21st century, there was a huge demand for data engineers to handle petabytes of data and help businesses grow in the areas of reporting, recommendation engines, risk controls, and more.

Annual size of the global datasphere (source: IDC Datasphere whitepaper)

In 2001, Doug Laney published a Gartner report describing "big data" with "three vees": volume, velocity, and variety. This trend paved the way for the invention of cutting-edge distributed and scalable databases like Apache Cassandra and Apache HBase and real-time engines like Apache Druid and Apache Kafka. These have become favored tools for today's data engineers to efficiently store and process data. Furthermore, the emergence of the public cloud has triggered a revolutionary shift in software development and deployment, providing huge advantages to data engineers as well.

As a result, the definition of a data engineer today is not the same as it was ten years ago. Today, a data engineer is someone who develops and maintains systems that take in raw data from various sources, transform it, and produce high-quality output that supports data consumers.

Data analyst

Data analysts are important data consumers who uncover critical insights from the data that lead to better decision-making for the organization. They require specific technical skills, such as Python/R and SQL, to support their analysis and data visualization skills, like Tableau and Looker, to present their analysis in an intuitive manner.

Another important skill of data analysts is stakeholder management. Data analysts need to communicate their findings to stakeholders, explaining the impact of the data to nontechnical audiences. This requires a combination of critical thinking, storytelling skills, and effective communication abilities. These soft skills empower data analysts to collaborate more closely with engineers and business users.

Analytics engineer

Analytics engineers are relatively newcomers in the data domain. As the landscape of data and analytics has expanded rapidly in the past few years, data engineers have shifted their attention from maintaining SQL-based tools to building highly scalable and reliable data infrastructure. Meanwhile, data consumers are eager to find more insights from this ever-changing world.

Consequently, the communication barrier between data engineers and data consumers becomes a problem, and there is a need for a role that intersects data engineers and data consumers. That's where analytics engineers come into the picture.

An analytics engineer is someone who has a blend of skills. They know enough about software engineering to write high-quality data transformation code with performance and maintainability in mind. Besides, they also have a good understanding of downstream use cases to ensure the data models can make faster and better analytically-driven business decisions.

Press + to interact

Data scientist

Data scientist is another group of important data consumers. It owns the title of the most attractive job in the 21st century. A data scientist is someone who makes predictions using past patterns and machine learning algorithms. They generally have a background in computer science and include more technical skills to dive deeply into the data to make quantifiable insights.

In an ideal world, data scientists should spend 80% of their time on the top layers of the pyramid. Data engineers, machine learning engineers, and analytics engineers build solid foundations for data scientists to run advanced algorithms efficiently.

Press + to interact
Data science hierarchy
Data science hierarchy

Machine learning engineer

As the data team grows, the number of ML models built by data scientists grows to a certain degree, where they need a systematic way to manage the deployment. In a data team, a machine learning engineer is someone who has conceptual knowledge of machine learning and strong engineering skills to put ML models into production and monitor their performance successfully. The process is also called MLOps, which applies to the entire life cycle of an ML model.

Note: In smaller-scale data teams, it's common to see data engineers and data scientists manage the model life cycle themselves due to limited headcounts.

There is another type of machine learning engineer who is not part of the data team. Machine learning is a broad domain, including video processing, facial recognition, etc. These machine learning engineers generally have specific domain knowledge and sit closer to the domain expert.

The graph summarizes the relations among different roles in a data team. Generally speaking, data scientists, analysts, and engineers are a data team's three main pillars. As the team grows, interdisciplinary roles like machine learning engineers and analytics engineers will be needed to bridge the gap between business and technology.

It's often the case that before hiring a machine learning or analytics engineer, someone from the data team already has the required skills to manage the gap for a while. Data leaders should appreciate their values, make sure they are not overloaded, and take action to hire new roles if necessary.