Data Team Structure: Embedded or Centralized?

Learn about different data team structures and their pros and cons.

In terms of structuring a data team within the organization, there is no one-size-fits-all team structure. It depends on the team's maturity level and what works best for the organization. Here are two different team models presented in a graph:

Press + to interact
Two different data team structures
Two different data team structures

Centralized data team model

A centralized data team is a team where all the team members work together and receive business requests from all the business units. They work as consultants to various teams. This model's pros include the ease of sharing technical and functional knowledge and building standards. The data team can have a consistent data strategy across the organization.

However, this may result in difficult prioritization. The data team leader must maintain an excellent understanding of the company strategy to prioritize work accurately and manage relationships with different departments (e.g., growth, finance) that all think their requests are critical.

Embedded data team model

In an embedded data team, data specialists work closely with the business users. The close relationship removes the barriers between technical members and business users. It also boots productivity due to less context switching.

But this model also has its problems. It creates silos: data engineers and analysts from different teams work in their own ways, and it's hard for the data team leader to keep track of each team's progress. This model may limit the career growth of data engineers due to fixed use cases.

Pros and Cons of Two Data Team Models


Centralized Team

Embedded Team



Pros

It is easy to share technical and functional knowledge with other DE/DA/DS. Consistent best data practices and visions are maintained across the organization. It is also feasible for early start-ups with not enough engineers.

There is a strong relationship between engineers and the business. This results in less context switching.



Cons

It is difficult to prioritize work that comes from different domains.


Separated teams lead to a lack of tech consistency across the organization. Career growth for engineers can be limited, and it is hard to keep track of each team’s progress.

Varied data team model(s)

Both models have pros and cons, so many organizations have devised new variants to best suit their use cases.

The first variant consists of a centralized data engineering team and embedded data analysts/scientists. For analysts, working from domain A to domain B requires a lot of context switching. Every time they switch domains, they create a productivity gap. This model learns the benefit of the embedded model and keeps analysts in each business unit.

On the other hand, it centralizes all the data engineers. A problem with the embedded model is its inconsistency across the organization. Centralizing engineers makes it easier to work on the same data infrastructure, create a coding standard, and consistently communicate with analysts. This model works well for a smaller-scale data team with around ten people.

Press + to interact
Centralized DE + embedded DA/DS
Centralized DE + embedded DA/DS

The second variant is a domain-based solution. Each domain has dedicated engineers and analysts, but at the same time, all data engineers form a sub-team. This variant is usually a step after the first variant. As business units grow, data engineers may face difficulties in prioritization, so embedding data engineers into product teams is a strategic move to ensure that each team consistently receives the necessary engineering resources, and meanwhile, engineers will have less context switching.

In parallel, engineers have their own group that encourages them to share technical knowledge. This setup is very similar to the Spotify model, where the team with business and engineering is named a Squad, and the all-engineers team is named Chapter.

Press + to interact
Centralized DE + embedded DE/DA/DS
Centralized DE + embedded DE/DA/DS

Note: Data mesh is a newly introduced decentralized approach that enables domain teams to perform cross-domain data analysis independently, and it's gaining a lot of attention in the industry.

It tries to solve the bottleneck of centralized data teams by letting each domain own the data pipelines and business logic from end-to-end and treat data as products. Meanwhile, a data platform team maintains a centralized self-serve data infrastructure for storage, catalog, access management, monitoring, etc. It allows domain teams to build data products more efficiently.

Upstream stakeholders

To be a successful data engineer, we must also understand our upstream stakeholders. Data engineers have a few upstream stakeholders: data architects and software engineers.

A data architect is someone who designs the blueprint for the policies, procedures, and technologies to be used in collecting, storing, and processing organizational information. They also serve as a bridge between technical and nontechnical sides. Depending on the maturity level of the data team, the responsibilities of a data engineer may overlap with those of a data architect.

A software engineer is someone who builds software that generates application events and logs. They produce operational data, which is an important data source, but the format is not necessarily designed for analytics. A good data engineer should work with software engineers to understand the operational data and agree on the frequency, data format, etc., to avoid unexpected data that may cause downstream problems.

We've seen five different data team structures. When deciding on team structure, it's crucial to consider the following points:

  • The size of the data team: A centralized team structure might be a good starting point for a small data team, but not necessarily for a large data team due to the challenge of task prioritization.

  • The ratio between decision-makers and engineers: Every sub-data team should include at least one decision-maker, such as a team lead or a project manager, to help with the prioritization. If there is no such role, then it's better to merge the team with the other data team.

  • Career growth and knowledge sharing: A centralized team structure makes it easy for engineers to get exposed to various areas and share knowledge among team members. But it doesn't mean that embedded team structures cannot achieve the same. Team leads play a vital role in improving knowledge sharing among product teams. Organizing cross-team knowledge sessions is a good example of promoting such learning initiatives.

Nevertheless, it's never the case that once a team structure is established, it can't be changed anymore. The leadership team is encouraged to try out different team models and eventually find the one that works the best for the company.