In pursuit of the Engineering Analyst

Why analysts in small teams are increasingly bridging the gap into Analytical Engineering

Oct 09, 2024

Article voiceover

1×

0:00

-16:43

I come across a lot of misconceptions about Data Engineering, Analytical Engineering and Analysts, in their roles, and the differences between them. A lot of the literature and discussion about these roles focuses on large organisations, where large teams of each can co-exist and afford to be prescriptive about their roles. However, in my experience this leaves smaller teams unsure of where boundaries lie, whether they need an analytical engineering role and where data analysis should start and stop.

This is a post about how smaller teams can meet that challenge, but first I want to start with a summary of what analytical engineers do, as this is a grey area that needs definition before we focus on what happens in smaller teams. So first let’s look at how things work, in general, at large organisations with more rigid structures and roles:

Section 1: In Theory…

What is analytical engineering?

Analytics engineering sits in between business and technical expertise Analytics engineers sit between data engineers, who focus on infrastructure, and data analysts, who interpret the data for the business. They therefore need to understand both the technical systems and business needs of their stakeholders, and be ready to translate the business requirements into technical data solutions. Their output is clean, analysis-ready datasets.

Analytics engineers do a lot of data modelling Analytics engineers focus on transforming, testing, deploying, and documenting data. As such they are responsible for designing scalable, robust data models and making heavy use of SQL and tools like DBT for data transformations. They also utilise tools like Git to apply software engineering best practices to maintain version control and continuous integration pipelines.

Data quality is a priority for analytics engineers A key part of an analytical engineers role is ensuring data accuracy, consistency, and reliability. They implement data quality checks and resolve issues with data engineers in order to build trust in data across the business and enable accurate business decision-making.

Analytics engineering is all about communication Analytical engineers collaborate with various stakeholders, from analysts and data scientists to business users and explain technical concepts to their non-technical audiences. In doing so they work to ensure alignment on any data definitions and quality standards, and incorporate these into the data models.

Analytics engineering is a new and evolving field The role of analytics engineer first emerged around 2018, in the main due to the demand for clean, reliable data. This means that while their core responsibilities are clear, the tools and technologies they use are still evolving and the role continues to develop within the wider data team structure.

How is Analytics Engineering Different to Data Engineering?

Data engineers, in short, sit much further back in the pipeline, and, while it overlaps with analytical engineering in parts, data engineers will build and maintain infrastructure for data acquisition and storage, whereas analytics engineers will focus on the data itself and how to transform it into analytics ready datasets.

Data engineers are proficient in programming languages like Python and Java, to interact with API’s, they know AWS and Azure Event Hubs and handle large volumes of complex data piping it into warehouses such as Snowflake and Databricks. By way of contrast, analytical engineers are masters of SQL, the language of data analysts, and are skilled in data modelling and DBT. They also have a strong understanding of data warehousing; and will have good experience with data warehouse platforms like Snowflake, Databricks, Redshift.

Data engineers tend to collaborate with software engineers and technical integration teams and are typically less involved in business side; with limited interaction with analysts and business users. Analytics engineers, however, require strong communication and collaboration skills as they must bridge the gap between the technical data engineers and non-technical business users.

In summary, if, as Humby says, “Data is the new oil”, then data engineers build the pipework to transport the crude oil from the rigs and analytical engineers are the refineries creating products that can be used and making it accessible to users.

So where does this leave analysts?

Data analysts are mainly responsible for analysing data and deriving meaningful insights from it. They identify trends, patterns, and anomalies within data, translating them into actionable insights for decision-making.

They then need to effectively communicate their findings to business stakeholders, often delivering reports, presentations, and dashboards. They must act as storytellers for the business, explaining how data can be used to answer business questions and guide strategic decisions. Data analysts are experts in conveying technical information to non-technical audiences, and like analytical engineers they bridge the gap between data and business understanding. However, analysts are much more likely to be at the coalface working with the business on an almost daily basis.

Data analysts utilise tools like SQL, Excel, and Tableau, as well as part of their toolkit, as well as tools like EasyMorph and Alteryx for exploratory analysis.

Read more about my Essential Toolkit for Exploratory Analysis below:

Read Post

Data analysts rely on data engineers to provide access to clean and reliable data sources and work with analytics engineers to ensure that data is transformed and modelled in a way that supports their needs.

Overall data analysts are business-oriented, focusing on solving practical business problems using data-driven insights. The work with the business to understand their needs, and translate this into deliverables to help business-decisions.

Section 2: In Practice…

In practice the definitions across Data Engineering, Analytics Engineering and Analyst are too multi-faceted and overlapping to draw any strict patterns between organisations. Moreover, many small organisations simply don’t have the luxury of multiple roles looking after distinct stages of the data pipeline. I have worked with a variety of teams who have different stages of maturity in terms of their data ecosystem and typically, these all share a common theme - they don’t have analytical engineers by name.

Instead, they have data engineers who work across the data pipeline and provide data, in the form of a “Gold” layer of curated tables to the business, with analysts picking it up from there. In smaller team, perversely, these roles tend to be a lot more rigid and defined.

And it’s here that the squeeze happens and challenges can start to occur…

Data analysts can feel disempowered when data engineers, too removed from the requirements of the business, try to second guess what curated layers are needed. Analysts raise tickets to rebuild curated layers and then grow frustrated as data engineers can’t respond with the flexibility they need.

Moreover, if data analysts have access to Bronze or Silver tables they build their own pipelines, outside the control of the data engineering team - into their own processes and reports.

This gap between the analysts and data engineers can leave the business struggling for insights, and both teams frustrated. While it can work, it relies on time and great communication between the leadership in both teams to collaborate effectively. Data teams need collaboration frameworks like regular feedback loops using tools like Slack or Teams, dedicated to open communication between Data Engineering and Analytical teams.

So is there a better way?

The Case for the Engineering Analyst

Small teams have some great strengths, and one of those skills is often how adept and flexible individuals have to become with their skills. SQL is likely a mainstay of most analysts in small teams, as such they can often build and deploy data pipelines easily. As mentioned above this often leads to frustrations with Data Engineers who see them working outside their processes and building curated data layers. Smaller data teams need to take advantage of this and deliberately utilise SQL skills to build Engineering Analysts.

Engineering Analysts can deliberately take control of the curation of datasets for the Gold Layer, and take the pressure off Data Engineers working right across the data pipeline.

However, to work then some clear rules of engagement need to be defined.

Everyone needs to subscribe to the fact that Data Analysis is iterative, curated layers will rarely cover everything and so a period of “suck it and see” is needed to build new curated datasets and check they work.

Data Analytics teams are much better placed to work iteratively with the business and use dashboards, reports, and metrics, to expose these as needed. While these datasets are in a state of flux then the need for strict version control, CI/CD pipelines and other best practice provided by Data Engineering teams is of limited value. Especially in small teams where it’s likely each dataset has only one owner.

Only when datasets have been in production for a little while should SQL pipelines be handed over to Data Engineering for “gold plating” and integration with source control systems, and other best practice.

Over time too, Data analysts should work with the business to provide self-serve datasets, exposed via Tableau, to help take the pressure off.

Beyond this, common rules of engagement between teams typically revolves around:

Agreements on how data quality and governance are maintained when analysts curate datasets and where the responsibility for data quality ultimately sits.
Guidelines for when to involve data engineers, such as after prototypes stabilise.
Use of shared platforms to ensure analysts aren’t creating disconnected pipelines.

Conclusion: Embracing Flexibility in Data Roles for Small Teams

The evolving nature of data roles, whether for data engineers, analytical engineers, or analysts, gives smaller teams challenges and opportunities. Large organisations can afford rigid boundaries between these roles, but for smaller teams, success often lies in flexibility and collaboration.

By understanding the distinctions and strengths in these roles, smaller teams can leverage the strengths of their analysts and engineers to maximize business impact. The concept of the “Engineering Analyst” offers smaller teams a pragmatic way to close the gap between raw data pipelines and meaningful, actionable insights without recruiting specialist roles. It allows analysts to take on more responsibility for data transformation and modelling, and can reduce bottlenecks, and ensure data is readily available for decision-making.

However, this flexibility must come with clear rules of engagement. By implementing structured workflow and encouraging collaboration between technical and business teams, smaller organisations can avoid pitfalls like fragmented pipelines or poor data quality.

an aside…the Medallion Architecture

I mentioned this in passing in the article and so let’s touch on it in more detail. A medallion architecture is a data design pattern used to logically organise data within a warehouse or lakehouse. The architecture is designed to improve the quality of data incrementally as it moves through its various layers. The architecture comprises three layers: bronze, silver, and gold. Each layer represents a progressively higher level of data quality. The medallion architecture is sometimes referred to as a "multi-hop" architecture.

Understanding the layers of the Medallion Architecture

Bronze Layer: This layer serves as the entry point for data. Data is ingested in its raw format without undergoing any processing or transformation. The bronze layer retains the complete history of each dataset in its raw form, enabling the recreation of any previous state of the data system.

Silver Layer: In this layer, the raw data from the bronze layer is subjected to cleaning, filtering, and transformation to enhance its usability. It is important to note that transformations at this stage are typically light modifications, focusing on data cleansing and preparation rather than complex aggregations or enrichments. The silver layer represents a more refined version of the data that can be relied upon for downstream analytics.

Gold Layer: This layer represents the final and most refined stage of data in the medallion architecture. Data in the gold layer has been transformed into a state suitable for consumption by various downstream teams and applications, including analytics, data science, and machine learning operations. It is important to note that while all layers in the warehouse serve a purpose, the gold layer signifies data that has been processed and transformed into valuable insights.

Love Data, Love Growth

Discussion about this post