THANK YOU FOR SUBSCRIBING

Data Reliability Engineering- Tackling The Data Quality Problem
By Torq Pagdin, Director, Technology (Data Engineering), Hotels.Com


Torq Pagdin, Director, Technology (Data Engineering), Hotels.Com
It is no longer acceptable to have ’mostly’ usefuld data; even the smallest amount of bad data can cause inaccuracies in predictive analytics.
As data engineers, we bear the brunt of any criticism and rightly so—data scientists often bemoan the fact that much of their time is spent cleaning up data rather than producing the models they are trained to do. We are the first part of a long chain and the world of data engineering has to embrace this responsibility.
Most failures seem to go like this:
• Production Support is alerted to a failure in the middle of the night
• They apply a ’Band-aid’ fix to get the application running again
• The next day they inform the dev team who own the code to assess options
• The dev team then plan the reprocessing of bad data to stop users from exploding
• A permanent fix is suggested, estimated, and then put on the backlog (often never to be seen again)
Another problem that arises with bad data quality is that feature development teams are often subjected to spend multiple days within a sprint, trying to get to the bottom of failures.
This means that published roadmap items get pushed further and further back, making the teams less efficient and causing frustration or mistrust among the stakeholders.
So, what can we do about it?
Step forward the Data Reliability Engineering team!
Data Reliability Engineering (DRE) is what you get when you treat data operations as a software engineering problem. Using the philosophy of DRE, Data Reliability Engineers are 20 per cent operators and 80 per cent developers, and they sit outside, independent of the feature teams.
This is not about being a production support team, but about being a talented and experienced development team that specialises in data pipelines across multiple technical disciplines.
The 6-step mission of DRE is:
1. To apply engineering practices to identify and correct data pipeline failures
2. To use specialist knowledge to analyse pipelines for weaknesses and potential failure points, and fixing them
3. To determine better ways of coping with failures, along with increasing automation of reprocessing functionality
4. To work with pipeline developers to advise on potential DQ issues with new designs
5. Utilize and contribute to Open Source DQ Software products
6. Improve the ‘first to know rate’ for DQ issues
So, the DRE team own the failure, the fix, and the message out to users. They can call in feature team developer help if specialist knowledge is required but aim to handle in-house as much as possible, thus freeing feature teams to continue with their roadmap.
OK great...but does that mean the feature teams throw Data Quality responsibilities over the fence to DRE? Certainly not! Each team still has a responsibility for their pipeline and DQ should be a core element of the architecture and design. The DRE teamwork with both feature development and Product teams to make sure that DQ is included in designs and estimates. They are also part of the sign off process for QA/ UAT—no DRE sign off means no move to Production.
So, is DRE the complete solution to all Data Quality problems? Unfortunately not—bad data issues will always occur as edge cases for data, in particular, are so hard to predict. However, having a dedicated engineering team for DQ shines a light on issues and provides transparency to stakeholders and data consumers, building trust among data engineers, scientists, and analysts who depend on the accuracy of their data.
So, what can we do about it?
Step forward the Data Reliability Engineering team!
Data Reliability Engineering (DRE) is what you get when you treat data operations as a software engineering problem. Using the philosophy of DRE, Data Reliability Engineers are 20 per cent operators and 80 per cent developers, and they sit outside, independent of the feature teams.
This is not about being a production support team, but about being a talented and experienced development team that specialises in data pipelines across multiple technical disciplines.
The 6-step mission of DRE is:
1. To apply engineering practices to identify and correct data pipeline failures
2. To use specialist knowledge to analyse pipelines for weaknesses and potential failure points, and fixing them
3. To determine better ways of coping with failures, along with increasing automation of reprocessing functionality
4. To work with pipeline developers to advise on potential DQ issues with new designs
5. Utilize and contribute to Open Source DQ Software products
6. Improve the ‘first to know rate’ for DQ issues
So, the DRE team own the failure, the fix, and the message out to users. They can call in feature team developer help if specialist knowledge is required but aim to handle in-house as much as possible, thus freeing feature teams to continue with their roadmap.
OK great...but does that mean the feature teams throw Data Quality responsibilities over the fence to DRE? Certainly not! Each team still has a responsibility for their pipeline and DQ should be a core element of the architecture and design. The DRE teamwork with both feature development and Product teams to make sure that DQ is included in designs and estimates. They are also part of the sign off process for QA/ UAT—no DRE sign off means no move to Production.
So, is DRE the complete solution to all Data Quality problems? Unfortunately not—bad data issues will always occur as edge cases for data, in particular, are so hard to predict. However, having a dedicated engineering team for DQ shines a light on issues and provides transparency to stakeholders and data consumers, building trust among data engineers, scientists, and analysts who depend on the accuracy of their data.
Weekly Brief
Read Also
2021 - Are You Ready for the Future?
Sebastian Fuchs, Managing Director Manheim and RMS Continental Europe, Cox Automotive
Follow the Money as Roadmap for Data Analytics
Hiek van der Scheer, Chief Analytics Officer, Aegon
How CERN has embraced and navigated the recruitment software maze
Anna Cook, Deputy Group Leader – Talent Acquisition, CERN [NASDAQ: CERN]
Key to AN Effective RCM: Collaborate with Payers
Sheila Augustine, Director of Patient Financial Services, Nebraska Medicine
Vulnerability Management- Thinking Beyond Patching and Software Vulnerabilities
Brad Waisanen, Vice President, Information Security at TTI
Rethinking Change Management
Viviane Minden, MBA, Change Management & Communications Head, Enterprise Operations Simplification, Novartis [SWX: NOVN]

I agree We use cookies on this website to enhance your user experience. By clicking any link on this page you are giving your consent for us to set cookies. More info