As you read this, hundreds of thousands of megabytes of new data have already been generated — and that’s a conservative calculation. Considering that every person was capable of generating 1.7 megabytes of data in 2020 (and that the big data market is expected to grow to be worth a reported $40.6 billion by 2023), it can be hard to wrap our brains around how much data is produced daily by each device we come into contact with, but “colossal” seems like a good word to describe the volume. For enterprises, this availability of massive amounts of quantifiable data at once creates a tremendous opportunity and a challenge: how can enterprises manage the sheer volume of information accurately?
One way businesses are helping to streamline the data they’re monitoring is through the process of data observability, which means keeping a close eye on the health, quality, and integrity of the data being collected. Companies adopting new methods of data observability on their way to storing data in the cloud may come across some common challenges: data inaccuracies or periods of missing or partial data delivery.
As more companies make key financial and customer decisions based on data input, data observability is like having a premium insurance plan to protect that data’s accuracy and timeliness. Maven Wave recently embarked on a data-focused project with an enterprise client, the results of which are presented here as a mini case study to show how data observability kinks can be easily overcome with the right plan.
When the client reached out to Maven Wave, they had several processes that moved data from an on-prem, legacy, relational database to a cloud-based data warehouse. The process, while mostly seamless, had one anomaly arise that required extra attention.
At one point in the data movement process, a failure occurred where one of those processes wasn’t loading any data at all, but the conditions around that process were just right — so much so that their existing alerts and monitoring mechanisms didn’t notice that anything was amiss. Rather than checking to see if any records were flowing, the process only monitored whether or not it could perform its requests properly. Therefore, when the data failed to load, none of the processes were set up to flag that inconsistency.
The client and Maven Wave designed a Proof of Concept (PoC) to address the situation by observing the data once it had landed (in addition to maintaining the established processes that monitored other aspects of the data delivery pipeline).
Through the initial PoC — built on Google Cloud Platform — Maven Wave’s data engineering team designed a process to assess if a target data source was healthy or not healthy. While this PoC was built on Google Cloud, the idea is applicable across all cloud platforms and could benefit any organization with processes that move data around.
The process Maven Wave built was able to address the challenge of gaps in data observability coverage.
During the first phase of this PoC, Maven Wave’s data observability PoC delivered tremendous benefits to the enterprise, which included:
- Surfacing valuable data and insights for the enterprise
- Providing cost-efficiency (and adding significant value)
- Negating the need for time-consuming refactors, tests, and reviews because this process was implemented outside of pipeline logic
For Phase Two, Maven Wave was tasked with determining how feasible it would be to implement this process on every other data source the client was ingesting. Maven Wave then developed a process to figure out whether a simple viability assessment would do the trick. For a given data source, or table, to work with this process as-is, it needs to have at least one date/time field that’s specific to the data to measure recency — and, preferably, another that’s specific to the ingestion process (appended at ingestion time).
While this PoC example was initially built with just one data source and platform in mind, it’s nearly universally applicable. Maven Wave is well suited to adapt this example to other use cases and customize data observability solutions to maintain a pipeline of healthy, insightful data, ensuring an organization’s data ingestion processes are optimized.
Ultimately, this partnership and solution is a key example of how data observability is an important piece of the overall data collection journey. With the checks and balances that a data observability plan provides, data can ultimately be used for its intended purpose: driving informed business decisions and ensuring minimal downtime while gaining a competitive edge. In today’s data environment, there’s nothing more critical.
To learn more about data observability and find out more details about Maven Wave’s recent enterprise PoC (Proof of Concept), download Maven Wave’s eBook, “Data Observability: The Heartbeat of Healthy Data”.
Get the latest industry news and insights delivered straight to your inbox.