The Missing Intermediary Data Layer

Before you were drowning in buzzwords like data hubs, lakes, fabric and mesh, hands-on analysts created their own “reusable” data sources. For me a decade ago, a way to build these datasets was to repurpose SQL queries that I could slightly “tweak” for various analytics projects. When I got into network optimization modeling, I would save datasets that I had to custom-create (like transportation costs per mile and factoring stops) to reuse for other types of analysis. The gist is, there has always been a need for an intermediate layer of meaningful data sets.

But streamlining the process of building these intermediate reusable data assets and monetizing those assets is a new concept for many.

But let us keep the monetization aspect aside and focus on reusability here. The challenge of figuring out the best way to share data across the enterprise has pestered us for decades. We keep trying new things, including the most recent sociotechnical architecture of data mesh.

But at the core of all this is the need to leverage data for insights. Finance would not need transactional data for barcode movements in a warehouse. They would love to have the same data in a form that can help them draw insights. For example, operations finance analysts will be very interested in units per man hours in a specific warehouse. This metric can be calculated from transactions but not in a straightforward way.

I remember writing a VBA script a decade plus ago (open source was not a craze back then) to “translate” data from ERP into a weekly warehouse metrics reporting dashboard. All in Excel. So much data was getting pumped in some workbook tabs that it crashed almost weekly. Every Monday morning, I ran the script early, expecting the workbook to crash multiple times. I always wondered how easy life would be if there were datasets that have already “massaged” SAP data that I could leverage for the Excel-based dashboard.

Interestingly, while technology has evolved, and dashboards can be built easily with no-code dashboarding solutions, the core challenge is still the same- managing the data that feeds into these dashboards.

While these dashboards do not crash like my Excel report used to, analysts still do a lot of data wrangling these days to make the best use of these tools. So from analytics and data science perspective, the core challenge still persists.

In Machine Learning, we introduced ML engineers to handle this issue (not precisely the intermediate data layer issue). “Lovingly” called the plumbers of data science, this breed of helpful folks makes a significant contribution. They help take all the mess of transactional data and flow it to analysts and scientists, improving data quality, structure, and meaningfulness in the process. But with due respect to all the dirty work they do, the mere fact that we need more and more ML engineers every passing day, validates that we have not yet addressed the core issue.

In very simple terms, we need an intermediary layer of data sources developed from corresponding systems that can be used cross-functionally for collaboration and analytics. And this layer needs to be strategically placed in your data architecture.

I see companies offering products with labels like data hubs, fabrics, and mesh. Many of these do touch upon the collaboration and “exchangeable” datasets. But take an honest look at how organizations are leveraging these products.

Going back and forth between decentralization and centralization, organizations repeatedly use different products to build the same thing. At the core of our problem is not technology but the lack of a strategic way to leverage the technology.

I see organizations that use architectures like data mesh, meant for decentralized exchange and collaboration, to pool data in one of a few locations to be reused. This differs from the decades-old approach of building data warehouses to pool transactional data. They have their hands on a sociotechnical architecture that they can tweak to fit their needs. Yet, they leverage it to recreate the old way.

In my next article on this topic (later this week), I will share how data mesh concept can be customized to build data architectures that address your unique business nuances.


One response to “The Missing Intermediary Data Layer”

  1. The Data Mesh Challenge of Building Enterprise LLM Models – Designed Analytics BLOG Avatar

    […] to build customized architectures based on requirements. A couple of weeks ago, I suggested modifying the data mesh to develop a better framework and was going to write about it. However, as I was thinking about implementation bottlenecks of a […]

    Like

Leave a reply to The Data Mesh Challenge of Building Enterprise LLM Models – Designed Analytics BLOG Cancel reply