6 questions to ask before implementing Data Mesh
Thinking of Data Mesh?
For those not familiar with Data Mesh, best described here, it prescribes a decentralised model of data ownership into business-defined domains and provides a series of principles that enables data product outputs to be defined and agreed between them in order to meet the needs of the business.
This devolution, supported by an underlying shared ‘self-service’ data platform is at the heart of ensuring local (domain-centric) agility can be maintained long term.
There’s no doubt that this would represent a significant shift in most organisation’s data strategies, at least it would have done 2 years ago but for many this represents an apparently faster means to better put data further at the heart of their organisation without the pain of managing central services.
However, the Data Mesh is not without its own implications and costs that need to be managed carefully, the technologies that support to first 3 principles as described here are relatively mature so the main challenges an organisation will likely face when following this approach lie in governance and organisation change. To that end I propose 6 key questions to consider in advance of setting out on a Data Mesh journey.
While the answers, one way or another are not meant to challenge an organisation thinking of proceeding, hopefully one or two might prove useful in helping teams in planning ahead to deal with some of the complexity.
1. Scale and Complexity — Is your organisation complicated enough ?
Starting with the first and most important question, the one of complexity. There is ultimately nothing wrong with a centralised approach for the right sized and shaped organisation, and for those organisations whose data landscape is at the lower end of the complexity scale the burden of operating a mesh approach to data may be materially greater than managing some concurrency in a central team.
This comes down into assessing the breadth of data (scale and variety of inputs, and generated data sets) and the depth of data use cases. The monolithic approach is challenged when teams try and do too much, with too much, and it is this which adds weight to the Data Mesh argument.
NB: Others, such as Barr Moses here propose other factors and an approach to scoring.
Ultimately only those within any organisation can assert that its complexity is sufficient for Data Mesh to be an appropriate response, for many this will be based on ‘ knowing it when you see it’ for others more analysis may be required to define the case.
2. Does the shape of the business look like an ‘efficient data mesh’ right now?
In Data Mesh the domains must align with the shape of the business now or in the future in order to function as required. Any misalignment will challenge both acceptance and operation of the mesh as well as add complexity to its governance and operation.
So this raises the questions … what are the natural lines of division between business domains where such expertise and understanding could be federated in this manner, do they divide enough to create a natural mesh or does the business structure fit a more centralised approach ?
It’s reasonable to assume that organisations of suitable complexity probably do consist of many natural business domains of an appropriate size but for some, years of consolidation, restructuring and centralisation may mean that data flows between them are inefficient and any domains built up around them will only suffer from the same complexity and cost.
It’s also likely that in many cases organisations may have naturally shaped themselves around exploiting the current IT estate and as such centralised around monoliths that are on the cards to be broken down in a new way.
This raises the necessity to consider not just the technical change when approaching the mesh but the need to reflect on domain data flows and any business level changes that may be needed to operate effectively.
3. Does your organisation have a handle on good data governance ?
Data Mesh requires a paradigm shift in governance towards a federated model where domains are trusted to operate in a devolved model, supported by automated governance tooling. For many this means an evolution from a mature but likely centralised decision making community to a federated one over time however for those organisations which have yet to achieve a level of maturity in data governance the effort of moving to a well functioning mesh should not be underestimated as it can only come from a combination of awareness & understanding, a well defined approach, cross organisational trust and devolved empowerment.
The technologies that fully underpin the ‘federated computational governance’ principle of the data mesh at an appropriate scale are not (in my opinion, please comment if you disagree as I’d love to hear how others are achieving this) yet prevalent enough or in some cases even in existence to prevent this from being a people heavy requirement in the short to medium term.
This is especially the case in organisations which fall under complex regulatory or legal regimes where effective compliance across domains, that is subsequently demonstrable will necessitate this strong focus even further.
4. Is the organisation’s likely Data Mesh approach flexible and agile enough itself ?
Are there any short to medium term imperatives on the horizon that may affect the approach, or even the end point of the plan ? Will they cause the focus to shift before the vision can be realised ?
The Data Mesh is not a vision that can be realised in a short time window (assuming the organisation is complex enough that is …) thus, as a medium to long term journey it requires ongoing focus and commitment. Even then it’s alignment with the business must be facilitated long term.
Many new initiatives, legislative changes, material business structural changes etc can send shockwaves through any organisation and reacting to these in short order is imperative to ensure the vision is realised and to maintain an effective Data Mesh over time. This may require for instance, restructuring or creating new domain, refactoring data products or agreeing new data contracts in short order.
5. How far away from ‘mesh like’ is your current data and technical architecture ?
In essence this is a reflection not just on how far the journey will be from the present to the mesh by reflecting on existing data silos and the necessity for data to be migrated, partitioned differently and secured in line with the domains, but also on the potential incremental approach to implementation by marrying up natural fault lines with current / immediate business need.
This sweet spot between accessible data and customer ask could pave the way for early incremental benefit to be delivered and for the approach to become better embedded with those making investment decisions.
More broadly this assessment should also consider the level of refactoring and the options to partition the current estate as well as any implications of ‘legacy’ tech that might affect the roadmap priorities.
NB: Dealing with legacy is probably the subject of a whole article in itself, which I’ll hopefully get to soon if there’s any interest as in my opinion this approach requires an early focus on the physical design for data partitioning.
Finally, this is also an opportunity to review the organisation’s technical strategy in order to begin the process of embedding Data Mesh supporting technologies where necessary to make their multi-domain adoption through the shared technology platform more straightforward.
6. How “data mesh” do you really want to be, over what timescale ?
By far the biggest question here, but left until last as it’s only by factoring in the answers to the questions above that it feels reasonable to be begin considering the best approach for the organisation.
In short, how much of the technical landscape will transition to the mesh over what timescale and in which order ?
Does the mesh extend to wider functions for instance such as HR and Finance or just to the business’ core operational activities ? Does the mesh operate globally or by region ? Does the mesh extend between a conglomerate’s sub-organisations ?
This destination only represents half of the answer, the approach the other.
Logically speaking, a big bang data mesh approach is likely to be a non-starter for all but the newest start-ups with ambitions of scale and most are likely to take an approach which centres around incremental domain adoption, independently or in mutually beneficial groups (as cross domain interactions may achieve the bigger wins for many organisations).
Alternatively, here, Sven Balnojan discriminates between distributed ownership of raw data and that of transformed data, which potentially supports an ‘analytics first’ approach, as the basis for transitioning operational services once the domains are established.
Finally, there is the necessity to consider how an abrupt change, leading to a pause in the roadmap would affect the business. Does the destination have to be reached to avoid leaving a challenging set of technical debt behind? If so, is there another approach or focus that would minimise this if something happened that prevented the ‘full’ mesh from being realised? Can each roadmap step be a valid endpoint in itself?
Conclusion
I hope the above offers some food for thought. Data mesh represents an opportunity for many organisations to achieve a vision for data that has become unrealistic following a centralised approach in the right circumstances and with the right levels of ongoing focus, governance and agility.
For many I hope an early focus on appropriateness (complexity and shape of business), governance, agility, brownfield complexity and scale of ambition can help unlock this outcome.