Data readiness for AI: Driving business value
AI is powered by data, however this data must be of high quality to deliver the business value that organisations expect from AI.
According to Gartner, 52% of AI projects never make it to production, with 39% indicating that data-related issues are a major barrier to AI adoption, making appropriately high quality, well curated data the top priority for driving business value.
Therefore, as there is now more access to AI technologies due to the increased availability of prebuilt AI models, organisations are shifting their focus towards ensuring their data is AI ready. However, there are many data challenges facing organisations:

- Data readiness: The availability of data or its readiness for AI can only be assessed based on how the data will be used for that specific AI use case. For example, building a predictive maintenance algorithm or applying generative AI (GenAI) to enterprise data require very different sets of data.
- Data quality: Traditional high-quality data does not equate to AI-ready data. When using data in the context of analytics, for example, it is expected to remove the outliers or cleanse the data to support the expectations of the humans. Yet, when training an algorithm, the algorithm will need representative data which may include poor-quality data.
- Rigid data governance: The fast-paced nature of AI requires an even more sophisticated approach to governance. Models need to be retrained and updated with new data in some cases, with fine-tuning or reinforcement that can target specific points rather than full retraining in other instances. Traditional governance such as periodic auditing, rigid rules or manual data processes are particularly inadequate for modern AI systems and can struggle to keep up with its speed and agility. In this LLM era, guardrails are being deployed as part of the AI operating model. These measures are critical for maintaining the integrity, safety, security and trustworthiness of AI systems. The incorporation of these guardrails can minimise the risks associated with automatic content generation and ensure that AI actions are responsible and aligned with ethical standards.
- Unstructured data: Traditional data management approaches are often designed for structured data and predefined schemas. AI, on the other hand, frequently deals with unstructured data like text, images and audio, requiring more flexible and adaptable systems.
- Fragmented technology stacks: Separate tools and systems for data storage, processing, and model development, creates silos and hinders the collaboration and scalability needed for effective data management and AI at scale. This fragmentation makes it difficult to unlock insights from data and deploy AI solutions effectively across the organisation.
Modern data management
While traditional data management approaches have their strengths, they may not be suitable for the unique demands of AI. Organisations need to adopt modern data management strategies that are flexible, scalable, and capable of handling the diverse data types and ethical considerations needed to fully leverage the potential of AI. Proving that your data is AI-ready is based on:



1. Align data with the use case
Every AI use case must state the data it needs for production, which depends on the AI technique being used. The following parameters will help ensure your data meets the expectations of its AI use case:
- AI techniques: Each technique has a specific requirement for its data. For examples, generative AI data will set very different data requirements compared with a simulation model.
- Quantification & suitability: There must be enough data available for its specific AI use case, for example a high quantity of training data across multiple years if there is a seasonality pattern. Synthetic data can be considered as part of a remediation plan to augment existing data if required.
- Quality: While there might be a sufficient volume of data, any data must meet the specific quality requirements of its AI use case.
- Enrichment: Annotating and labelling data can help increase the accuracy of the AI model. This is central to supporting fine-tuning and retrieval-augmented generation (RAG) models in generative AI.

- Tracking: The use of reliable data sources and tracking data lineage is crucial for building trust in your AI models. Transparency with your data sources allows you to identify any potential biases or errors. By understanding the journey of your data from its source to its use in the AI models, trust can be built into the model's outputs and make informed decisions. Additionally, tracking data lineage can help compliance with data privacy regulations and ensure that your AI models are responsible.
- Diversity: A wide range of factors such as demographics, geographic location, and cultural background can help build a diverse dataset. This helps AI models learn more robust patterns and make more accurate predictions, while helping mitigate the risk of perpetuating biases based inon training data. It is helpful to ensure the data is representative of the real-world scenarios that the AI model will encounter.
2. Continuously qualify the data use to meet confidence requirements
Ensuring your data is AI-ready is not a one-off exercise, it is a continuous task of qualifying the data used for training, developing and running AI models in operations. MLOps principles can help establish a robust framework for this continuous qualification by addressing key areas:
- Validation & verification: Data should be regularly checked against the requirements of its associated AI use case, such as data quality, schema validation and evaluating the data against the real-world scenarios it represents. MLOps practices can help automate these checks and integrate them into your delivery pipelines.
- Performance maintenance: AI systems should continuously meet operational requirements such as response time, uptime and cost-efficiency. MLOps tools can monitor performance and trigger alerts if key metrics deviate from expected thresholds.
- Version control: Keeping track of different versions of your data, models, code, and configurations allows you to roll back to previous versions if needed to help with auditing and reproducibility. MLOps platforms often provide built-in version control for all components of your AI system.

- Continuous testing: Robust testing strategies, including regression testing, can catch issues like model drift and performance degradation. MLOps can automate these tests and provide insights into model behaviour over time.
- System Health: Observability metrics and monitoring tools can track the health of your AI systems. This includes data quality metrics, model performance metrics, and infrastructure health. MLOps dashboards can visualise these metrics and provide insights into potential issues.
3. Govern the data in the context of its use case
Data governance for AI is not a one-size-fits-all proposition. It requires a contextual approach that adapts to the unique needs and risks of each AI use case. This means:
- Tailoring policies: Data governance policies should be dynamic, enabling them to be appropriate for each AI application. Consider the sensitivity of the data, the potential impact of the AI system, the validity of the data and the ethical implications, without stifling innovation and the benefits to employees and customers.
- Considering ethics: Fairness, accountability, and transparency must always be embedded into your data governance framework. Ensure your AI systems are used responsibly, do not perpetuate harmful biases, can be contested and be explained.

- Adhering to regulations: AI standards, such as the AI EU Act, are being implemented. These new regulations will add to existing regulatory and compliance requirements (such as the General Data Protection Regulation [GDPR]) on data. The overall impact of the various regulations will vary along with the use cases.
- Controlling inference and derivation: In many AI systems, one model can be used as the input to another. Enforcing governance requirements to track inference and derivation are important considerations, which is often overlooked but constitutes a real risk as it amplifies the black box effect of a model.
- Mitigating data bias: An AI-specific challenge as the bias in the data will be reflected in the model. For example, training data including only a single gender, may lead to biased results when the model processes data with a different gender. Data should be governed to proactively anticipate and mitigate data bias, including adversarial datasets to test models for bias.
By adopting a contextual approach to data governance, you can ensure that your AI initiatives are built on a foundation of trust, responsibility, and ethical data practices. This will help maximise the value of AI while minimising potential risks.
Modern data infrastructure
While the right data is crucial for AI, it is equally important to have the right infrastructure in place to manage and process that data effectively. For many organisations, data analytics and AI workloads have been executed on fragmented technology stacks, with separate tools and systems for data storage, processing, model development and model management. This fragmentation hinders the ability to capture the “value of repeatability” when developing AI products. This fragmented approach puts a ceiling on the level of business benefit that can be gained from the effort of creating AI solutions through inefficiency.
"Up to 80% of a data analyst's time is spent discovering and preparing data due to siloed data and poor data quality."
Recognising this challenge, hyperscalers such as Microsoft, Amazon, and Google are investing heavily in modern data analytics estates that provide unified and integrated platforms for managing the entire data lifecycle.
These data platforms offer some key advantages:




A good example of this trend is Microsoft Fabric. This end-to-end analytics platform provides a unified environment for data engineering, data integration, data warehousing, data science, and real-time analytics. By consolidating these capabilities, Microsoft Fabric helps organisations break down data silos, improve collaboration, and accelerate AI initiatives efficiently.
By leveraging modern data infrastructure like Microsoft Fabric, organisations can streamline their data operations, improve data accessibility, and accelerate their AI journey. This empowers them to extract maximum value from their data and drive innovation across the business.
Through implementation of the practical solutions above to your data, the potential of AI can be fully realised in your organisation and deliver the business value you expect.
If you would like to learn more about overcoming your data challenges, check out our upcoming events.
Authors