Four moves to create data strategy competitive advantage

The value of data lies in combining and analyzing a proprietary strategic core with models that predict outcomes and prescribe responses, argues Misiek Piskorski.

Ask any executive what “data strategy” requires and you will get a predictable answer. Breaking down silos. Cleaning up legacy databases. Establishing governance. Appointing a chief data officer. Ensuring compliance. None of this is wrong, but these are maintenance activities. They keep the lights on. They do not create competitive advantage.

The organizations pulling ahead in AI implementation are playing an entirely different game. They are assembling assets that their competitors cannot replicate; a proprietary corpus of organizational knowledge, pipelines that blend internal data with external sources in novel combinations, and AI agents that process it at speeds that were unthinkable two years ago. Together, these three moves build what I call the strategic data core. Deploy models on top of it, capture where they fail, and feed the errors back in, and that’s your fourth data source that you use to improve the models. With all four, you have a compounding advantage that widens over time.

Move 1: Build your corpus. You already have the data – you just cannot find it

Most organizations have no idea what they collectively know. Every company generates enormous volumes of unstructured knowledge: strategy documents, client proposals, research reports, internal memos, sales call transcripts, customer service logs, patent filings, regulatory submissions. This material represents decades of accumulated organizational intelligence. In most companies, it sits in disconnected drives, email archives, and the heads of long-tenured employees who may leave tomorrow.

In the age of generative AI, this unstructured knowledge has become the most strategically valuable data an organization possesses. Through retrieval-augmented generation (RAG), an organization can build AI systems that draw on its full accumulated knowledge when answering questions, drafting documents, or supporting decisions. The difference between a generic AI assistant and one that understands your company’s products, clients, competitive position, and institutional memory is the difference between a tool and a colleague.

But dumping documents into a vector database is not enough. The most sophisticated organizations are building knowledge graphs: structured representations of the entities, concepts, and relationships embedded in their unstructured data. A knowledge graph does not just store a client proposal. It maps the relationships between the client, the industry, the products, the competitors, and the regulatory constraints. GraphRAG, which combines knowledge graphs with RAG, is pushing enterprise AI accuracy toward 99% in high-stakes domains, compared to 70–80% for standard RAG.

LinkedIn, for example, built a knowledge graph across its customer service operations and combined it with RAG. The result was a 78% improvement in accuracy and a 29% reduction in median resolution time per issue. The system succeeded not because LinkedIn used a better language model, but because the model could draw on a structured, queryable representation of the company’s own knowledge.

Building a corpus is harder than it sounds, and not for technical reasons. Knowledge is power, and people hoard it. Departments maintain their own repositories. Critical institutional knowledge lives in the heads of senior people who have never been asked to document it. Building a corpus requires executive leadership because it requires changing incentives: mandating that client interactions get transcribed, that project learnings get documented, that institutional knowledge gets captured before people retire or leave.

The companies that do this well will have AI systems that are genuinely differentiated. Not because they use a different model, since everyone has access to the same foundation models, but because their AI draws on a proprietary body of knowledge that no one else possesses.

But once you see what you know, you also see what you are missing.

Abstract technology circle blue wave Flow of particles Big data transfer visualization 3D rendering

“Most companies are trying to build AI on their own data alone. It is not enough.”

Move 2: Build the pipeline. Look outside your walls

Most companies are trying to build AI on their own data alone. It is not enough. The most interesting AI applications emerge at the intersection of internal and external data. A retailer’s sales data becomes dramatically more powerful when combined with weather patterns, foot traffic data, and local event schedules. A pharmaceutical company’s clinical data becomes more predictive when enriched with genomic databases and real-world evidence. An insurer’s claims data becomes a different asset entirely when layered with satellite imagery and IoT sensor feeds. The landscape has changed: data marketplaces like Snowflake, Databricks, and AWS Data Exchange now host thousands of ready-to-use datasets that can be plugged directly into an organization’s analytics environment. These are not obscure technical platforms. They are becoming standard infrastructure for data-forward companies.

Walmart illustrates what becomes possible. The company’s AI systems ingest data from point-of-sale transactions across thousands of stores alongside weather forecasts, local event calendars, social media trends, and supplier lead times. The result is demand forecasting at the store and SKU level that adjusts in real time. When a storm warning is issued in a region, the system anticipates spikes in emergency supplies and triggers replenishment before the storm arrives. The results: a 16% reduction in stockouts, a 10% improvement in inventory turnover, and a 10% reduction in logistics costs. The value came not from collecting more internal data, but from combining data sources that had never been connected before.

The real frontier is data collaboration. Data clean rooms allow two or more organizations to combine and analyze their data without either party seeing the other’s raw data. International Data Corporation (IDC) predicts that, by 2028, 60% of enterprises will collaborate through private exchanges or clean rooms. And clean rooms are getting smarter: AWS recently launched the ability to generate synthetic datasets inside clean rooms, artificial data that preserves the statistical properties of real data without containing any actual records. Organizations can now train AI models without privacy exposure and simulate scenarios they have never encountered.

So, as you think about your data strategy, ask: What data do we lack that, combined with what we already have, would create disproportionate value?

Now that you have your internal corpus and the external pipeline, however, the organization faces an avalanche of data in wildly different formats and quality levels. This is why the third move matters.

The bank deployed AI-driven data quality tools that continuously monitor its financial data estate, automatically setting rules for completeness, accuracy, and consistency across departments.

Move 3: Let agents clean your data

Data professionals spend most of their time cleaning, standardizing, and validating data rather than analyzing it. AI agents are changing this equation. The new generation of AI-powered tools can detect duplicates, identify anomalies, standardize formats, and flag inconsistencies across massive datasets. The real shift is deploying agents that continuously monitor and maintain data quality as an ongoing operation, not a quarterly cleanup.

NatWest Markets offers a compelling example. The bank deployed AI-driven data quality tools that continuously monitor its financial data estate, automatically setting rules for completeness, accuracy, and consistency across departments. Previously, the data management team spent six months or more building pipelines to extract, check, and report on data quality. With the automated system, that setup time dropped to a third, and data quality insights shifted from periodic manual checks to daily, on-demand reporting. As the bank’s program manager put it: “Regulators need to know we understand our data. We needed a mechanism to observe our data in the most efficient way possible.”

This inverts decades of failed data governance. The old approach was top-down: establish standards, create a chief data officer role, hope people comply. The agent-driven approach flips this: you deploy agents that enforce standards automatically, in real time, without slowing anyone down. Agents are what make the first two moves feasible at scale.

The pipeline enriches it with what the organization lacks.

The strategic data core

These three moves build a single asset: the strategic data core. The corpus captures what the organization knows. The pipeline enriches it with what the organization lacks. The agents keep it clean, integrated, and current. Together, they create a unified, continuously maintained body of data that is proprietary, defensible, and ready for AI to use. Most organizations do not have this. They have scattered datasets, departmental repositories, and one-off integrations. A strategic data core is something fundamentally different: a deliberate, curated, organization-wide asset that compounds in value over time.

But the core is not the end. It is the foundation.

A predictive model forecasts which customers will churn, which supply routes will face disruption, or which loan applicants carry hidden risk.

Move 4: Deploy models for predictions and prescriptions

The strategic data core exists to feed models. And models exist to do two things: predict what will happen and prescribe what to do about it. A predictive model forecasts which customers will churn, which supply routes will face disruption, or which loan applicants carry hidden risk. A prescriptive model goes further: it recommends a specific retention offer, reroutes shipments before disruption hits, or suggests adjusted loan terms that balance risk and revenue. The richer and more integrated the data core, the more accurate and actionable these outputs become.

This is where the strategic data core earns its return. Mastercard demonstrates the principle at scale. Its Decision Intelligence Pro system uses a proprietary neural network trained on approximately 125 billion annual transactions. The model assesses relationships between merchants, cardholders, and transaction patterns to both predict which transactions are fraudulent and prescribe whether to approve, decline, or flag each one, all in under 50 milliseconds. The results: a 20% average improvement in fraud detection rates, up to 300% in specific cases, and an 85% reduction in false positives. The system works not because Mastercard uses a more sophisticated algorithm, but because that algorithm draws on rich, continuously updated, deeply integrated data across its entire global network that no manual process could replicate.

But deployment is not the final step. It is where the most important data starts to be generated.

The flywheel: Model error data

Here is the part that almost every organization gets wrong. They deploy a model, measure its aggregate performance, and move on. They forget to capture the most valuable data the organization will ever produce: the specific instances where the model was wrong and why.

When a loan officer overrides a recommendation, that override contains information about what the model missed. When a demand forecast overshoots by 30%, the gap reveals a pattern the model has not learned. When a prescriptive system recommends a pricing change and sales decline instead of rising, that outcome is direct evidence of a flaw in the model’s logic. This is model error data, and it is the fuel of the flywheel. Organizations that systematically capture it can feed corrections back into the model. Each cycle of prediction, error capture, and retraining makes the model more accurate and widens the gap with competitors.

Most companies do not do this. They treat deployment as the end of the project rather than the beginning of a learning cycle. They measure whether the model “works” in aggregate but do not build the infrastructure to capture, analyze, and learn from its specific failures.

Tesla’s “Data Engine” is the most sophisticated example. Every Tesla vehicle runs its self-driving neural network in “shadow mode” even when the human is driving. When the AI’s predicted action differs from the driver’s, that discrepancy is flagged and uploaded. Tesla queries its fleet for similar scenarios, labels the data, retrains the model, and deploys the improved version back to shadow mode. When the system was misidentifying bicycles on car racks as cyclists on the road, this loop collected thousands of correctly labeled examples within days. No competitor can replicate this at Tesla’s scale, and the gap widens with every cycle.

This is what turns the strategic data core into what I call an AI Factory: not a single project, but a continuous, self-improving capability. Build the core. Deploy models. Capture the errors. Feed them back. Repeat.

Organizational theory tells us that when variance is high, and change is faster than planning cycles, adaptation beats optimization every time.

The executive agenda

The question is not whether these moves matter. It is where to start.

If the answers exist inside the company but nobody can find them, build your corpus. Invest in transcription, documentation, retrieval infrastructure, and knowledge graph construction.

If your AI applications feel generic and underpowered, look outward. Explore data marketplaces, identify clean room partners, investigate synthetic data, and map the external data that would most enhance your existing assets.

If every AI project stalls in the data preparation phase, deploy agents. Start with the messiest pipelines and automate the preparation work.

These three moves build the strategic data core. Once it exists, deploy models to generate predictions and prescriptions. And from day one, build the infrastructure to capture model errors. That error data is the fuel for the flywheel. Without it, your AI stays static. With it, your AI improves every time it runs.

The organizations that get this right will not just be better at AI. They will be better at learning. Because in the end, that is what data is: the raw material of organizational intelligence. The question is: Are you building the machine that uses it?

Misiek Piskorski

Professor of Digital Strategy, Analytics and Innovation and Dean of Executive Education

Mikołaj Jan Piskorski, who often goes by the name Misiek, is a Professor of Digital Strategy, Analytics and Innovation and the Dean of Executive Education, responsible for Custom and Open programs at IMD. Professor Piskorski is an expert on digital strategy, platform strategy, and the process of digital business transformation. He is Co-Director of the AI Strategy and Implementation program.

Are you caught in the insider trap?

May 21, 2026 • by Jing Yan in Strategy

Are you caught in the insider trap? Learn to identify signs of leadership stagnation and explore effective ways to reset your approach for sustained success....

We know that digital ecosystems drive value. But most organizations are failing to capture it

May 19, 2026 • by Michael R. Wade, Konstantinos Trantopoulos , Mark Greeven in Strategy

Exclusive data reveals the gap between ambition and execution in a new report from IMD and AVEVA that explores how senior leaders can better create, deliver and capture value. ...

Kati ter Horst on being a first-time CEO when the old rules are gone

May 14, 2026 • by David Bach in Strategy

Outokumpu CEO Kati ter Horst on steel, free trade, Europe’s competitiveness, AI, sustainability, and leading when the old rules are gone....

The first line of defense against cyberattacks? Your own people

May 13, 2026 • by Naomi Haefner in Strategy

Strong cybersecurity culture relies on continuous, bite-sized training and simulated threats to build employee awareness and reduce human risk....

Learn Brain Circuits

Join us for daily exercises focusing on issues from team building to developing an actionable sustainability plan to personal development. Go on - they only take five minutes.

Explore Leadership

What makes a great leader? Do you need charisma? How do you inspire your team? Our experts offer actionable insights through first-person narratives, behind-the-scenes interviews and The Help Desk.

Join Membership

Log in here to join in the conversation with the I by IMD community. Your subscription grants you access to the quarterly magazine plus daily articles, videos, podcasts and learning exercises.

Log in or register to enjoy the full experience

Explore first person business intelligence from top minds curated for a global executive audience

Strategy

Your data strategy won’t create competitive advantage. These four moves will

The value of data lies in combining and analyzing a proprietary strategic core with models that predict outcomes and prescribe responses, argues Misiek Piskorski.

Move 1: Build your corpus. You already have the data – you just cannot find it

Move 2: Build the pipeline. Look outside your walls

Move 3: Let agents clean your data

The strategic data core

Move 4: Deploy models for predictions and prescriptions

The flywheel: Model error data

The executive agenda

Further reading

Authors

Misiek Piskorski

Related

Are you caught in the insider trap?

We know that digital ecosystems drive value. But most organizations are failing to capture it

Kati ter Horst on being a first-time CEO when the old rules are gone

The first line of defense against cyberattacks? Your own people

Learn Brain Circuits

Explore Leadership

Join Membership

Log in or register to enjoy the full experience

Share

Strategy

Your data strategy won’t create competitive advantage. These four moves will

The value of data lies in combining and analyzing a proprietary strategic core with models that predict outcomes and prescribe responses, argues Misiek Piskorski.

Move 1: Build your corpus. You already have the data – you just cannot find it

Move 2: Build the pipeline. Look outside your walls

Move 3: Let agents clean your data

The strategic data core

Move 4: Deploy models for predictions and prescriptions

The flywheel: Model error data

The executive agenda

Further reading

Authors

Misiek Piskorski

Related

Are you caught in the insider trap?

We know that digital ecosystems drive value. But most organizations are failing to capture it

Kati ter Horst on being a first-time CEO when the old rules are gone

The first line of defense against cyberattacks? Your own people

Learn Brain Circuits

Explore Leadership

Join Membership

Log in or register to enjoy the full experience