
Closing the AI gender gap
IMD, Media Trust, and Code For Good Now are building a coalition to shape how AI is developed, deployed, and governed, so it becomes a force for inclusive growth, responsible innovation,...

by Mariia Bulycheva, Karl Schmedders Published June 11, 2026 in Artificial Intelligence • 11 min read • Audio available
In boardrooms and across product roadmaps, a single question now dominates: “Could an AI agent handle this for us?” From customer service to compliance and from HR to pricing, large language models (LLMs) and “agents” are being wired into virtually every business process. Last year, Gartner predicted that by 2028, a third of enterprise software applications will embed agentic AI. At least 15% of day-to-day work decisions will be made autonomously (up from close to zero in 2024).
The investment push is visible in sector data: EY’s May 2025 Technology Pulse Poll found that 92% of technology leaders expected to increase AI spending over the next 12 months, and among those planning budget increases, 43% said that more than half of their current AI spending was already going to agentic technologies. Yet the optimism is tempered by caution. A June 2025 Gartner report estimated that over 40% of agentic AI projects will be canceled by 2027 because of escalating costs and uncertain business impact.
The reality behind these numbers is more nuanced than the hype suggests. Agentic systems can deliver real gains, particularly in tightly defined use cases. But many organizations discover that what is easy to prototype is much harder to scale, govern, and maintain in production. The challenge is not just whether an agent can complete a task, but whether the surrounding system remains reliable, transparent, and sustainable. In many companies, the strongest results come from carefully designed workflows in which LLMs are one component among several and not from handing entire processes to autonomous agents.
Here, we argue for a grounded approach: think workflows first, agents second. For finance leaders, especially in data-sensitive sectors such as investment banking and legal services, this shift can drastically improve cost control, transparency, and data protection.
If a conventional workflow is a recipe, an agent is a chef improvising in the kitchen. That flexibility makes accountability difficult.
The enthusiasm for agentic AI is understandable. When it works, an AI agent looks like a tireless junior colleague: you state a goal (“summarize this deal pipeline and draft talking points for tomorrow’s client meeting”), and the system decomposes the task into steps, calls tools and APIs, drafts content, and refines it based on feedback.
Vendors promise end-to-end automation: sales agents that orchestrate customer relationship management (CRM), email, pricing, and contract systems; finance agents that reconcile accounts, sort and process invoices, and highlight where spending differs from the plan; and legal agents that read thousands of pages of documents and flag risk.
For CFOs and general counsels, this sounds like a productivity breakthrough. It also aligns with a wider narrative: AI as a co-pilot that gradually takes on more complex decisions across the enterprise. But beneath the surface, three structural problems are emerging:
Agentic systems link dozens of model calls and interactions with tools, often in ways hard to reconstruct. If a conventional workflow is a recipe, an agent is a chef improvising in the kitchen. That flexibility is attractive, but it makes accountability difficult: Why did the system pull these particular documents, but not others? Which intermediate answer caused the final recommendation to be wrong? How do we audit the path when something goes wrong for a client or regulator?
These are not merely theoretical concerns. In 2025, a healthtech firm disclosed a breach affecting more than 483,000 patients after a semi-autonomous AI agent pushed confidential data into unsecured workflows. According to McKinsey, 80% of organizations report encountering risky behaviors by AI agents, including improper data exposure and unauthorized access to systems. McKinsey offers the following example: a credit data processing agent misclassifies short-term debt as income, and that error then cascades into downstream scoring and loan approval decisions. LLM hallucinations compound this. As José Parra Moyano has argued, hallucination is not a removable “bug” but a structural property of LLMs. Enterprises must assume it will always be present and design governance accordingly. In long agentic chains, even a small hallucination early in the process can propagate and be amplified, making post-hoc explanations even harder.
In long agentic chains, even a small hallucination early in the process can propagate and be amplified.
Agent frameworks often encourage iterative exploration: the agent tries one tool, evaluates the result, calls another, revisits the first, and so on. Each step involves another paid API call and more computing in the background, which incurs additional costs.
A developer may find this elegant, but it can be alarming for a CFO. A single user request may trigger hundreds of expensive model calls. When multiplied across thousands of users, the cost curve quickly becomes non-linear. Many organizations only realize this when the first AI invoices turn out to be much higher than expected.
Flexera’s 2025 State of the Cloud report finds that 84% of organizations see managing cloud spend and its unexpected fluctuations as their top cloud challenge. This problem is getting worse as AI workloads drive an increasing number of paid model calls. Research by Andreessen Horowitz, among others, shows that AI infrastructure and model API calls are materially more expensive than typical SaaS workloads, reinforcing the idea that unconstrained usage can blow up budgets.
Fully autonomous agents often require broad access: they must cross multiple permission boundaries within the enterprise environment (see Chart 1). In practice, that can mean access not only to internal systems and data sources but also to external APIs and execution layers that are not fully under the firm’s direct control.
For sectors such as banking, insurance, and legal services, where client confidentiality and regulatory constraints are paramount, this is a serious challenge. It is no coincidence that banks and law firms are exploring private and on-premise LLM deployments to keep sensitive data within their perimeter.

The goal is not to replace workflows with agents, but to let agents and LLMs operate inside well-defined flows.
How do you tap into LLMs’ power without losing control over costs, transparency, or data? The real design challenge is not whether agents can do more, but how to give them enough autonomy to be useful without giving up control. That starts with being clear about the difference between an agent and a workflow.
An AI agent is a system that takes a high-level goal (“prepare a client briefing”) and decides on its own which tools to use, in what order, and when to stop. A workflow, by contrast, is a predefined sequence of steps that humans design up front: which systems to call on, what inputs and outputs to expect, and where approvals are needed. Agents optimize for flexibility; workflows optimize for clarity, control, and repeatability.
The goal should not be to replace workflows with agents, but to let agents and LLMs operate inside well-defined flows. That starts with designing the process as if no AI existed, and only then inserting LLM calls where language understanding truly creates value.
This means distinguishing between two layers:
Rather than asking an agent to “handle, contract, review”, define a structured flow — an order of actions with predefined input-output format or schema — and let the model handle only the genuinely ambiguous parts.
So, how does it work in practice? Consider a law firm that wants to accelerate contract review. The “fully agentic” approach might look like this:
“Read this contract, identify risks, compare it to our standard templates, and draft a recommendation memo.”
The agent then improvises: it decides which sections to read, when to search internal precedents, and how to phrase the memo. It may yield impressive demos until a hallucination or missed clause creates an unacceptable risk for an important client.
Conversely, a workflow-centric approach would:
Ingest and classify. Deterministic logic identifies contract type, parties, jurisdiction, and governing law.
Extract key clauses. A series of targeted LLM calls extract specific elements (termination, liability caps, confidentiality, etc.) into a structured schema.
Compare against policy. Rule-based checks and, where needed, another LLM call flag deviations from the firm’s playbook.
Generate a draft summary. The model generates a concise, structured summary for the lawyer, referencing the extracted schema.
Human review and sign-off. A senior associate or partner reviews, edits, and approves before anything goes to the client.
The number of model calls is reduced and predictable, each call is scoped, and data flows are controlled. Crucially, the firm can explain to clients and regulators exactly how recommendations were produced, for example, during an audit.
A similar pattern is emerging in financial institutions experimenting with knowledge graph-based workflows, especially in activities that are information-intensive, relationship-driven, and require synthesis across many internal and external data sources. In corporate and investment banking, for example, this can support practical tasks such as preparing for a client meeting, identifying relevant precedent transactions, summarizing exposure to a sector, or finding the right internal expert for a live deal opportunity.
Instead of a free-roaming banking agent with access to everything, these institutions typically take a more structured approach, including: building a knowledge graph of entities, such as clients, deals, products, regions, and internal experts; defining a small set of canonical workflows, for example, “prepare client briefing”, “summarize exposure in sector X”, or “identify similar past deals”; and using LLMs at specific points, for instance, translating a natural language question into graph queries, summarizing retrieved information, or drafting a first version of a briefing note.
Take “prepare client briefing”. Before an important meeting, a relationship manager may need to pull together deal history, current exposure, recent client developments, market context, and relevant internal expertise. Today, that often means searching across multiple systems and documents and manually stitching the information together. In a workflow-based setup, the retrieval path is predefined, while the LLM is used only where language understanding and synthesis add value.
The same logic applies to “identify similar past deals”. In origination and structuring, precedent transactions are often the starting point for pricing, positioning, and documentation. Here too, the system does not need a fully autonomous agent improvising across the bank’s infrastructure. It needs a controlled workflow that retrieves the right information and uses the model to summarize and present it clearly.
The orchestration, i.e., which systems get queried, in what order, and with which access permissions, remains deterministic and auditable. The “intelligence” is injected where ambiguity is high and language-level reasoning is beneficial.
For leadership, the benefits are tangible:
Cost control: realistic estimates on model usage per workflow run.
Transparency: clear traces of which data sources and logic led to a recommendation.
Data protection: strict boundaries on which information is ever sent to an external API.
By the end of this decade, far more software will be infused with some degree of goal-driven autonomy.
The final piece of the puzzle is where these models run. In sectors with strong data privacy requirements (banking, legal, healthcare, and government), we see an accelerating trend toward private LLMs: models that run on-premises or in tightly controlled virtual private clouds, rather than sending sensitive prompts to public APIs.
Modern hardware has changed the economics. For many mid-sized organizations, it is now practical to operate one or two GPU servers in-house and deploy an open-source model (for example, a Llama- or Mistral-class model); fine-tune it on the firm’s own documents, templates, and taxonomies; retrain it regularly as new data arrives; and keep all prompts, outputs, and logs within their own infrastructure.
Coupled with the workflow-centric approach described above, this yields a compelling architecture:
Workflows: explicit, auditable, integrated with existing systems.
Models: powerful but constrained to specific roles in those workflows.
Deployment: under the firm’s control, aligned with its risk and compliance posture.
This is not about rejecting agents altogether. In some contexts, especially narrow, well-scoped tasks with clear tools and guardrails, agentic patterns can be extremely effective. Consider an airline customer-service agent: it may be allowed to rebook a delayed passenger, process a refund, or reroute lost baggage, but only within a tightly defined set of rules and systems. That is exactly where agentic design works best – not as open-ended autonomy, but as autonomy within boundaries. The direction of travel for many sophisticated organizations seems clear: fewer “AI wizards”, more carefully engineered flows where autonomy is selectively granted and always supervised.
LLMs and agentic AI will continue to reshape the enterprise. Analyst forecasts and vendor roadmaps are not wrong: by the end of this decade, far more software will be infused with some degree of goal-driven autonomy, according to Gartner. The strategic question for leaders is how that autonomy is introduced.
Treating agents as magical end-to-end problem solvers is tempting but unnecessarily risky. Designing workflows first and using LLMs and agents as carefully constrained components within those workflows is less glamorous but more robust.
For CFOs, risk officers, and general counsels, this approach offers a pragmatic path forward. Costs can be forecast and managed, processes can be explained to boards, auditors, and regulators, and data stays where it belongs.
The firms that succeed with AI will not be those that deploy the most agents, but those that design clear workflows and choose where intelligence belongs.


Senior Machine Learning Engineer, Intapp
Mariia Bulycheva is a Senior Machine Learning Engineer at Intapp, where she builds large-scale knowledge graphs that power AI agents and LLM-based applications. Previously, she worked on recommendation and forecasting models at Zalando and held investment banking roles at JPMorgan and Morgan Stanley.

Professor of Finance
Karl Schmedders is a Professor of Finance, with research and teaching centered on sustainability and the economics of climate change. He directs the Strategic Finance (SF) program and teaches in the Executive MBA programs. Passionate about sustainable finance, Schmedders believes that more attention needs to be paid to on the social (S) and governance (G) aspects of ESG to ensure a fair transition and tackle inequality.

June 9, 2026 in Artificial Intelligence
IMD, Media Trust, and Code For Good Now are building a coalition to shape how AI is developed, deployed, and governed, so it becomes a force for inclusive growth, responsible innovation,...

June 8, 2026 • by I by IMD in Artificial Intelligence
While it is normal to be nervous about AI, organizations must create a psychologically safe environment in which their employees can innovate and experiment, says Publicis Sapient’s Kameshwari Rao ...

June 4, 2026 • by Jerry Davis in Artificial Intelligence
While there are good reasons to fear a dystopian future, the open-source origins of the computer industry offer hope, suggests an unusually optimistic Jerry Davis....

May 28, 2026 • by Michael R. Wade, Massimo Marcolivio in Artificial Intelligence
Turning AI into ROI requires a clear focus on the value you are trying to create, disciplined metrics, and outcome-based governance....
Explore first person business intelligence from top minds curated for a global executive audience