
How a structured workflow can rein in AI’s rogue agents
AI excels at complex tasks, yet its effectiveness depends on strictly controlled boundaries...
Audio available

by Vsevolod Shabad Published April 22, 2026 in Artificial Intelligence • 9 min read
Accountability and agency travel together. To be answerable for a decision, you must have been its author – someone who understood the context, weighed the trade-offs, and could have chosen differently. This is not a legal technicality; it is the condition that makes accountability meaningful rather than ceremonial. When organizations delegate agency to automated systems – increasingly to agentic AI, which selects, executes, and escalates without human intervention at each step – they fail to retain accountability as they shed decision-making. They hollow it out. The humans who remain nominally responsible can describe what the system did. They cannot answer for it. That distinction matters precisely when it matters most: not in routine operations, but at the moment a regulator, a parliamentary committee, or a camera asks a senior leader to explain a consequential decision made three years ago. The efficiency gains from delegation are real and materialize on schedule. What leaves quietly, and without a reporting line, is the organizational capacity to be genuinely answerable.
For a large, regulated organization, that moment arrives not as a letter from a regulator but as a camera – a parliamentary committee, a public hearing in which the institution is named before findings are formally published. Consider Danske Bank in 2018: the CEO resigned on the day the bank’s own investigation was published rather than waiting for a formal regulatory finding. Two months later, a whistleblower testified before the Danish Parliament and the European Parliament’s Special Committee on Financial Crimes. The bank had lost nearly half its market value before regulators reached their conclusions. Media amplification turns a credibility failure into a narrative; investor heuristics turn pressure into repricing, often before formal findings are available. This is the risk most boards are not pricing.
The operational case for automating complex analytical functions – contract management, procurement review, compliance monitoring – is largely sound. The question is: What erodes alongside the inefficiency being removed?
The pattern is consistent across organizations that automated early. Throughput improves. Skilled practitioners begin reviewing system recommendations rather than performing their own analysis. New entrants develop competence in using tools, but not the analytical literacy that the tools were built to replicate. Senior professionals who carry institutional memory – the commercial logic behind a non-standard clause, the regulatory trade-off that made sense three years ago – retire or are reorganized out. What remains is a function that performs well when conditions are normal and cannot explain itself when they are not.
The developmental dimension reinforces this. The analytical roles automation displaces are not simply tasks – they are the structure through which practitioners build judgment that qualifies them for senior responsibility. Judgment not developed at lower levels does not emerge at senior levels; the organization loses the shared language and institutional reference points that make senior reasoning legible. The juniors are technically proficient. The seniors are retiring. The generation in between may not fully form.
This pattern holds regardless of sector.
The standard reassurance is: We have audit trails, explainable AI, and complete documentation. This reassurance is wrong where it matters most.
Explainability is not the same as answerability. Traceability is not the same as ownership. A system can produce a complete audit trail of every input, weight, and output – and still leave open who understood what was happening and who accepted responsibility for it. Audit trails answer a documentation question. They do not touch the governance question that determines whether the organization survives scrutiny with its credibility intact.
Unlike a cyberattack, it arrives through a process entirely contingent on whether the organization can demonstrate that it understood and controlled what it was doing.
Boards organize their risk concerns around familiar categories: cyberattack, supply chain disruption, counterparty failure. What this framing consistently underestimates is risk originating from the state itself – not routine regulatory friction, but the full application of regulatory authority to an organization that cannot adequately account for its own conduct.
No ransomware group, no sophisticated threat actor has the structural capacity to impose losses equivalent to a GDPR fine calibrated at 4% of global annual turnover. GDPR is one instance – the Digital Operational Resilience Act (DORA) and the EU Artificial Intelligence Act extend it to operational resilience and algorithmic systems. Unlike a cyberattack, it arrives through a process entirely contingent on whether the organization can demonstrate that it understood and controlled what it was doing.
Danske Bank illustrates how quickly that dynamic escalates. By late 2018, the bank was no longer managing a relationship with a single regulator: the Danish FSA, the Estonian FSA, the European Banking Authority, the US Department of Justice, and the European Parliament’s Special Committee on Financial Crimes were all active simultaneously. Each jurisdiction operated on its own timeline and evidentiary standards. The bank could not negotiate a settlement with one authority without creating risk in another. When the state becomes the primary concern, the organization loses control not only of the process but also of the outcome.
The asymmetry is in exit, not entry.
The objection that parliamentary hearings are rare confuses the visible endpoint with the underlying risk. Regulatory pressure escalates through levels – from routine working discussions, through formal published investigations, to public parliamentary scrutiny – and organizations move between those levels in ways that are neither random nor inevitable.
Three factors drive escalation: a genuinely serious violation; a pattern of moderate failures that accumulate into a systemic picture; or responses at lower levels that regulators read as superficial, evasive, or insufficiently serious. An organization that arrives at a routine regulatory discussion with audit trails instead of understanding, with process descriptions instead of ownership, sends a signal. That signal influences whether the next interaction is treated as routine or as one requiring escalation. Recent parliamentary scrutiny of generative AI platforms illustrates the mechanism: the regulatory response was not triggered by a single technical failure, but by a pattern of prior concerns that had already positioned those organizations for heightened attention.
The asymmetry is in exit, not entry. Regulators maintain a portfolio view – marking some organizations for heightened attention. Entry can happen through two or three interactions where responses were judged inadequate. Exit requires years of consistent demonstration, where every interaction is interpreted through a lens of prior concern. The organization does not fail catastrophically at a single hearing. It gradually signals, across dozens of routine interactions, that it does not fully understand its own decisions.

The useful diagnostic is not whether an organization has automated its processes. It is whether the organization can pass the three-year test: could a senior leader, without preparation, without documents, and under adversarial questioning, reconstruct and credibly defend the reasoning behind a significant decision made three years ago – not from system logs, but from genuine human understanding of the context, the trade-offs, and the judgment exercised?
If the honest answer is no, the organization has mispriced its regulatory and reputational exposure in a way that will not appear on any dashboard until the inquiry begins. Regulators are not conducting fact-finding exercises. They are conducting belief tests. The question is not what happened. The question is whether this organization is the kind of institution whose account of what happened can be trusted – and that determination rests on whether real humans, with real understanding, are visibly present in the room.
The Danish FSA’s decision on Danske Bank is unusually direct on what governance failure looks like in practice. The regulator found that decision-making processes were insufficiently documented – and that, as a result, the Board of Directors and Executive Board were unable to answer the FSA’s questions on a number of issues, referring instead to the need for further internal investigations. The regulator did not ask what the system had recorded. It asked whether the people responsible had understood and controlled what was happening. They could not demonstrate that they had. That is the three-year test, applied in practice.
The first is to map the intersection of automation depth and accountability weight.
The three-year test is a diagnostic, not a solution. Three assessments follow for the next governance cycle.
The first is to map the intersection of automation depth and accountability weight. For each significantly automated function, ask: if a decision made here were challenged in a regulatory hearing three years from now, who in the organization could explain it – and on what basis? Where that intersection is densest is where exposure is concentrated.
The second is to treat the departure of experienced practitioners as a structured governance event rather than an HR transaction. What commercial rationale, relationship history, and contextual judgment leaves with each senior professional – and where does it go? Most organizations have no honest answer. Finding one is harder than it appears.
The third is to assess the signal quality of the last three significant regulatory interactions. Did the organization arrive with understanding, or with documentation? Did senior representatives account for decisions, or describe processes? The difference is visible to regulators in the room. Whether it is visible internally – and what it implies about the next interaction – is the question worth asking before the regulator asks it first.
The efficiency gains from automation are real. The risk accumulating against them is real too, and it grows faster than the frameworks designed to manage it. The question that belongs on every board agenda is not how much can be automated. It is what the organization will say, and who will say it, when the three-year test arrives uninvited.
The views expressed are those of the author in a personal capacity and do not represent the positions of any organizations.

Principal enterprise architect and former CISO across critical-infrastructure sectors
Vsevolod Shabad FBCS has held senior technology and governance roles — as CISO, CIO, and Principal Enterprise Architect — across banking, energy, metals and mining, and telecommunications in eight countries. He conducts research at the University of Liverpool.

June 11, 2026 • by Mariia Bulycheva, Karl Schmedders in Artificial Intelligence
AI excels at complex tasks, yet its effectiveness depends on strictly controlled boundaries...

June 9, 2026 in Artificial Intelligence
IMD, Media Trust, and Code For Good Now are building a coalition to shape how AI is developed, deployed, and governed, so it becomes a force for inclusive growth, responsible innovation,...

June 8, 2026 • by I by IMD in Artificial Intelligence
While it is normal to be nervous about AI, organizations must create a psychologically safe environment in which their employees can innovate and experiment, says Publicis Sapient’s Kameshwari Rao ...

June 4, 2026 • by Jerry Davis in Artificial Intelligence
While there are good reasons to fear a dystopian future, the open-source origins of the computer industry offer hope, suggests an unusually optimistic Jerry Davis....
Explore first person business intelligence from top minds curated for a global executive audience