Share
Facebook Facebook icon Twitter Twitter icon LinkedIn LinkedIn icon Email

Artificial Intelligence

Focus on data governance to reap the rewards of GenAI

Published 17 December 2024 in Artificial Intelligence • 6 min read

GenAI can deliver substantial benefits, but few organizations understand the risks. They need a deliberate data governance strategy, argues IMD’s TomokoYokoi.

For all the excitement surrounding generative artificial intelligence (GenAI), the list of its high-profile failures is beginning to get uncomfortably long. These range from Air Canada’s run-in with a passenger who received false information from a chatbot to the now infamous lawyer who used GenAI for a court submission to discover that it included legal precedents that never existed.

Many of these problems originate in failures of data governance. Data quality, in particular, is key. While business leaders don’t need to become data experts, they do have the responsibility to understand where the data comes from, the potential shortcomings of the data, and how it shapes the AI algorithm.

Then there is the problem of ‘hallucination’, where a GenAI tool generates an output that is inaccurate

The dangers that lie beneath

First, there is growing concern about intellectual property (IP) infringement. The large language models (LLMs) on which GenAI tools are trained consist of huge amounts of data that are potentially protected by IP rights. There are several ongoing legal disputes in different parts of the world over whether the scraping and use of this data to train AI – and by extension, to generate its outputs – represent IP infringements.

Importantly, in many countries, it is possible to be liable for an infringement of IP rights without having knowingly or deliberately infringed. This means that users of GenAI tools need to be as alert to IP issues as developers are. The courts have yet to set the ramifications and potential penalties for liability, but the danger is both real and significant.

Then there is the problem of ‘hallucination.’ This is where a GenAI tool generates an output that is inaccurate. This is much more common than many organizations realize. Research published in 2023 found that chatbots, for example, hallucinate in more than a quarter of the outputs they generate. Researchers analyzing chatbot responses found that almost half included factual errors.

Again, the risks here are substantial. Organizations are increasingly dependent on GenAI output, both in day-to-day operations and for longer-term strategic work. Where those outputs aren’t dependable, the organization may end up making costly mistakes.

The issue of bias also demands attention. GenAI tools produce such polished answers that it’s easy to fall into the trap of assuming they are all the products of a reasoned response. But the tools are incapable of making logical inferences; all they do is analyze the data they are given for recognizable patterns. Where that data includes bias, obvious or otherwise, the outputs from the tool will reflect it.

As a result, in fields ranging from healthcare to finance to law enforcement, GenAI tools are perpetuating gender and race discrimination, among other types of prejudice. For example, limited representation of certain socioeconomic groups in credit scoring can make it harder for those people to get access to financial services.

As organizations become more aware of issues such as IP rights, hallucinations, and bias, the natural response is to be much more selective about the data going into LLMs. But a lack of a wide range of high-quality data – particularly free or low-cost data – undermines the quality of the LLMs’ outputs, too.

One trend here is that organizations increasingly depend on synthetic data. Their GenAI models are trained on algorithms that have learned the patterns, correlations, and statistical properties of real-world data and used this to create statistically identical synthetic data. If the real-world data is flawed, this can become a self-perpetuating issue.

“Different uses of GenAI come with different concerns. Generating creative content, for example, might carry elevated IP infringement dangers while, for a customer chatbot, bias might be more of a concern.”

Taming the monster

How, then, do organizations begin to protect themselves from GenAI risk? The answer comes back to data governance. To use GenAI safely and effectively, organizations must be much more vigilant about the data used by underlying LLMs – what it is, who it references, and what biases and inaccuracies it may contain. They also need to be more skeptical about the quality of the end product.

Most organizations lack the resources to build their own LLMs from scratch and, therefore, use a foundational model such as GPT. It’s not always easy to interrogate these models for bias and accuracy, although developers are under pressure to be more open about the processes they use.

However, while it may be difficult to evaluate opaque models, organizations can be more risk-conscious with the data they use to train the LLMs. Is the data relevant to the objectives the organization is pursuing? Is it dependable, diverse, and balanced? Does it pose IP infringement questions or expose the organization to data privacy risks?

Assessing these factors will be an ever-more important task for organizations seeking to use GenAI. Ultimately, organizations will need to decide which risks they are comfortable with – potentially a task for the governance committee – and exclude data that could take them outside these boundaries. The threshold may vary according to the use case; there is little point in exposing the organization to additional danger if a GenAI use case is of limited value.

But also, different uses of GenAI come with different concerns. Generating creative content, for example, might carry elevated IP infringement dangers while, for a customer chatbot, bias might be more of a concern.

Another piece of the puzzle for organizations to solve is the question of how to assess GenAI outputs. Even where substantial work has gone into vetting the data used by a particular tool, it will still be important to question the responses that it generates – for example, to identify hallucinations.

Training staff to spot and test for hallucinations will be important. If the workforce is going to make use of GenAI tools, people need to know how to verify outputs and how to report suspect results so the model can be finetuned. Guardrails should also be in place to ensure GenAI is only used for intended tasks.

The potential of GenAI is too great to ignore. But pursuing that potential without properly understanding the inherent risks – many of which are only just beginning to emerge – is dangerously short-sighted. An approach that balances responsibility with open-mindedness and ambition will give organizations the best chance of walking that fine line.

All views expressed herein are those of the author and have been specifically developed and published in accordance with the principles of academic freedom. As such, such views are not necessarily held or endorsed by TONOMUS or its affiliates.

Authors

Tomoko Yokoi

Tomoko Yokoi

Researcher, TONOMUS Global Center for Digital and AI Transformation

Tomoko Yokoi is an IMD researcher and senior business executive with expertise in digital business transformations, women in tech, and digital innovation. With 20 years of experience in B2B and B2C industries, her insights are regularly published in outlets such as Forbes and MIT Sloan Management Review.

Related

Learn Brain Circuits

Join us for daily exercises focusing on issues from team building to developing an actionable sustainability plan to personal development. Go on - they only take five minutes.
 
Read more 

Explore Leadership

What makes a great leader? Do you need charisma? How do you inspire your team? Our experts offer actionable insights through first-person narratives, behind-the-scenes interviews and The Help Desk.
 
Read more

Join Membership

Log in here to join in the conversation with the I by IMD community. Your subscription grants you access to the quarterly magazine plus daily articles, videos, podcasts and learning exercises.
 
Sign up
X

Log in or register to enjoy the full experience

Explore first person business intelligence from top minds curated for a global executive audience