NLP bias and its impact on AI

Natural Language Processing (NLP) can be divided into two broad areas: Natural Language Understanding (NLU) and Natural Language Generation (NLG). NLU is concerned with the use of computers to understand the semantic relationships between words in natural language texts, while NLG is concerned with the generation of texts that mimic the semantic complexity of natural language texts.

These tools can be applied to various real-world business problems, such as document classification and summarization, named entity extraction, machine translation, fact checking, and question answering. They can help increase efficiency by reducing search time and effectiveness by improving relevance. NLP can be a highly efficient way to use computers to solve problems that traditionally could only be handled by humans.

NLP and speech recognition software

NLP can even assist with Automatic Speech Recognition (ASR). Since ASR aims to process natural language, it can also be understood as part of the NLP category that combines NLU (utterance comprehension) and NLG (generation of natural language output as a transcription of spoken input).

If an explicit distinction is to be made, then NLP can help improve the accuracy of the acoustic model of an ASR system. In this case, a language model (LM) can be used to estimate the probability of a particular syllable or word sequence. This can help, for example, to distinguish homophones, i.e. words that are pronounced the same but carry a different meaning.

Modern LMs can use the context words to estimate overall probabilities. However, recent publications show that the most accurate ASR systems address the problem end-to-end, i.e., the acoustic model is intertwined with the LM and the speech generation model. This makes it increasingly difficult to distinguish ASR from NLP.

Issues of bias in NLP and speech recognition

But there are instances of bias occurring in NLP and ASR that have the potential to derail the use of these technologies. Implementing AI with modern machine learning (ML) involves two main components: an ML model with a specific architecture and a dataset that models one or more specific tasks. Both of these parts can introduce biases.

The black-box nature of ML models can make it difficult to explain the decisions made by the models. Furthermore, models can overfit datasets or become overconfident and do not generalize well to unseen examples. However, in a majority of cases, the dataset used for training and evaluation is the culprit for introducing bias.

A dataset may contain inherently biased information, such as an unbalanced number of entities. Datasets that have been manually annotated by human annotators are particularly prone to bias, even if the annotators have been very carefully selected and have diverse backgrounds. Large corpora obtained unsupervised from the World Wide Web still exhibit biases, e.g., due to differences in Internet availability around the world or differences in the frequency of speakers of certain languages.

The implications of NLP bias

The downside is that populations that are underrepresented in particular data sets are, at best, unable to use an AI system to help them solve the desired task and, at worst, discriminated against because of how the AI predicts outcomes.

Discrimination based on the unfairness of an artificial model becomes a serious problem once AI systems are used to make potentially important decisions automatically and with limited human oversight. In addition, these problems also hinder the progress and acceptance of AI due to the justified mistrust that is generated. As a result, these technologies are most effective when they are used to augment, rather than replace, human input and expertise.

Overcoming and regulating bias in NLP technology

Unfortunately, there is no silver bullet to solve the problem of bias in NLP, ML, or AI in general. Instead, an important component is awareness of the problem and an ongoing commitment to developing AI solutions that improve fairness.

Technically, there are a variety of theories and methods that are being actively researched and developed to improve fairness and explainability. These include but are not limited to measurement and reduction of bias in datasets, principles for balanced training of models, strategies for dealing with inherent uncertainty during inference, and ongoing monitoring of AI decision-making.

The role of ethics

The recent field of Ethics in AI also plays a role in addressing NLP bias. The challenge is that AI is still a relatively young and fast-moving field of research and application. Although it has existed for many years, it is only recently that the deployment has become more widespread. We have not yet reached the plateau of stability, which is required to formulate and codify behaviors and norms, ensuring a fair playing field.

Squirro’s approach to this is threefold, and one that could go a long way if followed by the wider industry: A) ongoing consciousness-raising internally and with customers and prospects around the issue of bias in AI modeling and AI-supported decision making. B) calling for and contributing to industry and government working groups establishing the regulatory framework to operate AI responsibly and C) implementing – not just discussing them – A & B.

NLP is an impactful technology, with a variety of use cases that help businesses be more efficient and effective. It is so useful that the industry cannot afford to let its use be negatively affected by issues of bias. Such technologies work most effectively when they are used to augment human input and intelligence, not replace them. In addition to the above, addressing bias requires focus and industry-wide commitment to mitigate its negative impact.

Thomas Diggelmann

Thomas Diggelmann is Machine Learning Engineer at augmented intelligence firm Squirro, which works with organizations worldwide to extract meaningful and actionable insight from the data they hold.

Choose an AI solution to transform beyond technology

Kit Cox • 09th December 2024

The first step is knowing exactly what your business wants to achieve with AI; think faster, smarter and more efficient. Once you know what you are working towards, you can start looking for a solution that can help you make it a reality. AI integration can feel like a daunting task at the beginning, so...

A Roadmap to Security and Privacy Compliance

John Lynch Director of Kiteworks • 04th December 2024

Only by understanding the current regulatory environment and implementing robust data protection measures, can organisations enhance their security posture, ensure compliance, and build resilience against the latest cyber threats. This article provides a comprehensive roadmap of how to do it.

Data-Sharing Done Right: Finding the Best Business Approach

Bart Koek • 20th November 2024

To ensure data is not only available, but also accessible to those that need it, businesses recognise that it is vital to focus on collecting, sorting and governing all the data in their organisation. But what happens when data also needs to be accessed and shared across the business? That is where organisations discover a...

Nova: The Ultimate AI-Powered Martech Solution for Boosting Sales, Marketing...

Erin Lanahan • 19th November 2024

Discover how Nova, the AI-powered engine behind Launched, revolutionises Martech by automating sales and marketing tasks, enhancing personalisation, and delivering unmatched ROI. With advanced intent data integration, revenue attribution, and real-time insights, Nova empowers businesses to scale, streamline operations, and outperform competitors like 6Sense and 11x.ai. Experience the future of Martech with Nova’s transformative AI...

How E-commerce Marketers Can Win Black Friday

Sue Azari • 11th November 2024

As new global eCommerce players expand their influence across both European and US markets, traditional brands are navigating a rapidly shifting landscape. These fast-growing Asian platforms have gained traction by offering ultra-low prices, rapid product turnarounds, heavy investment in paid user acquisition, and leveraging viral social media trends to create demand almost in real-time. This...

Why microgrids are big news

Craig Tropea • 31st October 2024

As the world continues its march towards a greener future, businesses, communities, and individuals alike are all increasingly turning towards renewable energy sources to power their operations. What is most interesting, though, is how many of them are taking the pro-active position of researching, selecting, and implementing their preferred solutions without the assistance of traditional...

Is automation the silver bullet for customer retention?

Carter Busse • 22nd October 2024

CX innovation has accelerated rapidly since 2020, as business and consumer expectations evolved dramatically during the Covid-19 pandemic. Now, finding the best way to engage and respond to customers has become a top business priority and a key business challenge. Not only do customers expect the highest standard, but companies are prioritising superb CX to...