NLP bias and its impact on AI

Natural Language Processing (NLP) can be divided into two broad areas: Natural Language Understanding (NLU) and Natural Language Generation (NLG). NLU is concerned with the use of computers to understand the semantic relationships between words in natural language texts, while NLG is concerned with the generation of texts that mimic the semantic complexity of natural language texts.

These tools can be applied to various real-world business problems, such as document classification and summarization, named entity extraction, machine translation, fact checking, and question answering. They can help increase efficiency by reducing search time and effectiveness by improving relevance. NLP can be a highly efficient way to use computers to solve problems that traditionally could only be handled by humans.

NLP and speech recognition software

NLP can even assist with Automatic Speech Recognition (ASR). Since ASR aims to process natural language, it can also be understood as part of the NLP category that combines NLU (utterance comprehension) and NLG (generation of natural language output as a transcription of spoken input).

If an explicit distinction is to be made, then NLP can help improve the accuracy of the acoustic model of an ASR system. In this case, a language model (LM) can be used to estimate the probability of a particular syllable or word sequence. This can help, for example, to distinguish homophones, i.e. words that are pronounced the same but carry a different meaning.

Modern LMs can use the context words to estimate overall probabilities. However, recent publications show that the most accurate ASR systems address the problem end-to-end, i.e., the acoustic model is intertwined with the LM and the speech generation model. This makes it increasingly difficult to distinguish ASR from NLP.

Issues of bias in NLP and speech recognition

But there are instances of bias occurring in NLP and ASR that have the potential to derail the use of these technologies. Implementing AI with modern machine learning (ML) involves two main components: an ML model with a specific architecture and a dataset that models one or more specific tasks. Both of these parts can introduce biases.

The black-box nature of ML models can make it difficult to explain the decisions made by the models. Furthermore, models can overfit datasets or become overconfident and do not generalize well to unseen examples. However, in a majority of cases, the dataset used for training and evaluation is the culprit for introducing bias.

A dataset may contain inherently biased information, such as an unbalanced number of entities. Datasets that have been manually annotated by human annotators are particularly prone to bias, even if the annotators have been very carefully selected and have diverse backgrounds. Large corpora obtained unsupervised from the World Wide Web still exhibit biases, e.g., due to differences in Internet availability around the world or differences in the frequency of speakers of certain languages.

The implications of NLP bias

The downside is that populations that are underrepresented in particular data sets are, at best, unable to use an AI system to help them solve the desired task and, at worst, discriminated against because of how the AI predicts outcomes.

Discrimination based on the unfairness of an artificial model becomes a serious problem once AI systems are used to make potentially important decisions automatically and with limited human oversight. In addition, these problems also hinder the progress and acceptance of AI due to the justified mistrust that is generated. As a result, these technologies are most effective when they are used to augment, rather than replace, human input and expertise.

Overcoming and regulating bias in NLP technology

Unfortunately, there is no silver bullet to solve the problem of bias in NLP, ML, or AI in general. Instead, an important component is awareness of the problem and an ongoing commitment to developing AI solutions that improve fairness.

Technically, there are a variety of theories and methods that are being actively researched and developed to improve fairness and explainability. These include but are not limited to measurement and reduction of bias in datasets, principles for balanced training of models, strategies for dealing with inherent uncertainty during inference, and ongoing monitoring of AI decision-making.

The role of ethics

The recent field of Ethics in AI also plays a role in addressing NLP bias. The challenge is that AI is still a relatively young and fast-moving field of research and application. Although it has existed for many years, it is only recently that the deployment has become more widespread. We have not yet reached the plateau of stability, which is required to formulate and codify behaviors and norms, ensuring a fair playing field.

Squirro’s approach to this is threefold, and one that could go a long way if followed by the wider industry: A) ongoing consciousness-raising internally and with customers and prospects around the issue of bias in AI modeling and AI-supported decision making. B) calling for and contributing to industry and government working groups establishing the regulatory framework to operate AI responsibly and C) implementing – not just discussing them – A & B.

NLP is an impactful technology, with a variety of use cases that help businesses be more efficient and effective. It is so useful that the industry cannot afford to let its use be negatively affected by issues of bias. Such technologies work most effectively when they are used to augment human input and intelligence, not replace them. In addition to the above, addressing bias requires focus and industry-wide commitment to mitigate its negative impact.

Thomas Diggelmann

Thomas Diggelmann is Machine Learning Engineer at augmented intelligence firm Squirro, which works with organizations worldwide to extract meaningful and actionable insight from the data they hold.

The rise of loyalty apps

Sue Azari • 17th January 2025

Increased choice and a consumer more price sensitive than ever before, has made customers far more likely to shop around for the best deals. Price is now the number one factor in brand consideration. In an effort to bag a bargain, loyalty programs have become increasingly popular with consumers, with nine out of ten in...

Rocket launch challenges Elon Musk’s space dominance

Professor Sultan Mahmud • 16th January 2025

Amazon founder Jeff Bezos’s space company has blasted its first rocket into orbit in a bid to challenge the dominance of Elon Musk’s SpaceX. The New Glenn rocket launched from Cape Canaveral Space Force Station in Florida at 02:02 local time (07:02 GMT). It firmly pits the world’s two richest men against each other in...

Giesecke+Devrient launches new Smart Label at CES 2025

Giesecke Devrient • 06th January 2025

G+D has today launched the G+D Smart Label, its innovative tracking solution that transforms any package into an IoT device. Ultra-thin and only slightly larger than a credit card, the new Smart Label proposition has been jointly developed by G+D in conjunction with its hardware partner, Sensos to enable cost-effective, accurate location tracking for a...

Choose an AI solution to transform beyond technology

Kit Cox • 09th December 2024

The first step is knowing exactly what your business wants to achieve with AI; think faster, smarter and more efficient. Once you know what you are working towards, you can start looking for a solution that can help you make it a reality. AI integration can feel like a daunting task at the beginning, so...

A Roadmap to Security and Privacy Compliance

John Lynch Director of Kiteworks • 04th December 2024

Only by understanding the current regulatory environment and implementing robust data protection measures, can organisations enhance their security posture, ensure compliance, and build resilience against the latest cyber threats. This article provides a comprehensive roadmap of how to do it.

Data-Sharing Done Right: Finding the Best Business Approach

Bart Koek • 20th November 2024

To ensure data is not only available, but also accessible to those that need it, businesses recognise that it is vital to focus on collecting, sorting and governing all the data in their organisation. But what happens when data also needs to be accessed and shared across the business? That is where organisations discover a...

Nova: The Ultimate AI-Powered Martech Solution for Boosting Sales, Marketing...

Erin Lanahan • 19th November 2024

Discover how Nova, the AI-powered engine behind Launched, revolutionises Martech by automating sales and marketing tasks, enhancing personalisation, and delivering unmatched ROI. With advanced intent data integration, revenue attribution, and real-time insights, Nova empowers businesses to scale, streamline operations, and outperform competitors like 6Sense and 11x.ai. Experience the future of Martech with Nova’s transformative AI...