AI Data Bias, The 2024 Challenge: Solutions for Multicultural Integrity
August 13, 2024
By Liz Castells-Heard, CEO & Chief Strategy Officer, INFUSION by Castells
Generative AI (Gen AI) applied to marketing has tremendous value and we leverage it across the board, however its inherent biases and limitations in Multicultural/Ethnic accuracy and representation requires guardrails, human contextual and deductive skills, and human involvement throughout the process.
MCM Stakes Higher Than Ever
As a Multicultural agency, we’ve seen unintentional data bias for decades in first-party Client research data, third-party media metrics or research studies and even the Census from limited or erroneous samples, misassumptions, lack of relevant content or context, and language or cultural biases. But now that we are all using AI tools and models, the stakes are much higher. Clients are using or will use Gen AI optimization and MMX models with aggregated data sets to make major business decisions from resource allocation, marcomm and media optimization strategies to identifying target priorities/profiles, messaging and developing or transcreating content. Imagine the domino effect of even one flawed or biased data set.
AI’s Unchecked DEI Data Bias
Besides the limited samples, context or prejudice raised above, data bias can come from the lack of diversity in the humans who built them, data users, its interpretation, incomplete algorithms, and/or historical data not reflecting current populations. For example, Amazon’s AI hiring algorithm amplified the severe women, gender and ethnic bias using decades of historical recruiting records dominated by white males and despite efforts to rectify, they lost confidence and abandoned the model. Imaging tools depict “attractive” or “productive” as light-skinned individuals while those “with social services” are darker-skinned Blacks or Hispanics, despite the majority of recipients being White. Even Stability AI’s image generator (Stable Diffusion XL) which is one of the best defaults to outdated Western stereotypes. Content AI output varies in tone, style and accuracy. And most AI tools (including ChatGPT) are trained on general online content without data cleansing. Closed systems still rely on years of first- or third-party metrics, thus also inherit biases. And AI trained on human-developed materials not only inherits biases but amplifies them.
Addressing this requires a nuanced approach, considering diverse demographics and real-world vetting.
Spanish-Language Model Limitations
Since all AI/ML Natural Language Processing (NLPs) are from English-speaking countries only trained with English data, Spanish language and cultural nuances get lost in translation, even with the upcoming Large Language Models (LLM).Consider just a few variables involved to transcreate English content into Spanish so that it is understood by U.S. Hispanics with 27+ country dialects, male/female prepositions, conditional use of Tú vs. Usted, category nomenclature which varies with context and includes English or bilingual words, or consistency in brand tonality.
The nuanced variables require multiple human factors, decades of human training and consistent human input.
Gen AI Use In Perspective
If we leverage AI’s strengths, understand its limitations, and try to mitigate the biases, then we can astutely apply. We apply Gen AI Project Mgmt. to the execution of laborious tasks and now publishing tools for repetitive tasks like resizing or reformatting the same content to improve workflow productivity and speed to market. We use content tools mostly for inspiration, ideation and simple tasks like email subject headlines but assure copywriters vet for errors and edit to sound more human. We have used NLP CAT tools for years as a starting point with heavy proofing and editing. We use imaging tools to help with storyboards, mock-ups and social but not for major digital, print or video campaigns, and likewise video production tools lack the high level imaging, GFX and nuanced complexity of ads required. We use AI tools for data mining deeper insights and implications but are acutely aware of the drawbacks. And when it comes to data analysis, we constantly audit to catch errors and bias, and use various metrics/data bases on the same subject for comparison and vetting.
The goal is to deliver a more effective “Augmented Workforce” with AI + Human partnerships the best of both worlds.
Best Practices Addressing AI Data Model Bias
A 360° view, “HOTL” models (Humans on the loop) and safeguards throughout the process can help address the complexities that require human contextual and deductive skills to mitigate biases between and within datasets:
Test algorithms in real-life settings and compare AI results with traditional methods
Gather data from diverse sources and use diverse datasets to assure cogent MCM samples
Ensure a diverse internal team and leverage knowledgeable partners to think through different views to develop comprehensive outcomes – from the data input stage to analysis and ethics oversight
Consider ALL end users and ALL of those affected as data must relate to the whole of the target population
Do not use historical data that does not represent today’s reality, especially in diversity/MCM
Execute regular automatic and manual audits of models for fairness and retrain models with new data
Humans in every step to vet and analyze, make the decisions, review outcomes and adjust rules or parameters
Incorporating different human views, types and sources of data at each step help ensure representation and flexibility.
The Bottom Line: A Collective Responsibility
AI’s inability to fully grasp and respond to intangible human factors in decision-making, such as ethnicity, race or culture, or the proper ethics, morality, and empathy, limits its readiness for viable decisions. While completely eliminating AI bias is unattainable, marketing leaders must ensure AI systems are equitable, enhance business outcomes and human decision-making rather than perpetuating human prejudices and low MCM representation. It is our collective responsibility as an industry to foster research and establish standards that reduce AI bias.
If we can find solutions to assure Multicultural integrity in data models and do it right, then Generative AI is the seismic change that will truly enhance our cognitive capabilities, change how we work, how people buy and interact with brands, and lead us into the new golden age of creativity and exponential business value.