Using AI to turn messy, unstructured data into insights

May 14, 2020

by Kyle Findlay – Senior Data Science Director (Innovation) / Kantar

For much of our industry’s history, market researchers have used intentional questioning to generate structured data to derive insights from, but as technology has grown in its ubiquity, we’ve gained access to new unstructured data sources and the resources to turn this messy data into insights.

Today we can leverage Twitter posts, Instagram images, customer feedback verbatims, YouTube videos, Alexa logs, browser search histories and much more (all with permission, of course). And, as COVID-19 hampers some of our traditional data collection channels, other sources have gained an urgent impetus.

Coronavirus advertising: it’s time brands stopped being there for people

These other sources present us with novel challenges though, so how do we gain insight from this boon of unstructured data? This is where data science and AI enter the picture. Working with our various teams around the world to figure out how to attack these problems in their myriad forms has been one of the most exciting and novel journeys of my professional career.

Let’s explore what it means to leverage AI to make sense of unstructured data. Extracting insights from unstructured data can be broken down into two parts:

What do we know about in advance that we want to quantify in the data?
How do we surface the things that we don’t know about in advance?

Measuring what we already know about

Market research has concerned itself with quantifying known variables for most of its existence. Most surveys are an attempt to quantify issues we already know about . The quintessence of this paradigm is the attribute list – a pre-defined list of themes that we ask respondents about in order to quantify those themes’ impact in the market.

In the preserve of data science and AI, an attribute list is roughly akin to a taxonomy. These lists are manifestations of our insight into how a category works. It takes an expert – someone with deep knowledge of a category – to put such a list together that is comprehensive and useful. M uch of our extended team s’ efforts have gone into making the implicit category knowledge of Kantar’s experts captured in such lists explicit so that our AI can leverage that knowledge.

While we can’t scrape the entire internet like Google or access the world’s social graph like Facebook, we can collate a massive data asset in the form of brand and category taxonomies for our AI to leverage. Thus, we have built the Kantar Brain Taxonomy Tool ( referred to colloquially as the “Kantar Brain”), a centralized database that pulls together all of our existing taxonomies from a variety of divisions, products, tools and data stores. When combined with the attendant AI tools that leverage this data, we are able to bring to bear the knowledge of an unprecedented global network of brand experts to quickly surface insights around known themes in social media posts, voice of the customer verbatims, chatbots, and wherever else we deal with unstructured data.

Surfacing what we don’t know

The flip side of th e unstructured data coin is finding the things that we didn’t know about in advance, but which appear in our data; the unanticipated themes that weren’t on our radar ahead of time (like COVID-19, for example). This requires us to leverage our data assets and AI capabilities in a slightly different way.

Thankfully, our tools stand on the shoulders of giants. Everyone playing in the data science field owes a huge debt to companies such as Google, Facebook, OpenAI (see the previous post about Cyber nAIgel based on their GPT-2 model) , and others who use their resources to encode huge swathes of human knowledge into models that they release into the public domain. By leveraging these pre-trained public models, our AI tools start with much needed context that we couldn’t give them on our own. It’s then up to us to further ‘fine tune’ th e models to our clients’ specific contexts and the market research paradigm in general.

All of this means that we empower our colleagues with tools that are uniquely attuned to how brands work. This helps them to, for example, quickly surface the themes of interest in their customer experience verbatims, identify consumer segments based on Instagram photos, see which themes on social media drive brand equity, algorithmically tie disparate data sources together, create smarter chatbots that automatically infer what a respondent is talking about, and much more.

Market researchers go where the data is

Data trends swell and crest. One thing that has always remained constant though is our need to translate between the language of the consumer and the paradigm of business. Kantar’s investment in data science and pursuit of AI-infused capabilities allows us to answer fundamental brand questions in an ever-more holistic manner by being data agnostic and using the best data available to us.

These investments inform all aspects of market research: sometimes questions can be answered without ever asking a question through the clever use of existing data and AI techniques; often though specific questions need to be asked (but even here, AI is helping us evolve by, for example, allowing chatbots to have ever-more naturalistic conversations with respondents). T he most powerful insights come from a melding of both approaches – automatic insight mining and intentional questioning.

To realize the future, we have worked hard to encode the deep knowledge and expertise of our organization into our AI capabilities so that our machines can empower our people and create valuable insights for our clients in a rapidly changing world. In the process, putting the necessary skills, infrastructure, data assets and capabilities in place has changed what it means to be a market research provider.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent system. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent system. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent system. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent system. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent system and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
csrf_token	session	User protection against possible Cross-Site Request Forgery attack. Recommended.
mwsid	session	HispanicAd.com Newsletter cookie should user decides to sign-up. Essential.