DSS Blog

Understanding Customer Concerns in Retail: A review of LLMs and Traditional ML-based methods for Topic Modeling and Multi-label Classification

Written by Utkarsh Singh, Yuyang Gao, Zhengxiang Wang | Aug 27, 2024 3:10:21 PM

Did you know that organizations risk losing an average of 8% of their revenue due to poor Customer Experience (CX)? This significant potential loss can be attributed to the fact that over 50% of consumers either decreased or stopped spending with a business after a bad experience.

This impact is significantly pronounced in Online Retail, Parcel Delivery and Department Stores, with more than 20% customers admitting having completely stopped their engagement after a poor experience. Moreover, the negative perception lasts long, with some customers avoiding companies for 2 years or more. These statistics highlight how vital it is to get customer service right – but we first need to understand what customers are concerned about.

Why Understanding Customer Concerns is Vital for Retail Success

Customers can share their experiences across a range of aspects including personalization preferences, user experience in-store or online, products & services, post-sale customer care and many other functions. Companies can ethically collect qualitative data through surveys, feedback forms, anonymized customer support interactions, reviews etc. for all the above functions.

Advancements in Generative AI and NLP frameworks have enabled the analysis of unstructured data to identify and address customer concerns while ensuring compliance with privacy regulations. This article reviews leading ML methods and state-of-the-art LLMs for discovering new pain points, identifying them in data from feedback or service channels and conducting root-cause exploration to mitigate CX issues.

Topic Modeling to Discover New Points of Concern

Identifying customer concerns or pain points (“concerns” and “pain points” are interchangeably used in the article), often hidden within vast amounts of customer interactions, is a foundational step in addressing and improving the overall customer experience. This can be done using Topic modeling, which is a powerful framework in machine learning, allowing businesses to sift through customer interactions and identify recurring themes and concerns.

ML Methods in Topic Modeling

Topic modeling is a classic unsupervised ML task. With the advancements in ML over time, various advanced methods have emerged for topic modeling. For retailers, these methods take customer interaction data as inputs and return a set of topics potentially relevant to the customer’s pain points. By systematically analyzing the occurrence of customer pain points via the obtained topics, retailers can prioritize the most pressing issues and develop targeted solutions.

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) is a traditional topic modeling method. It is a generative probabilistic model assuming that each document is a mixture of topics assigned with varying probabilities, and each topic is a mixture of words that frequently appear together. LDA's advantages include its simplicity and ability to provide a clear probabilistic framework for discovering topics across documents. LDA is also highly scalable in that it can handle large volumes of text data effectively.  However, LDA has limitations, such as difficulty in handling short and noisy texts and the need for pre-specifying the number of topics, which can limit its application and effectiveness.

BERTopic

BERTopic is a neural topic modeling technique that addresses some of the limitations of LDA. It leverages transformer-based language models and a class-based Term Frequency-Inverse Document Frequency (c-TF-IDF) method to create dense and semantically coherent clusters. This approach allows for easily interpretable topics while preserving important words in the topic descriptions. BERTopic's advantages include better handling of short and noisy texts, dynamic topic discovery, and improved interpretability of the results. However, scaling BERTopic is cost-intensive because significantly more computations are required for text embedding and clustering.

Other Advanced Techniques

BERTopic needs extensive domain-adaptation, and pretrained models may not have good feature representations for texts specific to retailer business, some advanced clustering techniques can be employed to improve the clustering quality, such as Deep Embedded Clustering (DEC) and Curriculum Labeling. These two approaches have been shown to perform well with minimalit possible to extract human-interpretable and highly customizable topics from documents via prompting, such as TopicGPT. However, LLM-based topic modeling can be both time and cost consuming for big data.

Key Takeaways

Topic modeling is a powerful tool for modeling customer pain points in the retail sector. We reviewed several ML methods for this purpose, such as LDA, BERTopic, and TopicGPT, each of which has strengths and weaknesses. Correctly utilizing these topic modeling can bring about important customer insights that are critical to the success of a retailer business. 

Building and Evaluating Classification Models for Identifying Known Issues in Customer Interaction Data

Once the space of customer pain points is defined through topic modeling, a classification model can be trained for identifying known customer issues. This section outlines some common frameworks and methods used for creating and evaluating a multi-label classification model using advanced machine learning techniques.

Model Framework

Customers often discuss multiple issues within a single interaction, making it essential for the model to recognize and classify several pain points simultaneously. By mapping each customer interaction into a vector space using a language model (embedding model), semantically similar interactions can be identified nearest neighbor algorithms for inference. This approach naturally supports multi-label prediction, as the top similar known interactions may contain similar yet different customer concerns.

Training and Fine-tuning the Classification Model through Contrastive Learning

A critical component of the model is obtaining a set of known customer interactions with correctly identified customer issues. Typically, this data is validated manually to ensure accuracy and serve as reference points for training the embedding model. Fine-tuning the embedding model involves using a contrastive learning objective function to ensure semantically similar interactions are mapped closer together in the vector space while dissimilar interactions are mapped further apart. This can be achieved through a supervised approach using a contrastive learning loss.

Performance Evaluation

Comprehensive evaluation of the model's performance is essential to ensure it accurately predicts customer issues within interactions. Given the multi-label nature of the predictions and ground truth, evaluation should consider the following metrics:

  • Top-1 Accuracy: This metric assesses whether the top-ranked issue predicted by the model matches the primary ground truth customer issue. It evaluates the model's ability to correctly identify the most critical issue.
  • Top-K Accuracy: This metric evaluates if the primary ground truth issue is among the top K predicted issues for a given interaction. It serves as a relaxed metric for assessing the model’s effectiveness in capturing relevant issues among its top predictions.

In addition to accuracy metrics, some other metrics that can be considered include Hamming Loss, which measures the fraction of incorrect labels; Subset Accuracy (Exact Match Ratio), which checks if the predicted labels exactly match the ground truth; and Precision, Recall, and F1 Score, which offer insights into model performance per label (macro-averaging) or overall (micro-averaging). These metrics collectively provide a thorough assessment of the model’s ability to identify and prioritize customer issues.

Future Trends and Potential Use Cases

Over 80% of companies surveyed for a study by Mckinsey responded that they are investing in LLM and GenAI tools for customer care. However, the study found a rising preference for live phone interactions and digital chat services. This blend of traditional and digital preferences presents an opportunity to collect and analyze large amounts of qualitative data from customers about a company’s products and services.

Integration of AI-driven chatbots and virtual assistants in front-end customer support applications can make it easier to generate metadata about interactions. For example, these programs can identify contextual keywords and phrases to tag conversations in real time. This metadata can then be processed using SOTA NLP models to not only identify known pain points in conversations, but also discover new issues or sentiments. This intelligence can then feed into strategic reports to inform leadership about new and existing issues, as well as be used for automatic triaging of cases for faster resolutions, all in real time.

Challenges and Opportunities

While these are viable applications, they do currently face challenges in terms of the accuracy and reliability of LLM-generated outputs. Despite significant advancements from industry leaders such as OpenAI, Google, Anthropic, Meta and Mistral, 25% respondents in a survey of 50+ large companies conducted by Lightspeed reported that LLM applications have yet to meet minimum performance thresholds.

For Instance, traditional methods used to label unstructured data heavily rely on manual annotation of sizeable volumes of data in a cost and time-intensive manner. Multi-LLM debating frameworks have shown promising results in potentially reducing these manual annotation costs, but there is room for improvement in domain adaptation through finetuning.

Manually annotated data is then used to train BERT-based classification models for domain-specific applications. However, training these models is time-consuming and computationally expensive. For general classification tasks on unlabeled data, few-shot learning and RAG-based finetuning are popular methods among data scientists, but their performance remains subpar compared to domain-adapted models. To overcome these challenges, there is growing focus on resource-efficient LLM finetuning strategies. Techniques like Prefix-Tuning, LoRA, and LayerNorm tuning enable fine-tuning LLMs with minimal additional parameters. The use of LoRA to fine-tune models significantly enhances performance without the need for extensive computational resources. LS-LLaMA has been shown to outperform traditional models like BERT and RoBERTa in various text classification tasks by effectively utilizing latent representations for label prediction.

In conclusion, the integration of GenAI and traditional NLP techniques holds tremendous potential for revolutionizing customer care by providing deeper insights and faster resolutions. As these technologies continue to evolve, they promise to enhance customer experiences while driving operational efficiencies across industries.