Emerging retrieval-augmented generation (RAG)-powered enterprise conversational QA applications leverage massive internal knowledge bases and conversational data to improve customer service, increase employee productivity, and help make better decisions.
Generative Large Language Models (LLMs) are instrumental in RAG and can produce human-like responses. However, LLMs are limited by context length. This is because of the quadratically increasing computational time and space complexity for the attention layer and embedding dimension with increasing token size.
To address this limitation, researchers use vector databases to improve both vanilla QA and RAG performance, store enormous amounts of text data, and retrieve relevant data efficiently. However, vector databases have several challenges, including high latency, high cost, repeat-indexing cost with model upgrades, and low reliability due to their relatively new technology.
FAISS, a new indexing solution, appears to tackle the challenges mentioned above.
Emerging Retrieval-Augmented generation (RAG)-powered enterprise conversational QA applications leverage massive internal knowledge bases. These QA applications can help companies to:
Apart from cost and latency, the biggest challenge is context length, as these LLMs have a maximum token limit that encompasses both the input prompt, data, and generated output. While handling long documents, the model will truncate the text to fit within the allowable context length. LLMs have a restricted input size (example 4,096 for GPT3.5). While fine-tuning LLM with confidential data is a viable option, it may be hard to get good results, and it can be a costly operation.
Recently, the hot technology of vector databases aims to address the above problems and implement the above use cases. Using an external vector database to store data and enable LLMs to get relevant data to answer the prompted question may improve LLMs performance. Vector databases are designed for efficient similarity search and can scale to accommodate massive volumes of data because of advanced indexing and distributed systems.
But vector databases have their own issues. As presented in this article on latency, the cost of generating indexes and maintaining millions of these data-chunks can be in the tens of thousands of dollars on a monthly basis and is susceptible to scenarios where related information is scattered across chunks, causing poor quality and missing information retrieval.
This article shows step-by-step instructions for improving vector search by drastically reducing the indexing footprint to massively reduce indexing costs, achieving search precision, and using targeted search to get users the answers they want.
Data labeling entails creating labels for data chunks in large volumes of text data. These labels have to be search, intent and problem-statement focussed.
For example, assume 10 documents, each with 1500 words across 50-75 sentences, are divided into 10-15 chunks. By using data labeling, these chunks would result in indexing and maintenance costs of only a few words compared to 1500 words (more than 99% reduction) per document. This additionally results in much better search precision as it can quickly identify which tranches of text have relevant information to answer the query.
Labels or tags are ideal for indexing data; however, unstructured data doesn't have any. Without the e-commerce equivalent of tags (metadata), this article presents an ML technique for automatically tagging and labeling millions of documents, including paragraphs, which will serve as indexes for search.
In addition to the aforementioned issues, excessively casual language is problematic when used in informal texts such as client reviews and comments. For the exact information-based search that prospective users want, this must be eliminated.
Figure-1 - Tag-indexed document: indexing architecture offline and real-time queries
This example is for Yelp restaurant recommendations, but it could equally be used for automobiles, cuisine, health and services, beauty and spas, etc. within Yelp data.
The idea is to use a retrieval-based chatbot with semantic search to find the best answer among numerous responses. Tags help natural language processing models discover and analyze content by identifying text portions with bounding boxes. Yelp reviews from restaurants in the Tampa, Florida, area were utilized to create this example search.
These recommendations were based on five-star evaluations from Tampa, Florida, for "crab" searches.
Figure-2 - RAG output using LLMs and FAISS running on index-tags
{'review': ' So excited to find a great Mexican place in Tampa',
'label': 'emotional'},
{'review': ' And OMG the warm carrot cake', 'label': 'food_talk'},
{'This place is a Tampa classic': 'emotional', ' The Cuban sandwich was on point': 'food_talk',}
‘This place is a Tampa classic. The Cuban sandwich was on point. the breakfast was another winner.I enjoyed also the devil crabs. Appreciate the reasonable prices, can't wait to come back and have again’
‘The Cuban sandwich was on point. I enjoyed also the devil crabs’
Gaspar's Grotto has a little something for everyone! This is my go to bar during the week and the weekend, whether you're in the mood to chill, dance, or brunch with the best of them, there's something for everyone. They have the best deviled crab I've had since in Tampa.
So delicious!!! Do not hesitate to come eat your little Crab Heart out. Crab spice is just hot enough, and I love hot.
I'm blown away by the service and the food! I ordered the ayce crab and they don't lie! Big claws and clusters and they bring it out faster then I could eat it. I'm a PICKY person and this was hands down the BEST crab I've had in town!!! See ya next week! ….
“I want you to act as a data scientist and analyze the dataset. Provide me with the exact and definitive answer to each question. Do not provide me with the code snippets for the questions. The dataset is provided below. Consider the given dataset for analysis. The first row of the dataset contains the header. Convert the filtered_customer_review column into 20 words. return as summarized food-magazine article to lure foodie customer to restaurant in 'name' column as per food in column filtered_customer_review”
Final results for “crab” as a queryGaspar's Grotto: This popular bar offers something for everyone, including the best deviled crab in Tampa. Come for a chill night, a dance, or brunch with friends.
Mr and Mrs Crab: Delicious crab dishes await at this restaurant. Try the Crab Spice for a hot and tasty treat.
Surf Shack: Ayce crab is the star of the show here. Big claws and clusters are served quickly and are sure to please even the pickiest of eaters.
Brocato's Sandwich Shop: This spot has the best deviled crabs around. Try the Italian sandwich for a meal that weighs 3 pounds.
Crafty Crab: Take your mom here for a great time and delicious crab legs. Seascapes Beach Club: Outstanding crab legs are the highlight of this beach club. Enjoy a delicious meal with a view….
Another search for the query “Cubano” food:Pepo's Cafe: Enjoy the best Cuban food in Tampa at Pepo's Cafe.
Franci's Cafe: Franci's Cafe offers fresh, authentic Cuban food that will delight your taste buds.
Columbia Restaurant: When you're craving Cuban, head to Columbia Restaurant for a delicious meal.
The Stone Soup Company: The Stone Soup Company offers award-winning Cuban cuisine.
La Teresita Cafe: La Teresita Cafe is the best place to get Cuban food in Tampa.
Pinky's: Pinky's is the go-to spot for the best breakfast Cuban in Tampa.
Prior to vector search, domain-based reduction in indexing footprint
Figure-5 - Example of reduction in number of tokens to be indexed
The use of a vector database and semantic search to improves the performance of GPT in answering questions about massive amounts of text data. It provides step-by-step instructions on reducing indexing costs by automatically tagging and labeling documents for indexing.
The data presented above indicates an 85% decrease in the quantity of tokens that need to be indexed. Used A tokenizer OpenAI example on counting tokens with Tiktoken.
Authors: Anshuman Guha, Sid Kashiramka, Ravi Krishnan