Large Language Models (LLMs) are revolutionizing the AI domain at an incredible pace. The diffusion of such powerful models is happening across various domains, from education to entertainment to healthcare.
Although the flexibility of this technology is laudable, the reliability of the generated content needs to be approached cautiously. The propensity of LLMs to generate non-factual information, known as hallucinations, is a bottleneck for deploying such powerful models in accuracy-sensitive spaces like healthcare, biomedicine, and the military, among others.
Approaches such as fine-tuning and domain-specific pre-training have been useful in addressing this issue to some extent; however, they come with the overhead of additional computational expense. This blog discusses an alternative approach to empower LLMs to generate credible and factual content using Knowledge Graphs (KG). KGs represent knowledge explicitly in a connected fashion, while decoder-like models (such as GPT) represent knowledge in a parametric fashion. Efficiently integrating domain-specific explicit knowledge from the KG along with the parametric knowledge of the LLM can augment the user prompt.
Such augmentation provides additional context for the LLM to generate content that is grounded in factual knowledge from the KG. This approach not only enhances the veracity of LLM responses but also comes with minimal computational expense.
Prior to delving into the nitty gritties of this exciting topic, it is essential to establish clear definitions for specific concepts. Hence, we will start by defining KG, understanding the functionality of LLM, and then subsequently exploring the ways in which KG can be capitalized to enhance the performance of LLM.
KG is a way of representing the information as graph, with nodes representing concepts and edges representing relationships between them. This form of representation offers several benefits. One key advantage is the ability to discover new knowledge by navigating the connections within the graph. This helps to unveil actionable insights, allowing for informed decision-making.
Knowledge Graphs (KGs) find applications in various aspects of our daily lives. For instance, Google employs a knowledge panel sourced from its KG to provide information related to your search queries. Pinterest utilizes its own KG to suggest images to users, while Uber relies on its KG to recommend food items tailored to individual user preferences.
Social media platforms such as Facebook use their social graph to suggest connections with messages like ‘People you might know’. In biomedicine, KGs serve as invaluable resources, facilitating insights into drug mechanisms and thereby aiding in repurposing or discovering new drugs. To summarize, KG is an excellent way of representing knowledge and it also exhibits versatility based on its applicability across diverse domains.
As advancements and breakthroughs unfold in the field of AI, understanding the concept of Large Language Models (LLMs) and their underlying mechanisms becomes crucial. At the core of an LLM lies the Transformer network, prompting the fundamental question of ‘what exactly is a Transformer network’.
Transformer is a type of neural network designed to process sequences and discern the relationships between them through an efficient technique known as ‘attention mechanism’. What sets Transformers apart from their predecessors is their capacity to process sequences in parallel, significantly enhancing processing speed. Initially employed for language translation tasks, such as English to French translation, the Transformer architecture found its way into other language modeling efforts. This involved stacking of multiple Transformer layers, resulting in the creation of LLMs with parameters reaching into the order of billions or trillions.
While there are various types of LLMs available today, varying in scale based on their parameters, the foundational structure continues to be rooted in the Transformer model.
Large Language Models (LLMs) can be categorized into three types: Encoder-Decoder type (e.g., T5 model), Encoder-only (e.g., BERT model), and Decoder-only types (e.g., GPT models). The distinctions among these LLMs stem from variations in their architectural style and training methodologies. This discussion will specifically delve into Decoder-only type LLMs.
Decoder-only type LLMs, such as GPT, Llama, PaLM, and others, operate as ‘next-token predictors’. In essence, these models use the given input to generate the subsequent token. The generated token, in conjunction with the previous inputs, is then employed to generate the next token, and so forth. This process is known as the autoregressive approach to text generation. A key insight from this brief overview of how GPT-like models function is that they represent the knowledge necessary for predicting the next token within the weight parameters of the Transformer neural network. Hence, LLMs have implicit or parametric knowledge representation.
The application of parametric knowledge representation extends to various capabilities, with one notable example being text generation, as exemplified by models like chatGPT. This functionality allows these models to serve as user-friendly chatbots. Given that LLMs undergo training on an extensive dataset comprising billions or trillions of tokens, they acquire a comprehensive grasp of the language on which they are trained. This broad understanding enables them to perform tasks for which they were not specifically trained.
For instance, even though chatGPT may not be explicitly trained to recognize sentiment in movie reviews, it can still accomplish this task when prompted. This capability, known as zero-shot prediction, exemplifies another advantage of parametric knowledge representation. Moreover, these parameters can be fine-tuned using custom data, allowing the infusion of domain specificity into the LLM.
Despite its advantages, parametric knowledge representation comes with certain limitations. LLMs occasionally show proclivity to generate non-factual information in response to user prompts, a phenomenon known as 'hallucination'. This poses a significant challenge, particularly in domains where accuracy is paramount, such as biomedicine and military applications.
Another drawback is the lack of provenance in the text generated by an LLM. The generated text originates from the embedded knowledge in the parameters, tuned based on the training data. Consequently, tracing the provenance associated with the generated text becomes challenging, raising concerns about the veracity of the information. Furthermore, parametric knowledge is confined to the data used for training up to a specific point in time. Updating the knowledge in LLMs necessitates continual training, incurring additional computational costs.
This holds true for fine-tuning LLMs to impart domain specificity, introducing a similar challenge of computational overhead.
As mentioned in the definition of KG, they represent knowledge explicitly using nodes and edges. On the other hand, LLMs represent knowledge in implicit fashion within its parameters. Integration of these two different types of knowledge representations offers many advantages. Primarily, the interconnected knowledge within a KG can furnish a context-rich environment for an LLM to address user queries effectively. Consider a scenario where a user poses the question "who is the president of the United States?".
The answer to this question changes over time, requiring the LLM to continually update its internal knowledge for providing accurate responses. Continuous training to keep the LLM's knowledge up-to-date is computationally intensive. Here, KGs come to the rescue, offering a more cost-effective method for maintaining updated knowledge compared to LLMs.
The approach involves extracting information from a KG to provide context to the LLM. Consequently, the LLM can utilize this information to generate timely and accurate responses without the need for expensive parameter tuning. This approach is known as 'Retrieval Augmented Generation' (RAG).
In addition to providing timely updated information, KG offers additional benefits to LLMs. Since information is represented in a connected fashion in KG, it serves as a valuable resource for conducting complex multi-hop reasoning. KGs have been extensively employed for predicting the likelihood of relationships between two nodes based on their connectivity, exemplified by applications like Facebook's friend suggestions in the 'people you may know' feature. This capability allows LLMs to leverage KGs as reasoning tools when responding to user prompts, offering well-informed answers grounded in factual knowledge.
For instance, if a user seeks an explanation for why drug A is effective in treating disease B, the KG would have nodes representing both drug A and disease B. Traversing the graph from drug A to disease B enables the collection of meaningful connections that elucidate the mechanistic reason behind the efficacy of drug A in treating disease B. This contextual information can then be incorporated into an LLM’s response to provide a comprehensive answer to the user’s query.
In addition to being a context provider, KG also qualifies as an excellent data source for training LLM. LLMs necessitate high-quality data for parameter tuning and the generation of coherent text, with data processing being a pivotal step prior to training. KGs are very systematic in this process since they adhere to a structured data modeling schema based on which nodes and edges are represented. It is important to note that this approach of utilizing KG for LLM training demands careful consideration of graph serialization. Exploring the order of relations to include in the training data is crucial for capturing intricate relationships within the data. For instance, if we have a graph illustrating the interaction between products and users, training an LLM using first-order relations—such as which products a user utilizes—provides one level of insight.
Alternatively, training with higher-order relations, like second-order relations that reveal products used by user A also being purchased by user B, could potentially enable the LLM to comprehend user similarity based on product interactions. To summarize, utilizing KGs as a data source for LLM training could contribute to the development of compelling and impactful LLM applications.
Integration of KG and LLM could emerge as a potential technological alliance, showcasing its prowess in addressing real-world challenges. This integration could go beyond the conventional method of training LLM with language corpora, and thereby offering extended utility and versatility. Whether employed within a RAG framework or utilized for the direct training of LLM, the choice depends on the specific problem at hand.
In essence, the synergy between KG and LLM seamlessly integrates explicit and implicit knowledge respectively. This factual knowledge-grounded approach has the potential to enhance the reliability of the text generated by the LLM, making it adaptable for deployment in knowledge intensive domains.
Find the author on LinkedIn.