Knowledge Graphs and LLMs - tackling limitations with synergy

By Konrad Budek

Jörg Schad, CTO at ArangoDB, illustrates how the fusion of Large Language Models and Knowledge Graphs creates a potent synergy, bypassing the need for specialist teams to manage database queries and averting model hallucinations.

In the tech realm, LLMs, particularly driven by the success and versatility of ChatGPT, are making significant waves. At the same time, knowledge graphs have also captured attention, being actively employed by industry giants such as Amazon, Facebook, and Google.

During this recent Webinar, Jörg Schad showed that combining these technologies comes with synergy that boosts their effectiveness and tackles the downsides.

What's A Large Language Model

"A Large Language Model is a probabilistic model of natural language. It essentially serves as a natural language interface to general knowledge. Everybody has tried ChatGPT or GPT-4," said Jörg Schad. "You basically get a human-like output. That would pass the Turing test," he adds.

Yet, the Large Language Model also comes with a set of limitations. For example, these systems are prone to hallucinations. "The language model generates random facts that are not based on the data it was trained on and do not correspond to reality. This is because it was trained on unstructured data and delivers probabilistic outcomes," says the expert.

What's A knowledge graph 

Basically, a knowledge graph displays the network of real-world entities and their relationships. "There is also a knowledge graph version of Wikipedia, delivering knowledge in a structured form" says the expert. Knowledge graphs can be private or public, depending on the needs of the company behind their creation. Knowledge graphs are widely used by companies like Google, Facebook, or Amazon.

"What is easy to see is that these concepts complement each other," says Jörg Schad. "Knowledge graphs require an expert to retrieve information, and the graph is limited only to the knowledge that was inserted there. The large language model requires no skill, but they are missing custom knowledge and suffer from hallucinations," he adds. "And together, they form a powerful combination."

Combining knowledge graphs with LLMs

Jörg Schad illustrates how ArangoDB harnesses the strengths of two technologies, crafting a tool that boasts a natural language interface while sidestepping the issue of hallucinations.

Three pivotal intersections between Knowledge Graphs and Large Language Models:
  • Structured Knowledge in Training: While text is inherently unstructured, integrating a knowledge graph provides structure, facilitating easier access to information.
  • LLM-Enhanced Knowledge Graphs: This approach capitalizes on LLM capabilities for enriching or generating knowledge graphs.
  • Harmonized Knowledge Platforms: A collaborative framework where both technologies operate in tandem.
The expert has also shared more detailed use cases of both technologies working together.

Structuring the knowledge

If a company has a database, such as a data lake, the LLM can be used to transform the data into a knowledge graph.

"A knowledge graph consists of separate entities, for example, a paper can be related to a conference where it was published, the author, title, year when it was published, and sometimes other papers where it was quoted or that were quoted in it," says Jörg Schad.

As he shows, one of the key challenges for knowledge graphs is the fact that every question needs to be prepared and processed by the expert team. The team itself works as a buffer between the knowledge graph and the rest of the organization.

"This slows down the process, hampers business performance, and puts additional stress on the expert team," comments Jörg Schad. The solution is to use the LLM model as a man-in-the-middle, replacing the expert team in knowledge extraction.

Generating A knowledge graph

Typically, a knowledge graph is generated from multiple sources, such as relational databases or data lakes. The LLM can support the company in parsing the unstructured text data into the knowledge graph.

"It can be the text from Wikipedia, and the model can be used to identify and extract each entity in the text, and later determine the relations. Large language models can both detect them and map their relations," says Jörg Schad. This speeds up the process of building the knowledge graph and significantly reduces the costs.

The power of embeddings 

"The knowledge graph will never be complete and some information will be missing," says Jörg Schad. "But classic machine learning can be extremely helpful," he adds.
The key is in the embeddings, which are large sequences of numbers representing certain entities. Usually, concepts close to each other are mapped nearby, as they appear in similar concepts.

"The knowledge graph can be encoded into embeddings. We can leverage that to find group concepts or cluster concepts and, by that, find missing links in the knowledge graph and encode it back."

As the expert shares, the process can be done in cycles, where the company encodes the graph into embeddings, enhances it, and later encodes it back. This supports the process of filling the gaps in the graph and mapping the interdependencies. 

End to end picture

"LLMs can help us create the knowledge graph from data. By that, it supports the expert team in building the knowledge graph," says Jörg Schad. "On the other hand, LLMs support business users, who need not know the structure of the graph and have expert knowledge. They can just ask their questions in natural language and have them answered.”

This could be implemented as a business chatbot, where users pose questions in natural language, similar to querying an expert team, and receive direct answers.

If you enjoyed this Webinar, make sure to sign up for the next one


Get the latest data science news and resources every Friday right to your inbox!