Demystifying Prompt Engineering for ChatGPT

By Jaya Susan Mathew

ChatGPT was introduced in November 2022, sparking a revolution of generative AI adoption. Before its publication, only academia-related people, researchers, and experts in the field knew about GenAI and the myriad possibilities it brings. By August 2023, McKinsey reported that one-third of the users surveyed already have used generative AI to enhance their work. 

With the publication of the paper, ‘Attention is all you need’ in 2017, transformer architecture and attention mechanism made way for plethora of large language models (LLMs) developed by both research groups and companies, as shown in the figure, some of which are open sourced and available at (https://huggingface.co/models). 

Previously, users needed to learn to code in various computer programming languages like C, C++, Python, Java, SQL etc., to get the computer to understand what they wanted to do and use an AI model, which created a barrier to entry when trying to use NL models.

Now with using natural language via an AI prompt to state what they want to accomplish, this breakthrough has been a turning point in the adoption and is indeed helping democratize AI. 

Introduce ChatGPT to motivate the need to learn a bit how to craft prompts: 

ChatGPT, which was introduced in Nov-2022, has gained popularity and now practically everyone is looking for ways to use it to improve their productivity. ChatGPT’s popularity stems from the fact that it can take NL input from the user and return appropriate responses in a variety of formats such as natural language, code, etc. 

To ensure the adoption of these models continue to grow, the output from the models has to be of acceptable quality. The quality of the responses that a GenAI application like ChatGPT returns not only depends on the underlying model but also on the types of prompts the user provides to the model. 

So as an end user, we have limited levers to change the underlying models (settings like max response, temperature, top-p, frequency penalty, presence penalty etc.) but can tweak the response by learning how to prompt the model effectively. 

Often the user might notice that the same prompt might elicit different responses, to ensure the response stays more or less consistent while the user experiments with the prompts, the user could start off by setting the model’s ‘temperature’ parameter to equal 0.

What is a prompt in AI (and additional terminology) 

Before we deep-dive into how to create a good prompt, let’s get started by defining what a ‘prompt’ is. ‘Prompts’ are simply ways the user can tell an application like ChatGPT what the user wants it to do (for example: summarize a text, write a joke etc.). 

Next, the term ‘prompt engineering’ describes the process of improving the prompt, where users who use the applications can improve the quality of the generated responses. 

Let’s also define a few additional terms that the user might encounter when using GenAI applications like ChatGPT and develop their own custom prompts.

  • First is the ‘system message’ which helps sets the context for the model by describing expectations from the model and constraints (if any). 
    For example, ‘You are an AI assistant that helps people find information.’, this tells the system that it’s generic role is to help people find information via the Chat UI.
  • Next in the ‘user prompt’ the user provides additional context and describes the NL task to be performed. 
    For example: “Translate the following text into French: On a scale of 1 to 10, how satisfied are you with your in-store experience today?’’.  

Artificial Intelligence Prompt Engineer manual - Designing prompts and tips on how to create good prompts, what to avoid when creating prompts.

  • Start off by creating very clear, simple prompts to get a good baseline response from the model. Often long convoluted instructions tend to result in incorrect responses from the model.
    For example: “Translate the following text into French: ‘On a scale of 1 to 10, how satisfied are you with your in-store experience today?’’’.  
  • Examine the quality of the response generated and then help improve the output by giving the model examples which are called one-shot/few-shot learning prompts that can include one, or a few, examples of the output you expect. This will ensure that the model can generate further responses in the same style as the examples provided in the prompt.
    For example: “‘Hello, welcome to the store’ in English can be translated into French as ‘Bonjour, bienvenue dans le magasin’. Now, translate the following text into French: ‘On a scale of 1 to 10, how satisfied are you with your in-store experience today?’’’.  
  • To further improve the output, often providing a line to provide step by step reasoning while generating the output could prove to be useful. This technique is especially helpful when the question posed by the user requires some mathematical computation or deductive reasoning and the answer is not found directly in the document corpus. 
    For example: in the system message, the user could add the following text: ‘If the question has multiple parts or if the question is complex, think step by step to generate the answer’. For more details refer to: Chain of thought reasoning.
  • If the user has any other specifications on how the model generates the output for example detailed vs short answer, further instructions can be provided. 
    As an example: ‘Provide detailed, accurate and factual answers to user questions.’.

Preventing hallucinations and ensuring the output is grounded

While the output in majority of the cases seem coherent, one of the major issues with putting these models into production is that the output generated could be completely inaccurate, ungrounded (fabricated with no citation). To prevent inaccurate responses, the prompts can also include grounding data to provide additional context. 

To do so, an AI prompt engineer should include the contextual data in the prompt so that the model can use it to generate an appropriate output. Unlike the initial few models like GPT-3.5 which had a context window of 4,096 tokens, the latest GPT4-Turbo model has a rather large context window of 128K tokens (tokens are essentially pieces of words, typically a token is approximately equal to 3/4th of a word in English). 

Often editing the system prompt also might help reduce model hallucinations or ungrounded responses. 
For example, ‘You are an AI assistant that helps people find information, use only the information provided to generate a response.’.  If needed, the user can even add such text to prevent ungrounded responses ‘If there isn't enough information below, say you don't know. Do not generate answers that don't use the sources below’. 

Users have to also bear in mind that these LLMs are pre-trained on large datasets (often open source from the internet and other sources) but at a specific point in time, hence they lack knowledge about concepts outside that training dataset (for example if the training data is from 2022 and the user asks about an event from 2023, the models tends to give inaccurate information unless additional data is provided). Here the user can provide more data to the context to help generate data for the newer timeframe. 

As time goes on, newer models or model generations will have richer capabilities but also bring unique quirks and trade-offs in terms of cost & complexity. Often, the latest and more expensive GPT-4 models might not be needed for your use case, hence experimenting with GPT-3 models might be worth investing some time in to check for the output generated by the different models before finalizing on the model to be used in production. 

Safeguarding the system

Prior to putting these prompts into production systems, it is also good to be aware of the vast domain of ‘prompt hacking’ and how these systems need to be safeguarded against malicious users. 

‘Prompt hacking’ is a term used to describe a type of attack that exploits the vulnerabilities of LLMs, by manipulating their inputs or prompts. Unlike traditional hacking, which typically exploits software vulnerabilities, prompt hacking relies on carefully crafting prompts to deceive the LLM into performing unintended actions like Prompt injection, Prompt leaking, Jailbreaking. 

Summary

Prompt engineering as a domain is still evolving. It is a super-fresh, new and exciting field to explore. It is an opportunity for data science experts and non-technical people alike by introducing the natural language into the human-machine relations.

Further reading:

To learn more about common AI terminology, refer to https://news.microsoft.com/10-ai-terms and for more information about the mathematical principles behind these LLMs, refer to Generative AI for Beginners.

References: 

Papers:

SIGN UP FOR THE DSS PLAY WEEKLY NEWSLETTER
Get the latest data science news and resources every Friday right to your inbox!