Prompt Engineering

Last Updated: February 2023

Introduction

Prompt engineering refers to the process of designing and optimizing the inputs or "prompts" to an AI model to elicit the most accurate, relevant, and meaningful outputs. This process involves a deep understanding of the model's capabilities, idiosyncrasies, and limitations, making it a blend of both science and art. Implementing apps that use prompt engineering can be technically challenging too.

The heart of effective prompt engineering lies in understanding how AI models learn. Large language models, trained on vast datasets, learn to predict the next word in a sentence based on the context provided by preceding words. Therefore, the way I frame a prompt can significantly influence the model's response. The choice of keywords, the context provided, the tone, and even the specificity of a prompt can dramatically alter the output. Consequently, prompt engineering involves a lot of experimentation and iteration to find the best prompts that yield the desired results.

So, why is prompt engineering so crucial? The answer lies in the fundamental goal of any AI system: to serve as a useful tool that can assist in complex tasks, provide insights, and make our work more efficient. For example, in data analysis, a well-crafted prompt can help an AI model generate a nuanced analysis of complex data, identify patterns, and provide predictions. Similarly, in research, optimized prompts can enable AI models to solve complex problems or generate new hypotheses.

However, while prompt engineering is powerful, it is not without its challenges. The quality of a model's output is highly dependent on the quality and specificity of the input prompt, which can be a double-edged sword. Too vague a prompt might result in broad and irrelevant responses, while an overly-specific prompt might constrain the model's creativity and limit the scope of its output. Hence, prompt engineering requires a delicate balance. Furthermore, biases inherent in the training data can emerge in the model's response, which calls for careful scrutiny and continuous tweaking of prompts.

Demonstrating the impact of prompt engineering

We can use the webapp below to see how prompt engineering alters a model's response. The same language model is prompted with the company name. On the left the prompt is passed directly to the model, whilst on the right it is engineered to obtain a more targeted response.

Approaches

There are several ways in which we can engineer prompts to get responses from models that best fit a specific requirement. The "Zero-shot" and "Few-shot" approaches provide the model with no, or just a handful, of examples to guide its output. The "instruction" approach focuses on the task definition or command within the input prompt, influencing the model's output through its specificity, clarity, and context. "Chain of Thought" (CoT) approaches allow for a more dynamic and interactive engagement with the model, breaking down complex tasks into a series of simpler, linked prompts.

Zero shot

In the context of machine learning, "zero-shot" learning refers to the ability of a model to handle tasks it has not explicitly seen during training. Consider a language model like GPT-4, which has been trained on a diverse set of text data. The model learns to predict the next word in a sequence given the context of the previous words. However, it doesn't explicitly learn tasks like translation or summarization during training. Instead, it learns a rich representation of the data, which can be leveraged to perform these tasks in a zero-shot manner. This is achieved by carefully crafting the input prompt to specify the task and provide the necessary context. Despite not being explicitly trained on these tasks, GPT-4 can perform them, effectively making a 'zero-shot' at a completely new task.

Few shot

"Few-shot" learning, on the other hand, is a learning setup where the model is provided with a small number of examples during the inference stage to guide its output. In the context of a language model like GPT-4, this might look like providing a couple of examples of a specific task as part of the prompt. For example, to make the model generate an analysis, I might start with a couple of examples before providing the initial line for the model to complete. The few-shot setup helps the model understand the task better by providing direct examples, guiding its inference towards more accurate outputs. This approach has proven effective across a range of tasks like text classification, translation, and more.

Chain of Thought (CoT)

The "Chain of Thought" (CoT) approach is a strategy for engaging with AI models in a more dynamic and interactive manner. Instead of crafting single-shot prompts and expecting the model to understand complex tasks in one go, CoT involves breaking down the tasks into a series of simpler, linked prompts. In each step, the model's output from the previous step serves as part of the input for the next, creating a continuous chain of thought. This allows for better control over the model's output, especially for complex or nuanced tasks. However, it requires careful handling of the context window since these models have a limit to how much context they can consider in one go. Therefore, crucial information needs to be reiterated to ensure it's within the model's context window and considered in its responses.

Instruction

The term "instruction" in the context of language models, refers to the task definition or the command given to the model in the input prompt. The instruction is a crucial part of the prompt that guides the model's output. It can be explicit, where the task is directly stated like "Translate all non-English text in the following text to English:", or implicit, where the task is indirectly indicated through the context. The instruction's formulation is a key aspect of prompt engineering, and its specificity, clarity, and context can greatly influence the quality and relevance of the model's output. Furthermore, the instruction can be designed to include reasoning steps or desired output format, providing more control over the model's output.

Technical Challenges

We can see in the section above "Demonstrating the impact of prompt engineering" that the GPT4 response is quite slow. The asynchronous function I am using is already optimized to a great extent. It is running the OpenAI API call in a separate thread, which is the best approach I can find for now to speed up I/O-bound tasks like this one. However, the speed of this function is primarily dependent on the response time of the OpenAI API, which is out of my control. If the API is slow, my function will also be slow because it has to wait for the response.

AI computations can be quite intensive, and the speed will also depend on the complexity of the task I am asking the model to perform. In this case, I am asking the model to generate a fairly complex and detailed output, which involves obtaining the "raw" model response plus the engineered response, so it takes some time. To improve the performance I have tried things like reducing the max_tokens value, but this has the negative effect of also reducing the quality of the response. I have also tried running multiple tasks concurrently, but this increases the API rate limits.

Where I have multiple independent tasks, I try to run them concurrently. For example, if I am calling this function multiple times with different prompts and modalities, I use asyncio.gather() to run these tasks concurrently. I have also reduced the amount of data, for example, by reducing the max_tokens. This does speed up the response time but at the cost of potentially less detailed responses. As with many aspects of the implementation, it is a trade off and a lot of trial and error is required.

Source code

Source code for this post can be found on my GitHub.

References

Chip Huyen (2023). Building LLM applications for production. Chip Huyen.

Lilian Weng (2023). Prompt Engineering. Lil'Log.

Wei, J., et al. (2022). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv:2201.11903.

Yoran, O., et al. (2023). Answering Questions by Meta-Reasoning over Multiple Chains of Thought. arXiv:2304.13007.