Favicon

Showing how Llama-2 can be used as an 'internal' AI assistant when, for example, client data must be protected, alongside 'external' AI assistants such as GPT-4

Introduction

Many enterprises I engage with express hesitancy about transmitting their data through public APIs when leveraging AI models like ChatGPT or PaLM-2. The primary apprehensions are rooted in safeguarding intellectual property (IP) and ensuring stringent data privacy protocols. While the latest version of ChatGPT Enterprise does offer robust security measures, including end-to-end encryption and explicit assurances that the user data won't contribute to the ongoing training of OpenAI's models, some institutions may still have specialized requirements that necessitate an isolated environment for model deployment. In such scenarios, implementing and perhaps fine-tuning an on-premises AI model like Llama-2 can offer a tailored solution. By adopting this approach, data flow remains confined within the organization's secure network perimeter, and the unique advantages of customizing the model for domain-specific tasks are retained exclusively by the enterprise.

In this post I show how I have implemented llama-2-7b alongside gpt-4-0613 to produce a basic due diligence report for a given corporate name according to specific circumstances, for example, different levels of sensitivity pertaining to the organisations interest in that name, or additional information passed with the name through the prompt.

Combining Open-Source and Closed-Source Approaches: The Most Sensible Strategy Today

Improving at a rate that many businesses and financial organizations may not expect, top-tier models are set to continually enhance their performance. However, there's a case to be made that for specific, specialized tasks, open-source models could eventually outperform their closed-source counterparts. This viewpoint suggests that the more niche the task and exclusive the data, the higher the chances that a customized foundational model will exceed the capabilities of leading closed-source models.

Views differ on the duration represented by the x-axis in the adjacent chart, ranging from a few months to several years, or even indefinite. My belief is that the time frame is highly dependent on the specific task or business environment at hand.

In the realm of language models, "closed-source" generally refers to models where aspects like training code, architecture, or pre-trained parameters are proprietary and not open for public use or alteration. On the other hand, "open-source" usually means that the training code, model architecture, and possibly even the pre-trained model weights are openly accessible and modifiable.

Completions code

My implementation of Llama-2 is on one of my machines at the moment, a lambda ubuntu laptop with a 3080GPU, hence screen shots of the UI below instead of an interactive demo as I usually provide. I plan to provide a working demo in the near future. I allow the code to dynamically select between GPT-4 and llama-2-7b based on the provided modality. In the llama-2-7b modality, for example, the prompts are generated using the Llama class's text_completion method. My code for the llama-2-7b text_completion is shown below:

completions code

Prompts

My prompts are shown below ...

gpt-4 response

llama-2-7b response

It is necessary to do quite a bit of parameter-tuning to get to an acceptable intitial response. To give a sense of this, I set max_seq_len=100, max_gen_len=1500, as kind of comparable to 1500 for gpt-4 max_tokens, max_batch_size=4, temperature=0, and top_p=0.9, although it is overridden here by temperature=0, which aligns with my gpt-4 settings. You can see the entirety of my parameter tuning on my GitHub. Note the relative simplicity of the prompt, compared for example to the commented out version I experimented with. A response is shown below and I think it clearly shows the basics are in place. Please bear in mind this model has not been fine-tuned or otherwise trained, as would be the case if we use llama-2 as a baseline model.

llama-2-7b response

gpt-4 response

Compare to the gpt-4-0613 repsonse, bearing in mind the differences between the respective models! I think the kind of set up the UI illustrates, where internal models trained to meet specific corporate objectives handle sensitive information and external models are used as more general AI assistants, providing further levels of support and automation, is compelling.

gpt-4 response

Source code

Source code for this post can be found on my GitHub.

References

Llama 2 technical overview.Technical overview.

Llama 2 repo.Llama 2 repo.

© 2023 johncollins