From “iron manual” to “Iron Man” — Augmenting GPT for fast editable memory to enable context aware question & answering

March 27, 2024
mins read
TL:DR We show how Marqo can be used as an update-able and domain specific memory to GPT to perform question and answering for products and chat agents, enabling companies to move beyond generic LLM chatbots (like ChatGPT) using their own datasets and context. We also show how reference and hallucination checks can be easily implemented. A walk-through with images and animations is provided along with the code to reproduce.

1. Product Q and A — “Iron Manual”

Indexing the product manual enables its use by the GPT (there are several gpt models, but gpt-3.5 turbo and gpt-4 are the latest). Highlights are retrieved from the manual and provided as context for the GPT prompt. This allows GPT to answer specific questions about the product (the foundation fo the question answering system). Code here.

2. Chat agent with history — “Ironman”

An NPC superhero that has a memory that also includes the iron manual. GPT as the NPC is encouraged to draw on its additional context which comes from this memory. Code here.


Large language models (LLM’s) can be used for many tasks with little (few-shot) to no (zero-shot) training data. A single LLM (like GPT or bert) can be used for semantic, contextual tasks like summarization, translation, question answering systems, and classification—they are very good for natural language processing (NLP), which goes far beyond information retrieval. 

Despite LLM’s recent success there are still some limitations. For example, after the machine learning models are trained they are not easily updatable with new information if the training set changes. They also have a fixed input length. This places restrictions on the amount of context they can have inserted when being prompted. To overcome these limitations, we show how an external knowledge base can be used as part of the LLM to provide a fast and editable memory (i.e. document store) for it to draw from.

Use case 1 —Product Q&A

For the first use case, GPT is paired with Marqo to create a powerful search function for product documentation. This allows question and answering of its features. It is also able to provide a nice compact answer that is easy to read.

1.1 The product documents

To test the question answering capabilities, an “in the wild” use case was desired. A paper manual for a recently purchased clothes iron was selected. If already digitized text is available, this step can be skipped.

The product documentation.

The manual is a particularly dry read. It consists of 7-pages of information related to the iron. Including information regarding its safe operation and maintenance.

1.2 Preparing the documents

Since the manual was on some paper, it needs to be digitized. AWS Textract was used to perform optical character recognition (OCR). The pages were two-columned which provided a challenge as the OCR output is left-to-right, causing the text to be intermingled. Bounding boxes are provided from the OCR output which would allow conceptual grouping of the text, however this was going to take too long. Instead, the OCR was performed again but with half the text blocked off by another piece of paper.

The original document with two columns and the masked document to make it one column.

After scanning, there were seven documents, each representing a column of text from the manual. Below is an example of the text after OCR.

1.3 Indexing the documents

After creating a digital copy of the product manual, the next step is to index them into Marqo. Marqo embeds the documents using an encoder and allows for fast and efficient retrieval of relevant documents. Marqo provides both embedding based and lexical based retrieval. These retrieved documents are then going to be passed into a prompt for GPT. GPT is then asked to answer the query with respect to the retrieved documents (the “sources”).

1.3.1 Installing Marqo

We first install Marqo and the Marqo python client,

1.3.2 Indexing documents

We then need to index the pages we have for the product. They need to be formatted as a python dictionary for ingestion.

Once the documents are prepared, we can start indexing them using the python client. If no index settings are present, the default encoder is used.

1.3.3 Searching documents

At this point, Marqo can be used to search over the document embeddings using an approximate nearest neighbor algorithm (HNSW).

Or using lexical search which uses BM25.

1.4 Connecting Marqo to GPT

The documents (product manual) can now be searched. Searching and retrieving relevant documents will provide the context for GPT to create a final answer. GPT requires an API key and can be obtained from the OpenAI website. The key then needs to be set as an environment variable.

1.4.1 Prompt creation

The first thing that needs to be done is to create a prompt. There are a plethora of examples to draw from here but something like the following will get good results.

Here we instruct GPT to answer based on the context and not make anything up (this may not be perfect though). The question and answer is then inserted along with the context (“summaries”).

To save time, we will use Langchain to help with the communication with GPT. Langchain can make it easy to setup interactions with LLM’s and removes a lot of the boiler plate code that would otherwise be required.

1.4.2 Preparing the context

In order to connect GPT to Marqo, we need to format the results so they can be easily inserted into the prompt that was just created.

To start with, we take just the highlights from Marqo. This is convenient because they are small pieces of text. They fit within the prompt and do not occupy too many tokens. Token limits matter because GPT (and LLM’s in general) will have a context length and often charge by token. This is the maximum number of tokens that can be used for the input. The more context, the larger the text (and background) that GPT can use. The drawback here is that the highlights might be too short to accurately answer the question.

1.4.3 Token aware context truncating (optional)

To help with token limits while also having control over the input context length - a “dilation” procedure around the highlight can be performed to allow for more context. This means that some text before and after the highlight is included to allow for greater context for GPT. This is also very helpful as the pricing models for these models can be per token.

The next step is to provide the prompt with formatted context.

1.4.4 GPT inference

Now we have the prepared documents and prompt, we can call GPT using Langchain. We initiate an OpenAI class which communicates with the GPT API.

The result is a dictionary with the text output from GPT. This essentially completes the required steps to augment GPT with an external knowledge base. However, we can add another feature to include and score sources of information that were used in the answer. This can be useful since LLM’s are known to hallucinate details. Providing the original sources can be used to check the output of the LLM’s results. In the next section we show how a re-ranker from two-stage retrieval can be repurposed to check which sources were used and provide a score.

1.4.5 Rating sources

After we have received a response in from the LLM, we can score the sources with respect to the LLM’s response. This is in contrast to other methods that get the LLM themselves to cite the sources. From the experience here, that method was sometimes unreliable.

For the proposed method, we take a re-ranker (sentence-transformers cross-encoder) which would normally score each (query, document) pair to re-rank search results. Instead here, we score each (llm_result, document) pair. This provides a score for the “relevency” of the LLM’s response with the provided sources. The idea being, the ones that were used will be most similar to the response.

We can see scores for each piece of context with the LLM response.

The model is a classification model. A score of 1 is exact match and 0 is not a match. This method is not perfect though so some care should be taken. For example, GPT can be quite verbose when it does not know an answer. This response can include the original question and will cause false positives when scoring. Fine-tuning the re-ranker would probably reduce the false positives considerably.

Use case 2 — conversational agents with a story

The second use case deals with a conversational agent that can draw on history (or background) as context to answer questions. This could be used for creating NPC’s with a backstory or other chat agents that may need past context.

2.1 Indexing NPC data

In the exact same way we indexed the iron manual, we will index some character data. This data contains various pieces of their backstory which can then be drawn on when answering questions.

2.1.1 NPC data

Here is an example of what the documents look like for the NPC’s:

2.1.2 Indexing the data

Now we can index the data. We need to get the Marqo client and create an index name.

Now we index the documents.

We can search and see what comes back.

Which gives the desired output

Different characters can be easily selected (filtered). This means only their background can be searched.

2.2 Connecting Marqo to GPT

Now we have indexed the documents, we can search over them. In this case the documents are the backstories and the search is used context for the conversation. This search and retrieve step will provide the context for GPT to create a final answer.

2.2.1 Prompt creation

We use a prompt that contains some context and sets the stage for the LLM conversationalist.

Here we instruct GPT to answer for the character based on the background and to reference it where possible. Langchain is then used to create the prompt,

2.2.2 Preparing the context

Here we truncate the context around the highlight from the previous search using Marqo. We use the token aware truncation which adds context from before and after the highlight.

The next step is to provide the prompt with formatted context.

2.2.3 GPT inference

Now we have the prepared documents and prompt, we can call GPT using Langchain. We initiate an OpenAI class which communicates with the GPT API,

The result is a dictionary with the text output from GPT. For example this is the response after the first text from the human,

which aligns well with the background which was,

2.3 Making it conversational

The next step is to do some iterative prompting and inference to create a chat. We do this by iteratively updating the prompt with the question, searching across the background, formatting the context, calling the LLM and appending the result to the chat sequence (full code here).

We can now see the conversation and the agent drawing on its retrieved background.

This fits nicely with the background we gave the character,

2.4 Editing the characters background

We can patch, delete or add documents for the agents background with Marqo. Lets add something from the previous example,

This adds some of the safety information from the iron manual. We will also take the bottom ranked results (i.e least relevant) to make it interesting. The following is the conversation — we can see it weaving its new background into the story nicely!


We have shown how it is easy to make product question and answering and chat agents with an editable background using LLM’s like GPT and Marqo. We also showed how the limits of context length can be overcome by judicious truncation of the text. Reference scoring can be used to help verify the output. Although the results are really encouraging some care should still be made. GPT still has a habit of hallucinating results even with strong instructions and reference checks. If you are interested in combining GPT (or LLM’s in general) with Marqo — check out the github. Finally, if you are interested in running this in production, sign up for our cloud.

Jesse Clark