RAG architectures using GPT models

Data and context

Is ChatGPT a trend or a digital revolution?

In a world where technological innovations make headlines and are just as quickly forgotten, ChatGPT has maintained a remarkable presence. But what is behind this
this continued interest? ChatGPT, a dialog system based on artificial intelligence, has proven to be more than just a fleeting trend.

It is the software's ability to hold human-like conversations and respond to a variety of queries with astounding precision that fascinates users worldwide. This fascination is fueled by the constant further development and the wide range of possible applications, which constantly open up new perspectives in both private and professional contexts.

The birth of ChatGPT

ChatGPT was introduced by OpenAI at the end of 2022 and has since reignited the discussion about artificial intelligence (AI) in society. The key to its social acceptance lies in its accessibility and ease of use. By integrating complex AI technology into a simple chat interface, ChatGPT has brought AI out of the research labs and directly into the hands of the general public. It has shown that AI is not just a tool for experts, but can also have practical uses in everyday life. Its intuitive usability and ability to understand and generate human speech have helped AI to be perceived as a tangible and useful tool.

Why companies are jumping on the ChatGPT bandwagon

From start-ups to multinational corporations, interest in ChatGPT and chatbots can be seen across all industries. Companies in customer service, IT, marketing and even education are recognizing the potential of this technology to increase efficiency, reduce costs and improve customer experiences. Chatbots can provide round-the-clock support, are scalable and handle a large number of requests simultaneously. They also enable personalized interactions and can therefore increase customer satisfaction and loyalty.

Superpowers and Achilles' heels

ChatGPT is an impressive example of how far the capabilities of AI-based dialog systems have already developed. It can communicate in natural language, answer complex questions, write texts and even generate programming code. However, despite its advanced capabilities, ChatGPT also has limitations. For example, it cannot independently feel emotions or develop creative ideas to the same extent as a human. It also relies on the quality and variety of the data it uses to learn. In the following, we will explore in more detail what ChatGPT can do, what its limitations are and how it fits into the existing technology landscape.

ChatGPT meets the business world

The integration of ChatGPT into the corporate context opens up a wide range of possibilities for optimizing processes and revolutionizing interaction with customers. For example, companies use ChatGPT in the form of chatbot solutions to create automated, intelligent and interactive dialog systems. These systems can answer customer queries, provide assistance or even support internal workflows.

LLMs

The language model behind ChatGPT is a Large Language Model. LLMs are advanced machine learning models that are trained to understand and generate large amounts of text data, enabling them to perform complex language tasks. This makes them useful for applications in areas such as customer service and education. When we talk about LLMs in this article, we focus on OpenAI's GPT models (ChatGPT). The findings and possible applications also apply to other LLMs such as those from Meta (LLaMA 2) or Google (Gemini).

Intelligent chatbot integrations

In order to better understand the practical use of ChatGPT in companies, it is worth taking a closer look at the Retrieval Augmented Generation (RAG) approach. This extends the possibilities of ChatGPT by incorporating specific knowledge into the answering of questions. Imagine being able to interact directly with your company's collected data as if you were chatting with an expert. This is exactly where RAG comes in: It combines an extensive database that makes it possible to find precise information on a topic or question with the ability of generative AI models, such as GPT, to translate this information into understandable, natural answers.

Fig. 1: RAG architecture in Azure. Source: taod

As part of the Azure platform, in particular through the use of Azure OpenAI Services, companies have access to advanced tools to create chatbots based on GPT models. Azure AI Search represents the company database, which is able to find the information or documents relevant to a query. This information is then fed into a GPT model (for example GPT-4), which is able to formulate precise and contextual answers that are provided to the user in real time.

In the practical use of RAG, especially in the Azure integration, the process is carried out in two key steps, as can be seen in the figure above: First, the user's query, which is formulated in natural language, is converted into a search query using GPT-4 (see Fig. 1: Steps 1 - 3), which can be processed by Azure AI Search (see Step 4). This makes it possible to extract exactly the information needed to answer the question from the wealth of available company data (see step 5/Retrieval). This information is then passed together with the original query to GPT-4, which generates a detailed and accurate answer
based on this specific context (see steps 6 - 8/Generation).

This approach makes the chatbot not only a powerful tool for information procurement, but also an interactive medium that answers complex queries with a precision that was previously reserved for manual research.

Azure AI Search

Microsoft's cloud-based search service makes it possible to integrate complex search functions into applications. With semantic search, Azure AI Search can understand the context and meaning behind search queries, leading to more accurate and relevant search results than would be possible with traditional, keyword-based searches. Vector search uses machine learning to convert content into vectors (numerical representations of the data), finding similarities based on the context of the query, even if the exact search terms do not appear in the text. This allows users to find information more efficiently by basing search results not only on keywords but also on the actual meaning. Alternatives for Azure AI Search include Elasticsearch or Algolia.

Steer AI correctly

Prompt engineering is a crucial aspect in the use of RAG solutions and plays a central role in optimizing the performance of AI chatbots. Prompt engineering refers to the process of formulating instructions, called "prompts", that are directed to an LLM, such as
GPT-4, to achieve the desired output. These prompts are critical because they inform the LLM what is expected of it and how it should respond to an input. A well-designed prompt can significantly increase the effectiveness of the LLM by generating more accurate and relevant responses.

In the context of RAG solutions, this means that the prompt must be designed to effectively guide the retrieval system to pull the most relevant information from the database. It must then instruct the generation system to convert this information into a fluent, understandable and helpful response (see Fig. 1).

Challenges of prompt engineering

Prompt engineering is challenging. Developing effective prompts requires creativity and the ability to precisely interpret and formulate the user's intentions. The challenges include:

1. precision
The prompt must be precise enough to generate relevant responses, but not so specific that it limits the creativity and flexibility of the AI.

2. contextual understanding
The prompt must provide the LLM with sufficient context so that it can interpret the request within the correct framework.

3. linguistic nuances
The nuances of human language, including irony, metaphors and cultural references, can be difficult for LLMs to grasp, making prompt design complicated.

4. dynamics
Since every company and every use case is unique, prompts must be customized, which requires continuous testing and adaptation.

5. scalability
Prompts must remain effective even with a large number of requests and different use cases.

Prompt engineering is an extensive but rewarding task that bridges the gap between human intuition and machine intelligence. Through careful development and continuous optimization of prompts, RAG solutions unleash their full potential and help companies automate and improve interactions.

ChatGPT in action

For a customer project, we developed a solution based on RAG architecture as part of an MVP. The aim was to develop a powerful chatbot that can automatically answer customer support queries relating to extensive ERP software.

For this project, we selected two diverse and unstructured sources of information that provide a solid data basis for the system. The first was over 2,600 Confluence pages, which are both the user manual for the end customer and a data basis for customer support. Secondly, we used over 160,000 historical customer support tickets, which contain conversation histories starting with a customer inquiry and subsequent instructions from customer support. This data was carefully processed, anonymized (customer support tickets) and loaded into Azure AI Search to enable efficient and accurate information retrieval.

The chatbot developed has proven to make the company's work much easier. Not only is it able to answer some of the customer queries completely independently, but it also provides relevant sources of information in addition to the answer, which can serve as a basis for customer support employees to process special cases that the chatbot cannot answer independently.

Strategies and solutions

Over the course of the project, we gained important insights that allowed us to refine our methods and improve the quality of our chatbot solutions. The challenges we faced led to the following learning outcomes:

1. quality of the document search

In order to generate precise answers, it was crucial to optimize the search for the most relevant documents. By using a combination of vector search, semantic search and keyword search in Azure AI Search, we were able to achieve our goals. For the vector search, a vector representation had to be generated for each document in the search database. To do this, we used the text embedding model ada-002 from OpenAI. These representations of text as vectors are called text embeddings.

Just as the information basis is stored as a vector representation in AI Search, we also generate a vector representation of the search query for AI Search. This allows the similarities of the vectors to be compared and the most relevant documents to be found. The semantic search in Azure AI Search also takes into account the context and meaning of the search queries, not just the exact match of the words. This enables a deeper analysis of the content by understanding and using the relationships between words and concepts. To do this, Microsoft has adapted methods that were also used in the Bing search.

In keyword search, which is also part of Azure AI Search, the system focuses on identifying and matching specific keywords within documents. This more traditional search is particularly effective for specific and well-defined search queries for which users know exact terms or phrases that occur in the desired documents. By combining keyword search with advanced methods such as semantic search and vector search, AI Search can cover a wide range of information needs, from very specific queries to those that require a deeper understanding of context.

What are text embeddings?

Text embeddings are a method of representing text documents as vectors in a high-dimensional space. Each document is represented by a vector that summarizes the meaning and context of the words in the document. This ensures that the texts can be compared.

To determine the relevance of the documents in relation to a search query, we convert the query into a vector in addition to the documents themselves. We then measure the similarity between the search vector and the document vectors using cosine similarity. This measures the cosine of the angle between two vectors and returns a value between -1 and 1; a value of 1 means that the query and the document are identical in content. For example, if someone searches for "best practices for data analysis", we compare the vector for this query with the vectors of all documents in the knowledge base. The documents that have the highest cosine similarity are considered most similar to the query.

Let's take a look at an example: We examine the similarity between a search query and two documents within a three-dimensional space (for comparison: ada-002 uses a space with 1536 dimensions!). This similarity is determined by the positions of the vectors in space and their cosine similarity to each other. The search query is represented by the vector Q, while the documents are represented by the vectors A and B. The specific values of these vectors are in our example:

Search query Q: [0.5, 0.4, 0.7]
Document A: [0.2, 0.8, 0.5]
Document B: [0.9, 0.1, 0.2]

The calculated cosine similarity values between the search query Q and the documents A and B are (we omit the specific formula for calculating the cosine similarity in this article):

For Q and A: 0.842
For Q and B: 0.716

These values indicate that search query Q is more similar to document A than to document B. The higher cosine similarity for Q-A shows a greater proximity in three-dimensional space, which indicates a stronger thematic or content-related match between the search query and document A.

Fig. 2 illustrates the positions of the vectors in space. Search query Q is shown in red, document A in green and document B in blue. The spatial arrangement and the proximity between the vectors clearly visualize why document A is considered more similar to search query Q: A is closer to Q than B, which is quantitatively supported by the Cosine Similarity values.

By using Cosine Similarity, we can effectively assess which document is more relevant to a given search query based on their spatial orientation and proximity to each other in multidimensional space. In this case, document A is clearly more relevant to search query Q than document B, which is confirmed by both the mathematical calculation and the visual representation.

2. complex data basis

During the development of our project, we were faced with the challenge of processing complex information from a comprehensive ERP system as outlined in a detailed manual. The manual contained detailed descriptions of functions, processes and idioms that required specialized industry knowledge. Our used GPT model, despite its advanced nature, showed a good basic ability to understand the texts, but reached its limits with very detailed descriptions that additionally contained specific technical jargon. Especially in cases where the answers to user questions were not directly represented in the available data and a transfer was necessary, the limitations of the model became apparent.

This experience emphasizes how important human input and specialized knowledge remain in many areas. It becomes clear that when introducing chatbots with the support of large language models, a thorough examination of their limitations is essential. In this context, continuous collaboration with subject matter experts is essential in order to establish a link between the competencies of AI and the specific needs of users.

3. non-contextual questions

When dealing with questions that lay outside the specific company context, we encountered an additional difficulty: it was not easy to answer such queries appropriately or to forward them accordingly. Leaving it to the GPT models alone to decide whether a question was relevant or not led to problems. To overcome this, we adapted our prompts to define exactly which topics the model should provide information on, while deliberately leaving out all other topics.

4. human communication style

It was crucial for us that the support responses generated by the system were not only accurate in content, but also in the tone of human customer service. To achieve this, we developed carefully crafted guidelines for designing a customer-friendly response and encouraged the model to follow the conventions of human customer service communication. This methodical approach proved to be particularly effective.

We also used the technique of "few-shot learning". In this approach, the model generates similar answers based on just a few examples or "shots". Instead of training the model in the traditional sense, this method allowed the model to quickly adapt to specific customer support requirements and styles by analyzing a small number of examples. This approach significantly helped to improve response quality by enabling the model to generate accurate responses that were formulated in line with human communication styles.

5. avoidance of hallucinations

To minimize the risk of "hallucinations" - a phenomenon in which AI models present inaccurate or outright fabricated information - adjusting the temperature parameter proved effective. This parameter influences how creative or cautious the model is in its responses. By setting this value low, we were able to significantly reduce the frequency of such hallucinations, although they still pose a challenge. Additionally, optimizing the input prompts helped guide the model more accurately and encourage relevant responses. This targeted guidance reduces the likelihood of misleading or fabricated responses and, together with adjusting the temperature parameter, helps to improve response quality and reduce hallucinations.

6. reproducibility of the results

Ensuring consistent and repeatable results played a central role in the reliability of our system. By defining a specific "random_seed", we were already able to take a big step towards deterministic results. Particularly important, however, was our decision to set the temperature parameter to 0.0 only when generating the search queries. This allowed us to achieve predictable results by always generating the same search query for each customer request and therefore extracting the same documents from the AI search. This targeted customization in the creation of the search queries was instrumental in increasing the robustness of our responses and ensuring reliable consistency in the information provided.

7. extensive testing

Intensive testing ensured the quality and precision of the chatbot interactions. Testing turned out to be the biggest challenge, especially because the product that our chatbot answers questions about is very complex. This complexity meant that we as developers could not rely solely on our intuition to judge the quality of the results. To ensure that we always had an accurate picture of the answer quality, we established a continuous feedback process with the customer. Through this process, we received regular assessments from the customer and adapted the chatbot accordingly. Working closely with our client was crucial to develop a deep understanding of the product details and customer interaction requirements.

For an innovative and, above all, automated quality assurance, we also used another GPT model to evaluate the performance of our chatbot. We created a database of test questions and the corresponding correct answers provided by customer support. For each test question, we generated an answer through our chatbot and then used a separate GPT model to evaluate the match between the chatbot answer and the correct answer provided by customer support. This approach allowed us to identify weaknesses or errors in the chatbot responses that may have been missed by traditional testing methods.

Through this method of peer review, combined with the valuable feedback from human testers, we were able to develop a deep understanding of the strengths and weaknesses of our chatbot. This ensured that the final product met our client's high standards. This approach takes quality assurance to a new level and ensures that our chatbot not only communicates with users intelligently, but also reliably and error-free.

Continuous improvement

The project demonstrated the significant added value that the use of RAG in combination with Azure AI Search and advanced AI methods brings to companies - especially in terms of optimizing customer service and increasing efficiency. In the first expansion stage of our chatbot, we were able to observe how positively it was received by our customer's teams. The use of AI technology in customer support has proven to be an important step towards making work easier and, in particular, making customer support processes more efficient. This initial implementation underlines the potential of the chatbot to serve as an essential tool to support daily operations and lays the foundation for the successful integration of AI into the customer support strategy.

In order to further increase the performance of our chatbot and ensure even greater satisfaction among end users, we consider it necessary to carry out further development cycles. Through continuous testing and the use of the knowledge gained from these tests, we aim to increase the rate of requests answered independently by the chatbot. The continuous improvement process increases the precision and reliability of the chatbot and makes it an even more effective customer support tool.

This methodical approach illustrates our efforts to continuously expand the possibilities of AI technology while incorporating human input and technical expertise as key components of further development. Our experience with the use of chatbots in customer support processes shows that the development of a fully autonomous and reliable AI-supported customer service is an ongoing process that requires constant adjustments and improvements.

AI driver Azure

Deploying GPT models on Azure enables companies to benefit from the most advanced AI technologies without having to build and maintain the necessary infrastructure themselves. This provides a scalable and flexible solution that is ideal for the development and deployment of AI applications. However, the release of the latest OpenAI model versions on Azure regularly takes several months, which can be a challenge for users who want to stay up to date with the latest AI developments. These delays mean that users cannot immediately benefit from the latest OpenAI innovations and improvements, which could put them at a competitive disadvantage, especially in fast-moving industries.

"We are only at the beginning"

For data! author and data scientist Fabian, ChatGPT has long been daily business. He sees it as a tool that, when used correctly, can give companies a real advantage. His personal view of new technologies helps companies to correctly classify the hype surrounding RAG architectures.

What do you use ChatGPT for?

Especially for the development of chatbots, such as the RAG project that I carried out with my team. In my private life, ChatGPT helps me to overcome writer's block, for example, or simply to get creative input for my own projects.

What do you like about technology?

Above all, the speed with which the models learn and adapt. It is impressive how natural and human the responses have become and how versatile the areas of application can be. The technology has the potential to revolutionize so many industries, and we are only at the beginning.

Does technology have limits?

You must always remain critical and remember that a model is only as good as the data with which it was trained or the data it is provided with as context. provided as context.

What do you say to companies that really want to do something with AI?

First, be clear about what you want to achieve with the technology. It is important to identify specific use cases where AI offers real added value. It is best to develop a strategy together with experts. Don't jump on the bandwagon, but use technology sensibly.

How do you stay up to date?

It is a continuous learning process. Online research, specialist articles and exchanges with colleagues keep me up to date. In our field, this is simply necessary in order to remain competitive. One example of this is our last RAG project: we had barely developed an initial functional prototype of our chatbot when we realized that the interfaces to Azure OpenAI were already outdated. So we were faced with the task of quickly converting our code to the more up-to-date API version. This shows just how dynamic and fast-moving the tech industry is.

GPT generations in comparison

GPT-3.5 and GPT-4 are both advanced language models from OpenAI, with GPT-4 being the newer and more powerful model. GPT-4 is a larger, multimodal model that can process both text and image inputs and produce text outputs. It is characterized by a broader generality and advanced reasoning capabilities that allow it to solve more difficult problems with greater accuracy. In comparison, GPT-3.5 Turbo is optimized for chat applications, but also works well for non-chat tasks. GPT-4 not only outperforms previous large language models, but also most state-of-the-art systems in different languages.

The transition from GPT-3.5 to GPT-4 brought significant improvements in language processing capability. GPT-4 can recognize more complex relationships, understand longer and more detailed contexts, and generate more accurate responses based on this. Particularly important is the model's improved ability to support fine-tuning, which allows organizations to more easily tailor the model to specific requirements. The multimodal capability of GPT-4, which includes image recognition, opens up new fields of application and makes the model more versatile.

GPT generations in comparison. Status: May 2024.

Potential for improvement

A decisive improvement in response times is of central importance for the next update of GPT-4. The current latencies are still too high for smooth and efficient use, which limits the application in real-time scenarios. Users expect a fast response to their requests, as this directly influences the usability and effectiveness of the chatbot. Targeted optimization in this area would significantly improve the user experience and make GPT models more viable for a wider range of applications, such as customer service or as personal assistants. It is therefore of utmost importance that the reduction of response times is the focus of the next update in order to increase the usability and acceptance of the model in daily use.

OpenAI update

OpenAI recently announced GPT-4o. GPT-4o (the "o" stands for "omni") can process not only text, but also audio, images and videos as input and then generates text, audio and images as output. The response time to audio input is impressive, averaging just 320 milliseconds - almost as fast as a human.

The improvements in the processing of visual and acoustic content are particularly noteworthy. GPT-4o outperforms its predecessors in the recognition and translation of speech, especially for less common languages. GPT-4o also sets new standards in the area of multilingualism by handling different languages more efficiently and requiring fewer tokens. GPT-4o is currently (as of May 2024) in a gradual rollout and is already available in two US regions of Azure.

For chatbot scenarios, GPT-4o means a significantly shorter response time, which can increase customer satisfaction as conversations appear more natural. The integration of voice and audio capabilities, which will be important for many customer service applications, is particularly exciting.

More features will be released in the coming weeks, including audio and video outputs, which will initially be made available to a small group of trusted partners. We are excited about future developments and the possibilities that GPT-4o offers for AI interaction.

AI in the future

AI technologies such as ChatGPT undoubtedly have the potential to transform business processes and lead us in new ways of thinking and working. However, our fascination with their capabilities should not blind us to the fact that they are not the optimal solution in every context. Human empathy, creativity and ethical judgment are qualities that can be supported, but not replaced, by machine intelligence.

It is essential that we carefully consider the limitations and risks associated with the implementation of chatbots. Technology should serve people, not the other way around. In a world that is always reaching for the latest technological solution, it is crucial to define the actual use case before deciding on a tool.

We are at a turning point where we can set the course for the future. It is up to us to choose the path wisely - a path that is guided not only by the possibilities of the technology, but also by a deep understanding of its limitations. Let's work together to ensure that we use AI technology wisely and in the best interests of all. Because only in this way can technological progress be shaped in harmony with human values and needs.

This article first appeared in a similar form in issue 01/24 of data! All issues and articles of our biannual magazine can be found here:‍

data! Magazine: Cloud Services, Data Analytics & AI | taod

<p style="font-size: 28px; font-weight: bold; margin-bottom: 5px;">Du möchtest ChatGPT in deine Unternehmensprozesse integrieren?</p><p style="color:#ee385c; font-size: 24px; font-weight: bold; margin-top: 0;">Wir beraten dich gerne!</p>

Request free initial consultation now

No items found.