At MOHARA, we’ve found retrieval-augmented generation (RAG) to be a game-changing tool for our work with generative AI models.
It’s a bridge that connects the broad internal knowledge of large language models (LLMs) to the focused insight provided by an information retrieval component.
AI excels at delivering answers using its training data, but is constrained by the scope of that dataset.
Retrieval-augmented generation enhances its ability to manage knowledge-intensive tasks by integrating external data sources.
It does this by tapping into a pool of data stored across external knowledge sources, which allows the system to develop responses that are relevant to the context and draw from the most current knowledge available.
In this blog we dive into how it works, why it’s such a breakthrough, and how it has positively impacted our work at MOHARA.
Understanding LLMs and Their Limitations
The advent of large language models like Open AI’s GPT-3 and GPT-4 or Google’s Gemini has marked a milestone in artificial intelligence, particularly within natural language processing.
These models work by predicting the next word in a sequence, making educated guesses based on vast amounts of LLM training data.
At their core, LLMs are powered by a few fundamental principles that enable their advanced capabilities:
1. Data
LLMs like GPT-3, GPT-4, and Gemini are trained on enormous amounts of unlabelled and unstructured text data, including books, articles, conversations, code, and forum posts.
This extensive training material is what enables them to generate a human-like response to a user query. For perspective, GPT-3 was trained on 45 terabytes of text data, while GPT-4 was trained on 1 petabyte (1,000 terabytes).
2. Architecture
These models are based on the Transformer architecture, a type of neural network that excels in handling sequences of data.
This architectural choice is key to their success in generating coherent and contextually relevant text. The scale of these models is immense, with GPT-3 boasting 175 billion parameters and GPT-4 expanding to 1.7 trillion parameters.
3. Training
During their training phase, LLMs learn the probability of sequences of words occurring together.
This probabilistic understanding is what allows them to predict the next word in a sequence with remarkable accuracy in an efficient manner.
This process involves a few key concepts:
Probabilistic prediction: The model predicts the next word based on the likelihood of various possibilities, using a vast dataset to inform these probabilities.
Context sensitivity: The effectiveness of an LLM is enhanced by the context provided, which guides its predictions.
Temperature parameter: This controls the level of creativity or randomness in the model’s output, with a lower temperature leading to more predictable and less varied outputs.
Despite their capabilities, LLMs are not without limitations. Their tendency to generate plausible yet incorrect information (called “hallucination”) poses a challenge, especially when it comes to knowledge-intensive tasks that require precise data.
Fine-Tuning vs. Retrieval-Augmented Generation
Although LLMs have extensive datasets, they are still limited.
To address this challenge, fine-tuning and retrieval-augmented generation can be employed in order to incorporate domain-specific data.
These two methods are typically used when developing applications that leverage LLMs for specialised fields:
Fine-tuning
Fine-tuning is a process where a model, initially trained on a broad dataset, is later retrained with domain-specific training data to refine its output according to particular knowledge areas.
While fine-tuning customises the model to perform better on tasks related to the added domain, it is still limited. Some of the challenges involved in the fine-tuning process include:
Opaque sources
It’s challenging to discern whether information comes from the general dataset the model was initially trained on, or from the domain-specific dataset used during fine-tuning.
This ambiguity can impact the reliability and traceability of the information.
All or nothing access
Fine-tuned models typically do not support the granularity of controlling access to specific domain data for different users.
This ‘all or nothing’ approach to data access can pose limitations in scenarios where it’s necessary to tailor the model’s knowledge bases for diverse user groups or applications.
Cost implications
The financial aspect of fine-tuning models cannot be overlooked. For instance, using a sophisticated model like the Davinci AI image generator costs approximately $0.02 per 1,000 tokens for standard usage.
However, the costs increase to $0.03 for training plus $0.12 per 1,000 tokens for usage. These costs can add up quickly, especially for applications requiring extensive use of the AI.
Repetitions and retraining
Perhaps one of the most significant drawbacks of fine-tuning is the necessity for repeated retraining cycles.
Any changes in the knowledge base, such as updates or corrections, require the model to undergo retraining to reflect this new information accurately.
This not only adds to the cost but also to the time invested in keeping the model current, making it a less agile solution in fast-paced or rapidly evolving domains.
Retrieval-augmented generation
Retrieval-augmented generation supplements the capabilities of LLMs through a smart retriever.
This enables it to gather relevant information from a knowledge base. It allows the model to access the latest information without needing to continuously retrain it with external knowledge.
The advantages of retrieval-augmented generation extend further, addressing some of the key limitations of fine-tuning:
Source identification
Unlike fine-tuned models, RAG allows for precise identification of the sources from which it draws domain-specific knowledge.
This transparency is invaluable because it enables users to verify the reliability and origin of the information provided in the response.
Reduced hallucination
Retrieval-augmented generation systems are designed to avoid making up answers when no relevant domain knowledge is available.
This is a significant improvement over traditional LLMs, which may “hallucinate” information in the absence of relevant data, leading to potentially misleading or incorrect responses.
Maintainability
The search index used by retrieval-augmented generation systems can be continually updated as the knowledge base evolves, ensuring the model remains up to date with the latest information without the need for retraining.
MOHARA Case Study: Automotive Industry
At MOHARA, one of our clients faced a challenge that could potentially derail their production schedules and inflate costs: the validation of bills of materials (BOMs) for vehicle manufacturing.
The core issue revolved around the validation of BOMs. Given the vast number of parts involved—often ranging into thousands—ensuring the accuracy of these BOMs is paramount to prevent production delays and reduce costs.
Traditionally, this task involved manual checks and basic database queries, both of which were time-consuming and prone to errors given the complexity and specificity of automotive parts.
Given the critical nature of BOM accuracy, we explored several strategies for validation:
Human checks: This involves engineering BOM reviews, simple line-by-line validations, commodity quantity checks, visual validations, and virtual builds. While effective, such knowledge-intensive tasks are time-consuming and subject to human error.
Physical prototyping: We also considered building a prototype to identify errors. However, this method is prohibitive in terms of both computational and financial costs.
Leveraging generative AI: Recognising the limitations of traditional methods, we turned our focus towards leveraging generative AI, specifically retrieval-augmented generation technology, to enhance our validation processes.
Our approach
Our approach saw us develop an Excel add-on that uses an LLM to interact with the BOMs, and then provide accurate answers based on the information provided by the RAG component.
The initial scenarios enabled basic queries about the BOMs, such as the number of parts and the percentage validated.
More advanced scenarios leveraged the model’s semantic understanding capabilities to validate the BOMs more effectively.
This was particularly useful for identifying parts that might be listed under various names or descriptions, a task that traditional exact-match searches would fail to accomplish.
For example, the distinction between “right-hand side steering wheel” and “orange steering wheel” illustrates the nuanced understanding required—a challenge that was met by retrieval-augmented generation.
This not only demonstrated the potential to significantly speed up the BOM validation process, but also highlighted the cost-saving implications of early discrepancy identification.
A New Era of AI: Lead the Way with MOHARA
Retrieval-augmented generation systems revolutionise how AI accesses and integrates vast pools of external knowledge, delivering responses that are not only contextually relevant but also enriched with the latest information.
At MOHARA, we’re harnessing this technology to redefine the boundaries of what AI can achieve. Ready to unlock the full potential of artificial intelligence? Get in touch with us.