At MOHARA, we’ve found retrieval-augmented generation (RAG) to be a game-changing tool for our work with generative AI models.

It’s a bridge that connects the broad internal knowledge of large language models (LLMs) to the focused insight provided by an information retrieval component.

AI excels at delivering answers using its training data, but is constrained by the scope of that dataset.

Retrieval-augmented generation enhances its ability to manage knowledge-intensive tasks by integrating external data sources.

It does this by tapping into a pool of data stored across external knowledge sources, which allows the system to develop responses that are relevant to the context and draw from the most current knowledge available.

In this blog we dive into how it works, why it’s such a breakthrough, and how it has positively impacted our work at MOHARA.

Understanding LLMs and Their Limitations

The advent of large language models like Open AI’s GPT-3 and GPT-4 or Google’s Gemini has marked a milestone in artificial intelligence, particularly within natural language processing.

These models work by predicting the next word in a sequence, making educated guesses based on vast amounts of LLM training data.

💡 Prompt Engineering

Prompt engineering plays a crucial role in how effective an LLM is.

The input (or “prompt”) you provide is crafted in a specific manner to guide the model’s response. This involves designing the prompt to provide the LLM with as much context as possible for its answer.

For example, you can provide it with an opening statement that positions the AI as an expert in a particular field, like vehicle manufacturing.

Following this, the prompt can assign a specific task to the AI, such as deciphering a car model part number based on user input.

To enhance the accuracy of its response, you can use RAG to supply the model with additional information from external knowledge bases. This could include data from documents that have been previously identified and integrated into a database.

By strategically engineering the prompt in this manner, the AI is better equipped to generate responses that address the user’s query.

At their core, LLMs are powered by a few fundamental principles that enable their advanced capabilities:

1. Data

LLMs like GPT-3, GPT-4, and Gemini are trained on enormous amounts of unlabelled and unstructured text data, including books, articles, conversations, code, and forum posts.

This extensive training material is what enables them to generate a human-like response to a user query. For perspective, GPT-3 was trained on 45 terabytes of text data, while GPT-4 was trained on 1 petabyte (1,000 terabytes).

2. Architecture

These models are based on the Transformer architecture, a type of neural network that excels in handling sequences of data.

This architectural choice is key to their success in generating coherent and contextually relevant text. The scale of these models is immense, with GPT-3 boasting 175 billion parameters and GPT-4 expanding to 1.7 trillion parameters.

3. Training

During their training phase, LLMs learn the probability of sequences of words occurring together.

This probabilistic understanding is what allows them to predict the next word in a sequence with remarkable accuracy in an efficient manner.

This process involves a few key concepts:

Probabilistic prediction: The model predicts the next word based on the likelihood of various possibilities, using a vast dataset to inform these probabilities.

Context sensitivity: The effectiveness of an LLM is enhanced by the context provided, which guides its predictions.

Temperature parameter: This controls the level of creativity or randomness in the model’s output, with a lower temperature leading to more predictable and less varied outputs.

Despite their capabilities, LLMs are not without limitations. Their tendency to generate plausible yet incorrect information (called “hallucination”) poses a challenge, especially when it comes to knowledge-intensive tasks that require precise data.

Fine-Tuning vs. Retrieval-Augmented Generation

Although LLMs have extensive datasets, they are still limited.

To address this challenge, fine-tuning and retrieval-augmented generation can be employed in order to incorporate domain-specific data.

These two methods are typically used when developing applications that leverage LLMs for specialised fields:


Fine-tuning is a process where a model, initially trained on a broad dataset, is later retrained with domain-specific training data to refine its output according to particular knowledge areas.

ℹ️ Fine-Tuning: A Real-World Example

Take, for example, a healthcare organisation aiming to improve their ability to accurately diagnose rare diseases by analysing patient records.

The initial model, trained on a broad dataset of medical information, might struggle with the rarity of certain conditions. This can lead to potential misdiagnoses or overlooked symptoms.

The organisation decides to fine-tune the model using curated training data comprised of:

  • Detailed case studies.
  • Research papers and medical documents.
  • Patient records specifically related to rare diseases.

With the addition of this external knowledge, the model can now understand subtle symptoms and rare combinations of clinical signs that it previously could not accurately interpret.

While fine-tuning customises the model to perform better on tasks related to the added domain, it is still limited. Some of the challenges involved in the fine-tuning process include:

Opaque sources

It’s challenging to discern whether information comes from the general dataset the model was initially trained on, or from the domain-specific dataset used during fine-tuning.

This ambiguity can impact the reliability and traceability of the information.

All or nothing access

Fine-tuned models typically do not support the granularity of controlling access to specific domain data for different users.

This ‘all or nothing’ approach to data access can pose limitations in scenarios where it’s necessary to tailor the model’s knowledge bases for diverse user groups or applications.

Cost implications

The financial aspect of fine-tuning models cannot be overlooked. For instance, using a sophisticated model like the Davinci AI image generator costs approximately $0.02 per 1,000 tokens for standard usage.

However, the costs increase to $0.03 for training plus $0.12 per 1,000 tokens for usage. These costs can add up quickly, especially for applications requiring extensive use of the AI.

Repetitions and retraining

Perhaps one of the most significant drawbacks of fine-tuning is the necessity for repeated retraining cycles.

Any changes in the knowledge base, such as updates or corrections, require the model to undergo retraining to reflect this new information accurately.

This not only adds to the cost but also to the time invested in keeping the model current, making it a less agile solution in fast-paced or rapidly evolving domains.

Retrieval-augmented generation

Retrieval-augmented generation supplements the capabilities of LLMs through a smart retriever.

This enables it to gather relevant information from a knowledge base. It allows the model to access the latest information without needing to continuously retrain it with external knowledge.

The advantages of retrieval-augmented generation extend further, addressing some of the key limitations of fine-tuning:

Source identification

Unlike fine-tuned models, RAG allows for precise identification of the sources from which it draws domain-specific knowledge.

This transparency is invaluable because it enables users to verify the reliability and origin of the information provided in the response.

Reduced hallucination

Retrieval-augmented generation systems are designed to avoid making up answers when no relevant domain knowledge is available.

This is a significant improvement over traditional LLMs, which may “hallucinate” information in the absence of relevant data, leading to potentially misleading or incorrect responses.


The search index used by retrieval-augmented generation systems can be continually updated as the knowledge base evolves, ensuring the model remains up to date with the latest information without the need for retraining.

💬 James Potgieter, Software Engineer, MOHARA

“Anything that’s domain-specific, or not available to the public, is where retrieval-augmented generation shines.

When you perform a smart retrieval, the system compares your query’s encoding with those in the domain knowledge base, selecting the information most similar to your query. This supplements the model’s knowledge, enabling it to provide a more accurate answer.

Without retrieval-augmented generation, if you asked a large language model (like ChatGPT) a question about something very domain-specific without external information, it might just make a best guess, essentially fabricating an answer.

But with retrieval-augmented generation it accesses this domain-specific knowledge base, finding details on the latest updates about a specific topic, and then injects relevant information into the response.

This drastically reduces the model’s tendency to ‘hallucinate’, instead offering answers grounded in domain-specific knowledge you can trust.”

MOHARA Case Study: Automotive Industry

At MOHARA, one of our clients faced a challenge that could potentially derail their production schedules and inflate costs: the validation of bills of materials (BOMs) for vehicle manufacturing.

💬 Bill of Materials

A Bill of Materials (BOM) is a comprehensive list detailing all the components required to manufacture a product.

In automotive manufacturing, a BOM can range anywhere from 2,000 to 50,000 lines, with each line specifying parts with attributes such as quantity, cost, weight, and supplier information.

The sheer volume and detail encapsulated in these documents underscores their complexity and the necessity for precision.

The core issue revolved around the validation of BOMs. Given the vast number of parts involved—often ranging into thousands—ensuring the accuracy of these BOMs is paramount to prevent production delays and reduce costs.

Traditionally, this task involved manual checks and basic database queries, both of which were time-consuming and prone to errors given the complexity and specificity of automotive parts.

Given the critical nature of BOM accuracy, we explored several strategies for validation:

Human checks: This involves engineering BOM reviews, simple line-by-line validations, commodity quantity checks, visual validations, and virtual builds. While effective, such knowledge-intensive tasks are time-consuming and subject to human error.

Physical prototyping: We also considered building a prototype to identify errors. However, this method is prohibitive in terms of both computational and financial costs.

Leveraging generative AI: Recognising the limitations of traditional methods, we turned our focus towards leveraging generative AI, specifically retrieval-augmented generation technology, to enhance our validation processes.

Our approach

Our approach saw us develop an Excel add-on that uses an LLM to interact with the BOMs, and then provide accurate answers based on the information provided by the RAG component.

The initial scenarios enabled basic queries about the BOMs, such as the number of parts and the percentage validated.

More advanced scenarios leveraged the model’s semantic understanding capabilities to validate the BOMs more effectively.

This was particularly useful for identifying parts that might be listed under various names or descriptions, a task that traditional exact-match searches would fail to accomplish.

For example, the distinction between “right-hand side steering wheel” and “orange steering wheel” illustrates the nuanced understanding required—a challenge that was met by retrieval-augmented generation.

This not only demonstrated the potential to significantly speed up the BOM validation process, but also highlighted the cost-saving implications of early discrepancy identification.

A New Era of AI: Lead the Way with MOHARA

Retrieval-augmented generation systems revolutionise how AI accesses and integrates vast pools of external knowledge, delivering responses that are not only contextually relevant but also enriched with the latest information.

At MOHARA, we’re harnessing this technology to redefine the boundaries of what AI can achieve. Ready to unlock the full potential of artificial intelligence? Get in touch with us.