From RAG to RAFT: My Journey to a Smarter Reasoning LLM

When I first experimented with Retrieval-Augmented Generation (RAG), it felt like giving a language model access to the internet – suddenly it could pull facts, names, and numbers from external data instead of hallucinating them.

It was fast, accurate (most of the time), and deeply satisfying to watch.
But then came the tricky questions – the kind that require reasoning.

The kind where you don’t just ask “what”, but “why”.
That’s when I realized: retrieval alone wasn’t enough.


Step 1: When RAG Meets Its Limits

At its core, RAG is simple:

YAML
User Question → Retriever → Context → LLM → Answer

It looks like this:

YAML
+-------------+     +-------------+     +-------------+     +----------+
User Query | --> | Retriever   | --> | Context      | --> | LLM Answer  |
+-------------+     +-------------+     +-------------+     +----------+

The retriever searches a vector database for relevant chunks of text, then the LLM uses that text to answer.

But here’s the problem:
Even with the right context, the model sometimes fails to reason logically.

I saw it many times – the retriever fetched the perfect paragraph, but the model misinterpreted it or missed the cause-effect relationship entirely.

It could recall, but it couldn’t think.

So I asked myself:

“If RAG is about knowledge, what gives the model reasoning?”


Step 2: Teaching the Model to Think — Fine-Tuning with Reasoning

That’s when I turned to fine-tuning, specifically LoRA (Low-Rank Adaptation) – a method that allows training additional reasoning weights without touching the entire base model.

I created a dataset that didn’t just include questions and answers, but also the reasoning steps behind them.

Example:

YAML
Q: Why did ERCOT’s reserve margin fall in 2021?
Reasoning: The winter storm caused multiple thermal plants to fail, reducing available capacity.
A: Because the winter storm froze key power plants and reduced generation.

After fine-tuning, the model became more analytical.
It started answering why questions with structured logic.

Still, I noticed something interesting:
When combined with RAG, this fine-tuned model didn’t always use the retrieved context correctly – it sometimes ignored parts or overemphasized irrelevant ones.

That made me think: maybe reasoning and retrieval need to learn together, not separately.


Step 3: Combining Retrieval and Reasoning

I envisioned an improved RAG architecture: one that integrates Knowledge Graphs (KG) and Graph Neural Networks (GNN) to make retrieval more structured and relational.

Imagine this flow:

YAML
Question
   
Retriever (Vector DB + KG + GNN)
   
Context (facts + relationships)
   
Reasoning Model (LoRA Fine-tuned)
   
Final Answer

This system can find not only facts but also connections between them.
But I still had one question:

“If my model already knows how to reason, do I still need prompting tricks like ‘Let’s think step by step’?”


Step 4: Discovering RAFT – Retrieval-Augmented Fine-Tuning

That’s when I discovered RAFTRetrieval-Augmented Fine-Tuning.
It’s exactly the hybrid approach I was looking for.

What is RAFT?

RAFT is like teaching your model how to use retrieval properly.

Instead of fine-tuning only on question–answer pairs, RAFT fine-tunes the model on question + retrieved context + answer.
Each sample looks like this:

YAML
{
  "question": "Why did ERCOT experience low reserve margin in 2021?",
  "context": "In 2021, ERCOT's thermal generation dropped due to the Texas freeze...",
  "answer": "Because the winter storm caused multiple thermal plants to fail."
}

By including the retrieved context during fine-tuning, the model learns how to interpret and reason with it – automatically.

It no longer needs explicit step-by-step prompting (Chain of Thought) during inference.
It already knows how to “think” with retrieved evidence.


Step 5: How RAFT Differs from RAG and LoRA

Here’s how I see it now:

Method

Retrieval

Fine-tuning

Needs CoT Prompt

Accuracy

Cost

Ideal Use

RAG

✅ Yes

❌ No

✅ Yes

Medium

High

Quick factual lookup

Fine-tune (LoRA)

❌ No

✅ Yes

✅ Yes

High but prone to hallucination

Low

Internal reasoning tasks

RAFT

✅ Yes

✅ Yes

❌ No (optional)

Very High

Low

Reliable reasoning + retrieval QA

In short:

  • RAG gives the model memory.
  • Fine-tuning gives it logic.
  • RAFT combines both – and gives it wisdom.

Step 6: My “KG-Augmented RAFT” Architecture

After a few iterations, I designed what I call a KG-Augmented RAFT pipeline — blending reasoning and relational retrieval into one intelligent system.

YAML
                +-------------------+
                | User Question     |
                +---------+---------+
                          |
                          v
              +-----------+-----------+
              | Retriever Layer       |
              | (Vector DB + KG + GNN)|
              +-----------+-----------+
                          |
                    Retrieved Context
                          |
                          v
               +----------+----------+
               | RAFT Model (LoRA FT)|
               |  - Trained on Q+C+A |
               +----------+----------+
                          |
                      Final Answer

Key Components:

  • Retriever Layer: Uses both vector embeddings and graph relationships.
  • RAFT Model: Fine-tuned to understand and reason with retrieved context.
  • Prompt Builder: Minimal – no long “think step by step” instructions needed.
  • Optional Verification Step: Fact-check final outputs against retrieved sources.

This architecture felt like the “click” moment – a system that truly combines external knowledge with learned reasoning.


Step 7: Lessons Learned

This journey from RAG → Fine-Tuning → RAFT taught me more than just model mechanics.
It showed me how intelligence emerges from interaction – not from memorization alone.

Here are my biggest takeaways:

  • RAG is great for recall, but not reasoning.
  • Fine-tuning builds reasoning, but can’t access new facts.
  • RAFT unifies both worlds – reasoning grounded in factual retrieval.
  • KG + GNN retrieval adds structure, reducing ambiguity in context.
  • You don’t always need step-by-step prompts – if your model already learned reasoning during fine-tuning.

Final Thoughts

Building an LLM system is a lot like mentoring a student.

You can give them a library (RAG),
teach them how to think (Fine-tuning),
or – if you’re clever – let them learn how to use both together (RAFT).

And when they finally master both?
They stop guessing, and start understanding.


That’s the beauty of RAFT – it’s not just a technique, it’s a philosophy:
teach your model not only to know, but to think.

One response to “From RAG to RAFT: My Journey to a Smarter Reasoning LLM”

  1. Biggest Fan Ever Avatar
    Biggest Fan Ever

    Đẳng cấp quá a ơi

Leave a Reply

Your email address will not be published. Required fields are marked *