7 Comments
User's avatar
werdnagreb's avatar

My company is using RAG-like mechanisms to help find bugs in software. Given the right context, the LLM can do a pretty good job (most of the time). The problem is determining which context to give the LLM. So, we’re using standard static analysis techniques (data flow and finding references mostly) to find code related to the pieces being changed. Our experiments are showing a big improvement.

From what I’ve seen RAG approaches do well when they work on structured data. In these cases it’s easy to know what extra context to provide.

When the data is unstructured, there is no easy way to discover relationships without using another LLM to make the determination.

Expand full comment
Daniel's avatar

It feels like we’re building libraries for models that never read — they skim and guess

Expand full comment
Bayblade Garena's avatar

In a nutshell, fine tuning is the way? Is there any literature supporting this claim?

Expand full comment
Sergei Polevikov's avatar

Unless your organization is severely budget-constrained and willing to cut corners, no one in their right mind would choose RAG over literally any neural network-based technique for document or text retrieval.

Expand full comment
Bayblade Garena's avatar

What do you think RAG bots are using for retrieval? 🤷

Expand full comment
Maria Sukhareva's avatar

When OpenAI released its first fine-tuning API, they explicitly said that fine-tuning should not be viewed as a way to teach the model something new but rather to steer the model in the right direction e.g. to a certain writing style. It seems like the weights are very fragile and doing fine-tuning incorrectly might basically break the model and it will lose its ability to generalize. It seems like the answer is actually in pre-training

Expand full comment
Chris W's avatar

Two thoughts:

The same problem would be present in using an LLM or summarize, say, a company meeting, I think. I hypothesize, for example, that if there’s a novel idea (relative to LLM training) proposed in the meeting, it may not be summarized correctly.

2) It’s annoying from a scientific point of view that “reasoning models” are hybrids between LLMs and other often secret (symbolic?) technologies plus are expensive to use, yet often outperform standalone LLM counterparts. I wonder how for example o3 would treat collisions with retrieved facts vs facts learned during pre training. Better or worse?

Expand full comment