EchoRAG: A Framework for Enhancing Language Models

Large Language Models (LLMs), and more recently Large Reasoning Models (LRMs), have demonstrated remarkable performance in various Natural Language Processing (NLP) tasks, such as classification and information extraction. However, their high computational demands, need for robust infrastructure, and security constraints—especially regarding API usage—limit their practical adoption in many scenarios. Compact and open-source models, like DeepSeek-R1 8B and Llama3 8B, have emerged as more cost-effective, self-hosted alternatives, enabling local execution and reducing dependence on external services. Nevertheless, their smaller number of parameters may hinder performance in tasks requiring complex reasoning or specialized knowledge. In this work, we investigate how compact LLMs and LRMs can achieve competitive results without fine-tuning, by leveraging graph-based retrieval-augmented generation (Graph-RAG) and in-context learning (ICL). We conduct a comparative evaluation of both compact and large-scale LLMs and LRMs across four NLP tasks—classification, information extraction, sentiment analysis, and fake news detection—using eight datasets, and analyzing accuracy, latency, and cost. Our experiments explore: (i) the effectiveness of Graph-RAG for factual enrichment; (ii) the impact of ICL with examples the model struggles to answer; (iii) the combination of both techniques, culminating in the new framework EcoRAG; and (iv) the trade-offs between LLMs and LRMs. The results indicate that compact models, when enhanced with Graph-RAG and ICL, can achieve performance close to or even surpassing that of LLMs, while LRMs yielded underwhelming results and exhibited up to 97\% higher latency compared to compact models.

Access the full article

William Beckhauser

EchoRAG: A Framework for Enhancing Language Models with Graph-RAG and In-Context Learning