Fine-Tuning Small Language Models to Optimize Code Review Accuracy

Generative AI is transforming enterprises by driving innovation and boosting efficiency across numerous applications. However, adopting large foundational models poses several challenges, including high costs, slow performance, and data privacy concerns. Many enterprises hesitate to share sensitive code or data with external LLM providers. Additionally, while foundational LLMs excel at general tasks, they often require extensive prompt engineering to achieve high accuracy on specific enterprise-focused use cases.

Fine-tuning small language models (SLMs), often leveraging techniques like knowledge distillation, offer an attractive solution for these challenges. These smaller LLMs can deliver performance close to larger models and are significantly faster and more cost-effective. Additionally, SLMs can be deployed on-premises or in virtual private clouds (VPCs), enabling enterprises to keep sensitive data secure. However, fine-tuning smaller models requires high-quality labeled data, which is time-consuming and expensive to create.

This post introduces an automated fine-tuning approach that addresses these challenges by using the data flywheel strategy, a feedback-driven mechanism that iteratively enhances model performance. The approach incorporates curriculum learning, a technique inspired by human learning, where training data is introduced progressively based on complexity. By using large “teacher” models to generate and structure synthetic training data, this method optimizes the fine-tuning process, enabling smaller models to handle complex tasks more effectively while minimizing human intervention.

We’ll cover the following topics:

Overview of the automated fine-tuning approach: A teacher-student paradigm for creating efficient training workflows.
Implementation steps: Key stages like exam generation, evaluation, and fine-tuning.
Applications in code-review automation: Real-world examples like severity rating and explanation generation, where the automated fine-tuned SLM (Llama 3 8B Instruct plus low-rank adaptation (LoRA), or llama3-8b+LoRA) improved accuracy by 18%, outperforming larger models, and delivered expert-aligned explanations—all with lower costs and latency.
Lessons learned: Best practices for scalable, cost-effective AI solutions.

By the end of this post, you’ll know how fine-tuned SLMs can enable enterprises to achieve competitive accuracy while addressing challenges related to cost, latency, and scalability. While the focus here is on the code assistance, the methodology is applicable across diverse enterprise use cases.

This post is part of the NVIDIA Chat Labs series, which shares insights and best practices developed from the internal generative AI projects created to help others navigate AI adoption.