Graphite tested the latest AI models for code review and found a surprising winner. Newer, bigger models actually created more noise for developers.
#1about 2 minutes
The challenge of reviewing exponentially growing AI-generated code
The rapid increase in AI-generated code creates a significant bottleneck in the software development lifecycle, particularly in the code review stage.
#2about 3 minutes
How AI code generation strains the developer outer loop
While AI accelerates code writing (the inner loop), it overwhelms the critical outer loop processes of testing, reviewing, and deploying code.
#3about 1 minute
Introducing Diamond, an AI agent for automated code review
Graphite's AI agent, Diamond, acts as an always-on senior engineer within GitHub to summarize, prioritize, and review every code change.
#4about 3 minutes
Using comment acceptance rate to measure AI review quality
The primary metric for a successful AI reviewer is the acceptance rate of its comments, as every high-signal comment should result in a code change.
#5about 1 minute
Why evaluations are the key lever for LLM performance
Unlike traditional machine learning, optimizing large language models relies heavily on a robust evaluation process rather than other levers like feature engineering.
#6about 2 minutes
A methodology for evaluating AI code comprehension models
Models are evaluated against a large dataset of pull requests using two core metrics: matched comment rate for recall and unmatched comment rate for noise.
#7about 3 minutes
A comparative analysis of GPT-4.0, Opus, and Gemini
A detailed comparison reveals that models like GPT-4.0 excel at precision while Gemini has the best recall, but no single model wins on all metrics.
#8about 2 minutes
Evaluating Sonnet models and the problem of AI noise
Testing reveals that Sonnet 4.0 generates the most noise, making it less suitable for high-signal code review compared to its predecessors.
#9about 2 minutes
Why Sonnet 3.7 offers the best balance for code review
Sonnet 3.7 is the chosen model because it provides the optimal blend of strong reasoning, high recall of important issues, and low generation of noisy comments.
#10about 3 minutes
The critical role of continuous evaluation for new models
The key to leveraging AI effectively is to constantly re-evaluate new models, as shown by preliminary tests on GR four which revealed significant performance gaps.
Related jobs
Jobs that call for the skills explored in this talk.
Matching moments
02:01 MIN
Comparing LLM performance and planning next steps
Build Your First AI Assistant in 30 Minutes: No Code Workshop
02:27 MIN
An overview of an AI-powered code reviewer
How we built an AI-powered code reviewer in 80 hours
05:28 MIN
The limitations and potential of AI models
Coffee with Developers - Cassidy Williams -
02:24 MIN
Prototyping a basic AI code review agent
The Limits of Prompting: ArchitectingTrustworthy Coding Agents
04:17 MIN
Analyzing the developer productivity funnel for GenAI tools
The State of GenAI & Machine Learning in 2025
04:02 MIN
Q&A on AI limitations and practical application
How to become an AI toolsmith
01:37 MIN
Understanding the limitations and challenges of AI documentation
AI-Powered Code Documentation: Simplify the Complex
02:12 MIN
Favorite AI development tools at leading tech companies
Engineering Productivity: Cutting Through the AI Noise
GitHub Copilot: Beyond the Basics – 10 Ways to Elevate Your CodingWelcome to an in-depth exploration of GitHub Copilot and its capabilities. If you're a software developer or someone intrigued by AI's potential to revolutionize coding, this post is for you. GitHub Copilot, an AI-powered code completion tool, offers...
Daniel Cranney
One billion (bad?) developers: How AI is changing the way we learn to codeAI has transformed so many aspects of programming, with IDE-integrated code assistants now capable of building complex projects from simple prompts.While AI makes it easier for newcomers to dive into coding, could it also hinder their learning by enc...
Panel Discussion: Responsible AI in Practice - Real-World Examples and ChallengesIntroductionIn the ever-evolving landscape of artificial intelligence, the concept of "responsible AI" has emerged as a cornerstone for ethical and practical AI implementation. During the WWC24 Panel discussion, three eminent experts—Mina, Bjorn Brin...
From learning to earning
Jobs that call for the skills explored in this talk.