AGI Advance: Weekly AI & AGI Insights (May 27, 2025)

Turing Staff

28 May 2025•3 mins read

LLM training and enhancement

What we're thinking

What we're reading

Where we’ll be

Stay ahead with AGI Advance

LLM training and enhancement

This week in AGI Advance, we explore how sampling quality, repo diversity, and feedback-driven fine-tuning are reshaping agent performance. From SWE-agent’s 9.6% resolve rate on SWE-Bench Verified to AlphaEvolve’s algorithmic breakthroughs, the signal is clear: intelligent data selection and human-grounded evaluation are outperforming brute-force scale.

What we're thinking

We’ve been diving into how Rejection Sampling Fine-Tuning (RFT) can boost LLM performance on real-world software engineering benchmarks like SWE-Bench, without the cost of full RL.

Here’s what stood out from our internal experiments:

SFT alone isn’t enough: Models trained with basic supervised fine-tuning struggled to pass eval tests, even when trained on thousands of PRs.
Filtering is critical: Success rates improved dramatically when low-quality or failed samples were filtered out during data generation.
More repos, better generalization: Performance scaled logarithmically with repo diversity—confirming that training breadth matters as much as depth.
Resolve rate ≠ loss: Loss curves didn’t predict success. This reinforces the need for evaluation grounded in task outcomes, not just token probabilities.

The result: Our best-performing model achieved 9.6% resolve rate on SWE-Bench Verified—a strong signal that smart sampling + filtering + SFT can rival RL-based setups at a fraction of the cost.

What we're reading

AlphaEvolve: A Coding Agent for Scientific and Algorithmic Discovery
AlphaEvolve is an agent that combines LLMs with automated evaluation to iteratively discover and optimize algorithms. It uncovered the first improvement to Strassen’s matrix multiplication in 56 years, solved open math problems, and delivered real-world performance gains, like a 23% kernel speedup in Gemini training and 0.7% uplift in data center efficiency.
SWE-smith: Scaling Data for Software Engineering Agents
SWE-smith is a new data generation pipeline that creates over 50,000 validated bug-fixing tasks from 128 real Python repos, at a fraction of SWE-bench’s cost and setup time. By synthesizing bugs through LLM rewriting, AST mutation, PR reversal, and patch combinations, it produces rich training instances for agent models like SWE-agent. Using rejection sampling fine-tuning on this dataset, SWE-agent-LM-32B achieved 40.2% Pass@1 on SWE-bench Verified—the best result for any open-weight model to date.
Vision as LoRA
VoRA introduces a new approach to building MLLMs by integrating visual understanding directly into LLMs using LoRA layers—no external vision encoder required. With techniques like block-wise distillation and bi-directional attention, VoRA achieves competitive performance at lower computational cost, and could generalize beyond vision to other modalities.

Where we’ll be

Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:

ICML 2025 [Vancouver Convention Center, Canada | July 13 – 19]
The International Conference on Machine Learning (ICML) is a leading international conference focused on the advancements in machine learning and its applications.
KDD 2025 [Toronto, ON, Canada | Aug 3 – 7]
The ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) focuses on innovative research in data mining, knowledge discovery, and large-scale data analytics.

If you’re attending, reach out—we’d love to connect and exchange insights!

Stay ahead with AGI Advance

Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.

[Subscribe & Read More]