This week in AGI Advance, we explore how sampling quality, repo diversity, and feedback-driven fine-tuning are reshaping agent performance. From SWE-agent’s 9.6% resolve rate on SWE-Bench Verified to AlphaEvolve’s algorithmic breakthroughs, the signal is clear: intelligent data selection and human-grounded evaluation are outperforming brute-force scale.
We’ve been diving into how Rejection Sampling Fine-Tuning (RFT) can boost LLM performance on real-world software engineering benchmarks like SWE-Bench, without the cost of full RL.
Here’s what stood out from our internal experiments:
The result: Our best-performing model achieved 9.6% resolve rate on SWE-Bench Verified—a strong signal that smart sampling + filtering + SFT can rival RL-based setups at a fraction of the cost.
Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:
If you’re attending, reach out—we’d love to connect and exchange insights!
Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.
Talk to one of our solutions architects and start innovating with AI-powered talent.