This week’s edition highlights how clean, synthetic trajectory data can power meaningful gains in model performance, without needing bigger weights. We explore how Turing’s multilingual coding traces helped Qwen 2.5 jump from zero to 25 solved tasks on SWE-Bench, offering a clear case for fine-tuning with process-level supervision. Also in this issue: Anthropic uncovers the first autonomous AI-powered cyber attack, Google Research introduces a new learning paradigm to battle catastrophic forgetting, and Cognizant sets a new milestone for long-horizon LLM task execution.
This week, we explored how Turing's multilingual trajectory data measurably improves performance on software engineering benchmarks. In a series of fine-tuning experiments, our team validated that synthetic, pass-only agent traces generated from Turing's internal dataset, help open-weight models climb the SWE-Bench leaderboard.
Here’s what we’re seeing:
Training better coding agents isn’t just about bigger models, it’s about cleaner, more human-like paths to solutions. With the right data mix, and balanced scaffolding, even synthetic traces can push models toward SOTA performance.
🎉 ServiceNow AI Research × Turing × Mila: GroundCUA and GroundNext
This week, ServiceNow AI Research released GroundCUA, a high-quality dataset of over 3 million annotations spanning 80+ real-world desktop applications, designed to teach agents how humans actually use software like IDEs, spreadsheets, design tools, and browsers. Turing contributed frontier-grade data and feedback to help scale human-grounded demonstrations.
Turing will be at this major AI conference in the coming month—join us to discuss the future of AGI:
If you’re attending, reach out—we’d love to connect and exchange insights!
Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.
Partner with Turing to fine-tune, validate, and deploy models that learn continuously.