AGI Advance: Weekly AI & AGI Insights (Oct 14, 2025)

Turing Staff

15 Oct 2025•4 mins read

LLM training and enhancement

Stay ahead with AGI Advance

LLM training and enhancement

This week, we’re diving into where models still struggle, and how to fix it. Turing built a 20K-sample dataset designed to surface failures in chart reasoning, helping a top AI lab improve accuracy on scientific figures through expert-written CoTs. Our CEO, Jonathan Siddharth, breaks down what it takes to move from sweatshop data labeling to frontier research acceleration. Additionally, we explore recursive reasoning with tiny models, single-agent ML engineering systems, and Google’s new Gemini 2.5 UI agent that interacts with real apps, not just APIs.

What we're thinking

This week, we’re showcasing how Turing helped a leading AI lab pinpoint systemic failures in chart reasoning using a 20K-sample dataset of long-form expert-written CoTs. Designed to stress-test multimodal models on real scientific figures, this data is now powering fine-tuning, eval, and reward shaping.

Here’s what we’re seeing:

20,000+ adversarial CoTs: Multi-step reasoning tasks across 6+ scientific domains, built to break models on trend analysis, subplots, and legends.
98%+ factual accuracy: Every CoT grounded in visual features alone, not external context and fully verified by expert annotators.
Estimated ~7–8pt accuracy lift: Projected improvement in trend comparison, subplot alignment, and figure-level reasoning.

From surface-level captions to grounded reasoning, this is what visual understanding should look like.

What we’re saying

🗣️ Jonathan Siddharth, Founder & CEO:

“The era of sweatshop data labeling is over, and $30 trillion of human work is on the verge of automation.”

What frontier labs need now is frontier data; expert-level, hard, realistic, and diverse enough to push models beyond their limits. That’s where Turing comes in: not just as a data vendor, but as a research accelerator, working with 7 of the 8 leading labs to advance the four pillars of superintelligence: multimodality, reasoning, tool use, and coding.

→ Watch the full conversation

What we're reading

Less is More: Recursive Reasoning with Tiny Networks
This paper introduces the Tiny Recursive Model (TRM), a minimalist approach to solving hard reasoning tasks like Sudoku and ARC-AGI with just 7M parameters, outperforming much larger LLMs. TRM uses a single 2-layer network and simple recursion to refine its answers step-by-step, achieving 45% accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, beating models like DeepSeek R1 and Gemini 2.5 Pro. The model avoids the complexity of prior approaches like HRM by eliminating fixed-point assumptions, using no self-attention (on some tasks), and halving memory costs, proving recursion can outperform scale for structured reasoning tasks.
Operand Quant: A Single-Agent Architecture for Autonomous Machine Learning Engineering
Operand Research introduces Operand Quant, a single-agent system that outperforms multi-agent setups on the MLE-Benchmark by maintaining contextual continuity and eliminating orchestration overhead. Instead of relying on agent hierarchies, Operand Quant runs all ML engineering phases, including exploration, modeling, evaluation, and deployment, within a simulated IDE. The system achieved the highest recorded benchmark performance (39.6% medal rate across 75 tasks) and introduced “deep-thinking” ensemble reasoning to overcome context bias. Its results suggest that autonomous, non-blocking single-agent workflows may be a more scalable path to AI-driven ML engineering.
Introducing the Gemini 2.5 Computer Use Model
Google just released Gemini 2.5 Computer Use, a preview model that empowers agents to interact with UI elements, such as filling forms, scrolling, dragging, and more, without relying on APIs. The model runs in a loop: it sees a screenshot + action history → predicts a UI action → executes → returns updated screenshot → repeat. According to benchmarks, it outperforms alternatives on web and mobile control tasks and achieves lower latency. The model also includes built-in safety guardrails (stepwise safety checks, “ask confirmation” for risky actions, exclusions) to mitigate misuse.

Where we’ll be

Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:

NeurIPS 2025
[Mexico City | Nov 30 – Dec 5]
[San Diego Convention Center | Dec 2 – 7]
The Neural Information Processing Systems Foundation is a non-profit that promotes research in AI and ML by organizing a leading annual conference focused on ethical, diverse, and interdisciplinary collaboration.

If you’re attending, reach out—we’d love to connect and exchange insights!

Stay ahead with AGI Advance

Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.

[Subscribe & Read More]