This week, we’re diving into where models still struggle, and how to fix it. Turing built a 20K-sample dataset designed to surface failures in chart reasoning, helping a top AI lab improve accuracy on scientific figures through expert-written CoTs. Our CEO, Jonathan Siddharth, breaks down what it takes to move from sweatshop data labeling to frontier research acceleration. Additionally, we explore recursive reasoning with tiny models, single-agent ML engineering systems, and Google’s new Gemini 2.5 UI agent that interacts with real apps, not just APIs.
This week, we’re showcasing how Turing helped a leading AI lab pinpoint systemic failures in chart reasoning using a 20K-sample dataset of long-form expert-written CoTs. Designed to stress-test multimodal models on real scientific figures, this data is now powering fine-tuning, eval, and reward shaping.
Here’s what we’re seeing:
From surface-level captions to grounded reasoning, this is what visual understanding should look like.
🗣️ Jonathan Siddharth, Founder & CEO:
“The era of sweatshop data labeling is over, and $30 trillion of human work is on the verge of automation.”
What frontier labs need now is frontier data; expert-level, hard, realistic, and diverse enough to push models beyond their limits. That’s where Turing comes in: not just as a data vendor, but as a research accelerator, working with 7 of the 8 leading labs to advance the four pillars of superintelligence: multimodality, reasoning, tool use, and coding.
Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:
If you’re attending, reach out—we’d love to connect and exchange insights!
Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.
Partner with Turing to fine-tune, validate, and deploy models that learn continuously.