This week, we highlight how Turing delivered 70,000+ structured table reasoning Q&A pairs sourced from real-world PDFs across seven domains, enforcing strict zero-inference standards and reasoning-type integrity at scale. We also share our partnership with Math Kangaroo USA to power AI-driven math learning, along with new research on large-scale audio editing benchmarks, encoder-free multimodal models, and diffusion-based reasoning architectures.
What we're doing
This week, we're highlighting how Turing delivered a table understanding dataset for AI training, sourced from real-world PDFs across seven document domains. Unlike standard table QA benchmarks that rely on simple lookups, this dataset trains models to perform the full range of table reasoning, including exact value retrieval, multi-row filtering, multi-cell calculations, and cross-table synthesis.
Here's what we delivered:
- 70,000+ structured table reasoning Q&A pairs, spanning financial reports, government files, benefit plans, academic papers, and more
- Zero external inference enforced throughout; all answers sourced strictly from table data, with no calculation steps, assumptions, or outside knowledge permitted, alongside precise numerical extraction that preserves exact values, superscripts, subscripts, and formatting from source documents
- 95%+ overall pass rate across all delivered tasks, backed by a two-layer QA system combining expert pre-submission checklists with independent human review
💡 By maintaining strict reasoning-type integrity and zero-inference standards at scale, this dataset gives models the supervision signal they need to reason accurately over structured data.
What we're celebrating
🎉 Turing × Math Kangaroo USA — AI-Powered Math Learning
We're proud to share that Turing partnered with Math Kangaroo USA to build the Kangaroo AI Tool, an AI-powered learning experience designed to help students strengthen problem-solving skills through interactive, reasoning-first guidance.
Built with Google Gemini, the tool bridges the gap between classroom learning and the advanced logic required for competitive mathematics, encouraging critical thinking and curiosity at every step.
What we're reading
- MMAE: A Massive Multitask Audio Editing Benchmark
This paper introduces MMAE, the first comprehensive benchmark for instruction-based audio editing, covering 2,000 real-world samples and 17,741 rubric-based evaluation criteria across sound, speech, music, and mixed-modality audio. The benchmark spans 7 audio modalities, 6 levels of task complexity, and 8 editing operation types, ranging from simple edits to multi-hop reasoning and multi-round editing.
Using a rubric-based evaluation framework that separately measures Instruction Following (IFR) and Consistency (CR), MMAE reveals that current audio editing systems remain far from reliable. Across leading models, Exact Match Rate (EMR) stays below 5%, dropping to 0% on complex mixed-modality tasks, highlighting major challenges in precise editing and context preservation. - Introducing Gemma 4 12B: A Unified, Encoder-Free Multimodal Model
Google introduced Gemma 4 12B, a multimodal open model designed to bring agentic reasoning, vision, and audio capabilities directly to laptops. Unlike traditional multimodal architectures, Gemma 4 12B uses a novel encoder-free design, processing images and audio directly through the LLM backbone without separate vision or audio encoders, reducing latency and memory usage.
Despite requiring only 16GB of RAM/VRAM, Gemma 4 12B achieves performance approaching Google’s larger 26B MoE model, while adding native audio support, multimodal reasoning, and multi-step agent workflows. The model also includes Multi-Token Prediction (MTP) drafters to improve inference speed and responsiveness. - Introducing Mercury 2
Inception introduced Mercury 2, a reasoning model built on a diffusion architecture rather than traditional autoregressive decoding. By generating and refining multiple tokens in parallel, Mercury 2 achieves 1,009 tokens/sec, delivering over 5× faster generation while maintaining reasoning-level quality for coding, agents, search, and voice applications.
The model supports 128K context, native tool use, tunable reasoning, and structured JSON outputs, while remaining cost-efficient at $0.25 per million input tokens and $0.75 per million output tokens. Its key advantage is enabling complex agentic and retrieval workflows without the latency penalties typically associated with higher test-time compute.
Where we’ll be
🔹 ICML 2026 — International Conference on Machine Learning
📍 Seoul, South Korea | 🗓️ July 6-11
ICML is one of the world’s leading machine learning conferences, highlighting frontier research across AI, data science, and applied domains from vision to robotics.
Stay ahead with AGI Advance
Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.
Ready to Optimize Your Model for Real-World Needs?
Partner with Turing to fine-tune, validate, and deploy models that learn continuously.

