Frontier AI systems are often framed as a scaling problem. However, as models begin to tackle advanced knowledge work, the bottleneck is shifting from data volume to access to talent. This week, we introduce Turing Frontier, a platform giving AI labs direct access to rigorously vetted U.S.-based experts across engineering, science, and enterprise domains.
We also cover new research on benchmarking multimodal agents in real computer environments, optimizing model harnesses end-to-end, and improving long-context reasoning efficiency.
What we're doing
This week, we launched Turing Frontier, a platform giving AI labs direct access to rigorously vetted U.S.-based experts across software engineering, STEM, and enterprise domains including finance, legal, medicine, energy, and manufacturing.
What Turing Frontier enables:
- Identity-verified, elite domain experts across critical industries
- Expert-designed prompts, RL environments, and evaluation frameworks
- High-signal data generation and validation grounded in real workflows
💡 Frontier models improve when they are shaped by professionals who actually perform the work and can build the evaluation systems around it.
What we're reading
- OSWORLD: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
This paper introduces OSWorld, the first scalable benchmark with a real, interactive computer environment for evaluating multimodal agents across OS-level tasks. It includes 369 real-world tasks spanning web, desktop apps, file systems, and multi-app workflows, with execution-based evaluation for reliable measurement. - Results show a large capability gap: while humans achieve ~72% success, the best models reach only ~12%, struggling with GUI grounding, action precision, and multi-step workflows.
- Meta-Harness: End-to-End Optimization of Model Harnesses
This paper introduces Meta-Harness, a framework that automatically optimizes the harness, including the code controlling context, memory, and tool use around LLMs. Instead of manual tuning, it uses an agent to iteratively propose and evaluate new harnesses, leveraging full execution history (code, traces, scores) stored in a filesystem for long-horizon optimization. - Across tasks, Meta-Harness delivers strong gains: +7.7% on text classification with 4× fewer tokens, +4.7% on IMO-level math reasoning, and state-of-the-art performance on TerminalBench-2, surpassing hand-engineered agent systems.
- TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
This paper introduces TriAttention, a KV cache compression method that improves long-context reasoning efficiency by leveraging a new observation: pre-RoPE query/key vectors are highly concentrated, enabling attention to be approximated as a trigonometric function of token distance. - Using this, TriAttention scores and retains only important tokens, combining distance-based (trigonometric) scoring with norm signals. On AIME25, it matches full attention accuracy while achieving 2.5× higher throughput and 10.7× KV memory reduction, significantly outperforming prior compression methods.
Where we’ll be
ICLR- The International Conference on Learning Representations
🔹 LLM Researchers Happy Hour During ICLR- April 23
📍 Rio de Janeiro, Brazil | 🗓️ April 23 - 27
📌 Booth #301
ICLR focuses on cutting-edge research in deep learning, highlighting advancements in representation learning, optimization, and AI theory.
AI Dev 26- The AI Developers Conference
🔹 LLM Researchers Happy Hour During AI Dev- April 28
📍 San Francisco, California | 🗓️ April 28 - 29
AI Dev brings together developers for hands-on AI workshops, expert talks, startup showcases, and live demos focused on real-world AI systems.
Stay ahead with AGI Advance
Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.
Ready to Optimize Your Model for Real-World Needs?
Partner with Turing to fine-tune, validate, and deploy models that learn continuously.

