This week’s edition focuses on agent safety in the real world. Turing built a dataset of 24,000+ multi-turn conversations, capturing how AI agents make decisions across tool use, refusals, and final response, annotated step by step across 30+ safety dimensions. Additionally, Jonathan Siddharth speaks at Axios House in Davos about why enterprise is the proving ground for superintelligence, and we dig into new research on dynamic context discovery, tool orchestration, and execution-grounded data generation.
This week, we’re spotlighting how Turing helped a client build a dataset of 24,000+ multi-turn conversations to evaluate and improve agent safety in tool-rich environments. Unlike traditional datasets focused only on final responses, this effort supervised every step, from tool calls to confirmations, refusals, and rewrites across 30+ safety dimensions.
Here’s what we delivered:
💡 Real safety failures don’t just happen at the final output; they unfold across decisions, tools, and turns. This dataset captures them all.
🗣️ Jonathan Siddharth at Axios House, Davos
In a conversation with Axios Publisher Nicholas Johnston, Turing CEO Jonathan Siddharth made the case for why enterprise is the real proving ground for superintelligence.
“The models are capable of X, but we’re only extracting X minus delta of value. Closing that gap is where the next breakthroughs will come from.”
Jonathan shared why real-world deployment across banks, life sciences, and government is the key to uncovering model failure modes, surfacing missing enterprise knowledge, and building the systems where intelligence becomes infrastructure.
Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.
Partner with Turing to fine-tune, validate, and deploy models that learn continuously.