AGI Advance: Weekly AI & AGI Insights (Sept 23, 2025)

Turing Staff

26 Sep 2025•3 mins read

LLM training and enhancement

Stay ahead with AGI Advance

LLM training and enhancement

Welcome to AGI Advance, Turing’s weekly briefing on AI breakthroughs, AGI research, and industry trends.

This week, we explore a foundational shift in how AI systems are trained: from passive learning on scraped text to active learning in simulated environments. RL gyms, domain-specific training environments built with real workflows, tools, and verifiers, are fast becoming essential infrastructure for enterprise AI. We also look at how reasoning can emerge through pure RL, why usage patterns show ChatGPT’s impact beyond work, and what batch-invariant inference unlocks for model reliability and reward learning.

What we're thinking

This week, we’re looking at how RL environments, once the domain of games and navigation agents, are becoming critical infrastructure for enterprise AI. At Turing, we call these RL gyms: curated environments where agents don’t just learn from data, they learn from experience.

Here’s what we’re seeing:

RL gyms simulate enterprise workflows: These are structured with prompts, tools, and verifiers. Think DCF modeling, performance marketing ops, or multi-modal navigation of complex UI layers.
Human feedback is built into the loop: As agents explore action paths, domain experts label, correct, or nudge the trajectory, producing rich evaluative data that improves with every cycle.
Agents train by playing against themselves: Like AlphaGo’s Move 37 moment, enterprise agents can now improve by testing strategies, self-verifying outputs, and iterating within closed environments designed for verifiability and reward.

As models begin learning real-world workflows, realism matters more than ever. With RL environments and expert training data, Turing is powering the next generation of AI co-workers.

What we're saying

🗣️ Nishad Acharya, Head of Fulfillment:

“There's enormous opportunity not just for language specialists but also for interdisciplinary talent who can fine-tune AI systems for healthcare, education, and finance.

As AI companies push to make their models relevant across cultures and geographies, linguistic expertise has emerged as a critical skill in the AI economy.”

What we're reading

DeepSeek-R1 Incentivizes Reasoning in LLMs Through Reinforcement Learning
This Nature paper shows that advanced reasoning in LLMs can emerge through pure reinforcement learning, without supervised traces. By optimizing only for correct final answers, DeepSeek-R1-Zero learned to self-reflect, verify, and adapt strategies, reaching 86.7% on AIME 2024, surpassing human competitors. The full DeepSeek-R1 pipeline adds instruction tuning and preference alignment, balancing reasoning skill with general usability.
How People Are Using ChatGPT
OpenAI and Harvard researchers analyzed 1.5 million ChatGPT chats from 700M global users, in the largest-ever consumer study of LLM usage. They found that non-work use now accounts for 73% of conversations, with "Asking" overtaking "Doing" as the dominant user intent. Writing and practical guidance dominate workplace use, while global adoption has surged, especially in low- and middle-income countries.
Defeating Nondeterminism in LLM Inference
This study shows that LLM inference is often nondeterministic not because of concurrency or floating-point math, but because of batch-size-dependent kernel behavior. The authors introduce batch-invariant kernels for RMSNorm, matmul, and attention, and demonstrate fully deterministic completions using Qwen-3 with vLLM, achieving identical output across 1,000 completions at temperature 0.

Where we’ll be

Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:

COLM 2025 [Montreal, Canada | Oct 7 – 10]
The Conference on Language Modeling (COLM) aims to create a community of researchers with expertise in different disciplines, focused on understanding, improving, and critiquing the development of LM technology.
NeurIPS 2025
[Mexico City | Nov 30 – Dec 5]
[San Diego Convention Center | Dec 2 – 7]
The Neural Information Processing Systems Foundation is a non-profit that promotes research in AI and ML by organizing a leading annual conference focused on ethical, diverse, and interdisciplinary collaboration.

If you’re attending, reach out—we’d love to connect and exchange insights!

Stay ahead with AGI Advance

Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.

[Subscribe & Read More]