AGI Advance: Weekly AI & AGI Insights (Sept 16, 2025)

Turing Staff

17 Sep 2025•4 mins read

LLM training and enhancement

Stay ahead with AGI Advance

LLM training and enhancement

Welcome to AGI Advance, Turing’s weekly briefing on AI breakthroughs, AGI research, and industry trends.

This week, we shift focus from generation to evaluation, specifically, how code review is emerging as a practical benchmark for agent reliability. We explore why even top agents struggle to catch regressions in real PRs, how perturbative training reduces multimodal hallucination without compute tradeoffs, and what it looks like when millions of human–LLM dyads evolve into a collective SuperBrain.

What we're thinking

This week, we’re looking at how code review can act as a scalable eval signal, not just a productivity tool. Unlike full task execution or unit tests, review-based feedback is easier to collect, easier to interpret, and directly tied to developer intent.

Here’s what we’re seeing from recent experiments:

Binary reviews are high-signal, low-friction: By framing code review as a simple approval vs. request-changes task, we can benchmark agents more precisely with ground truth from real rejected PRs.
Agents often miss scope, not syntax: Failure cases weren’t about correctness, but about incomplete reasoning, ignoring request descriptions, missing the actual source of regressions, or approving unrelated test changes.
Code review is a harder eval than it looks: Even top models (GPT-5, Claude 4, Gemini 2.5) only reached 62–76% F1, with performance dropping on large diffs, refactors, and lower-resource languages like Go or Rust.

We think code review might be the most practical path to evaluating real-world coding agents, not by testing code generation, but by assessing whether agents can accurately identify incomplete, unsafe, or misaligned implementations.

What we're saying

🗣️ Jonathan Siddharth, Founder & CEO:

“We’re going to enter the agentic era where LLMs can amplify your productivity by 100x.

In this episode of B2BaCEO with Ashu Garg, Jonathan shares his vision for the agentic era, how we'll achieve artificial superintelligence, and why he believes that, within three years, even the CEO’s role will be automated.”

What we're reading

PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
This paper tackles multimodal hallucination with PerturboLLaVA, a training method that injects adversarial text perturbations to reduce language bias and force models to ground outputs in visual content. It introduces HalFscore, a new concept-graph-based metric that measures hallucination and completeness in dense captions. Compared to SOTA baselines, PerturboLLaVA improves F1 by +2.4, reduces object hallucinations by 33%, and adds no training or inference overhead, making it a scalable drop-in improvement for MLLMs.
Faster Cascades via Speculative Decoding
This paper proposes SpecCascade, a hybrid of speculative decoding and model cascading that optimizes when and how to defer to larger models. By using parallel verification to decide deferral dynamically, rather than sequentially, SpecCascade achieves up to 2.6× latency reduction over standard cascades, while improving output quality by up to +3 BLEU and +2 ROUGE across tasks like summarization, translation, and reasoning.
LLM-Assisted Iterative Evolution with Swarm Intelligence Toward Superbrain
This paper introduces SuperBrain, a multi-layer architecture where LLMs co-evolve with high-value users to form persistent “Subclass Brains.” These dyads undergo forward (user-side) and backward (LLM-side) evolutionary loops, refining prompts, fitness functions, and heuristics through genetic algorithms. At scale, millions of Subclass Brains interact through swarm intelligence to form a Superclass Brain: an emergent, adaptive intelligence layer capable of abstraction, generalization, and cross-domain synthesis. Tested on real-world scheduling tasks, the system outperformed human-only baselines and suggests a roadmap for scalable, explainable, and ethically-aligned collective AI.

Where we’ll be

Turing will be at two major AI conferences in the coming months—join us to discuss the future of AGI:

COLM 2025 [Montreal, Canada | Oct 7 – 10]
The Conference on Language Modeling (COLM) aims to create a community of researchers with expertise in different disciplines, focused on understanding, improving, and critiquing the development of LM technology.
NeurIPS 2025
[Mexico City | Nov 30 – Dec 5]
[San Diego Convention Center | Dec 2 – 7]
The Neural Information Processing Systems Foundation is a non-profit that promotes research in AI and ML by organizing a leading annual conference focused on ethical, diverse, and interdisciplinary collaboration.

If you’re attending, reach out—we’d love to connect and exchange insights!

Stay ahead with AGI Advance

Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.

[Subscribe & Read More]