AGI Advance: Weekly AI & AGI Insights (Nov 11, 2025)

Turing Staff

12 Nov 2025•3 mins read

LLM training and enhancement

What we're thinking

What we're celebrating

What we're reading

Where we’ll be

Stay ahead with AGI Advance

LLM training and enhancement

This week’s edition explores one of the most complex frontiers for LLMs: the financial services industry. From fraud detection to credit scoring, we examined where current models fall short, and where they might finally break through. Plus, don’t miss the latest research on deceptive dialogue, real-time reasoning agents, and a new benchmark for coding workflows.

What we're thinking

This week, we examined the role of LLMs in the financial services and payments industry, where billions of real-time, high-stakes transactions make model accuracy, explainability, and latency non-negotiable.

Here’s what we’re seeing:

Fraud and credit risk remain distinct ML problems: Fraud detection emphasizes real-time decisioning and low false negatives, using graph neural nets, RNN autoencoders, and human-in-the-loop setups. Credit scoring favors transparent models like logistic regression and XGBoost, often via teacher-student distillation.
Infrastructure and regulation are major blockers: Most financial institutions still operate on-prem, with private cloud GPU infrastructure years away. Data aggregation and compliance (especially PII removal) add friction to foundation model training.
Foundation model opportunities still exist: Agentic assistants for internal financial teams, fine-tuned small models for fraud detection, and metadata-enriched tokenization pipelines for tabular data are promising directions.

The gap between AI labs and financial institutions isn’t just technical, it’s regulatory, cultural, and infrastructural. Solving for it requires more than a model drop; it demands research-grade fine-tuning, data engineering, and system integration.

What we're celebrating

🗣️ Jonathan Siddharth, Founder & CEO:

Turing was honored to be represented at the Digital Government Authority Conference in Saudi Arabia, where Jonathan met HE Ahmed Alsuwaiyan, Governor of the DGA, and delivered a keynote on AI Trends and the Positive Impact on Government Sector.

What we're reading

Evaluating & Reducing Deceptive Dialogue from Language Models with Multi-Turn RL
LLMs exhibit deceptive behavior in ~26% of dialogue turns, even without explicit deceptive prompts, and up to 43% when trained with RLHF. This study introduces a novel metric, belief misalignment, to better quantify deception by measuring how far a listener's beliefs drift from the truth over time. It shows this aligns more closely with human judgment than existing metrics. Using multi-turn reinforcement learning and this metric as a reward signal, the researchers fine-tuned LLMs to reduce deceptive behavior by 77.6%, offering a new pathway to safer, more trustworthy dialogue systems.
SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models
SWE-Compass introduces a large-scale benchmark designed to evaluate coding agents across 2,000 real GitHub-derived tasks, spanning 8 task types, 8 engineering scenarios, and 10 programming languages. Unlike prior benchmarks focused narrowly on bug fixing, SWE-Compass includes performance optimization, test generation, refactoring, and more. The benchmark supports two agent workflows: SWE-Agent and Claude Code, and shows that top models like Claude 4 and Qwen 3 perform best when agents are tailored to deterministic workflows.
Real-Time Reasoning Agents in Evolving Environments
Researchers introduce Real-Time Reasoning Gym, an environment designed for agents to reason in dynamic environments. Unlike traditional static benchmarks, the environment here evolves continuously, even while the agent is reasoning. The team proposes AgileThinker, a dual-thread architecture combining a fast-reactive agent and a deeper-planning agent, enabling robust real-time behavior. Tested across games like Snake, Freeway, and Overcooked, AgileThinker outperforms single-paradigm agents, especially under high cognitive load or time pressure. Its gains persist even in real-world wall-clock time settings, offering a blueprint for latency-sensitive, high-stakes applications like robotics or finance.

Where we’ll be

Turing will be at this major AI conference in the coming month—join us to discuss the future of AGI:

NeurIPS 2025
[Mexico City | Nov 30 – Dec 5]
[San Diego Convention Center | Dec 2 – 7]
The Neural Information Processing Systems Foundation is a non-profit that promotes research in AI and ML by organizing a leading annual conference focused on ethical, diverse, and interdisciplinary collaboration.

If you’re attending, reach out—we’d love to connect and exchange insights!

Stay ahead with AGI Advance

Turing is leading the charge in bridging AI research with real-world applications. Subscribe to AGI Advance for weekly insights into breakthroughs, research, and industry shifts that matter.

[Subscribe & Read More]