Improving LLM Performance with 4,000+ Apex and SOQL Notebook Tasks

Created Apex and SOQL datasets to simulate real developer workflows and support next-generation assistant development. The data enables LLMs to reason through syntax errors, refactor insecure logic, and translate natural language into structured queries with precision.

4,000+

structured training samples: Including Apex and SOQL/SOSL tasks.

99.6%

SOQL data acceptance rate: Reflects high-quality mapping and language precision.

100%

human-in-the-loop QA: Spanning issue validation, corrected code explanations, and prompt rewrites.

IndustrySoftware Development

Company typeEnterprise

CountryUnited States

Capabilites usedTuring AGI Advancement

Improving LLM Performance with 4,000+ Apex and SOQL Notebook Tasks

The Challenge

The client needed to improve LLM performance across two core developer capabilities:

Structured error intelligence: Existing Apex datasets lacked real-world coding mistakes and corrective logic, making it difficult for the model to identify and understand why a snippet was wrong.
Prompt-to-query mapping depth: SOQL examples lacked linguistic diversity and contextual reasoning, limiting the mode’s ability to translate complex user prompts into accurate and optimized SOQL or SOSL queries.

These gaps limited the model’s ability to perform context-aware reasoning, error detection, and syntactic generation, core capabilities for an enterprise-grade assistant.

The Approach

Dataset

To address these challenges, the team designed a two-phase approach focused on error comprehension and semantic translation.

a. Apex Notebooks

Built around a bad code → corrected code → explanation structure
Sourced and validated violations using PMD (Programming Mistake Detector)
Covered recurring anti-patterns such as hardcoded values, inefficient loops, unguarded triggers, DML misuse, and more
Enabled fine-grained error classification, correction logic, and instruction-following training

b. SOQL Notebooks

Each prompt simulated complex logic, including nested filters, multi-object joins, and aggregate queries
Expert-written responses translated prompts into optimized SOQL or SOSL
Commentary documented reasoning steps, constraints, and selection logic
Taught the model how to reason across both syntax and business logic

Evaluation

Each notebook task underwent a structured QA process:

Code correctness and formatting verified for Apex and SOQL responses
Natural-language clarity and diversity reviewed for prompt quality
Error explanations and reasoning steps checked for completeness and clarity
Standardized notebook templates enabled scalable reuse across future training cycles

Key Results

Delivered 2,000+ Apex samples featuring real and simulated logic flaws with detailed fix reasoning
Delivered 2,000+ SOQL/SOSL samples aligned to complex user intent, showcasing high linguistic and structural diversity
Achieved ~99.6% data acceptance rate for SOQL samples
Improved LLM precision in:
- Detecting and fixing insecure or invalid Apex logic
- Translating prompts to accurate, optimized SOQL queries
Provided a modular notebook format supporting future dataset expansion and fine-tuning

The Outcome

This dataset provided the client with a scalable, high-quality foundation for training models on real-world software development tasks. The notebooks improved:

Error detection and fix classification
Natural language to code translation
Model interpretability and reasoning traceability

The client can now apply this framework to build development assistants with stronger grounding in syntax, semantics, and best practices.

Want to evaluate your model’s code reasoning or query generation?

Request a sample notebook with realistic errors, corrected code, natural language prompts, step-by-step explanations, and QA-aligned notebook formatting.

Request Sample

What’s in the sample?

Each sample includes a task with prompt, code, corrections, and accompanying explanations.

Can this be used for fine-tuning?

Yes. The format is designed for supervised fine-tuning or instruction-tuning pipelines.

Is this usable across languages?

Yes. The dataset includes Python-style reasoning patterns in Apex and language-aligned prompts for query translation.

What’s the QA process?

Each task was validated by expert reviewers for correctness, clarity, and coverage dimensions.

What’s the NDA process?

A standard mutual NDA. Turing provides the countersigned agreement within one business day.

How fast can I get a sample?

Within three business days after NDA execution.

Related resources

Case Study

Evaluating Olympiad-Grade Math Reasoning for Salesforce AI Research

Read

Advancing Code-Based Physics and 2D-3D Simulation Understanding with 3,800+ Tasks

Case Study

Advancing Code-Based Physics & 2D/3D Simulation Understanding with 3,800+ Tasks

Read

Case Study

Creating a 1,500-Task Real-World Software Engineering Benchmark with E2E UI Test Oracles

Read

How well does your model follow multi-part developer instructions?

Request a QA-validated notebook task with structured prompts, fix logic, and query mappings.

Request Sample