Improving LLM Performance with 4,000+ Apex and SOQL Notebook Tasks

Created Apex and SOQL datasets to simulate real developer workflows and support next-generation assistant development. The data enables LLMs to reason through syntax errors, refactor insecure logic, and translate natural language into structured queries with precision.

4,000+

structured training samples: Including Apex and SOQL/SOSL tasks.

99.6%

SOQL data acceptance rate: Reflects high-quality mapping and language precision.

100%

human-in-the-loop QA: Spanning issue validation, corrected code explanations, and prompt rewrites.

IndustrySoftware Development
Company typeEnterprise
CountryUnited States
Capabilites usedTuring AGI Advancement
Improving LLM Performance with 4,000+ Apex and SOQL Notebook Tasks

The Challenge

The client needed to improve LLM performance across two core developer capabilities:

  • Structured error intelligence: Existing Apex datasets lacked real-world coding mistakes and corrective logic, making it difficult for the model to identify and understand why a snippet was wrong.
  • Prompt-to-query mapping depth: SOQL examples lacked linguistic diversity and contextual reasoning, limiting  the mode’s ability to translate complex user prompts into accurate and optimized SOQL or SOSL queries.

These gaps limited the model’s ability to perform context-aware reasoning, error detection, and syntactic generation, core capabilities for an enterprise-grade assistant.

The Approach

Dataset

To address these challenges, the team designed a two-phase approach focused on error comprehension and semantic translation.

a. Apex Notebooks

  • Built around a bad code → corrected code → explanation structure
  • Sourced and validated violations using PMD (Programming Mistake Detector)
  • Covered recurring anti-patterns such as hardcoded values, inefficient loops, unguarded triggers, DML misuse, and more
  • Enabled fine-grained error classification, correction logic, and instruction-following training

b. SOQL Notebooks

  • Each prompt simulated complex logic, including nested filters, multi-object joins, and aggregate queries
  • Expert-written responses translated prompts into optimized SOQL or SOSL
  • Commentary documented reasoning steps, constraints, and selection logic
  • Taught the model how to reason across both syntax and business logic

Evaluation

Each notebook task underwent a structured QA process:

  • Code correctness and formatting verified for Apex and SOQL responses
  • Natural-language clarity and diversity reviewed for prompt quality
  • Error explanations and reasoning steps checked for completeness and clarity
  • Standardized notebook templates enabled scalable reuse across future training cycles

Key Results

  • Delivered 2,000+ Apex samples featuring real and simulated logic flaws with detailed  fix reasoning
  • Delivered 2,000+ SOQL/SOSL samples aligned to complex user intent, showcasing high linguistic and structural diversity
  • Achieved ~99.6% data acceptance rate for SOQL samples
  • Improved LLM precision in:
    - Detecting and fixing insecure or invalid Apex logic
    - Translating prompts to accurate, optimized SOQL queries
  • Provided a modular notebook format supporting future dataset expansion and fine-tuning

The Outcome

This dataset provided the client with a scalable, high-quality foundation for training models on real-world software development tasks. The notebooks improved:

  • Error detection and fix classification
  • Natural language to code translation
  • Model interpretability and reasoning traceability

The client can now apply this framework to build development assistants with stronger grounding in syntax, semantics, and best practices.

Want to evaluate your model’s code reasoning or query generation?

Request a sample notebook with realistic errors, corrected code, natural language prompts, step-by-step explanations, and QA-aligned notebook formatting.

Request Sample

Share

FAQ

What’s in the sample?

Each sample includes a task with prompt, code, corrections, and accompanying explanations.

Can this be used for fine-tuning?

Yes. The format is designed for supervised fine-tuning or instruction-tuning pipelines.

Is this usable across languages?

Yes. The dataset includes Python-style reasoning patterns in Apex and language-aligned prompts for query translation.

What’s the QA process?

Each task was validated by expert reviewers for correctness, clarity, and coverage dimensions.

What’s the NDA process?

A standard mutual NDA. Turing provides the countersigned agreement within one business day.

How fast can I get a sample?

Within three business days after NDA execution.

How well does your model follow multi-part developer instructions?

Request a QA-validated notebook task with structured prompts, fix logic, and query mappings.

Request Sample