ServiceNow partnered with Turing to build a high-quality SFT dataset spanning reasoning, code, Glide scripting, function calling, and complex instruction following. This dataset powered the training of Apriel-1.5, a 15 billion parameter model that achieved frontier-level benchmark scores before reinforcement, matching much larger models while remaining deployable on a single GPU.

ServiceNow aimed to build a small but extremely capable reasoning model that:
Achieving this capability within a 15 billion parameter constraint required domain-balanced, high-signal SFT data, with not only quantity but high-quality coverage across diverse taxonomies.
Turing built a 390,000+ sample SFT dataset precisely structured to uplift all major capability axes that ServiceNow targeted.
i. Code & Glide domain (ServiceNow platform)
The team created domain-specific data to train the model to:
ii. Function calling & tool-use
Turing developed thousands of structured examples that taught the model to:
Impact:
Improved scores on benchmarks measuring structured reasoning and task execution, such as IFBench and Tau.
iii. Complex instruction following (CIF)
Tasks spanned:
The dataset followed a strict distribution of 25% single-turn and 75% multi-turn data. Model improvements were tested on a small sample to validate uplift statistics.


Impact:
Significant uplift in deterministic, schema-controlled outputs - critical for enterprise usage.
iv. Complex reasoning & agentic tasks
We constructed reasoning tasks requiring:
Impact:
Material improvements in long-horizon reasoning contributed to Artificial Analysis 52 and Tau 68 results.
We used a multi-layer QA pipeline ensuring world-class quality:
Turing’s 390K-sample SFT dataset helped ServiceNow train Apriel-1.5-15B-Thinker, a 15B model that matches frontier model capabilities at a fraction of the size.
Highlights from ServiceNow’s benchmark release:
Turing contributed a high-quality SFT corpus, across reasoning, instruction following, code generation, function calling, and ServiceNow’s domain (Glide), that lifted the model’s mid-training capabilities.
Request a sample with multi-turn SFT samples for code generation, reasoning, and tool use, grounded in real automation tasks.
Request SampleThe dataset includes reasoning, code generation, complex instruction following, function calling, agentic task planning, and Glide (ServiceNow’s proprietary scripting language).
Yes. Turing provided extensive domain-specific SFT focused on writing and debugging Glide scripts, platform workflows, and Now Platform automation logic.
All 390,000 samples were curated and reviewed through a three-layer QA system, including LLM-assisted coverage checks and calibration from domain experts.
Yes. The dataset contributed to benchmark wins during mid-training, showing significant uplift before reinforcement stages.
It was taxonomy aligned, with structured distributions across multi-turn vs single-turn, code vs non-code, and agentic versus declarative workflows, designed to match ServiceNow’s internal modeling goals.
A standard mutual NDA. Turing provides the countersigned agreement within one business day.
Within three business days after NDA execution.
Request curated, benchmark-aligned datasets across reasoning, code, and enterprise domains.