Created a safety supervision dataset that captures end-to-end agent behavior across multi-turn interactions, including tool calls, confirmations, refusals, and final responses. Each conversation was annotated for safety-relevant behavior with policy-compliant rewrites for identified violations.

Traditional safety datasets often focus on final model outputs but overlook safety decisions made during multi-turn interactions involving tools, confirmations, and intermediate reasoning.
The client required a dataset that could:
Turing deployed a team of trained safety annotators and reviewers to evaluate agent tool-use conversations against the client’s safety policies and the taxonomies. The workflow emphasized end-to-end supervision, explicit critique, and corrective rewrites, supported by both manual and automated quality controls.
1. Task design and coverage
Each task consisted of a realistic, multi-turn conversation in which an agent interacted with one or more tools to complete a user request. Tasks were designed to cover:
All tasks were tagged with structured metadata, including locale, risk category, harmfulness level, task type, tool call counts, and confirmation usage.
2. Safety annotation and labeling
Annotators evaluated each conversation step by step, applying safety-relevant labels that reflect both the task type and policy expectations, including:
3. Full rewrite to policy-compliant ideal responses
When a violation was identified, annotators provided a full rewrite of the agent’s response that:
Each conversation was continued through completion using the corrected responses, ensuring a fully policy-compliant trajectory.
4. Multi-pass human review and adjudication
All annotated conversations underwent multi-pass human review:
Reviewers acted as arbiters to determine whether issues originated from annotation, policy interpretation, or task construction.
5. Automated quality checks
In parallel with human review, automated checks enforced process consistency and policy alignment, including:
These checks ensured that every conversation met baseline technical and policy constraints before final acceptance.
The client now has a high-fidelity dataset to:
This dataset provides a strong foundation for improving safety, alignment, and reliability in agentic systems operating in real-world tool environments.
Request a sample of supervised agent tool-use conversations with safety labels, critiques, and policy-compliant rewrites.
Request SampleThis dataset supervises full agent behavior, including tool calls, confirmations, refusals, and final responses, rather than labeling only single-turn outputs.
No. The dataset is designed for safety fine-tuning, calibration, and evaluation, not large-scale pre-training.
Violations were identified through annotation and review, then corrected with a full rewrite to a policy-compliant ideal response.
The process combined multi-pass human review, reviewer adjudication with explicit decision outcomes, and automated checks for schema validity, metadata completeness, tool usage, and policy alignment.
A standard mutual NDA. Turing provides the countersigned agreement within one business day.
Within three business days after NDA execution.
Request safety supervision datasets that capture end-to-end agent behavior across real workflows.
Weekly updates on frontier benchmarks, evals, fine-tuning, and agentic workflows read by top labs and AI practitioners.