ProductHi-Fi, Multilingual Full-Duplex Datasets

Full-Duplex Conversational Datasets

Two-speaker conversations captured in full-duplex stereo across languages and dialects, with overlapping speech, backchannels, and natural disfluencies preserved.

Languages: American English, French, Arabic, Spanish, Indonesian, Russian, Mandarin, Vietnamese, Japanese, Thai, German, Hindi, Korean, and Polish

View Marketplace Read Blog

Explore Datasets

Encoding Human Expertise into Machines.

Human voice, vision, reasoning, & expertise — encoded into training data by our Expert Network and Data Foundry.

Get Data

10hr Multi-Accent English ASR Dataset

An Open Source, Multi-Accent English speech dataset spanning 11 countries, balanced across genders and accents for Automatic Speech Recognition (ASR) models. 7,377 recordings totaling 10.25 hours of audio.

View Dataset Read Blog

French, Male

The big red apple fell to the ground.

Built By Engineers From

Backed By

The Problem

AIcansolveolympiadproblems.Itstilllackshumannuance.

We'reinaTechnologicalRenaissance.Modelshavememorizedtheinternet.Theycanwriteessays,passbarexams,andprovetheorems.Butaskonetonegotiateadeal,comfortagrievingpatient,orspeakwiththewarmthandtimingofarealhumanvoice—andtheillusionbreaks.

Humanexpertiseisstaggeringlycomplex.Itspanseverymodalityandeveryculture—howasurgeonseestheoneshadowonascanthatchangeseverything,howatraderhearsriskinapausebetweenwords,howatherapistreadsafacebeforeasinglesentenceisspoken,howmeaningshiftsbetweenlanguages,accents,anddialectsthatnomodelwastrainedtounderstand.

Noneofthiswaseverinthetrainingdata.Scalingcomputewon'tconjureit.Syntheticdatawon'tapproximateit.Thebottleneckwasneverintelligence—it'stherichnessoflivedhumanexperience.

Our Solution

Weencodehumanexpertiseintomodelsthatworkfortherealworld.

We'reanappliedresearchlabbuildingthedatainfrastructureandhumanexpertisenetworktoencodereal-worldknowledgeintofrontiermodels—acrosseverymodality,language,anddomain.

Wepartnerwitheliteprofessionalstocapturewhattheyactuallydo.Thereasoningbehindadiagnosis.Theinstinctinanegotiation.Thecadenceofanativespeaker.Theengineeringinsightinadesigndecision.Themicro-expressionsamachinehasneverbeentaughttosee.

ThisflowsthroughourDataFoundry—apurpose-builtenginethattransformsrawexpertiseintostructuredtrainingdata,alignmentsignals,andrigorousevaluationsatscale.FromPhDmathematiciansandvoiceactorstoconstitutionallawyersandlinguists,everydiscipline,accent,anddialectgetsitsownpipeline.

Expert-Level Training Data

Datasets

Alignments

Evals

Benchmarks

Data Foundry

Tasks, Tools, RL Environments, & Rubrics

Elite Expert Network

Domain Experts

Linguists

Researchers

Global Workforce

Voice & Speech Data

Models can speak. Teaching them how to sound human is the real work.

Voice is becoming the primary AI interface. Every frontier lab is moving voice-first, and users no longer judge an assistant by what it knows — they judge it by how it sounds. By naturalness, emotional intelligence, responsiveness, the prosody of a real human voice, and the millisecond timing of an actual conversation.

Speech is harder than text. The same sentence can read sarcastic, calm, anxious, or confident. Audio carries accent, code-switching, room noise, mispronunciation, and the timing of barge-in and backchannel. Right-vs-wrong benchmarks break down — voice is evaluated subjectively, line by line, by the people who hear it.

The durable moat is the data around the model: full-duplex captures, emotional tagging, prosodic markers, scenario-anchored conversations, human preference data, and the evaluation loops that turn raw audio into training signal. This is the catalogue we ship to the labs building the next generation of conversational AI.

Full-Duplex Conversational Datasets

Two-speaker conversations captured at 48 kHz with isolated channels, overlap, backchannels, and barge-in preserved verbatim — the training audio behind real-time, conversational voice agents.

Listening…

Domain-Specific Speech Datasets

Task-anchored sessions across medical intake, customer support, technical interviews, and emergency calls — tagged by scenario, role, and intent for vertical voice agents.

The odor of spring makes young hearts jump.

Scripted Voice Datasets

Single-speaker performance reads from voice actors and trained narrators, phonetically balanced with controlled emotion ranges and multiple takes per line — production-grade material for TTS, voice cloning, and speech-to-speech.

Transcription00:00:12

Yeah, so I was thinking, <breathe/> maybe we could push the release until [hesitation] next Thursday? [laughter]

That's not a bad idea, actually. [agreement] Let me check the calendar.

Annotation & Evaluation Datasets

Word-level transcripts, diarization, prosodic markers, scenario and role labels, continuous emotional tagging, and human preference scores — the training signal that turns raw audio into controllable, evaluable speech.

Available in 40+ languages