Arabic Full-Duplex Conversational Dataset
MarketplaceConversational Arabic captured in full-duplex stereo across Levantine, Gulf, Egyptian, and Maghrebi dialects.
Overview
Naturalistic, two-speaker Arabic conversations captured at studio quality in full-duplex stereo. Pairs of native Arabic speakers from Egypt, the Levant, the Gulf, and the Maghreb discuss everyday topics for the full duration of the session — no read scripts, no scene cuts. Each recording preserves real overlapping speech, backchannels, hesitations, and code-switching, so downstream models train on the way Arabic actually sounds in the wild. Every clip is collected from paid contributors with explicit consent, scene-level provenance, and metadata for speaker demographics, dialect, and acoustic environment.
Key highlights
- 01
Levantine, Gulf, Egyptian, and Maghrebi dialects paired with explicit per-speaker dialect tags — not collapsed into a single "Arabic" label.
- 02
MSA (Modern Standard Arabic) ↔ colloquial register switching preserved within sessions, mirroring real bilingual cognitive patterns.
- 03
Religious greetings (As-salāmu ʿalaykum, in shāʾ Allāh), kinship terms, and culturally embedded honorifics captured naturally.
- 04
French and English loanwords from North African and Levantine speakers tagged at the utterance level for code-switching research.
- 05
Disfluencies — filled pauses (uh, um, hmm), false starts, self-repairs, hesitations, laughter, sighs, breath, and throat clears — are preserved with utterance-level timestamps rather than normalised away, so models can learn from them or filter them out as a first-class signal.
Technical specifications
Coverage
Hundreds of paired sessions from native Arabic speakers across the Arab world — coverage extends to bespoke dialects, age groups, and topical targets on request.
Capture specs
Stereo full-duplex audio at 48 kHz / 24-bit per channel from studio-grade microphones, with per-speaker channel isolation, calibrated noise floor, and continuous capture for the full lifespan of each session — not cherry-picked moments.
Annotations
Every session ships with rich speaker / contributor metadata (age, gender, region, dialect, native language, acoustic environment) plus an utterance-level annotation layer: emotion tags (joy, frustration, neutral, surprise, sadness, anger, amusement, empathy, and more), topic tags spanning everyday domains (work, family, sports, travel, health, finance, technology, food, pop culture, politics, education), intent labels (question, agreement, backchannel, hedge, interruption, repair, opinion), turn-taking markers (overlap onset/offset, gap, hold, yield), and prosody cues (pitch contour, stress, laughter, sighs, hesitation, code-switch boundaries). Custom annotation schemas — domain-specific intents, fine-grained emotion taxonomies, named-entity spans, sentiment scoring, or any task-specific labels — are available on request.
Use cases
- Full-duplex conversational AI training and evaluation
- Speaker diarization and Arabic ASR / TTS modelling
- Turn-taking, backchannel, and overlap-handling research
- Emotion-aware and intent-aware voice agent fine-tuning
- Voice agent benchmarks for natural, multi-party conversation
Request samples
Share your use case and we'll send sample clips, pricing, and recommended next steps for your pipeline.
More datasets
American English Full-Duplex Conversational Dataset
Two-speaker American English conversations captured in full-duplex stereo, covering everyday topics with overlapping speech, backchannels, and natural disfluencies preserved.
Languages
Countries