PC-Mix pairs speech from PartialSpoof v1.2 with a self-curated partial-spoof background pool to create controlled speech–background authenticity combinations. The original samples are collected from VGGSound and AudioCaps.
| Split | Mixed Samples | Original Samples | Background Source | Event Source | Fusion Method |
|---|---|---|---|---|---|
| Train | 25,380 | 6,345 | SONYC |
60% AudioLDM2 40% UrbanSound8K (0–1s) |
Ducking Overlay |
| Dev | 24,844 | 6,211 | IHTApark-UBS |
60% AudioLDM2 40% ESC-50 (0–1s) |
Ducking Overlay |
| Subset | Mixed | Original | Background | Event | Fusion |
|---|---|---|---|---|---|
| E0 Baseline | 17,809 | 17,809 | SONYC |
60% AudioLDM2 40% FSD50K (0–1s) |
Ducking Overlay |
| E1 Generator OOD | 14,247 | - | SONYC | AudioGen | Ducking Overlay |
| E2 Fusion OOD | 14,247 | - | SONYC | AudioLDM2 |
Energy Matching + Crossfade (20–80ms) |
| E3 Background OOD | 17,809 | - | DEMAND | AudioLDM2 | Ducking Overlay |
| E4 Noise OOD | 7,125 | - | WHAM Noise | AudioLDM2 | Ducking Overlay |
| Type | Audio |
|---|---|
| Original | |
| Speech Bona + Background Bona | |
| Speech Bona + Background Spoof | |
| Speech Spoof + Background Bona | |
| Speech Spoof + Background Spoof |