Datasets

Filters:
Common Voice

Common Voice Spontaneous Speech 2.0 - Gheg Albanian

A collection of spontaneous spoken phrases in Gheg Albanian.
License Icon

License: CC0-1.0

Locale Icon

Locale: aln

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 200.85 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Adyghe

A collection of spontaneous spoken phrases in Adyghe.
License Icon

License: CC0-1.0

Locale Icon

Locale: ady

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 107.44 MB

Common Voice

Common Voice Spontaneous Speech 2.0 - Arvanitika

A collection of spontaneous spoken phrases in Arvanitika.
License Icon

License: CC0-1.0

Locale Icon

Locale: aat

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 46.68 MB

Common Voice

Common Voice Scripted Speech 24.0 - Teutila Cuicatec

A collection of scripted spoken phrases in Teutila Cuicatec.
License Icon

License: CC0-1.0

Locale Icon

Locale: cut

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 209.52 MB

Common Voice

Common Voice Scripted Speech 24.0 - Norwegian Nynorsk

A collection of scripted spoken phrases in Norwegian Nynorsk.
License Icon

License: CC0-1.0

Locale Icon

Locale: nn-NO

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 33.55 MB

My Cool Organization Changed Again

rm-vallader test

rm-vallader test
License Icon

License: BSD-3-Clause

Locale Icon

Locale: rm-vallader

Task Icon

Task: NLP

Format Icon

Format: MP3

Size Icon

Size: 2.63 MB

Mozilla

checksum dataset

License Icon

License: Apache-2.0

Locale Icon

Locale: en-US

Task Icon

Task: N/A

Format Icon

Format: Not specified

Size Icon

Size: 914.69 KB

Mozilla

dawdad

wadaddwa
License Icon

License: Apache-2.0

Locale Icon

Locale: awdad

Task Icon

Task: NLP

Format Icon

Format: awdawd

Size Icon

Size: 34.00 MB

MozFam

Common Voice AZ DF

That's my little test upload. It contains the cv 10 corpus for az.
License Icon

License: CC0-1.0

Locale Icon

Locale: az

Task Icon

Task: ASR

Format Icon

Format: mp3

Size Icon

Size: 3.41 MB

Mozilla

test

test
License Icon

License: Apache-2.0

Locale Icon

Locale: en-US

Task Icon

Task: N/A

Format Icon

Format: WAV

Size Icon

Size: 2.63 MB

Common Voice

Dataset for API & Python SDK Tests [Do not remove] - Mock Spontaneous Speech English

DO NOT DELETE. E2E tests of the Python SDK depend on this test dataset
License Icon

License: CC-BY-4.0

Locale Icon

Locale: en-US

Task Icon

Task: NLP

Format Icon

Format: CSV

Size Icon

Size: 119.84 KB

Community

Community Dataset

Community Dataset
License Icon

License: CC-BY-SA-4.0

Locale Icon

Locale: en-US

Task Icon

Task: RAG

Format Icon

Format: MP3

Size Icon

Size: 2.76 MB

My Cool Organization Changed Again

Community Dataset

My Community Dataset
License Icon

License: BSD-3-Clause

Locale Icon

Locale: en-US

Task Icon

Task: MT

Format Icon

Format: MP3

Size Icon

Size: 2.76 MB

Mozilla

Otro dataset bonito

Esta es una descripción bastante corta para describir mi dataset
License Icon

License: CC-BY-ND-4.0

Locale Icon

Locale: es_MX

Task Icon

Task: NLP

Format Icon

Format: wav

Size Icon

Size: 180.78 MB

Mozilla Foundation

test 3.0

testing stuff
License Icon

License: CC-BY-NC-SA-4.0

Locale Icon

Locale: en

Task Icon

Task: MT

Format Icon

Format: Not specified

Size Icon

Size: 914.69 KB

Common Voice

file upload edit2

test
License Icon

License: Apache-2.0

Locale Icon

Locale: en-CA

Task Icon

Task: TTS

Format Icon

Format: MP3

Size Icon

Size: 72.21 MB

Mozilla Foundation

Test 2.0

Test 2
License Icon

License: CC-BY-4.0

Locale Icon

Locale: nhi

Task Icon

Task: NLP

Format Icon

Format: TXT

Size Icon

Size: 914.69 KB

MoFo-BetaBugBash

JohannBetaBugBashDataset

My Beta Bug Bash Dataset
License Icon

License: CC-0

Locale Icon

Locale: en-US

Task Icon

Task: CALL

Format Icon

Format: MP3

Size Icon

Size: 7.57 MB

MDC

Antarctic Penguin Observation

A comprehensive collection of field observations of three Antarctic penguin species (Emperor, Adelie, Gentoo) gathered between 2015-2023.
License Icon

License: BSD Zero Clause License

Locale Icon

Locale: en-US

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 34.00 MB

Elotl

Otro bonito dataset

Este dataset es para probar que puedo subirlos a MDC
License Icon

License: CC-BY-4.0

Locale Icon

Locale: es-MX

Task Icon

Task: ASR

Format Icon

Format: MP3

Size Icon

Size: 3.15 MB

Elotl

My bonito dataset

Esta es una descripción muy adecuada para mi dataset. TQM Elotl
License Icon

License: CC-BY-4.0

Locale Icon

Locale: en-US

Task Icon

Task: NLP

Format Icon

Format: wav

Size Icon

Size: 3.15 MB

Common Voice

kostis-test-28oct

License Icon

License: cc

Locale Icon

Locale: en-US

Task Icon

Task: N/A

Format Icon

Format: Not specified

Size Icon

Size: 12.06 MB

Mozilla Foundation

ReRooted 1.0

A speech corpus of Syrian Armenian refugee testimonials
License Icon

License: GPL-3.0

Locale Icon

Locale: en-US

Task Icon

Task: OTH

Format Icon

Format: WAV, TSV

Size Icon

Size: 914.69 KB

Common Voice

md test

testing markdown
License Icon

License: cc-0

Locale Icon

Locale: en-US

Task Icon

Task: NLU

Format Icon

Format: mp3

Size Icon

Size: 2.76 MB