Are you sure you want to delete this access key?
A simple audio/speech dataset consisting of recordings of spoken digits in wav
files at 8kHz. The recordings are trimmed so that they have near minimal silence at the beginnings and ends.
FSDD is an open dataset, which means it will grow over time as data is contributed. In order to enable reproducibility and accurate citation the dataset is versioned using Zenodo DOI as well as git tags
.
Files are named in the following format:
{digitLabel}_{speakerName}_{index}.wav
Example: 7_jackson_32.wav
metadata.py
contains meta-data regarding the speakers gender and accents.
trimmer.py
Trims silences at beginning and end of an audio file. Splits an audio file into multiple audio files by periods of silence.
fsdd.py
A simple class that provides an easy to use API to access the data.
spectogramer.py
Used for creating spectrograms of the audio data. Spectrograms are often a useful pre-processing step.
The test set officially consists of the first 10% of the recordings. Recordings numbered 0-4
(inclusive) are in the test and 5-49
are in the training set.
Did you use FSDD in a paper, project or app? Add it here!
Creative Commons Attribution-ShareAlike 4.0 International
Check this dataset in Dagshub
Press p or to see the previous file or, n or to see the next file
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?
Are you sure you want to delete this access key?