Speech Recognition Datasets for Congolese Languages
2023
- 350Usage
Metric Options: CountsSelecting the 1-year or 3-year option will change the metrics count to percentiles, illustrating how an article or review compares to other articles or reviews within the selected time period in the same journal. Selecting the 1-year option compares the metrics against other articles/reviews that were also published in the same calendar year. Selecting the 3-year option compares the metrics against other articles/reviews that were also published in the same calendar year plus the two years prior.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Example: if you select the 1-year option for an article published in 2019 and a metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019. If you select the 3-year option for the same article published in 2019 and the metric category shows 90%, that means that the article or review is performing better than 90% of the other articles/reviews published in that journal in 2019, 2018 and 2017.
Citation Benchmarking is provided by Scopus and SciVal and is different from the metrics context provided by PlumX Metrics.
Metrics Details
- Usage350
- Views329
- Downloads21
Dataset Description
This dataset contains two new benchmark corpora designed for low-resource languages spoken in the Democratic Republic of the Congo: The Lingala Read Speech Corpus LRSC, with 4.3 hours of labelled audio, and the Congolese Speech Radio Corpus CSRC, which offers 741 hours of unlabeled audio spanning four significant low-resource languages of the region (Lingala, Tshiluba, Kikongo and Congolese Swahili). Collecting speech and audio for this dataset involved two sets of processes: (1) for LRSC, 32 Congolese adult participants were instructed to sit in a relaxed manner within centimetres of an audio recording device or smartphone and read from the text utterances; (2) for CSRC, recording from the archives of a broadcast station were pre-processed and curated. Congolese languages tend to fall into the “low-resource” category, which, in contrast to “high-resource” languages, has fewer datasets accessible, limiting the development of Conversational Artificial Intelligence. This results in cr...
Bibliographic Details
Provide Feedback
Have ideas for a new metric? Would you like to see something else here?Let us know