TIMIT

TIMIT is a corpus of phonemically and lexically transcribed speech of American English speakers of different sexes and dialects. Each transcribed element has been delineated in time.

TIMIT was designed to further acoustic-phonetic knowledge and automatic speech recognition systems. It was commissioned by DARPA and corpus design was a joint effort between the Massachusetts Institute of Technology, SRI International, and Texas Instruments (TI). The speech was recorded at TI, transcribed at MIT, and verified and prepared for publishing by the National Institute of Standards and Technology (NIST).[1] There is also a telephone bandwidth version called NTIMIT (Network TIMIT).

TIMIT and NTIMIT are not freely available — either membership of the Linguistic Data Consortium, or a monetary payment, is required for access to the dataset.

History

The TIMIT telephone corpus was an early attempt to create a database with speech samples.[2] It was published in the year 1988 on CD-ROM and consists of only 10 sentences per speaker. Two 'dialect' sentences were read by each speaker, as well as another 7 sentences selected from a larger set [3] Each sentence is 30 seconds long and is spoken by 630 different speakers.[4] It was the first notable attempt in creating and distributing a speech corpus and the overall project has produced costs of 1.5 million US$.[5]

The full name of the project is DARPA-TIMIT Acoustic-Phonetic Continuous Speech Corpus[6] and the acronym TIMIT stands for Texas Instruments/Massachusetts Institute of Technology. The main reason why a corpus of telephone speech was created was to train speech recognition software. In the Blizzard challenge, different software has the obligation to convert audio recordings into textual data and the TIMIT corpus was used as a standardized baseline.[7]

Machine Learning Method Comparison


A comparison of phoneme recognition methods on the TIMIT dataset
StudyMethodAccuracy (%)
Cao and Fan[8]KIRF93.1
Bird et al.[9]DEvo MLP92.85
Cao and Fan[8]NPCD/MPLSR92.8
Cao and Fan[8]NPCD/PCA92.1
Cao and Fan[8]MPLSR91.1
Cao and Fan[8]PDA/Ridge91.1
Li and GhosalUMP89.25
Li and GhosalMLO85.25
Li and GhosalQDA83.75
Ager et al.GMM81.5
Li and Yu[10]FSDA81.5
Li and Yu[10]FSVM78

See also

  • Comparison of datasets in machine learning

References

  1. Fisher, William M.; Doddington, George R.; Goudie-Marshall, Kathleen M. (1986). The DARPA Speech Recognition Research Database: Specifications and Status. pp. 93–99.
  2. Morales, Nicolas and Tejedor, Javier and Garrido, Javier and Colas, Jose and Toledano, Doroteo T (2008). "STC-TIMIT Generation of a single-channel telephone corpus". Proceedings of the Sixth International Language Resources and Evaluation (LREC'08): 391–395.CS1 maint: multiple names: authors list (link)
  3. Lori F Lamel and Robert H. Kassel and Stephanie Seneff (1986). Speech Database Development: Design and Analysis of the Acoustic-Phonetic Corpus (Technical report). DARPA (SAIC-86/1546).
  4. John S Garofolo and Lori F Lamel and William M Fisher and Jonathan G Fiscus and David S Pallett and Nancy L Dahlgren (1993). DARPA TIMIT: (Technical report). National Institute of Standards and Technology. doi:10.6028/nist.ir.4930.
  5. Nattanun Chanchaochai and Christopher Cieri and Japhet Debrah and Hongwei Ding and Yue Jiang and Sishi Liao and Mark Liberman and Jonathan Wright and Jiahong Yuan and Juhong Zhan and Yuqing Zhan (2018). GlobalTIMIT: Acoustic-Phonetic Datasets for the World's Languages. Interspeech 2018. ISCA. doi:10.21437/interspeech.2018-1185.
  6. Bauer, Patrick and Scheler, David and Fingscheidt, Tim (2010). WTIMIT: The TIMIT Speech Corpus Transmitted Over The 3G AMR Wideband Mobile Network. LREC.CS1 maint: multiple names: authors list (link)
  7. Sawada, Kei and Asai, Chiaki and Hashimoto, Kei and Oura, Keiichiro and Tokuda, Keiichi (2016). The NITech text-to-speech system for the Blizzard Challenge 2016. Blizzard Challenge 2016 Workshop.CS1 maint: multiple names: authors list (link)
  8. Cao, Jiguo; Fan, Guangzhe (2010). Signal Classification Using Random Forest with Kernels. IEEE. doi:10.1109/aict.2010.81. ISBN 978-1-4244-6748-8.
  9. Bird, Jordan J.; Wanner, Elizabeth; Ekárt, Anikó; Faria, Diego R. (2020). "Optimisation of phonetic aware speech recognition through multi-objective evolutionary algorithms". Expert Systems with Applications. Elsevier BV. 153: 113402. doi:10.1016/j.eswa.2020.113402. ISSN 0957-4174.
  10. Li, Bin; Yu, Qingzhao (2008). "Classification of functional data: A segmentation approach". Computational Statistics & Data Analysis. Elsevier BV. 52 (10): 4790–4800. doi:10.1016/j.csda.2008.03.024. ISSN 0167-9473.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.