Slovenian National Corpus

Slovenian National Corpus FidaPLUS is the 621 million words (tokens) corpus of the Slovenian language, gathered from selected texts written in Slovenian of different genres and styles, mainly from books and newspapers.[1]

The FidaPLUS database is an upgrade of the older (FIDA) corpus, which was developed between 1997 and 2000, with added texts that were published up to 2006 and was the result of the applicative research project of the Faculty of Arts, Faculty of Social Sciences, both University of Ljubljana, and Jožef Stefan Institute's Department of Knowledge Technologies.[2]

Corpus is available via a corpus manager Sketch Engine.[3] This version FidaPLUS corpus contains Word sketches, an automatic corpus-derived overview of word's grammatical and collocational behaviour.

Year of publicationNumber of wordsPercent
1979 - 1990262.7080.04%
19911.487.8950.24%
19922.256.6920.36%
19933.208.6870.52%
19947.534.6891.21%
19957.433.8971.2%
199616.913.9162.27%
199731.589.2505.09%
199843.512.0417.01%
199954.711.6308.81%
200057.677.5349.29%
200174.720.53212.03%
200272.802.48411.72%
200382.897.09713.35%
200467.041.16710.79%
200539.086.6956.29%
200644.526.8257.17%
N/A13.486.2612,17%

References

  1. "Archived copy". Archived from the original on 2010-11-14. Retrieved 2012-03-15.CS1 maint: archived copy as title (link) The FidaPLUS number of words by date of publication
  2. "Archived copy". Archived from the original on 2012-03-21. Retrieved 2011-03-22.CS1 maint: archived copy as title (link) The FidaPLUS team list and institutional affiliations
  3. FidaPLUS corpus in Sketch Engine
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.