Blitz Latin’s quality testing

More than 4,000 files were used to develop and test Blitz Latin. The result is that all words that occur more than four times in these 4,000 files will be recognised. The only unknown words (excepting proper names) that will be encountered with Blitz Latin will be those with a frequency of fewer than four occurrences across all the range of files listed above. Most less-frequent words will also be translated.

The program has been tested without failure on multiple scans of nearly 1,000 Latin texts downloaded from a prime respository for Latin files in HTML format of uncertain accuracy – The Latin Library. The files contain all the works of such well known Latin authors as St. Augustine, Caesar, Cicero, Livy, Ovid and Vergil, as well as numerous other files, including fragmentary texts by less well known authors. Thus the file set can be regarded as representative of classical Latin, and contains nearly 5 million Latin words in all.

Other test documents included:
  • 0.5 million words from 180 medieval and modern Latin texts at the <> site.
  • 1.7 million legal words from Justinian’s Digest/Codex and from Theodosius’ Codex
  • 600,000 words from the Vulgate Latin bible, and all the most common words incorporated.
  • 7.8 million words from medieval documents (mostly from ‘Augsburg’, see ‘useful Web addresses’) have been translated and all the most common words incorporated.
  • 1 million Latin words describing the medieval theory of music (‘TMT project’) have been processed, as well as 600,000 Latin words from Bracton’s medieval Law.
  • 7.3 million words from the P.H.I. CD ROM No. 5.3 (many overlapping with the Latin files described above), courtesy of the Packard Humanities Institute, USA. These include all known Latin texts, including fragments but excluding inscriptions, up to about 200 AD, and many subsequent texts.
  • 1.0 million words from neo-Latin texts on Mathematics (courtesy of Ian Bruce, Australia).
  • 11.5 million words from the Vatican’s Acta Apostolicae Sedis files (AD 1909-2002, after bulk removal of non-Latin content).