See the "Demo" section below to get a quick intro into key concepts>
SCTK has at its core, sclite (Score-Lite), which is a flexible Dynamic Programming alignment engine used to "align" errorful hypothesized texts, such as output from an ASR system, to the correct reference texts. After alignment, sclite generates a veriety of summary as well as detailed scoring reports.
This version of sclite comes bundled with the CMU-Cambridge Statistical Language Modeling Toolkit v2. The toolkit is used to compute word-weights based on an N-gram language model. The directory 'src/slm_v2' contains the complete distribution and is automatically compiled by the installation scripts.
While sclite aligns and scores a single system, sc_stats will compare system performance between more than one system, so long as the systems under test have been ran on identical test data and using an identical test paradigm. Inter-System comparisons are made by running tests paired-comparison statistical significance tests.
Rover - Recognition Output Voting Error Reduction, is a tool which combines an arbitrary number for ASR system outputs into a composite Word Transition network which is then searched an scored to retrieve the best scoring word sequence.
The program is documented in the paper A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) presented at the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding. The paper is also available in postscript.
So. what is this package for? It's for computing the accuracy of ASR engines that convert recordings of speech into text. We'll use data in ../src/sclite/testdata for the demo. The process to compute the accuracy is:
SYSTEM SUMMARY PERCENTAGES by SPEAKER
,------------------------------------------------------------------.
| demo.hyp.txt |
|------------------------------------------------------------------|
| SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err |
|----------+-------------+-----------------------------------------|
| speaker1 | 2 46 | 84.8 15.2 0.0 2.2 17.4 50.0 |
|==================================================================|
| Sum/Avg | 2 46 | 84.8 15.2 0.0 2.2 17.4 50.0 |
|==================================================================|
| Mean | 2.0 46.0 | 84.8 15.2 0.0 2.2 17.4 50.0 |
| S.D. | 0.0 0.0 | 0.0 0.0 0.0 0.0 0.0 0.0 |
| Median | 2.0 46.0 | 84.8 15.2 0.0 2.2 17.4 50.0 |
`------------------------------------------------------------------'
DUMP OF SYSTEM ALIGNMENT STRUCTURE
System name: demo.hyp.txt
Speakers:
0: speaker1
Speaker sentences 0: speaker1 #utts: 2
id: (speaker1-utterance1)
Scores: (#C #S #D #I) 25 0 0 0
REF: as competition in the mutual fund business grows increasingly intense more players in the industry appear willing to sacrifice integrity in the name of performance
HYP: as competition in the mutual fund business grows increasingly intense more players in the industry appear willing to sacrifice integrity in the name of performance
Eval:
id: (speaker1-utterance2)
Scores: (#C #S #D #I) 14 7 0 1
REF: FOR A TWO TRILLION DOLLAR business built on public confidence this trend is **** DISHEARTENING AT best and downright dangerous at worst
HYP: FREED TO TRYING TO LURE business built on public confidence this trend is THIS TIGHTENING AND best and downright dangerous at worst
Eval: S S S S S I S S