#include <ngram2.h>
Collaboration diagram for NGRAM_INFO:

Data Fields | |
| int | version | 
| version number  | |
| boolean | from_bin | 
| TRUE if source is bingram, otherwise ARPA.  | |
| WORD_ID | max_word_num | 
| N-gram vocabulary size.  | |
| NNID | ngram_num [MAX_N] | 
| Total number of tuples for each N.  | |
| NNID | bigram_bo_num | 
| Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4).  | |
| WORD_ID | unk_id | 
| Unknown word ID.   | |
| int | unk_num | 
| Number of dictionary words that are not in this N-gram vocabulary.  | |
| LOGPROB | unk_num_log | 
| Log10 value of unk_num, used for calculating probability of unknown words.  | |
| boolean | isopen | 
| TRUE if dictionary has unknown words, which does not appear in this N-gram.  | |
| char ** | wname | 
| List of word string [nid].  | |
| PATNODE * | root | 
| Root of index tree to search n-gram word ID from its name.  | |
| LOGPROB * | p | 
| 1-gram log probabilities [nid]  | |
| LOGPROB * | bo_wt_lr | 
| Back-off weights for LR 2-gram [nid].  | |
| LOGPROB * | bo_wt_rl | 
| Back-off weights for RL 2-gram [nid].  | |
| NNID * | n2_bgn | 
| 2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context  | |
| WORD_ID * | n2_num | 
| Number of 2-gram that have the left context of above.  | |
| WORD_ID * | n2tonid | 
| Mapping each 2-gram index ID (n2) to its last word ID (nid).  | |
| LOGPROB * | p_lr | 
| LR 2-gram log probabilities [n2].  | |
| LOGPROB * | p_rl | 
| RL 2-gram log probabilities [n2].  | |
| NNID_UPPER * | n2bo_upper | 
| Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4).  | |
| NNID_LOWER * | n2bo_lower | 
| Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4).  | |
| LOGPROB * | bo_wt_rrl | 
| Back-off weights for RL 3-gram [n2-bo].  | |
| NNID * | n3_bgn | 
| 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3)  | |
| NNID_UPPER * | n3_bgn_upper | 
| upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4)  | |
| NNID_LOWER * | n3_bgn_lower | 
| lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4)  | |
| WORD_ID * | n3_num | 
| Number of 3-gram that have the left context of above.  | |
| WORD_ID * | n3tonid | 
| Mapping each 3-gram index ID (n3) to its last word ID (nid).  | |
| LOGPROB * | p_rrl | 
| RL 3-gram log probabilities [n3].  | |
bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.
Definition at line 113 of file ngram2.h.
Unknown word ID.
This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary.
Definition at line 126 of file ngram2.h.
Referenced by bi_prob_lr(), bi_prob_rl(), make_ngram_ref(), make_voca_ref(), print_ngram_info(), set_unknown_id(), tri_prob_rl(), and uni_prob().
 1.5.0