#include <ngram2.h>
Collaboration diagram for NGRAM_INFO:

Data Fields | |
| int | version |
| version number | |
| boolean | from_bin |
| TRUE if source is bingram, otherwise ARPA. | |
| WORD_ID | max_word_num |
| N-gram vocabulary size. | |
| NNID | ngram_num [MAX_N] |
| Total number of tuples for each N. | |
| NNID | bigram_bo_num |
| Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4). | |
| WORD_ID | unk_id |
| Unknown word ID. | |
| int | unk_num |
| Number of dictionary words that are not in this N-gram vocabulary. | |
| LOGPROB | unk_num_log |
| Log10 value of unk_num, used for calculating probability of unknown words. | |
| boolean | isopen |
| TRUE if dictionary has unknown words, which does not appear in this N-gram. | |
| char ** | wname |
| List of word string [nid]. | |
| PATNODE * | root |
| Root of index tree to search n-gram word ID from its name. | |
| LOGPROB * | p |
| 1-gram log probabilities [nid] | |
| LOGPROB * | bo_wt_lr |
| Back-off weights for LR 2-gram [nid]. | |
| LOGPROB * | bo_wt_rl |
| Back-off weights for RL 2-gram [nid]. | |
| NNID * | n2_bgn |
| 2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context | |
| WORD_ID * | n2_num |
| Number of 2-gram that have the left context of above. | |
| WORD_ID * | n2tonid |
| Mapping each 2-gram index ID (n2) to its last word ID (nid). | |
| LOGPROB * | p_lr |
| LR 2-gram log probabilities [n2]. | |
| LOGPROB * | p_rl |
| RL 2-gram log probabilities [n2]. | |
| NNID_UPPER * | n2bo_upper |
| Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4). | |
| NNID_LOWER * | n2bo_lower |
| Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4). | |
| LOGPROB * | bo_wt_rrl |
| Back-off weights for RL 3-gram [n2-bo]. | |
| NNID * | n3_bgn |
| 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3) | |
| NNID_UPPER * | n3_bgn_upper |
| upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
| NNID_LOWER * | n3_bgn_lower |
| lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
| WORD_ID * | n3_num |
| Number of 3-gram that have the left context of above. | |
| WORD_ID * | n3tonid |
| Mapping each 3-gram index ID (n3) to its last word ID (nid). | |
| LOGPROB * | p_rrl |
| RL 3-gram log probabilities [n3]. | |
bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.
Definition at line 113 of file ngram2.h.
|
|
Unknown word ID. This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary.
Definition at line 126 of file ngram2.h. Referenced by bi_prob_lr(), bi_prob_rl(), make_ngram_ref(), make_voca_ref(), print_ngram_info(), set_unknown_id(), and tri_prob_rl(). |
1.4.2