#include <ngram2.h>
NGRAM_INFOのコラボレーション図

変数 | |
| int | version |
| version number | |
| boolean | from_bin |
| TRUE if source is bingram, otherwise ARPA | |
| WORD_ID | max_word_num |
| N-gram vocabulary size | |
| NNID | ngram_num [MAX_N] |
| Total number of tuples for each N | |
| NNID | bigram_bo_num |
| Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4) | |
| WORD_ID | unk_id |
| Unknown word ID | |
| int | unk_num |
| Number of dictionary words that are not in this N-gram vocabulary | |
| LOGPROB | unk_num_log |
| Log10 value of unk_num, used for calculating probability of unknown words | |
| boolean | isopen |
| TRUE if dictionary has unknown words, which does not appear in this N-gram | |
| char ** | wname |
| List of word string [nid] | |
| PATNODE * | root |
| Root of index tree to search n-gram word ID from its name | |
| LOGPROB * | p |
| 1-gram log probabilities [nid] | |
| LOGPROB * | bo_wt_lr |
| Back-off weights for LR 2-gram [nid] | |
| LOGPROB * | bo_wt_rl |
| Back-off weights for RL 2-gram [nid] | |
| NNID * | n2_bgn |
| 2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context | |
| WORD_ID * | n2_num |
| Number of 2-gram that have the left context of above | |
| WORD_ID * | n2tonid |
| Mapping each 2-gram index ID (n2) to its last word ID (nid) | |
| LOGPROB * | p_lr |
| LR 2-gram log probabilities [n2] | |
| LOGPROB * | p_rl |
| RL 2-gram log probabilities [n2] | |
| NNID_UPPER * | n2bo_upper |
| Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4) | |
| NNID_LOWER * | n2bo_lower |
| Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4) | |
| LOGPROB * | bo_wt_rrl |
| Back-off weights for RL 3-gram [n2-bo] | |
| NNID * | n3_bgn |
| 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3) | |
| NNID_UPPER * | n3_bgn_upper |
| upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
| NNID_LOWER * | n3_bgn_lower |
| lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
| WORD_ID * | n3_num |
| Number of 3-gram that have the left context of above | |
| WORD_ID * | n3tonid |
| Mapping each 3-gram index ID (n3) to its last word ID (nid) | |
| LOGPROB * | p_rrl |
| RL 3-gram log probabilities [n3] | |
bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.
Unknown word ID
This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary.
参照元 bi_prob_lr()・bi_prob_rl()・make_ngram_ref()・make_voca_ref()・print_ngram_info()・set_unknown_id()・tri_prob_rl()・uni_prob().
1.5.0