#include <ngram2.h>
NGRAM_INFOのコラボレーション図
変数 | |
int | version |
version number | |
boolean | from_bin |
TRUE if source is bingram, otherwise ARPA | |
WORD_ID | max_word_num |
N-gram vocabulary size | |
NNID | ngram_num [MAX_N] |
Total number of tuples for each N | |
NNID | bigram_bo_num |
Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4) | |
WORD_ID | unk_id |
Unknown word ID | |
int | unk_num |
Number of dictionary words that are not in this N-gram vocabulary | |
LOGPROB | unk_num_log |
Log10 value of unk_num, used for calculating probability of unknown words | |
boolean | isopen |
TRUE if dictionary has unknown words, which does not appear in this N-gram | |
char ** | wname |
List of word string [nid] | |
PATNODE * | root |
Root of index tree to search n-gram word ID from its name | |
LOGPROB * | p |
1-gram log probabilities [nid] | |
LOGPROB * | bo_wt_lr |
Back-off weights for LR 2-gram [nid] | |
LOGPROB * | bo_wt_rl |
Back-off weights for RL 2-gram [nid] | |
NNID * | n2_bgn |
2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context | |
WORD_ID * | n2_num |
Number of 2-gram that have the left context of above | |
WORD_ID * | n2tonid |
Mapping each 2-gram index ID (n2) to its last word ID (nid) | |
LOGPROB * | p_lr |
LR 2-gram log probabilities [n2] | |
LOGPROB * | p_rl |
RL 2-gram log probabilities [n2] | |
NNID_UPPER * | n2bo_upper |
Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4) | |
NNID_LOWER * | n2bo_lower |
Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4) | |
LOGPROB * | bo_wt_rrl |
Back-off weights for RL 3-gram [n2-bo] | |
NNID * | n3_bgn |
3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3) | |
NNID_UPPER * | n3_bgn_upper |
upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
NNID_LOWER * | n3_bgn_lower |
lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
WORD_ID * | n3_num |
Number of 3-gram that have the left context of above | |
WORD_ID * | n3tonid |
Mapping each 3-gram index ID (n3) to its last word ID (nid) | |
LOGPROB * | p_rrl |
RL 3-gram log probabilities [n3] |
bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.
Unknown word ID
This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary.
参照元 bi_prob_lr()・bi_prob_rl()・make_ngram_ref()・make_voca_ref()・print_ngram_info()・set_unknown_id()・tri_prob_rl()・uni_prob().