#include <ngram2.h>
NGRAM_INFOのコラボレーション図
変数 | |
int | version |
version number | |
boolean | from_bin |
TRUE if source is bingram, otherwise ARPA. | |
WORD_ID | max_word_num |
N-gram vocabulary size. | |
NNID | ngram_num [MAX_N] |
Total number of tuples for each N. | |
NNID | bigram_bo_num |
Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4). | |
WORD_ID | unk_id |
Unknown word ID. | |
int | unk_num |
Number of dictionary words that are not in this N-gram vocabulary. | |
LOGPROB | unk_num_log |
Log10 value of unk_num, used for calculating probability of unknown words. | |
boolean | isopen |
TRUE if dictionary has unknown words, which does not appear in this N-gram. | |
char ** | wname |
List of word string [nid]. | |
PATNODE * | root |
Root of index tree to search n-gram word ID from its name. | |
LOGPROB * | p |
1-gram log probabilities [nid] | |
LOGPROB * | bo_wt_lr |
Back-off weights for LR 2-gram [nid]. | |
LOGPROB * | bo_wt_rl |
Back-off weights for RL 2-gram [nid]. | |
NNID * | n2_bgn |
2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context | |
WORD_ID * | n2_num |
Number of 2-gram that have the left context of above. | |
WORD_ID * | n2tonid |
Mapping each 2-gram index ID (n2) to its last word ID (nid). | |
LOGPROB * | p_lr |
LR 2-gram log probabilities [n2]. | |
LOGPROB * | p_rl |
RL 2-gram log probabilities [n2]. | |
NNID_UPPER * | n2bo_upper |
Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4). | |
NNID_LOWER * | n2bo_lower |
Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4). | |
LOGPROB * | bo_wt_rrl |
Back-off weights for RL 3-gram [n2-bo]. | |
NNID * | n3_bgn |
3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3) | |
NNID_UPPER * | n3_bgn_upper |
upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
NNID_LOWER * | n3_bgn_lower |
lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4) | |
WORD_ID * | n3_num |
Number of 3-gram that have the left context of above. | |
WORD_ID * | n3tonid |
Mapping each 3-gram index ID (n3) to its last word ID (nid). | |
LOGPROB * | p_rrl |
RL 3-gram log probabilities [n3]. |
bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.
|
Unknown word ID. This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary. 参照元 bi_prob_lr(), bi_prob_rl(), make_ngram_ref(), make_voca_ref(), print_ngram_info(), set_unknown_id(), と tri_prob_rl(). |