構造体 NGRAM_INFO

Main N-gram structure [詳細]

#include <ngram2.h>

NGRAM_INFOのコラボレーション図

Collaboration graph

[凡例]


変数
int	version
	version number
boolean	from_bin
	TRUE if source is bingram, otherwise ARPA
WORD_ID	max_word_num
	N-gram vocabulary size
NNID	ngram_num [MAX_N]
	Total number of tuples for each N
NNID	bigram_bo_num
	Total number of bigram tuples that has back-off weight (i.e. context of upper 3-gram) (v4)
WORD_ID	unk_id
	Unknown word ID
int	unk_num
	Number of dictionary words that are not in this N-gram vocabulary
LOGPROB	unk_num_log
	Log10 value of unk_num, used for calculating probability of unknown words
boolean	isopen
	TRUE if dictionary has unknown words, which does not appear in this N-gram
char **	wname
	List of word string [nid]
PATNODE *	root
	Root of index tree to search n-gram word ID from its name
LOGPROB *	p
	1-gram log probabilities [nid]
LOGPROB *	bo_wt_lr
	Back-off weights for LR 2-gram [nid]
LOGPROB *	bo_wt_rl
	Back-off weights for RL 2-gram [nid]
NNID *	n2_bgn
	2-gram IDs (n2) representing beginning point of 2-gram entries that have the left context
WORD_ID *	n2_num
	Number of 2-gram that have the left context of above
WORD_ID *	n2tonid
	Mapping each 2-gram index ID (n2) to its last word ID (nid)
LOGPROB *	p_lr
	LR 2-gram log probabilities [n2]
LOGPROB *	p_rl
	RL 2-gram log probabilities [n2]
NNID_UPPER *	n2bo_upper
	Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4)
NNID_LOWER *	n2bo_lower
	Mapping each 2-gram index ID (n2) to bigram back-off index (n2-bo) (v4)
LOGPROB *	bo_wt_rrl
	Back-off weights for RL 3-gram [n2-bo]
NNID *	n3_bgn
	3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v3)
NNID_UPPER *	n3_bgn_upper
	upper 8-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4)
NNID_LOWER *	n3_bgn_lower
	lower 16-bit 3-gram IDs (n3) representing beginning point of 3-gram entries that have the left context (v4)
WORD_ID *	n3_num
	Number of 3-gram that have the left context of above
WORD_ID *	n3tonid
	Mapping each 3-gram index ID (n3) to its last word ID (nid)
LOGPROB *	p_rrl
	RL 3-gram log probabilities [n3]

説明

Main N-gram structure

bigrams and trigrams are stored in the form of sequential lists. They are grouped by the same context, and referred from the context ((N-1)-gram) data by the beginning ID and its number.

ngram2.h の 113 行で定義されています。

構造体

WORD_ID NGRAM_INFO::unk_id

Unknown word ID

This value is always fixed to 0, since the CMU-Cambridge SLM Toolkit always define the unknown word "<UNK>" at the first word in vocabulary.

参照:: set_unknown_id

ngram2.h の 126 行で定義されています。

参照元 bi_prob_lr()・bi_prob_rl()・make_ngram_ref()・make_voca_ref()・print_ngram_info()・set_unknown_id()・tri_prob_rl()・uni_prob().

この構造体の説明は次のファイルから生成されました:

libsent/include/sent/ngram2.h

Juliusに対してTue Dec 26 16:21:34 2006に生成されました。

doxygen

1.5.0