factoring_sub.c

言語スコアのfactoring計算 [詳細]

#include <julius.h>

factoring_sub.cのインクルード依存関係図

ソースコードを見る。

説明

言語スコアのfactoring計算

作者:: Akinobu LEE

日付:: Mon Mar 7 23:20:26 2005

このファイルには，第１パスにおいて言語スコアの factoring を行うための関数が含まれています．木構造化辞書上でのサブツリー内の単語リスト (successor list) の構築，および認識中の言語スコア計算ルーチンが含まれます．

successor list は，木構造化辞書の各ノードに割り付けられる，そのノードを共有する単語のリストです．木構造化辞書において，枝部分の次のノードがこのリストを保持します．実際にはリストが変化する場所，すなわち木構造化辞書の枝の分岐点に割り付けられます．例えば，以下のような木構造化辞書の場合，数字の書いてあるノードに successor list が割り付けられます．

2-o-o - o-o-o - o-o-o word "A" / 1-o-o \ 4-o-o word "B" \ / 3-o-o - 5-o-o - 7-o-o word "C" \ \ \ 8-o-o word "D" 6-o-o word "E"

各 successor list はそのサブツリーに含まれる単語のリストです．この例では以下のようになります．

node | successor list (wchmm->state[node].sc) ======================= 1 | A B C D E 2 | A 3 | B C D E 4 | B 5 | C D 6 | E 7 | C 8 | D

ある successor list に含まれる単語が１つになったとき，その時点で単語が確定する．上記の場合，単語 "A" はノード 2 の位置ですでにその後続単語として "A" 以外無いので，そこで確定する．すなわち，単語 A の正確な言語スコアは，単語終端を待たずノード 2 で決まる．

第１パスにおける factoring の計算は，実際には beam.c で行なわれる． 2-gram factoringの場合，次ノードに successor list が存在すれば, その successor list の単語の 2-gram の最大値を求め, 伝搬してきている factoring 値を更新する．successor list に単語が1つのノードでは，正しい2-gramが自動的に割り当てられる． 1-gram factoringの場合，次ノードに successor list が存在する場合，その successor list の単語の 1-gram の最大値を求め，伝搬してきている factoring 値を更新する．successor list に単語が１つのノードで，はじめて 2-gram を計算する．

実際では 1-gram factoring では各 successor list における factoring 値は単語履歴に非依存なので，successor list 構築時に全てあらかじめ計算しておく．すなわち，エンジン起動時に木構造化辞書を構築後，successor list を構築したら，単語を2個以上含む successor list についてはその 1-gram の最大値を計算して，それをそのノードの fscore メンバに格納しておき，その successor list は free してしまえばよい．単語が１つのみの successor list についてはその単語IDを残しておき，探索時にパスがそこに到達したら正確な2-gramを計算すれば良い．

DFA文法使用時は，デフォルトでは言語制約(カテゴリ対制約)をカテゴリ単位で木を構築することで静的に表現する．このため，これらの factoring 機構は用いられない．ただし， CATEGORY_TREE が undefined であれば，決定的 factoring を用いた言語制約適用を行うことも可能である．すなわち，次ノードに successor list が存在すれば, その successor list 内の各単語と直前単語の単語対制約を調べ, そのうち一つでも接続可能な単語があれば，その遷移を許し，一つもなければ遷移させない．この機能は技術参考のために残されているのみである．

This file contains functions to do language score factoring on the 1st pass. They build a successor lists which holds the successive words in each sub tree on the tree lexicon, and also provide a factored LM probability on each nodes on the tree lexicon.

The "successor list" will be assigned for each lexicon tree node to represent a list of words that exist in the sub-tree and share the node. Actually they will be assigned to the branch node. Below is the example of successor lists on a tree lexicon, in which the lists is assigned to the numbered nodes.

2-o-o - o-o-o - o-o-o word "A" / 1-o-o \ 4-o-o word "B" \ / 3-o-o - 5-o-o - 7-o-o word "C" \ \ \ 8-o-o word "D" 6-o-o word "E"

The contents of the successor lists are the following:

node | successor list (wchmm->state[node].sc) ======================= 1 | A B C D E 2 | A 3 | B C D E 4 | B 5 | C D 6 | E 7 | C 8 | D

When the 1st pass proceeds, if the next going node has a successor list, all the word 2-gram scores in the successor list on the next node will be computed, and the propagating LM value in the token on the current node will be replaced by the maximum value of the scores when copied to the next node. Appearently, if the successor list has only one word, it means that the word can be determined on that point, and the precise 2-gram value will be assigned as is.

When using 1-gram factoring, the computation will be slightly different. Since the factoring value (maximum value of 1-gram scores on each successor list) is independent of the word context, they can be computed statically before the search. Thus, for all the successor lists that have more than two words, the maximum 1-gram value is computed and stored to "fscore" member in tree lexicon, and the successor lists will be freed. The successor lists with only one word should still remain in the tree lexicon, to compute the precise 2-gram scores for the words.

When using DFA grammar, Julian builds separated lexicon trees for every word categories, to statically express the catergory-pair constraint. Thus these factoring scheme is not used by default. However you can still force Julian to use the grammar-based deterministic factoring scheme by undefining CATEGORY_TREE. If CATEGORY_TREE is undefined, the word connection constraint will be performed based on the successor list at the middle of tree lexicon. This enables single tree search on Julian. This function is left only for technical reference.

Revision: 1.1.1.1

factoring_sub.c で定義されています。

Juliusに対してTue Mar 28 16:20:58 2006に生成されました。

1.4.2