julius/factoring_sub.c File Reference

Build successor tree and compute LM factoring values. More...

#include <julius.h>

Include dependency graph for factoring_sub.c:

Go to the source code of this file.

Detailed Description

Build successor tree and compute LM factoring values.

Author:: Akinobu LEE

Date:: Mon Mar 7 23:20:26 2005

This file contains functions to do language score factoring on the 1st pass. They build a successor lists which holds the successive words in each sub tree on the tree lexicon, and also provide a factored LM probability on each nodes on the tree lexicon.

The "successor list" will be assigned for each lexicon tree node to represent a list of words that exist in the sub-tree and share the node. Actually they will be assigned to the branch node. Below is the example of successor lists on a tree lexicon, in which the lists is assigned to the numbered nodes.

2-o-o - o-o-o - o-o-o word "A" / 1-o-o \ 4-o-o word "B" \ / 3-o-o - 5-o-o - 7-o-o word "C" \ \ \ 8-o-o word "D" 6-o-o word "E"

The contents of the successor lists are the following:

node | successor list (wchmm->state[node].sc) ======================= 1 | A B C D E 2 | A 3 | B C D E 4 | B 5 | C D 6 | E 7 | C 8 | D

When the 1st pass proceeds, if the next going node has a successor list, all the word 2-gram scores in the successor list on the next node will be computed, and the propagating LM value in the token on the current node will be replaced by the maximum value of the scores when copied to the next node. Appearently, if the successor list has only one word, it means that the word can be determined on that point, and the precise 2-gram value will be assigned as is.

When using 1-gram factoring, the computation will be slightly different. Since the factoring value (maximum value of 1-gram scores on each successor list) is independent of the word context, they can be computed statically before the search. Thus, for all the successor lists that have more than two words, the maximum 1-gram value is computed and stored to "fscore" member in tree lexicon, and the successor lists will be freed. The successor lists with only one word should still remain in the tree lexicon, to compute the precise 2-gram scores for the words.

When using DFA grammar, Julian builds separated lexicon trees for every word categories, to statically express the catergory-pair constraint. Thus these factoring scheme is not used by default. However you can still force Julian to use the grammar-based deterministic factoring scheme by undefining CATEGORY_TREE. If CATEGORY_TREE is undefined, the word connection constraint will be performed based on the successor list at the middle of tree lexicon. This enables single tree search on Julian. This function is left only for technical reference.

Revision: 1.3

Definition in file factoring_sub.c.

Generated on Tue Dec 26 12:53:30 2006 for Julian by

1.5.0