julius/factoring_sub.c File Reference

Build successor tree and compute LM factoring values. More...

#include <julius.h>

Include dependency graph for factoring_sub.c:

Go to the source code of this file.

Functions

static void add_successor (WCHMM_INFO *wchmm, int node, WORD_ID w)
 Add a word to the successor list on a node in tree lexicon. Words in lists should be ordered by ID.
static boolean match_successor (WCHMM_INFO *wchmm, int node1, int node2)
static void free_successor (WCHMM_INFO *wchmm, int scid)
static void compaction_successor (WCHMM_INFO *wchmm)
static void shrink_successor (WCHMM_INFO *wchmm)
void make_successor_list (WCHMM_INFO *wchmm)
void max_successor_cache_init (WCHMM_INFO *wchmm)
static void max_successor_prob_iw_free ()
void max_successor_cache_free ()
void make_iwcache_index (WCHMM_INFO *wchmm)
 Make a list of word head nodes on which cross-word factoring cache is needed.
void calc_all_unigram_factoring_values (WCHMM_INFO *wchmm)
 Calculate all the 1-gram factoring values on tree lexicon.
LOGPROB max_successor_prob (WCHMM_INFO *wchmm, WORD_ID lastword, int node)
 compute factoring LM score for the given word-internal node.
LOGPROBmax_successor_prob_iw (WCHMM_INFO *wchmm, WORD_ID lastword)
 Compute cross-word facgtoring values for word head nodes and return the list.

Variables

static LOGPROBprobcache
 Word-internal factoring cache indexed by scid, holding last score.
static WORD_IDlastwcache
 Word-internal factoring cache indexed by scid, holding last N-gram entry ID.
static LOGPROB ** iw_sc_cache
 Cross-word factoring cache to hold last-word-dependent factoring score toward word head nodes.
static int iw_cache_num


Detailed Description

Build successor tree and compute LM factoring values.

Author:
Akinobu LEE
Date:
Mon Mar 7 23:20:26 2005
This file contains functions to do language score factoring on the 1st pass. They build a successor lists which holds the successive words in each sub tree on the tree lexicon, and also provide a factored LM probability on each nodes on the tree lexicon.

The "successor list" will be assigned for each lexicon tree node to represent a list of words that exist in the sub-tree and share the node. Actually they will be assigned to the branch node. Below is the example of successor lists on a tree lexicon, in which the lists is assigned to the numbered nodes.

2-o-o - o-o-o - o-o-o word "A" / 1-o-o \ 4-o-o word "B" \ / 3-o-o - 5-o-o - 7-o-o word "C" \ \ \ 8-o-o word "D" 6-o-o word "E"

The contents of the successor lists are the following:

node | successor list (wchmm->state[node].sc) ======================= 1 | A B C D E 2 | A 3 | B C D E 4 | B 5 | C D 6 | E 7 | C 8 | D

When the 1st pass proceeds, if the next going node has a successor list, all the word 2-gram scores in the successor list on the next node will be computed, and the propagating LM value in the token on the current node will be replaced by the maximum value of the scores when copied to the next node. Appearently, if the successor list has only one word, it means that the word can be determined on that point, and the precise 2-gram value will be assigned as is.

When using 1-gram factoring, the computation will be slightly different. Since the factoring value (maximum value of 1-gram scores on each successor list) is independent of the word context, they can be computed statically before the search. Thus, for all the successor lists that have more than two words, the maximum 1-gram value is computed and stored to "fscore" member in tree lexicon, and the successor lists will be freed. The successor lists with only one word should still remain in the tree lexicon, to compute the precise 2-gram scores for the words.

When using DFA grammar, Julian builds separated lexicon trees for every word categories, to statically express the catergory-pair constraint. Thus these factoring scheme is not used by default. However you can still force Julian to use the grammar-based deterministic factoring scheme by undefining CATEGORY_TREE. If CATEGORY_TREE is undefined, the word connection constraint will be performed based on the successor list at the middle of tree lexicon. This enables single tree search on Julian. This function is left only for technical reference.

Revision
1.3

Definition in file factoring_sub.c.


Function Documentation

static void add_successor ( WCHMM_INFO wchmm,
int  node,
WORD_ID  w 
) [static]

Add a word to the successor list on a node in tree lexicon. Words in lists should be ordered by ID.

Parameters:
wchmm [i/o] tree lexicon
node [in] node id
w [in] word id

Definition at line 184 of file factoring_sub.c.

static boolean match_successor ( WCHMM_INFO wchmm,
int  node1,
int  node2 
) [static]

Check if successor lists on two nodes are the same.

Parameters:
wchmm [in] tree lexicon
node1 [in] 1st node id
node2 [in] 2nd node id
Returns:
TRUE if they have the same successor list, or FALSE if they differ.

Definition at line 236 of file factoring_sub.c.

static void free_successor ( WCHMM_INFO wchmm,
int  scid 
) [static]

Free successor list at the node

Parameters:
wchmm [i/o] tree lexicon
scid [in] node id

Definition at line 276 of file factoring_sub.c.

static void compaction_successor ( WCHMM_INFO wchmm  )  [static]

Garbage collection of the successor list, by deleting successor lists to which the link was deleted on the lexicon tree.

Parameters:
wchmm [i/o] tree lexiton

Definition at line 305 of file factoring_sub.c.

static void shrink_successor ( WCHMM_INFO wchmm  )  [static]

Shrink the memory area that has been allocated for building successor list.

Parameters:
wchmm [i/o] tree lexicon

Definition at line 341 of file factoring_sub.c.

Referenced by max_successor_cache_init().

void make_successor_list ( WCHMM_INFO wchmm  ) 

Main function to build whole successor list to lexicon tree.

Parameters:
wchmm [i/o] tree lexicon

Definition at line 360 of file factoring_sub.c.

void max_successor_cache_init ( WCHMM_INFO wchmm  ) 

Initialize factoring cache for a tree lexicon, allocating memory for cache. This should be called only once on start up.

Parameters:
wchmm [i/o] tree lexicon

Definition at line 602 of file factoring_sub.c.

Referenced by final_fusion().

static void max_successor_prob_iw_free (  )  [static]

Free cross-word factoring cache.

Definition at line 645 of file factoring_sub.c.

Referenced by max_successor_cache_free(), and max_successor_prob_iw().

void max_successor_cache_free (  ) 

Free all memory for factoring cache.

Definition at line 665 of file factoring_sub.c.

void make_iwcache_index ( WCHMM_INFO wchmm  ) 

Make a list of word head nodes on which cross-word factoring cache is needed.

On 1-gram factoring, the branch nodes on tree lexicon has a fixed factoring value (maximum 1-gram score of all sub-tree words). Thus, when computing cross-word factoring at word head nodes on inter-word transition, such 1-gram factoring nodes on word head, shared by several words, need not be cached in inter-word factoring cache.

This function make a list of word-head nodes which requires inter-word factoring caching (i.e. isolated word head nodes, does not shared by other words) from the existing list of word head nodes, and set it to wchmm->start2isolate and wchmm->isolatenum.

Parameters:
wchmm [i/o] tree lexicon

Definition at line 715 of file factoring_sub.c.

void calc_all_unigram_factoring_values ( WCHMM_INFO wchmm  ) 

Calculate all the 1-gram factoring values on tree lexicon.

On 1-gram factoring, the shared nodes on branch has fixed factoring score from 1-gram values, independent of the word context on recognition. So the values are fixed for all recognition and can be calculated before search. This function stores all the neede 1-gram factoring value by traversing tree lexicon with successor lists and compute maximum 1-gram for each successor lists that has more than two words (=shared). Since a successor list is no more neede after the 1-gram value is computed, they will be freed.

Actually, computed factoring scores will be stored in wchmm->fscore sequencially, and the index value, starting from 1, to the fscore list is stored in scid of each nodes as a negative value. The free will be performed in compaction_successor() by checking if a successor's corresponding scid on tree lexicon has negative value.

Parameters:
wchmm [i/o] tree lexicon

Definition at line 773 of file factoring_sub.c.

LOGPROB max_successor_prob ( WCHMM_INFO wchmm,
WORD_ID  lastword,
int  node 
)

compute factoring LM score for the given word-internal node.

If it is a shared branch node and 1-gram factoring is used, the constant factoring value which has already been assigned before search will be returned immediately. Else, the maximum 2-gram probability of corresponding successor words are computed.

The word-internal factoring cache is consulted within this function. If the given last word is the same as the last call on that node, the last computed value will be returned, else the maximum value will be computed update the cache with the last word and value.

Parameters:
wchmm [in] tree lexicon
lastword [in] word ID of last context word
node [in] node ID
Returns:
the LM factoring score.

Definition at line 900 of file factoring_sub.c.

Referenced by get_back_trellis_proceed(), and init_nodescore().

LOGPROB* max_successor_prob_iw ( WCHMM_INFO wchmm,
WORD_ID  lastword 
)

Compute cross-word facgtoring values for word head nodes and return the list.

Given a last word, this function compute the factoring LM scores for all the word head node to which the context-dependent (not 1-gram) factoring values should be computed. The resulting list of factoring values are cached within this function per the last word.

Parameters:
wchmm [in] tree lexicon
lastword [in] last word
Returns:
the list of factoring LM scores for all the needed word-head nodes.

Definition at line 992 of file factoring_sub.c.


Variable Documentation

LOGPROB** iw_sc_cache [static]

Cross-word factoring cache to hold last-word-dependent factoring score toward word head nodes.

Cached values will be stored as [last_nword][n], where n is the number of word-head node on which the last_nword-dependent N-gram factoring value should be computed on cross-word transition. In 1-gram factoring, n equals to wchmm->isolatenum, the number of isolated (not shared) word-head nodes. In 2-gram factoring, n simply equals to wchmm->startnum, the number of all word-head nodes.

The cache area will be allocated per the previous word when they appeared while search. It will retain across the speech stream, so the cache area will grow to an extent as recognition was done for many files.

Definition at line 575 of file factoring_sub.c.

Referenced by max_successor_prob_iw().

int iw_cache_num [static]

Maximum size of cross-word factoring cache iw_sc_cache per last word. The value is set in max_successor_cache_init().

Definition at line 580 of file factoring_sub.c.

Referenced by max_successor_prob_iw().


Generated on Tue Dec 26 16:16:41 2006 for Julius by  doxygen 1.5.0