libjulius/src/factoring_sub.c File Reference

LM factoring on 1st pass. More...

#include <julius/julius.h>

Go to the source code of this file.

Functions

static void add_successor (WCHMM_INFO *wchmm, int node, WORD_ID w)

Add a word to the successor list on a node in tree lexicon.

static boolean match_successor (WCHMM_INFO *wchmm, int node1, int node2)

Check if successor lists on two nodes are the same.

static void free_successor (WCHMM_INFO *wchmm, int scid)

Free successor list at the node.

static void compaction_successor (WCHMM_INFO *wchmm)

Garbage collection of the successor list, by deleting successor lists to which the link was deleted on the lexicon tree.

static void shrink_successor (WCHMM_INFO *wchmm)

Shrink the memory area that has been allocated for building successor list.

void make_successor_list (WCHMM_INFO *wchmm)

Main function to build whole successor list to lexicon tree.

void adjust_sc_index (WCHMM_INFO *wchmm)

Adjust factoring data in tree lexicon for multipath transition handling.

void max_successor_cache_init (WCHMM_INFO *wchmm)

Initialize factoring cache for a tree lexicon, allocating memory for cache.

static void max_successor_prob_iw_free (WCHMM_INFO *wchmm)

Free cross-word factoring cache.

void max_successor_cache_free (WCHMM_INFO *wchmm)

Free all memory for factoring cache.

static LOGPROB calc_successor_prob (WCHMM_INFO *wchmm, WORD_ID lastword, int node)

Compute 2-gram factoring value for the node and return the probability.

LOGPROB max_successor_prob (WCHMM_INFO *wchmm, WORD_ID lastword, int node)

compute factoring LM score for the given word-internal node.

LOGPROB * max_successor_prob_iw (WCHMM_INFO *wchmm, WORD_ID lastword)

Compute cross-word facgtoring values for word head nodes and return the list.

boolean can_succeed (WCHMM_INFO *wchmm, WORD_ID lastword, int node)

Deterministic factoring for grammar-based recognition (Julian).

Detailed Description

LM factoring on 1st pass.

This file contains functions to do language score factoring on the 1st pass. They build a successor lists which holds the successive words in each sub tree on the tree lexicon, and also provide a factored LM probability on each nodes on the tree lexicon.

The "successor list" will be assigned for each lexicon tree node to represent a list of words that exist in the sub-tree and share the node. Actually they will be assigned to the branch node. Below is the example of successor lists on a tree lexicon, in which the lists is assigned to the numbered nodes.

         2-o-o - o-o-o - o-o-o          word "A" 
        /
   1-o-o
        \       4-o-o                   word "B"
         \     /   
          3-o-o - 5-o-o - 7-o-o         word "C"
           \            \ 
            \            8-o-o          word "D"
             6-o-o                      word "E"

The contents of the successor lists are the following:

  node  | successor list (wchmm->state[node].sc)
  =======================
    1   | A B C D E
    2   | A
    3   |   B C D E
    4   |   B
    5   |     C D
    6   |         E
    7   |     C
    8   |       D

When the 1st pass proceeds, if the next going node has a successor list, all the word 2-gram scores in the successor list on the next node will be computed, and the propagating LM value in the token on the current node will be replaced by the maximum value of the scores when copied to the next node. Appearently, if the successor list has only one word, it means that the word can be determined on that point, and the precise 2-gram value will be assigned as is.

When using 1-gram factoring, the computation will be slightly different. Since the factoring value (maximum value of 1-gram scores on each successor list) is independent of the word context, they can be computed statically before the search. Thus, for all the successor lists that have more than two words, the maximum 1-gram value is computed and stored to "fscore" member in tree lexicon, and the successor lists will be freed. The successor lists with only one word should still remain in the tree lexicon, to compute the precise 2-gram scores for the words.

When using DFA grammar, Julian builds separated lexicon trees for every word categories, to statically express the catergory-pair constraint. Thus these factoring scheme is not used by default. However you can still force Julian to use the grammar-based deterministic factoring scheme by undefining CATEGORY_TREE. If CATEGORY_TREE is undefined, the word connection constraint will be performed based on the successor list at the middle of tree lexicon. This enables single tree search on Julian. This function is left only for technical reference.

Author:: Akinobu LEE

Date:: Mon Mar 7 23:20:26 2005

Revision: 1.1.1.1

Definition in file factoring_sub.c.

Function Documentation

static void add_successor	(	WCHMM_INFO *	wchmm,
		int	node,
		WORD_ID	w
	)			`[static]`

Add a word to the successor list on a node in tree lexicon.

Words in lists should be ordered by ID.

Parameters:

	wchmm	[i/o] tree lexicon
	node	[in] node id
	w	[in] word id

Definition at line 191 of file factoring_sub.c.

Referenced by make_successor_list().

static boolean match_successor	(	WCHMM_INFO *	wchmm,
		int	node1,
		int	node2
	)			`[static]`

Check if successor lists on two nodes are the same.

Parameters:

	wchmm	[in] tree lexicon
	node1	[in] 1st node id
	node2	[in] 2nd node id

Returns:: TRUE if they have the same successor list, or FALSE if they differ.

Definition at line 240 of file factoring_sub.c.

Referenced by make_successor_list().

static void free_successor	(	WCHMM_INFO *	wchmm,
		int	scid
	)			`[static]`

Free successor list at the node.

Parameters:

	wchmm	[i/o] tree lexicon
	scid	[in] node id

Definition at line 280 of file factoring_sub.c.

Referenced by make_successor_list().

static void compaction_successor ( WCHMM_INFO * wchmm ) [static]

Garbage collection of the successor list, by deleting successor lists to which the link was deleted on the lexicon tree.

Parameters:

wchmm

[i/o] tree lexiton

Definition at line 309 of file factoring_sub.c.

Referenced by make_successor_list().

static void shrink_successor ( WCHMM_INFO * wchmm ) [static]

Shrink the memory area that has been allocated for building successor list.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 347 of file factoring_sub.c.

Referenced by max_successor_cache_init().

void make_successor_list ( WCHMM_INFO * wchmm )

Main function to build whole successor list to lexicon tree.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 370 of file factoring_sub.c.

void adjust_sc_index ( WCHMM_INFO * wchmm )

Adjust factoring data in tree lexicon for multipath transition handling.

Parameters:

wchmm

[in] tree lexicon

Definition at line 480 of file factoring_sub.c.

void max_successor_cache_init ( WCHMM_INFO * wchmm )

Initialize factoring cache for a tree lexicon, allocating memory for cache.

This should be called only once on start up.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 574 of file factoring_sub.c.

Referenced by j_launch_recognition_instance().

Here is the caller graph for this function:

static void max_successor_prob_iw_free ( WCHMM_INFO * wchmm ) [static]

Free cross-word factoring cache.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 628 of file factoring_sub.c.

Referenced by max_successor_cache_free().

void max_successor_cache_free ( WCHMM_INFO * wchmm )

Free all memory for factoring cache.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 656 of file factoring_sub.c.

static LOGPROB calc_successor_prob	(	WCHMM_INFO *	wchmm,
		WORD_ID	lastword,
		int	node
	)			`[static]`

Compute 2-gram factoring value for the node and return the probability.

Parameters:

	wchmm	[in] tree lexicon
	lastword	[in] the last context word
	node	[in] node ID on wchmm

Returns:: the log probability of 2-gram on that node.

Definition at line 848 of file factoring_sub.c.

LOGPROB max_successor_prob	(	WCHMM_INFO *	wchmm,
		WORD_ID	lastword,
		int	node
	)

compute factoring LM score for the given word-internal node.

If it is a shared branch node and 1-gram factoring is used, the constant factoring value which has already been assigned before search will be returned immediately. Else, the maximum 2-gram probability of corresponding successor words are computed.

The word-internal factoring cache is consulted within this function. If the given last word is the same as the last call on that node, the last computed value will be returned, else the maximum value will be computed update the cache with the last word and value.

Parameters:

	wchmm	[in] tree lexicon
	lastword	[in] word ID of last context word
	node	[in] node ID

Returns:: the LM factoring score.

Definition at line 923 of file factoring_sub.c.

LOGPROB* max_successor_prob_iw	(	WCHMM_INFO *	wchmm,
		WORD_ID	lastword
	)

Compute cross-word facgtoring values for word head nodes and return the list.

Given a last word, this function compute the factoring LM scores for all the word head node to which the context-dependent (not 1-gram) factoring values should be computed. The resulting list of factoring values are cached within this function per the last word.

Parameters:

	wchmm	[in] tree lexicon
	lastword	[in] last word

Returns:: the list of factoring LM scores for all the needed word-head nodes.

Definition at line 1030 of file factoring_sub.c.

boolean can_succeed	(	WCHMM_INFO *	wchmm,
		WORD_ID	lastword,
		int	node
	)

Deterministic factoring for grammar-based recognition (Julian).

If CATEGORY_TREE is defined (this is default) on Julian, the tree lexicon will be organized per category and the category-pair constraint used in the 1st pass can be applied statically at cross-word transition.

If the CATEGORY_TREE is not defined, a single tree lexicon will be constucted for a whole dictionary. In this case, the category-pair constraint should be applied dynamically in the word-internal transition, like the factoring scheme with N-gram (Julius).

This function provides such word-internal factoring for grammar-based recognition (called deterministic factoring) when CATEGORY_TREE is undefined in Julian.

Parameters:

	wchmm	[in] tree lexicon
	lastword	[in] last word
	node	[in] node ID to check the constraint

Returns:: TRUE if the transition to the branch is allowed on the category-pair constraint, or FALSE if not allowed.

Definition at line 1176 of file factoring_sub.c.

Generated on Tue Dec 18 16:01:01 2007 for Julius by

1.5.4


Functions
static void	add_successor (WCHMM_INFO *wchmm, int node, WORD_ID w)
	Add a word to the successor list on a node in tree lexicon.
static boolean	match_successor (WCHMM_INFO *wchmm, int node1, int node2)
	Check if successor lists on two nodes are the same.
static void	free_successor (WCHMM_INFO *wchmm, int scid)
	Free successor list at the node.
static void	compaction_successor (WCHMM_INFO *wchmm)
	Garbage collection of the successor list, by deleting successor lists to which the link was deleted on the lexicon tree.
static void	shrink_successor (WCHMM_INFO *wchmm)
	Shrink the memory area that has been allocated for building successor list.
void	make_successor_list (WCHMM_INFO *wchmm)
	Main function to build whole successor list to lexicon tree.
void	adjust_sc_index (WCHMM_INFO *wchmm)
	Adjust factoring data in tree lexicon for multipath transition handling.
void	max_successor_cache_init (WCHMM_INFO *wchmm)
	Initialize factoring cache for a tree lexicon, allocating memory for cache.
static void	max_successor_prob_iw_free (WCHMM_INFO *wchmm)
	Free cross-word factoring cache.
void	max_successor_cache_free (WCHMM_INFO *wchmm)
	Free all memory for factoring cache.
static LOGPROB	calc_successor_prob (WCHMM_INFO *wchmm, WORD_ID lastword, int node)
	Compute 2-gram factoring value for the node and return the probability.
LOGPROB	max_successor_prob (WCHMM_INFO *wchmm, WORD_ID lastword, int node)
	compute factoring LM score for the given word-internal node.
LOGPROB *	max_successor_prob_iw (WCHMM_INFO *wchmm, WORD_ID lastword)
	Compute cross-word facgtoring values for word head nodes and return the list.
boolean	can_succeed (WCHMM_INFO *wchmm, WORD_ID lastword, int node)
	Deterministic factoring for grammar-based recognition (Julian).