julius/wchmm.c File Reference

Build tree lexicon. More...

#include <julius.h>

Include dependency graph for wchmm.c:

Go to the source code of this file.

Functions

WCHMM_INFO * wchmm_new ()

static void wchmm_init (WCHMM_INFO *wchmm)

static void wchmm_extend (WCHMM_INFO *wchmm)

void wchmm_free (WCHMM_INFO *w)

static int compare_wseq (WORD_ID *widx1, WORD_ID *widx2)

static void wchmm_sort_idx_by_wseq (WORD_INFO *winfo, WORD_ID *windex, WORD_ID bgn, WORD_ID len)

static int wchmm_check_match (WORD_INFO *winfo, int i, int j)

static void add_wacc (WCHMM_INFO *wchmm, int node, LOGPROB a, int arc)

static void wchmm_link_hmm (WCHMM_INFO *wchmm, int from_node, int to_node, HTK_HMM_Trans *tinfo)

static void wchmm_link_subword (WCHMM_INFO *wchmm, int from_word, int from_seq, int to_word, int to_seq)

static void wchmm_duplicate_state (WCHMM_INFO *wchmm, int node, int word)

static void wchmm_duplicate_leafnode (WCHMM_INFO *wchmm)

static void wchmm_add_word (WCHMM_INFO *wchmm, int word, int matchlen, int matchword)

static void wchmm_index_ststart (WCHMM_INFO *wchmm)

static void wchmm_calc_wordend_arc (WCHMM_INFO *wchmm)

static int compare_prob (LOGPROB *a, LOGPROB *b)

static LOGPROB get_nbest_uniprob (WORD_INFO *winfo, int n)

void build_wchmm2 (WCHMM_INFO *wchmm)

void print_wchmm_info (WCHMM_INFO *wchmm)

Variables

static int dupcount = 0

Number of duplicated nodes (for debug only) If defined, do wchmm size estimation (for debug only).

static WORD_INFO * local_winfo

Temporary work area for sort callbacks.

static int separated_word_count

Number of words actually separated (linearlized) from the tree.

Detailed Description

Build tree lexicon.

Author:: Akinobu Lee

Date:: Mon Sep 19 23:39:15 2005

Functions to build a tree lexicon (or called word-conjunction HMM here) from word dictionary, HMM and language models are defined here. The constructed tree lexicon will be used for the recognition of the 1st pass. The lexicon is composed per HMM state unit, and various informations about output probabilities, arcs, language model constraints, and others are assembled in the lexicon.

Note that the word "wchmm" in the source code is a synonim of "tree lexicon".

Revision: 1.4

Definition in file wchmm.c.

Function Documentation

WCHMM_INFO* wchmm_new ( )

Allocate a new tree lexicon structure.

Returns:: pointer to the newly allocated tree lexicon structure.

Definition at line 70 of file wchmm.c.

Referenced by final_fusion().

static void wchmm_init ( WCHMM_INFO * wchmm ) [static]

Initialize content of a lexicon tree.

Parameters:

wchmm

[out] pointer to the lexicon tree structure

Definition at line 103 of file wchmm.c.

Referenced by build_wchmm2().

static void wchmm_extend ( WCHMM_INFO * wchmm ) [static]

Expand state-related area in a tree lexicon by MAXWCNSTEP.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 156 of file wchmm.c.

Referenced by wchmm_add_word(), and wchmm_duplicate_state().

void wchmm_free ( WCHMM_INFO * w )

Free all data in a tree lexicon.

Parameters:

[in] tree lexicon

Definition at line 206 of file wchmm.c.

static int compare_wseq	(	WORD_ID *	widx1,
		WORD_ID *	widx2
	)			`[static]`

qsort function to sort words by their phoneme sequence.

Parameters:

	widx1	[in] pointer to word id #1
	widx2	[in] pointer to wrod id #2

Returns:: 1 if word[widx2] is part of word[widx1], -1 if word[widx1] is part of word[widx2], or 0 if the two words are equal.

Definition at line 292 of file wchmm.c.

Referenced by wchmm_sort_idx_by_wseq().

static void wchmm_sort_idx_by_wseq	(	WORD_INFO *	winfo,
		WORD_ID *	windex,
		WORD_ID	bgn,
		WORD_ID	len
	)			`[static]`

Sort word IDs in windex[bgn..bgn+len-1] by their phoneme sequence order.

Parameters:

	winfo	[in] word lexicon
	windex	[i/o] index sequence of word IDs, (will be sorted in this function)
	bgn	[in] start point to sort in windex
	len	[in] length of indexes to be sorted from bgn

Definition at line 341 of file wchmm.c.

Referenced by build_wchmm2().

static int wchmm_check_match	(	WORD_INFO *	winfo,
		int	i,
		int	j
	)			`[static]`

Compare two words from word head per phoneme to see how many phones can be shared among the two.

Parameters:

	winfo	[in] word dictionary
	i	[in] a word
	j	[in] another word

Returns:: the number of phonemes to be shared from the head of the words.

Definition at line 430 of file wchmm.c.

static void add_wacc	(	WCHMM_INFO *	wchmm,
		int	node,
		LOGPROB	a,
		int	arc
	)			`[static]`

Add a transition arc between two nodes on the tree lexicon

Parameters:

	wchmm	[i/o] tree lexicon
	node	[in] node number of source node
	a	[in] transition probability in log scale
	arc	[in] node number of destination node

Definition at line 463 of file wchmm.c.

Referenced by wchmm_duplicate_state(), and wchmm_link_hmm().

static void wchmm_link_hmm	(	WCHMM_INFO *	wchmm,
		int	from_node,
		int	to_node,
		HTK_HMM_Trans *	tinfo
	)			`[static]`

Add a transition from end node of a phone to start node of another phone.

Parameters:

	wchmm	[i/o] tree lexicon
	from_node	[in] end node of a phone
	to_node	[in] start node of a phone
	tinfo	[in] transition prob. matrix of the from_node phone.

Definition at line 592 of file wchmm.c.

Referenced by wchmm_link_subword().

static void wchmm_link_subword	(	WCHMM_INFO *	wchmm,
		int	from_word,
		int	from_seq,
		int	to_word,
		int	to_seq
	)			`[static]`

Connect two phonemes in tree lexicon.

Parameters:

	wchmm	[i/o] tree lexicon
	from_word	[in] source word ID
	from_seq	[in] index of source phoneme in from_word from which the other will be connected
	to_word	[in] destination word ID
	to_seq	[in] index of destination phoneme in to_word to which the other will connect

Definition at line 641 of file wchmm.c.

static void wchmm_duplicate_state	(	WCHMM_INFO *	wchmm,
		int	node,
		int	word
	)			`[static]`

Isolation of word-end nodes for homophones: duplicate the word-end state, link as the same as original, and make it the new word-end node of the given new word.

Parameters:

	wchmm	[i/o] tree lexicon
	node	[in] the word end node of the already existing homophone
	word	[in] word ID to be added to the tree

Definition at line 696 of file wchmm.c.

Referenced by wchmm_duplicate_leafnode().

static void wchmm_duplicate_leafnode ( WCHMM_INFO * wchmm ) [static]

Scan the whole lexicon tree to find already registered homophones, and make word-end nodes of the found homophones isolated from others.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 810 of file wchmm.c.

static void wchmm_add_word	(	WCHMM_INFO *	wchmm,
		int	word,
		int	matchlen,
		int	matchword
	)			`[static]`

Add a new word to the lexicon tree. The longest matched word in the current lexicon tree and the length of the matched phoneme from the word head should be specified to tell where to insert the new word to the tree.

Parameters:

	wchmm	[i/o] tree lexicon
	word	[in] word id to be added to the lexicon
	matchlen	[in] phoneme match length between word and matchword.
	matchword	[in] the longest matched word with word in the current lexicon tree

Definition at line 891 of file wchmm.c.

static void wchmm_index_ststart ( WCHMM_INFO * wchmm ) [static]

Inspect the whole lexicon tree to generate list of word head states for inter-word transition computation.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 1329 of file wchmm.c.

static void wchmm_calc_wordend_arc ( WCHMM_INFO * wchmm ) [static]

Scan the lexicon tree to make list of emission probability from the word end state.

Parameters:

wchmm

[i/o] tree lexicon

Definition at line 1367 of file wchmm.c.

static int compare_prob	(	LOGPROB *	a,
		LOGPROB *	b
	)			`[static]`

qsort callback function to sort unigram values.

Parameters:

	a	[in] element #1
	b	[in] element #2

Returns:: the result of comparison.

Definition at line 1407 of file wchmm.c.

Referenced by get_nbest_uniprob().

static LOGPROB get_nbest_uniprob	(	WORD_INFO *	winfo,
		int	n
	)			`[static]`

Get the Nth-best unigram probability from all words.

Parameters:

	winfo	[in] word dictionary
	n	[in] required rank

Returns:: the Nth-best unigram probability.

Definition at line 1433 of file wchmm.c.

Referenced by build_wchmm2().

void build_wchmm2 ( WCHMM_INFO * wchmm )

Build a tree lexicon from given word dictionary and language model. This function does the same job as build_wchmm(), but it is much faster because finding of the longest matched word to an adding word is done by first sorting all the words in the dictoinary by their phoneme sequence order. This function will be used instead of build_wchmm() by default.

Parameters:

wchmm

[i/o] lexicon tree

Definition at line 1682 of file wchmm.c.

Referenced by final_fusion().

void print_wchmm_info ( WCHMM_INFO * wchmm )

Output some specifications of the tree lexicon (size etc.) to stdout.

Parameters:

wchmm

[in] tree lexicon already built

Definition at line 1958 of file wchmm.c.

Referenced by print_info().

Generated on Tue Dec 26 16:16:59 2006 for Julius by

1.5.0


Functions
WCHMM_INFO *	wchmm_new ()
static void	wchmm_init (WCHMM_INFO *wchmm)
static void	wchmm_extend (WCHMM_INFO *wchmm)
void	wchmm_free (WCHMM_INFO *w)
static int	compare_wseq (WORD_ID widx1, WORD_ID widx2)
static void	wchmm_sort_idx_by_wseq (WORD_INFO winfo, WORD_ID windex, WORD_ID bgn, WORD_ID len)
static int	wchmm_check_match (WORD_INFO *winfo, int i, int j)
static void	add_wacc (WCHMM_INFO *wchmm, int node, LOGPROB a, int arc)
static void	wchmm_link_hmm (WCHMM_INFO wchmm, int from_node, int to_node, HTK_HMM_Trans tinfo)
static void	wchmm_link_subword (WCHMM_INFO *wchmm, int from_word, int from_seq, int to_word, int to_seq)
static void	wchmm_duplicate_state (WCHMM_INFO *wchmm, int node, int word)
static void	wchmm_duplicate_leafnode (WCHMM_INFO *wchmm)
static void	wchmm_add_word (WCHMM_INFO *wchmm, int word, int matchlen, int matchword)
static void	wchmm_index_ststart (WCHMM_INFO *wchmm)
static void	wchmm_calc_wordend_arc (WCHMM_INFO *wchmm)
static int	compare_prob (LOGPROB a, LOGPROB b)
static LOGPROB	get_nbest_uniprob (WORD_INFO *winfo, int n)
void	build_wchmm2 (WCHMM_INFO *wchmm)
void	print_wchmm_info (WCHMM_INFO *wchmm)
Variables
static int	dupcount = 0
	Number of duplicated nodes (for debug only) If defined, do wchmm size estimation (for debug only).
static WORD_INFO *	local_winfo
	Temporary work area for sort callbacks.
static int	separated_word_count
	Number of words actually separated (linearlized) from the tree.