julius/ngram_decode.c File Reference

Word prediction based on N-gram probability and word trellis index for 2nd pass of Julius. More...

#include <julius.h>

Include dependency graph for ngram_decode.c:

Go to the source code of this file.

Functions

static int compare_nw (NEXTWORD **a, NEXTWORD **b)

static NEXTWORD * search_nw (NEXTWORD **nw, WORD_ID w, int num)

static void set_word_context (WORD_ID *cseq, int n, WORD_INFO *winfo)

static int pick_backtrellis_words (BACKTRELLIS *bt, WORD_INFO *winfo, NGRAM_INFO *ngram, NEXTWORD **nw, int oldnum, NODE *hypo, short t)

Extract next word candidates from word trellis.

int get_backtrellis_words (BACKTRELLIS *bt, WORD_INFO *winfo, NGRAM_INFO *ngram, NEXTWORD **nw, NODE *hypo, short tm, short t_end)

Look for the next word candidates on the word trellis near the specified time frame.

int limit_nw (NEXTWORD **nw, NODE *hypo, int num)

int ngram_firstwords (NEXTWORD **nw, int peseqlen, int maxnw, WORD_INFO *winfo, BACKTRELLIS *bt)

Return the set of initial word hypotheses at the beginning.

int ngram_nextwords (NODE *hypo, NEXTWORD **nw, int maxnw, NGRAM_INFO *ngram, WORD_INFO *winfo, BACKTRELLIS *bt)

Return the list of next word candidate.

boolean ngram_acceptable (NODE *hypo, WORD_INFO *winfo)

Variables

static WORD_ID cnword [2]

Last two non-transparent words.

static int cnnum

Num of found non-transparent words (<=2).

static int last_trans

Num of skipped transparent words.

Detailed Description

Word prediction based on N-gram probability and word trellis index for 2nd pass of Julius.

Author:: Akinobu Lee

Date:: Fri Jul 8 14:57:51 2005

These functions returns next word candidates in the 2nd recognition pass of Julius, i.e. N-gram based stack decoding.

Given a partial sentence hypothesis, it first estimate the beginning frame of the hypothesis based on the word trellis. Then the words in the word trellis around the estimated frame are extracted from the word trellis. They will be returned with their N-gram probabilities.

In Julius, ngram_firstwords(), ngram_nextwords() and ngram_acceptable() are called from main search function wchmm_fbs(). In Julian, corresponding functions in dfa_decode.c will be used instead.

Revision: 1.4

Definition in file ngram_decode.c.

Function Documentation

static int compare_nw	(	NEXTWORD **	a,
		NEXTWORD **	b
	)			`[static]`

qsort callback function to sort next word candidates by their word ID.

Parameters:

	a	[in] element 1
	b	[in] element 2

Returns:: 1 if word id of a > that of b, -1 if negative, 0 if equal.

Definition at line 71 of file ngram_decode.c.

Referenced by get_backtrellis_words().

static NEXTWORD* search_nw	(	NEXTWORD **	nw,
		WORD_ID	w,
		int	num
	)			`[static]`

Find a word from list of next word candidates.

Parameters:

	nw	[in] list of next word candidates
	w	[in] word id to search for
	num	[in] length of nw

Returns:: the pointer to the NEXTWORD data if found, or NULL if not found.

Definition at line 101 of file ngram_decode.c.

Referenced by pick_backtrellis_words().

static void set_word_context	(	WORD_ID *	cseq,
		int	n,
		WORD_INFO *	winfo
	)			`[static]`

Set last two non-transparent words in the given word sequence and set them to cnword.

Parameters:

	cseq	[in] word sequence
	n	[in] length of cseq
	winfo	[in] word dictionary information

Definition at line 147 of file ngram_decode.c.

Referenced by pick_backtrellis_words().

static int pick_backtrellis_words	(	BACKTRELLIS *	bt,
		WORD_INFO *	winfo,
		NGRAM_INFO *	ngram,
		NEXTWORD **	nw,
		int	oldnum,
		NODE *	hypo,
		short	t
	)			`[static]`

Extract next word candidates from word trellis.

This function extracts the list of trellis words whose word end has survived in the word trellis at the specified frame. The N-gram probabilities of them are then computed and added to the current next word candidates data.

Parameters:

	bt	[in] word trellis structure
	winfo	[in] word dictionary structure
	ngram	[in] N-gram data structure
	nw	[in] list of next word candidates (new words will be appended at oldnum)
	oldnum	[in] number of words already stored in nw
	hypo	[in] the source sentence hypothesis
	t	[in] specified frame

Returns:: the total number of words currently stored in the nw.

Definition at line 205 of file ngram_decode.c.

Referenced by get_backtrellis_words().

int get_backtrellis_words	(	BACKTRELLIS *	bt,
		WORD_INFO *	winfo,
		NGRAM_INFO *	ngram,
		NEXTWORD **	nw,
		NODE *	hypo,
		short	tm,
		short	t_end
	)

Look for the next word candidates on the word trellis near the specified time frame.

This function builds a list of next word candidates by looking up the word trellis at specified frame, with lookup_range frame margin. If the same words exists in the near frames, only the one nearest to the specified frame will be chosen.

Parameters:

	bt	[in] word trellis structure
	winfo	[in] word dictionary structure
	ngram	[in] word N-gram structure
	nw	[out] pointer to hold the extracted words as list of next word candidates
	hypo	[in] partial sentence hypothesis from which the words will be expanded
	tm	[in] center time frame to look up the words
	t_end	[in] right frame boundary for the lookup.

Returns:: the number of next words candidates stored in nw.

Definition at line 309 of file ngram_decode.c.

Referenced by ngram_nextwords().

int limit_nw	(	NEXTWORD **	nw,
		NODE *	hypo,
		int	num
	)

Remove words in the nextword list which should not be expanded.

Parameters:

	nw	[i/o] list of next word candidates (will be shrinked by removing some words)
	hypo	[in] partial sentence hypothesis from which the words will be expanded
	num	[in] current number of next words in nw

Returns:: the new number of words in nw

Definition at line 391 of file ngram_decode.c.

Referenced by ngram_nextwords().

int ngram_firstwords	(	NEXTWORD **	nw,
		int	peseqlen,
		int	maxnw,
		WORD_INFO *	winfo,
		BACKTRELLIS *	bt
	)

Return the set of initial word hypotheses at the beginning.

on N-gram based recogntion, the initial hypothesis is fixed to the tail silence word. Exception is that, in short-pause segmentation mode, the initial hypothesis will be chosen from survived words on the last input frame in the first pass.

Parameters:

	nw	[out] pointer to hold the initial word candidates
	peseqlen	[in] input frame length
	maxnw	[in] maximum number of words that can be stored in nw
	winfo	[in] word dictionary information
	bt	[in] word trellis structure

Returns:: the number of words extracted and stored to nw.

Definition at line 462 of file ngram_decode.c.

Referenced by wchmm_fbs().

int ngram_nextwords	(	NODE *	hypo,
		NEXTWORD **	nw,
		int	maxnw,
		NGRAM_INFO *	ngram,
		WORD_INFO *	winfo,
		BACKTRELLIS *	bt
	)

Return the list of next word candidate.

Given a partial sentence hypothesis "hypo", it returns the list of next word candidates. Actually, it extracts from word trellis the list of words whose word-end node has survived near the estimated beginning-of-word frame of last word "hypo->estimated_next_t", and store them to "nw" with their N-gram probabilities.

Parameters:

	hypo	[in] source partial sentence hypothesis
	nw	[out] pointer to store the list of next word candidates (should be already allocated)
	maxnw	[in] maximum number of words that can be stored to nw
	ngram	[in] word N-gram
	winfo	[in] word dictionary
	bt	[in] word trellis structure

Returns:: the number of extracted next word candidates in nw.

Definition at line 531 of file ngram_decode.c.

boolean ngram_acceptable	(	NODE *	hypo,
		WORD_INFO *	winfo
	)

Return whether the given partial hypothesis is acceptable as a sentence and can be treated as a final search candidate. In N-gram mode, it checks whether the last word is the beginning-of-sentence silence (silhead).

Parameters:

	hypo	[in] partial sentence hypothesis to be examined
	winfo	[in] word dictionary

Returns:: TRUE if acceptable as a sentence, or FALSE if not.

Definition at line 584 of file ngram_decode.c.

Generated on Tue Dec 26 16:16:53 2006 for Julius by

1.5.0


Functions
static int	compare_nw (NEXTWORD a, NEXTWORD b)
static NEXTWORD *	search_nw (NEXTWORD **nw, WORD_ID w, int num)
static void	set_word_context (WORD_ID cseq, int n, WORD_INFO winfo)
static int	pick_backtrellis_words (BACKTRELLIS bt, WORD_INFO winfo, NGRAM_INFO ngram, NEXTWORD nw, int oldnum, NODE hypo, short t)
	Extract next word candidates from word trellis.
int	get_backtrellis_words (BACKTRELLIS bt, WORD_INFO winfo, NGRAM_INFO ngram, NEXTWORD nw, NODE hypo, short tm, short t_end)
	Look for the next word candidates on the word trellis near the specified time frame.
int	limit_nw (NEXTWORD *nw, NODE hypo, int num)
int	ngram_firstwords (NEXTWORD *nw, int peseqlen, int maxnw, WORD_INFO winfo, BACKTRELLIS *bt)
	Return the set of initial word hypotheses at the beginning.
int	ngram_nextwords (NODE hypo, NEXTWORD nw, int maxnw, NGRAM_INFO ngram, WORD_INFO winfo, BACKTRELLIS bt)
	Return the list of next word candidate.
boolean	ngram_acceptable (NODE hypo, WORD_INFO winfo)
Variables
static WORD_ID	cnword [2]
	Last two non-transparent words.
static int	cnnum
	Num of found non-transparent words (<=2).
static int	last_trans
	Num of skipped transparent words.