julius/ngram_decode.c File Reference

Word prediction based on N-gram probability and word trellis index for 2nd pass of Julius. More...

#include <julius.h>

Include dependency graph for ngram_decode.c:

Go to the source code of this file.

Functions

static int compare_nw (NEXTWORD **a, NEXTWORD **b)
static NEXTWORDsearch_nw (NEXTWORD **nw, WORD_ID w, int num)
static void set_word_context (WORD_ID *cseq, int n, WORD_INFO *winfo)
static int pick_backtrellis_words (BACKTRELLIS *bt, WORD_INFO *winfo, NGRAM_INFO *ngram, NEXTWORD **nw, int oldnum, NODE *hypo, short t)
 Extract next word candidates from word trellis.
int get_backtrellis_words (BACKTRELLIS *bt, WORD_INFO *winfo, NGRAM_INFO *ngram, NEXTWORD **nw, NODE *hypo, short tm, short t_end)
 Look for the next word candidates on the word trellis near the specified time frame.
int limit_nw (NEXTWORD **nw, NODE *hypo, int num)
int ngram_firstwords (NEXTWORD **nw, int peseqlen, int maxnw, WORD_INFO *winfo, BACKTRELLIS *bt)
 Return the set of initial word hypotheses at the beginning.
int ngram_nextwords (NODE *hypo, NEXTWORD **nw, int maxnw, NGRAM_INFO *ngram, WORD_INFO *winfo, BACKTRELLIS *bt)
 Return the list of next word candidate.
boolean ngram_acceptable (NODE *hypo, WORD_INFO *winfo)

Variables

static WORD_ID cnword [2]
 Last two non-transparent words.
static int cnnum
 Num of found non-transparent words (<=2).
static int last_trans
 Num of skipped transparent words.


Detailed Description

Word prediction based on N-gram probability and word trellis index for 2nd pass of Julius.

Author:
Akinobu Lee
Date:
Fri Jul 8 14:57:51 2005
These functions returns next word candidates in the 2nd recognition pass of Julius, i.e. N-gram based stack decoding.

Given a partial sentence hypothesis, it first estimate the beginning frame of the hypothesis based on the word trellis. Then the words in the word trellis around the estimated frame are extracted from the word trellis. They will be returned with their N-gram probabilities.

In Julius, ngram_firstwords(), ngram_nextwords() and ngram_acceptable() are called from main search function wchmm_fbs(). In Julian, corresponding functions in dfa_decode.c will be used instead.

Revision
1.4

Definition in file ngram_decode.c.


Function Documentation

static int compare_nw ( NEXTWORD **  a,
NEXTWORD **  b 
) [static]

qsort callback function to sort next word candidates by their word ID.

Parameters:
a [in] element 1
b [in] element 2
Returns:
1 if word id of a > that of b, -1 if negative, 0 if equal.

Definition at line 71 of file ngram_decode.c.

Referenced by get_backtrellis_words().

static NEXTWORD* search_nw ( NEXTWORD **  nw,
WORD_ID  w,
int  num 
) [static]

Find a word from list of next word candidates.

Parameters:
nw [in] list of next word candidates
w [in] word id to search for
num [in] length of nw
Returns:
the pointer to the NEXTWORD data if found, or NULL if not found.

Definition at line 101 of file ngram_decode.c.

Referenced by pick_backtrellis_words().

static void set_word_context ( WORD_ID cseq,
int  n,
WORD_INFO winfo 
) [static]

Set last two non-transparent words in the given word sequence and set them to cnword.

Parameters:
cseq [in] word sequence
n [in] length of cseq
winfo [in] word dictionary information

Definition at line 147 of file ngram_decode.c.

Referenced by pick_backtrellis_words().

static int pick_backtrellis_words ( BACKTRELLIS bt,
WORD_INFO winfo,
NGRAM_INFO ngram,
NEXTWORD **  nw,
int  oldnum,
NODE hypo,
short  t 
) [static]

Extract next word candidates from word trellis.

This function extracts the list of trellis words whose word end has survived in the word trellis at the specified frame. The N-gram probabilities of them are then computed and added to the current next word candidates data.

Parameters:
bt [in] word trellis structure
winfo [in] word dictionary structure
ngram [in] N-gram data structure
nw [in] list of next word candidates (new words will be appended at oldnum)
oldnum [in] number of words already stored in nw
hypo [in] the source sentence hypothesis
t [in] specified frame
Returns:
the total number of words currently stored in the nw.

Definition at line 205 of file ngram_decode.c.

Referenced by get_backtrellis_words().

int get_backtrellis_words ( BACKTRELLIS bt,
WORD_INFO winfo,
NGRAM_INFO ngram,
NEXTWORD **  nw,
NODE hypo,
short  tm,
short  t_end 
)

Look for the next word candidates on the word trellis near the specified time frame.

This function builds a list of next word candidates by looking up the word trellis at specified frame, with lookup_range frame margin. If the same words exists in the near frames, only the one nearest to the specified frame will be chosen.

Parameters:
bt [in] word trellis structure
winfo [in] word dictionary structure
ngram [in] word N-gram structure
nw [out] pointer to hold the extracted words as list of next word candidates
hypo [in] partial sentence hypothesis from which the words will be expanded
tm [in] center time frame to look up the words
t_end [in] right frame boundary for the lookup.
Returns:
the number of next words candidates stored in nw.

Definition at line 309 of file ngram_decode.c.

Referenced by ngram_nextwords().

int limit_nw ( NEXTWORD **  nw,
NODE hypo,
int  num 
)

Remove words in the nextword list which should not be expanded.

Parameters:
nw [i/o] list of next word candidates (will be shrinked by removing some words)
hypo [in] partial sentence hypothesis from which the words will be expanded
num [in] current number of next words in nw
Returns:
the new number of words in nw

Definition at line 391 of file ngram_decode.c.

Referenced by ngram_nextwords().

int ngram_firstwords ( NEXTWORD **  nw,
int  peseqlen,
int  maxnw,
WORD_INFO winfo,
BACKTRELLIS bt 
)

Return the set of initial word hypotheses at the beginning.

on N-gram based recogntion, the initial hypothesis is fixed to the tail silence word. Exception is that, in short-pause segmentation mode, the initial hypothesis will be chosen from survived words on the last input frame in the first pass.

Parameters:
nw [out] pointer to hold the initial word candidates
peseqlen [in] input frame length
maxnw [in] maximum number of words that can be stored in nw
winfo [in] word dictionary information
bt [in] word trellis structure
Returns:
the number of words extracted and stored to nw.

Definition at line 462 of file ngram_decode.c.

Referenced by wchmm_fbs().

int ngram_nextwords ( NODE hypo,
NEXTWORD **  nw,
int  maxnw,
NGRAM_INFO ngram,
WORD_INFO winfo,
BACKTRELLIS bt 
)

Return the list of next word candidate.

Given a partial sentence hypothesis "hypo", it returns the list of next word candidates. Actually, it extracts from word trellis the list of words whose word-end node has survived near the estimated beginning-of-word frame of last word "hypo->estimated_next_t", and store them to "nw" with their N-gram probabilities.

Parameters:
hypo [in] source partial sentence hypothesis
nw [out] pointer to store the list of next word candidates (should be already allocated)
maxnw [in] maximum number of words that can be stored to nw
ngram [in] word N-gram
winfo [in] word dictionary
bt [in] word trellis structure
Returns:
the number of extracted next word candidates in nw.

Definition at line 531 of file ngram_decode.c.

boolean ngram_acceptable ( NODE hypo,
WORD_INFO winfo 
)

Return whether the given partial hypothesis is acceptable as a sentence and can be treated as a final search candidate. In N-gram mode, it checks whether the last word is the beginning-of-sentence silence (silhead).

Parameters:
hypo [in] partial sentence hypothesis to be examined
winfo [in] word dictionary
Returns:
TRUE if acceptable as a sentence, or FALSE if not.

Definition at line 584 of file ngram_decode.c.


Generated on Tue Dec 26 16:16:53 2006 for Julius by  doxygen 1.5.0