libjulius/src/spsegment.c File Reference

Short-pause segmentation and decoder-based VAD. More...

#include <julius/julius.h>

Go to the source code of this file.

Functions

boolean is_sil (WORD_ID w, RecogProcess *r)
 Check if the fiven word is a short-pause word.
void mfcc_copy_to_rest_and_shrink (MFCCCalc *mfcc, int start, int end)
 Split input parameter for segmentation.
void mfcc_shrink (MFCCCalc *mfcc, int p)
 Shrink the parameter sequence.
boolean detect_end_of_segment (RecogProcess *r, int time)
 Speech end point detection.
void finalize_segment (Recog *recog)
 Finalize the first pass for successive decoding.
boolean spsegment_need_restart (Recog *recog, int *rf_ret, boolean *repro_ret)
 Check if rewind and restart of recognition is needed.
void spsegment_restart_mfccs (Recog *recog, int rewind_frame, boolean reprocess)
 Execute rewinding.


Detailed Description

Short-pause segmentation and decoder-based VAD.

In short-pause segmentation mode, Julius tries to find a "pause frame" by watching the word hypotheses at each frame. Julius treat words with only a silence model as "pause word", and judge whether the input frame is "pause frame" or not by watching if any of the pause words gets maximum score at each frame. Then it will segment the input when the duration of pause frame reaches a limit.

On normal short-pause segmentation (as of ver.3.x), the pause frames will not be eliminated. The input will be segment at the frame where a speech begins after the pause frames, and the next input will be processed from the beginning of the pause frames. In other words, the detected area of pause frames are processed twice, as end-of-segment silence at the former input segment and beginning-of-segment silence at the latter input segment.

When SPSEGMENT_NAIST is defined, a long pause area will be dropped from recognition. When the detecting pause frames gets longer than threshold, it segments the input at that point and skip the continuing pauses until a speech frame comes. The recognition process will be kept with a special status while in the pause segment. This scheme works as a decoder-driven VAD.

Author:
Akinobu Lee
Date:
Wed Oct 17 12:47:29 2007
$Revision:$

Definition in file spsegment.c.


Function Documentation

boolean is_sil ( WORD_ID  w,
RecogProcess r 
)

Check if the fiven word is a short-pause word.

Parameters:
w [in] word id
r [in] recognition process instance
Returns:
TRUE if it is short pause word, FALSE if not.

Definition at line 98 of file spsegment.c.

Referenced by detect_end_of_segment().

Here is the caller graph for this function:

void mfcc_copy_to_rest_and_shrink ( MFCCCalc mfcc,
int  start,
int  end 
)

Split input parameter for segmentation.

Copy the rest samples in param to rest_param, and shrink the param in mfcc instance. [start...param->samplenum] will be copied to rest_param, and [0...end] will be left in param.

Parameters:
mfcc [i/o] MFCC calculation instance
start [in] copy start frame
end [in] original end frame

Definition at line 154 of file spsegment.c.

Referenced by finalize_segment().

Here is the caller graph for this function:

void mfcc_shrink ( MFCCCalc mfcc,
int  p 
)

Shrink the parameter sequence.

Drop the first (p-1) frames and move [p..samplenum] to 0.

Parameters:
mfcc [i/o] MFCC Calculation instance
p [in] frame point to remain

Definition at line 194 of file spsegment.c.

boolean detect_end_of_segment ( RecogProcess r,
int  time 
)

Speech end point detection.

Detect end-of-input by duration of short-pause words when short-pause segmentation is enabled. When a pause word gets maximum score for a successive frames, the segment will be treated as a pause frames. When speech re-triggers, the current input will be segmented at that point.

When SPSEGMENT_NAIST is defined, this function performs extended version of the short pause segmentation, called "decoder-based VAD". When before speech trigger (r->pass1.after_trigger == FALSE), it tells the recognition functions not to generate word trellis and continue calculation. If a speech trigger is found (not a pause word gets maximum score), the input frames are 'rewinded' for a certain frame (r->config->successive.sp_margin) and start the normal recognition process from the rewinded frames (r->pass1.after_trigger = TRUE). When a pause frame duration reaches a limit (r->config->successive.sp_frame_duration), it terminate the search.

Parameters:
r [i/o] recognition process instance
time [in] current input frame
Returns:
TRUE if end-of-input detected at this frame, FALSE if not.

Definition at line 262 of file spsegment.c.

Referenced by decode_proceed().

Here is the caller graph for this function:

void finalize_segment ( Recog recog  ) 

Finalize the first pass for successive decoding.

When successive decoding mode is enabled, this function will be called just after finalize_1st_pass() to finish the beam search of the last segment. The beginning and ending words for the 2nd pass will be set according to the 1st pass result. Then the current input will be shrinked to the segmented length and the unprocessed region are copied to rest_param for the next decoding.

Parameters:
recog [in] engine instance

Definition at line 632 of file spsegment.c.

Referenced by decode_end(), and decode_end_segmented().

Here is the caller graph for this function:

boolean spsegment_need_restart ( Recog recog,
int *  rf_ret,
boolean *  repro_ret 
)

Check if rewind and restart of recognition is needed.

This function checks if an instance requires rewinding of input samples, and if recognition re-processing is needed after rewinding.

Parameters:
recog [in] engine instance
rf_ret [out] length of frame to rewind
repro_ret [out] TRUE if re-process is required after rewinding
Returns:
TRUE if rewinding is required, or FALSE if not.

Definition at line 836 of file spsegment.c.

Referenced by get_back_trellis(), and RealTimePipeLine().

Here is the caller graph for this function:

void spsegment_restart_mfccs ( Recog recog,
int  rewind_frame,
boolean  reprocess 
)

Execute rewinding.

This function will set re-start point for the following processing, and shrink the parameters for the rewinded part. The re-start point is 0 (beginning of rest samples) for recognition restart, or simply go back to the specified rewind frames for non restart.

Parameters:
recog [i/o] engine instance
rewind_frame [in] frame length to rewind
reprocess [in] TRUE if re-processing recognition is required for the following processing

Definition at line 907 of file spsegment.c.

Referenced by get_back_trellis(), and RealTimePipeLine().

Here is the caller graph for this function:


Generated on Tue Dec 18 16:01:28 2007 for Julius by  doxygen 1.5.4