#include <julius/julius.h>
Go to the source code of this file.
Functions | |
boolean | is_sil (WORD_ID w, RecogProcess *r) |
Check if the fiven word is a short-pause word. | |
void | mfcc_copy_to_rest_and_shrink (MFCCCalc *mfcc, int start, int end) |
Split input parameter for segmentation. | |
void | mfcc_shrink (MFCCCalc *mfcc, int p) |
Shrink the parameter sequence. | |
boolean | detect_end_of_segment (RecogProcess *r, int time) |
Speech end point detection. | |
void | finalize_segment (Recog *recog) |
Finalize the first pass for successive decoding. | |
boolean | spsegment_need_restart (Recog *recog, int *rf_ret, boolean *repro_ret) |
Check if rewind and restart of recognition is needed. | |
void | spsegment_restart_mfccs (Recog *recog, int rewind_frame, boolean reprocess) |
Execute rewinding. |
In short-pause segmentation mode, Julius tries to find a "pause frame" by watching the word hypotheses at each frame. Julius treat words with only a silence model as "pause word", and judge whether the input frame is "pause frame" or not by watching if any of the pause words gets maximum score at each frame. Then it will segment the input when the duration of pause frame reaches a limit.
On normal short-pause segmentation (as of ver.3.x), the pause frames will not be eliminated. The input will be segment at the frame where a speech begins after the pause frames, and the next input will be processed from the beginning of the pause frames. In other words, the detected area of pause frames are processed twice, as end-of-segment silence at the former input segment and beginning-of-segment silence at the latter input segment.
When SPSEGMENT_NAIST is defined, a long pause area will be dropped from recognition. When the detecting pause frames gets longer than threshold, it segments the input at that point and skip the continuing pauses until a speech frame comes. The recognition process will be kept with a special status while in the pause segment. This scheme works as a decoder-driven VAD.
Definition in file spsegment.c.
boolean is_sil | ( | WORD_ID | w, | |
RecogProcess * | r | |||
) |
Check if the fiven word is a short-pause word.
w | [in] word id | |
r | [in] recognition process instance |
Definition at line 98 of file spsegment.c.
Referenced by detect_end_of_segment().
void mfcc_copy_to_rest_and_shrink | ( | MFCCCalc * | mfcc, | |
int | start, | |||
int | end | |||
) |
Split input parameter for segmentation.
Copy the rest samples in param to rest_param, and shrink the param in mfcc instance. [start...param->samplenum] will be copied to rest_param, and [0...end] will be left in param.
mfcc | [i/o] MFCC calculation instance | |
start | [in] copy start frame | |
end | [in] original end frame |
Definition at line 154 of file spsegment.c.
Referenced by finalize_segment().
void mfcc_shrink | ( | MFCCCalc * | mfcc, | |
int | p | |||
) |
Shrink the parameter sequence.
Drop the first (p-1) frames and move [p..samplenum] to 0.
mfcc | [i/o] MFCC Calculation instance | |
p | [in] frame point to remain |
Definition at line 194 of file spsegment.c.
boolean detect_end_of_segment | ( | RecogProcess * | r, | |
int | time | |||
) |
Speech end point detection.
Detect end-of-input by duration of short-pause words when short-pause segmentation is enabled. When a pause word gets maximum score for a successive frames, the segment will be treated as a pause frames. When speech re-triggers, the current input will be segmented at that point.
When SPSEGMENT_NAIST is defined, this function performs extended version of the short pause segmentation, called "decoder-based VAD". When before speech trigger (r->pass1.after_trigger == FALSE), it tells the recognition functions not to generate word trellis and continue calculation. If a speech trigger is found (not a pause word gets maximum score), the input frames are 'rewinded' for a certain frame (r->config->successive.sp_margin) and start the normal recognition process from the rewinded frames (r->pass1.after_trigger = TRUE). When a pause frame duration reaches a limit (r->config->successive.sp_frame_duration), it terminate the search.
r | [i/o] recognition process instance | |
time | [in] current input frame |
Definition at line 262 of file spsegment.c.
Referenced by decode_proceed().
void finalize_segment | ( | Recog * | recog | ) |
Finalize the first pass for successive decoding.
When successive decoding mode is enabled, this function will be called just after finalize_1st_pass() to finish the beam search of the last segment. The beginning and ending words for the 2nd pass will be set according to the 1st pass result. Then the current input will be shrinked to the segmented length and the unprocessed region are copied to rest_param for the next decoding.
recog | [in] engine instance |
Definition at line 632 of file spsegment.c.
Referenced by decode_end(), and decode_end_segmented().
boolean spsegment_need_restart | ( | Recog * | recog, | |
int * | rf_ret, | |||
boolean * | repro_ret | |||
) |
Check if rewind and restart of recognition is needed.
This function checks if an instance requires rewinding of input samples, and if recognition re-processing is needed after rewinding.
recog | [in] engine instance | |
rf_ret | [out] length of frame to rewind | |
repro_ret | [out] TRUE if re-process is required after rewinding |
Definition at line 836 of file spsegment.c.
Referenced by get_back_trellis(), and RealTimePipeLine().
void spsegment_restart_mfccs | ( | Recog * | recog, | |
int | rewind_frame, | |||
boolean | reprocess | |||
) |
Execute rewinding.
This function will set re-start point for the following processing, and shrink the parameters for the rewinded part. The re-start point is 0 (beginning of rest samples) for recognition restart, or simply go back to the specified rewind frames for non restart.
recog | [i/o] engine instance | |
rewind_frame | [in] frame length to rewind | |
reprocess | [in] TRUE if re-processing recognition is required for the following processing |
Definition at line 907 of file spsegment.c.
Referenced by get_back_trellis(), and RealTimePipeLine().