#include <julius/julius.h>
Go to the source code of this file.
Functions | |
static void | init_param (MFCCCalc *mfcc) |
< Define if you want local debug message | |
boolean | RealTimeInit (Recog *recog) |
Initializations for the on-the-fly 1st pass decoding. | |
void | reset_mfcc (Recog *recog) |
Prepare work are a for MFCC calculation. | |
boolean | RealTimePipeLinePrepare (Recog *recog) |
Preparation for the on-the-fly 1st pass decoding. | |
boolean | RealTimeMFCC (MFCCCalc *mfcc, SP16 *window, int windowlen) |
Compute a parameter vector from a speech window. | |
int | RealTimePipeLine (SP16 *Speech, int nowlen, Recog *recog) |
Main function of the on-the-fly 1st pass decoding. | |
int | RealTimeResume (Recog *recog) |
Resuming recognition for short pause segmentation. | |
boolean | RealTimeParam (Recog *recog) |
Finalize the 1st pass on-the-fly decoding. | |
void | RealTimeCMNUpdate (MFCCCalc *mfcc, Recog *recog) |
Update cepstral mean. | |
void | RealTimeTerminate (Recog *recog) |
Terminate the 1st pass on-the-fly decoding. | |
void | realbeam_free (Recog *recog) |
Free the whole work area for 1st pass on-the-fly decoding. |
These are functions to perform on-the-fly decoding of the 1st pass (frame-synchronous beam search). These function can be used instead of new_wav2mfcc() and get_back_trellis(). These functions enable recognition as soon as an input triggers. The 1st pass processing will be done concurrently with the input.
The basic recognition procedure of Julius in main_recognition_loop() is as follows:
At on-the-fly decoding, procedures from 1 to 3 above will be performed in parallel. It is implemented by a simple scheme, processing the captured small speech fragments one by one progressively:
Actual procedure is as follows. The function RealTimePipeLine() will be given to adin_go() as callback. Then adin_go() will watch the input, and if speech input starts, it calls RealTimePipeLine() for every captured input fragments. RealTimePipeLine() will compute the feature vector of the given fragment and proceed the 1st pass processing for them, and return to the capture function. The current status will be hold to the next call, to perform inter-frame processing (computing delta coef. etc.).
Note about CMN: With acoustic models trained with CMN, Julius performs CMN to the input. On file input, the whole sentence mean will be computed and subtracted. At the on-the-fly decoding, the ceptral mean will be performed using the cepstral mean of last 5 second input (excluding rejected ones). This was a behavier earlier than 3.5, and 3.5.1 now applies MAP-CMN at on-the-fly decoding, using the last 5 second cepstrum as initial mean. Initial cepstral mean at start can be given by option "-cmnload", and you can also prohibit the updates of initial cepstral mean at each input by "-cmnnoupdate". The last option is useful to always use static global cepstral mean as initial mean for each input.
The primary functions in this file are:
Definition in file realtime-1stpass.c.
static void init_param | ( | MFCCCalc * | mfcc | ) | [static] |
< Define if you want local debug message
Prepare parameter holder in MFCC calculation instance to store MFCC vectors.
This function will store header information based on the parameters in mfcc->para, and allocate initial buffer for the incoming vectors. The vector buffer will be expanded as needed while recognition, so at this time only the minimal amount is allocated. If the instance already has a certain length of vector buffer, it will be kept.
This function will be called each time a new input begins.
mfcc | [i/o] MFCC calculation instance |
Definition at line 159 of file realtime-1stpass.c.
Referenced by RealTimePipeLinePrepare().
boolean RealTimeInit | ( | Recog * | recog | ) |
Initializations for the on-the-fly 1st pass decoding.
Work areas for all MFCC caculation instances are allocated. Additionaly, some initialization will be done such as allocating work area for spectral subtraction, loading noise spectrum from file, loading initial ceptral mean data for CMN from file, etc.
This will be called only once, on system startup.
recog | [i/o] engine instance |
Definition at line 222 of file realtime-1stpass.c.
Referenced by j_final_fusion().
void reset_mfcc | ( | Recog * | recog | ) |
Prepare work are a for MFCC calculation.
Reset values in work area for starting the next input. Output probability cache for each acoustic model will be also prepared at this function.
This function will be called before starting each input (segment).
recog | [i/o] engine instance |
Definition at line 321 of file realtime-1stpass.c.
Referenced by RealTimePipeLinePrepare(), and RealTimeResume().
boolean RealTimePipeLinePrepare | ( | Recog * | recog | ) |
Preparation for the on-the-fly 1st pass decoding.
Variables are reset and data are prepared for the next input recognition.
This function will be called before starting each input (segment).
recog | [i/o] engine instance |
Definition at line 379 of file realtime-1stpass.c.
Compute a parameter vector from a speech window.
This function calculates an MFCC vector from speech data windowed from input speech. The obtained MFCC vector will be stored to mfcc->tmpmfcc.
mfcc | [i/o] MFCC calculation instance | |
window | [in] speech input (windowed from input stream) | |
windowlen | [in] length of window |
Definition at line 463 of file realtime-1stpass.c.
Referenced by j_recog_new().
Main function of the on-the-fly 1st pass decoding.
This function performs sucessive MFCC calculation and 1st pass decoding. The given input data are windowed to a certain length, then converted to MFCC, and decoding for the input frame will be performed in one process cycle. The loop cycle will continue with window shift, until the whole given input has been processed.
In case of input segment request from decoding process (in decode_proceed()), this function keeps the rest un-processed speech to a buffer and tell the caller to stop input and end the 1st pass.
When back-end VAD such as SPSEGMENT_NAIST or GMM_VAD is defined, Decoder-based VAD is enabled and its decoding control will be managed here. In decoder-based VAD mode, the recognition will be processed but no output will be done at the first un-triggering input area. when speech input start is detected, this function will rewind the already obtained MFCC sequence to a certain frames, and re-start normal recognition at that point. When multiple recognition process instance is running, their segmentation will be synchronized.
This function will be called each time a new speech sample comes as as callback from A/D-in routine.
Speech | [in] pointer to the speech sample segments | |
nowlen | [in] length of above | |
recog | [i/o] engine instance |
Definition at line 635 of file realtime-1stpass.c.
Referenced by RealTimeResume().
int RealTimeResume | ( | Recog * | recog | ) |
Resuming recognition for short pause segmentation.
This function process overlapped data and remaining speech prior to the next input when input was segmented at last processing.
recog | [i/o] engine instance |
Definition at line 904 of file realtime-1stpass.c.
boolean RealTimeParam | ( | Recog * | recog | ) |
Finalize the 1st pass on-the-fly decoding.
This function will be called after the 1st pass processing ends. It fix the input length of parameter vector sequence, call decode_end() (or decode_end_segmented() when last input was ended by segmentation) to finalize the 1st pass.
If the last input was ended by end-of-stream (in case input reached EOF in file input etc.), process the rest samples remaining in the delta buffers.
recog | [i/o] engine instance |
Definition at line 1059 of file realtime-1stpass.c.
Update cepstral mean.
This function updates the initial cepstral mean for CMN of the next input.
mfcc | [i/o] MFCC Calculation instance to update its CMN | |
recog | [i/o] engine instance |
Definition at line 1296 of file realtime-1stpass.c.
void RealTimeTerminate | ( | Recog * | recog | ) |
Terminate the 1st pass on-the-fly decoding.
recog | [i/o] engine instance |
Definition at line 1357 of file realtime-1stpass.c.
void realbeam_free | ( | Recog * | recog | ) |
Free the whole work area for 1st pass on-the-fly decoding.
recog | [in] engine instance |
Definition at line 1382 of file realtime-1stpass.c.
Referenced by j_recog_free().