libjulius/src/realtime-1stpass.c File Reference

The first pass: frame-synchronous beam search (on-the-fly version). More...

#include <julius/julius.h>

Go to the source code of this file.

Functions

static void init_param (MFCCCalc *mfcc)
 < Define if you want local debug message
boolean RealTimeInit (Recog *recog)
 Initializations for the on-the-fly 1st pass decoding.
void reset_mfcc (Recog *recog)
 Prepare work are a for MFCC calculation.
boolean RealTimePipeLinePrepare (Recog *recog)
 Preparation for the on-the-fly 1st pass decoding.
boolean RealTimeMFCC (MFCCCalc *mfcc, SP16 *window, int windowlen)
 Compute a parameter vector from a speech window.
int RealTimePipeLine (SP16 *Speech, int nowlen, Recog *recog)
 Main function of the on-the-fly 1st pass decoding.
int RealTimeResume (Recog *recog)
 Resuming recognition for short pause segmentation.
boolean RealTimeParam (Recog *recog)
 Finalize the 1st pass on-the-fly decoding.
void RealTimeCMNUpdate (MFCCCalc *mfcc, Recog *recog)
 Update cepstral mean.
void RealTimeTerminate (Recog *recog)
 Terminate the 1st pass on-the-fly decoding.
void realbeam_free (Recog *recog)
 Free the whole work area for 1st pass on-the-fly decoding.


Detailed Description

The first pass: frame-synchronous beam search (on-the-fly version).

These are functions to perform on-the-fly decoding of the 1st pass (frame-synchronous beam search). These function can be used instead of new_wav2mfcc() and get_back_trellis(). These functions enable recognition as soon as an input triggers. The 1st pass processing will be done concurrently with the input.

The basic recognition procedure of Julius in main_recognition_loop() is as follows:

  1. speech input: (adin_go()) ... buffer `speech' holds the input
  2. feature extraction: (new_wav2mfcc()) ... compute feature vector from `speech' and store the vector sequence to `param'.
  3. recognition 1st pass: (get_back_trellis()) ... frame-wise beam decoding to generate word trellis index from `param' and models.
  4. recognition 2nd pass: (wchmm_fbs())
  5. Output result.

At on-the-fly decoding, procedures from 1 to 3 above will be performed in parallel. It is implemented by a simple scheme, processing the captured small speech fragments one by one progressively:

Actual procedure is as follows. The function RealTimePipeLine() will be given to adin_go() as callback. Then adin_go() will watch the input, and if speech input starts, it calls RealTimePipeLine() for every captured input fragments. RealTimePipeLine() will compute the feature vector of the given fragment and proceed the 1st pass processing for them, and return to the capture function. The current status will be hold to the next call, to perform inter-frame processing (computing delta coef. etc.).

Note about CMN: With acoustic models trained with CMN, Julius performs CMN to the input. On file input, the whole sentence mean will be computed and subtracted. At the on-the-fly decoding, the ceptral mean will be performed using the cepstral mean of last 5 second input (excluding rejected ones). This was a behavier earlier than 3.5, and 3.5.1 now applies MAP-CMN at on-the-fly decoding, using the last 5 second cepstrum as initial mean. Initial cepstral mean at start can be given by option "-cmnload", and you can also prohibit the updates of initial cepstral mean at each input by "-cmnnoupdate". The last option is useful to always use static global cepstral mean as initial mean for each input.

The primary functions in this file are:

Author:
Akinobu Lee
Date:
Tue Aug 23 11:44:14 2005
Revision
1.1.1.1

Definition in file realtime-1stpass.c.


Function Documentation

static void init_param ( MFCCCalc mfcc  )  [static]

< Define if you want local debug message

Prepare parameter holder in MFCC calculation instance to store MFCC vectors.

This function will store header information based on the parameters in mfcc->para, and allocate initial buffer for the incoming vectors. The vector buffer will be expanded as needed while recognition, so at this time only the minimal amount is allocated. If the instance already has a certain length of vector buffer, it will be kept.

This function will be called each time a new input begins.

Parameters:
mfcc [i/o] MFCC calculation instance

Definition at line 159 of file realtime-1stpass.c.

Referenced by RealTimePipeLinePrepare().

boolean RealTimeInit ( Recog recog  ) 

Initializations for the on-the-fly 1st pass decoding.

Work areas for all MFCC caculation instances are allocated. Additionaly, some initialization will be done such as allocating work area for spectral subtraction, loading noise spectrum from file, loading initial ceptral mean data for CMN from file, etc.

This will be called only once, on system startup.

Parameters:
recog [i/o] engine instance

Definition at line 222 of file realtime-1stpass.c.

Referenced by j_final_fusion().

Here is the caller graph for this function:

void reset_mfcc ( Recog recog  ) 

Prepare work are a for MFCC calculation.

Reset values in work area for starting the next input. Output probability cache for each acoustic model will be also prepared at this function.

This function will be called before starting each input (segment).

Parameters:
recog [i/o] engine instance

Definition at line 321 of file realtime-1stpass.c.

Referenced by RealTimePipeLinePrepare(), and RealTimeResume().

Here is the caller graph for this function:

boolean RealTimePipeLinePrepare ( Recog recog  ) 

Preparation for the on-the-fly 1st pass decoding.

Variables are reset and data are prepared for the next input recognition.

This function will be called before starting each input (segment).

Parameters:
recog [i/o] engine instance
Returns:
TRUE on success. FALSE on failure.

Definition at line 379 of file realtime-1stpass.c.

boolean RealTimeMFCC ( MFCCCalc mfcc,
SP16 window,
int  windowlen 
)

Compute a parameter vector from a speech window.

This function calculates an MFCC vector from speech data windowed from input speech. The obtained MFCC vector will be stored to mfcc->tmpmfcc.

Parameters:
mfcc [i/o] MFCC calculation instance
window [in] speech input (windowed from input stream)
windowlen [in] length of window
Returns:
TRUE on success (an vector obtained). Returns FALSE if no parameter vector obtained yet (due to delta delay).

Definition at line 463 of file realtime-1stpass.c.

Referenced by j_recog_new().

Here is the caller graph for this function:

int RealTimePipeLine ( SP16 Speech,
int  nowlen,
Recog recog 
)

Main function of the on-the-fly 1st pass decoding.

This function performs sucessive MFCC calculation and 1st pass decoding. The given input data are windowed to a certain length, then converted to MFCC, and decoding for the input frame will be performed in one process cycle. The loop cycle will continue with window shift, until the whole given input has been processed.

In case of input segment request from decoding process (in decode_proceed()), this function keeps the rest un-processed speech to a buffer and tell the caller to stop input and end the 1st pass.

When back-end VAD such as SPSEGMENT_NAIST or GMM_VAD is defined, Decoder-based VAD is enabled and its decoding control will be managed here. In decoder-based VAD mode, the recognition will be processed but no output will be done at the first un-triggering input area. when speech input start is detected, this function will rewind the already obtained MFCC sequence to a certain frames, and re-start normal recognition at that point. When multiple recognition process instance is running, their segmentation will be synchronized.

This function will be called each time a new speech sample comes as as callback from A/D-in routine.

Parameters:
Speech [in] pointer to the speech sample segments
nowlen [in] length of above
recog [i/o] engine instance
Returns:
-1 on error (tell caller to terminate), 0 on success (allow caller to call me for the next segment). It returns 1 when telling the caller to terminate input and go on to the next pass.

Definition at line 635 of file realtime-1stpass.c.

Referenced by RealTimeResume().

Here is the caller graph for this function:

int RealTimeResume ( Recog recog  ) 

Resuming recognition for short pause segmentation.

This function process overlapped data and remaining speech prior to the next input when input was segmented at last processing.

Parameters:
recog [i/o] engine instance
Returns:
-1 on error (tell caller to terminate), 0 on success (allow caller to call me for the next segment), or 1 when an end-of-sentence detected at this point (in that case caller will stop input and go to 2nd pass)

Definition at line 904 of file realtime-1stpass.c.

boolean RealTimeParam ( Recog recog  ) 

Finalize the 1st pass on-the-fly decoding.

This function will be called after the 1st pass processing ends. It fix the input length of parameter vector sequence, call decode_end() (or decode_end_segmented() when last input was ended by segmentation) to finalize the 1st pass.

If the last input was ended by end-of-stream (in case input reached EOF in file input etc.), process the rest samples remaining in the delta buffers.

Parameters:
recog [i/o] engine instance
Returns:
TRUE on success, or FALSE on error.

Definition at line 1059 of file realtime-1stpass.c.

void RealTimeCMNUpdate ( MFCCCalc mfcc,
Recog recog 
)

Update cepstral mean.

This function updates the initial cepstral mean for CMN of the next input.

Parameters:
mfcc [i/o] MFCC Calculation instance to update its CMN
recog [i/o] engine instance

Definition at line 1296 of file realtime-1stpass.c.

void RealTimeTerminate ( Recog recog  ) 

Terminate the 1st pass on-the-fly decoding.

Parameters:
recog [i/o] engine instance

Definition at line 1357 of file realtime-1stpass.c.

void realbeam_free ( Recog recog  ) 

Free the whole work area for 1st pass on-the-fly decoding.

Parameters:
recog [in] engine instance

Definition at line 1382 of file realtime-1stpass.c.

Referenced by j_recog_free().


Generated on Tue Dec 18 16:01:24 2007 for Julius by  doxygen 1.5.4