julius/realtime-1stpass.c File Reference

On-the-fly decoding of the 1st pass. More...

#include <julius.h>

Include dependency graph for realtime-1stpass.c:

Go to the source code of this file.

Functions

static void init_param ()

void RealTimeInit ()

void RealTimePipeLinePrepare ()

int RealTimePipeLine (SP16 *Speech, int nowlen)

Main function of on-the-fly 1st pass decoding.

HTK_Param * RealTimeParam (LOGPROB *backmax)

void RealTimeCMNUpdate (HTK_Param *param)

void RealTimeTerminate ()

Variables

static HTK_Param * param = NULL

< Define if you want local debug message Computed MFCC parameter vectors

static float * bf

Work space for FFT.

static DeltaBuf * db

Work space for delta MFCC cycle buffer.

static DeltaBuf * ab

Work space for accel MFCC cycle buffer.

static VECT * tmpmfcc

Work space to hold temporarl MFCC vector.

static int maxframelen

Maximum allowed input frame length.

static int last_time

Last processed frame.

static boolean last_is_segmented

TRUE if last pass was a segmented input.

static int f_raw

Frame pointer of current base MFCC.

static int f

Frame pointer where all MFCC computation has been done.

static SP16 * window

Window buffer for MFCC calculation.

static int windowlen

Buffer length of window.

static int windownum

Currently left samples in window.

Detailed Description

On-the-fly decoding of the 1st pass.

Author:: Akinobu Lee

Date:: Tue Aug 23 11:44:14 2005

These are functions to perform on-the-fly decoding of the 1st pass (frame-synchronous beam search). These function can be used instead of new_wav2mfcc() and get_back_trellis(). These functions enable recognition as soon as an input triggers. The 1st pass processing will be done concurrently with the input.

The basic recognition procedure of Julius in main_recognition_loop() is as follows:

speech input: (adin_go()) ... buffer `speech' holds the input
feature extraction: (new_wav2mfcc()) ... compute feature vector from `speech' and store the vector sequence to `param'.
recognition 1st pass: (get_back_trellis()) ... frame-wise beam decoding to generate word trellis index from `param' and models.
recognition 2nd pass: (wchmm_fbs())
Output result.

At on-the-fly decoding, procedures from 1 to 3 above will be performed in parallel. It is implemented by a simple scheme, processing the captured small speech fragments one by one progressively:

Define a callback function that can do feature extraction and 1st pass processing progressively.
The callback will be given to A/D-in function adin_go().

Actual procedure is as follows. The function RealTimePipeLine() will be given to adin_go() as callback. Then adin_go() will watch the input, and if speech input starts, it calls RealTimePipeLine() for every captured input fragments. RealTimePipeLine() will compute the feature vector of the given fragment and proceed the 1st pass processing for them, and return to the capture function. The current status will be hold to the next call, to perform inter-frame processing (computing delta coef. etc.).

Note about CMN: With acoustic models trained with CMN, Julius performs CMN to the input. On file input, the whole sentence mean will be computed and subtracted. At the on-the-fly decoding, the ceptral mean will be performed using the cepstral mean of last 5 second input (excluding rejected ones). This was a behavier earlier than 3.5, and 3.5.1 now applies MAP-CMN at on-the-fly decoding, using the last 5 second cepstrum as initial mean. Initial cepstral mean at start can be given by option "-cmnload", and you can also prohibit the updates of initial cepstral mean at each input by "-cmnnoupdate". The last option is useful to always use static global cepstral mean as initial mean for each input.

The primary functions in this file are:

RealTimeInit() - initialization at application startup
RealTimePipeLinePrepare() - initialization before each input
RealTimePipeLine() - callback for on-the-fly 1st pass decoding
RealTimeResume() - recognition resume procedure for short-pause segmentation.
RealTimeParam() - finalize the on-the-fly 1st pass when input ends.
RealTimeCMNUpdate() - update CMN data for next input

Revision: 1.12

Definition in file realtime-1stpass.c.

Function Documentation

static void init_param ( ) [static]

Prepare parameter vector holder to incrementally store the calculated MFCC vectors. This function will be called each time after a recognition ends and new input begins.

Definition at line 165 of file realtime-1stpass.c.

Referenced by RealTimePipeLinePrepare().

void RealTimeInit ( )

Initializations for on-the-fly 1st pass decoding (will be called once on startup)

Definition at line 207 of file realtime-1stpass.c.

Referenced by final_fusion().

void RealTimePipeLinePrepare ( )

Data preparation for on-the-fly 1st pass decoding (will be called on the start of each sentence input)

Definition at line 266 of file realtime-1stpass.c.

Referenced by main_recognition_loop().

int RealTimePipeLine	(	SP16 *	Speech,
		int	nowlen
	)

Main function of on-the-fly 1st pass decoding.

This function will be called each time a new speech sample comes, as as callback from A/D-in routine. When a speech input begins, the captured speech will be passed to this function for every sample segments. This process will continue until A/D-in routine detects an end of speech or input stream reached to an end.

This function will perform feture vector extraction and beam decording as 1st pass recognition simultaneously, in frame-wise mannar.

Parameters:

	Speech	[in] pointer to the speech sample segments
	nowlen	[in] length of above

Returns:: -1 on error (tell caller to terminate), 0 on success (allow caller to call me for the next segment), or 1 when an input segmentation is required at this point (in that case caller will stop input and go to 2nd pass)

Definition at line 351 of file realtime-1stpass.c.

Referenced by main_recognition_loop().

HTK_Param* RealTimeParam ( LOGPROB * backmax )

Finalize the 1st pass on-the-fly decoding.

Parameters:

backmax

[out] pointer to store the maximum score of last frame.

Returns:: newly allocated input parameter data for this input.

Definition at line 712 of file realtime-1stpass.c.

Referenced by main_recognition_loop().

void RealTimeCMNUpdate ( HTK_Param * param )

Update cepstral mean of CMN to prepare for the next input.

Parameters:

param

[in] current input parameter

Definition at line 862 of file realtime-1stpass.c.

Referenced by main_recognition_loop().

void RealTimeTerminate ( )

Finalize the 1st pass on-the-fly decoding when terminated.

Definition at line 910 of file realtime-1stpass.c.

Referenced by main_recognition_loop().

Generated on Tue Dec 26 16:16:54 2006 for Julius by

1.5.0


Functions
static void	init_param ()
void	RealTimeInit ()
void	RealTimePipeLinePrepare ()
int	RealTimePipeLine (SP16 *Speech, int nowlen)
	Main function of on-the-fly 1st pass decoding.
HTK_Param *	RealTimeParam (LOGPROB *backmax)
void	RealTimeCMNUpdate (HTK_Param *param)
void	RealTimeTerminate ()
Variables
static HTK_Param *	param = NULL
	< Define if you want local debug message Computed MFCC parameter vectors
static float *	bf
	Work space for FFT.
static DeltaBuf *	db
	Work space for delta MFCC cycle buffer.
static DeltaBuf *	ab
	Work space for accel MFCC cycle buffer.
static VECT *	tmpmfcc
	Work space to hold temporarl MFCC vector.
static int	maxframelen
	Maximum allowed input frame length.
static int	last_time
	Last processed frame.
static boolean	last_is_segmented
	TRUE if last pass was a segmented input.
static int	f_raw
	Frame pointer of current base MFCC.
static int	f
	Frame pointer where all MFCC computation has been done.
static SP16 *	window
	Window buffer for MFCC calculation.
static int	windowlen
	Buffer length of window.
static int	windownum
	Currently left samples in window.