Julius: libjulius/src/spsegment.c File Reference

In short-pause segmentation mode, Julius tries to find a "pause frame" by watching the word hypotheses at each frame. Julius treat words with only a silence model as "pause word", and judge whether the input frame is "pause frame" or not by watching if any of the pause words gets maximum score at each frame. Then it will segment the input when the duration of pause frame reaches a limit.

On normal short-pause segmentation (as of ver.3.x), the pause frames will not be eliminated. The input will be segment at the frame where a speech begins after the pause frames, and the next input will be processed from the beginning of the pause frames. In other words, the detected area of pause frames are processed twice, as end-of-segment silence at the former input segment and beginning-of-segment silence at the latter input segment.

When SPSEGMENT_NAIST is defined, a long pause area will be dropped from recognition. When the detecting pause frames gets longer than threshold, it segments the input at that point and skip the continuing pauses until a speech frame comes. The recognition process will be kept with a special status while in the pause segment. This scheme works as a decoder-driven VAD.

Function Documentation

boolean is_sil	(	WORD_ID	w,
		RecogProcess *	r
	)

Check if the fiven word is a short-pause word.

Parameters:

	w	[in] word id
	r	[in] recognition process instance

Returns:: TRUE if it is short pause word, FALSE if not.

Definition at line 98 of file spsegment.c.

Referenced by detect_end_of_segment().

Here is the caller graph for this function:

void mfcc_copy_to_rest_and_shrink	(	MFCCCalc *	mfcc,
		int	start,
		int	end
	)

Split input parameter for segmentation.

Copy the rest samples in param to rest_param, and shrink the param in mfcc instance. [start...param->samplenum] will be copied to rest_param, and [0...end] will be left in param.

Parameters:

	mfcc	[i/o] MFCC calculation instance
	start	[in] copy start frame
	end	[in] original end frame

Definition at line 154 of file spsegment.c.

Referenced by finalize_segment().

Here is the caller graph for this function:

void mfcc_shrink	(	MFCCCalc *	mfcc,
		int	p
	)

Shrink the parameter sequence.

Drop the first (p-1) frames and move [p..samplenum] to 0.

Parameters:

	mfcc	[i/o] MFCC Calculation instance
	p	[in] frame point to remain

Definition at line 194 of file spsegment.c.

boolean detect_end_of_segment	(	RecogProcess *	r,
		int	time
	)

Speech end point detection.

Detect end-of-input by duration of short-pause words when short-pause segmentation is enabled. When a pause word gets maximum score for a successive frames, the segment will be treated as a pause frames. When speech re-triggers, the current input will be segmented at that point.

When SPSEGMENT_NAIST is defined, this function performs extended version of the short pause segmentation, called "decoder-based VAD". When before speech trigger (r->pass1.after_trigger == FALSE), it tells the recognition functions not to generate word trellis and continue calculation. If a speech trigger is found (not a pause word gets maximum score), the input frames are 'rewinded' for a certain frame (r->config->successive.sp_margin) and start the normal recognition process from the rewinded frames (r->pass1.after_trigger = TRUE). When a pause frame duration reaches a limit (r->config->successive.sp_frame_duration), it terminate the search.

Parameters:

	r	[i/o] recognition process instance
	time	[in] current input frame

Returns:: TRUE if end-of-input detected at this frame, FALSE if not.

Definition at line 262 of file spsegment.c.

Referenced by decode_proceed().

Here is the caller graph for this function:

void finalize_segment ( Recog * recog )

Finalize the first pass for successive decoding.

When successive decoding mode is enabled, this function will be called just after finalize_1st_pass() to finish the beam search of the last segment. The beginning and ending words for the 2nd pass will be set according to the 1st pass result. Then the current input will be shrinked to the segmented length and the unprocessed region are copied to rest_param for the next decoding.

Parameters:

recog

[in] engine instance

Definition at line 632 of file spsegment.c.

Referenced by decode_end(), and decode_end_segmented().

Here is the caller graph for this function:

boolean spsegment_need_restart	(	Recog *	recog,
		int *	rf_ret,
		boolean *	repro_ret
	)

Check if rewind and restart of recognition is needed.

This function checks if an instance requires rewinding of input samples, and if recognition re-processing is needed after rewinding.

Parameters:

	recog	[in] engine instance
	rf_ret	[out] length of frame to rewind
	repro_ret	[out] TRUE if re-process is required after rewinding

Returns:: TRUE if rewinding is required, or FALSE if not.

Definition at line 836 of file spsegment.c.

Referenced by get_back_trellis(), and RealTimePipeLine().

Here is the caller graph for this function:

void spsegment_restart_mfccs	(	Recog *	recog,
		int	rewind_frame,
		boolean	reprocess
	)

Execute rewinding.

This function will set re-start point for the following processing, and shrink the parameters for the rewinded part. The re-start point is 0 (beginning of rest samples) for recognition restart, or simply go back to the specified rewind frames for non restart.

Parameters:

	recog	[i/o] engine instance
	rewind_frame	[in] frame length to rewind
	reprocess	[in] TRUE if re-processing recognition is required for the following processing

Definition at line 907 of file spsegment.c.

Referenced by get_back_trellis(), and RealTimePipeLine().

Here is the caller graph for this function:

libjulius/src/spsegment.c File Reference

Functions

Detailed Description

Function Documentation


Functions
boolean	is_sil (WORD_ID w, RecogProcess *r)
	Check if the fiven word is a short-pause word.
void	mfcc_copy_to_rest_and_shrink (MFCCCalc *mfcc, int start, int end)
	Split input parameter for segmentation.
void	mfcc_shrink (MFCCCalc *mfcc, int p)
	Shrink the parameter sequence.
boolean	detect_end_of_segment (RecogProcess *r, int time)
	Speech end point detection.
void	finalize_segment (Recog *recog)
	Finalize the first pass for successive decoding.
boolean	spsegment_need_restart (Recog recog, int rf_ret, boolean *repro_ret)
	Check if rewind and restart of recognition is needed.
void	spsegment_restart_mfccs (Recog *recog, int rewind_frame, boolean reprocess)
	Execute rewinding.