libsent/include/sent/speech.h File Reference

Miscellaneous definitions for speech input processing. More...

#include <sent/adin.h>

Go to the source code of this file.

Defines

#define MAXSEQNUM   150
 Maximum number of words in an input.
#define MAXSPEECHLEN   320000
 Maximum length of an input in samples.
#define INPUT_DELAY_SEC   8
 Maximum length of input delay in seconds.
#define OUTPROB_CACHE_PERIOD   100
 Expansion period in frames for output probability cache.
#define period2freq(A)   (10000000.0 / (float)(A))
 Macro to convert smpPeriod (100nsec unit) to frequency (Hz).
#define freq2period(A)   (10000000.0 / (float)(A))
 Macro to convert sampling frequency (Hz) to smpPeriod (100nsec unit).

Functions

int wrsamp (int fd, SP16 *buf, int len)
 Write waveform data in big endian to a file descriptor.
FILE * wrwav_open (char *filename, int sfreq)
 Open/create a WAVE file and write header.
boolean wrwav_data (FILE *fp, SP16 *buf, int len)
 Write speech samples.
boolean wrwav_close (FILE *fp)
 Close the file.
int strip_zero (SP16 a[], int len)
 Strip zero samples from speech data.


Detailed Description

Miscellaneous definitions for speech input processing.

This file contains miscellaneous definitions for speech input processing. Several limitation for input speech length is also defined here.

Please refer to adin.h for speech capturing, mfcc.h for MFCC parameter extraction, htk_param.h for storing the parameter vectors.

Author:
Akinobu LEE
Date:
Sat Feb 12 11:16:41 2005
Revision
1.1.1.1

Definition in file speech.h.


Define Documentation

#define MAXSEQNUM   150

Maximum number of words in an input.

This value defines limitation of word length in one utterance input. If the number of words exceeds this value, Julius produces error. So you have to set large value enough.

Definition at line 50 of file speech.h.

Referenced by confout_audio(), cpy_node(), and wb_init().

#define MAXSPEECHLEN   320000

Maximum length of an input in samples.

This value defines limitation of speech input length in one utterance input. If the length of an input exceeds this value, Julius stop the input at that point and recognize it, disgarding the rest until the end of speech (long silence) comes.

The default value is 320000, which means you can give Julius an input of at most 20 secons in 16kHz sampling. Setting smaller value saves memory usage.

Definition at line 65 of file speech.h.

Referenced by adin_cut(), adin_setup_param(), adin_thread_create(), confout_audio(), and RealTimeInit().

#define INPUT_DELAY_SEC   8

Maximum length of input delay in seconds.

This value defines maximum delay on live speech recognition with slow machines. If an input delays over this sample, the overflowed samples will be dropped. This value is used on callback-based ad-in, namely on portaudio interface.

The default value is 8 seconds. Setting smaller value saves memory usage but risk of overflow grows on slow machines

Definition at line 79 of file speech.h.

Referenced by adin_mic_standby().

#define OUTPROB_CACHE_PERIOD   100

Expansion period in frames for output probability cache.

When recognition, the 1st recognition pass stores all the output probabilities of HMM states for every incoming input frame, to speed up the re-computation of acoustic likelihoods in the 2nd pass. In live input mode, this output probability cache will be re-allocated dynamically as the input becomes longer.

This value specifies the re-allocation period in frames. The probability cache are will be expanded as the input proceeds this frame.

Smaller value may improve memory efficiency, but Too small value may result in the overhead of memory re-allocation and slow down the recognition.

Definition at line 98 of file speech.h.

Referenced by calc_tied_mix_extend(), and outprob_cache_extend().


Function Documentation

int wrsamp ( int  fd,
SP16 buf,
int  len 
)

Write waveform data in big endian to a file descriptor.

Parameters:
fd [in] file descriptor
buf [in] array of speech data
len [in] length of above
Returns:
number of bytes written, -1 on error.

Definition at line 37 of file wrsamp.c.

FILE* wrwav_open ( char *  filename,
int  sfreq 
)

Open/create a WAVE file and write header.

Open or creat a new WAV file and prepare for later data writing. The frame length written here is dummy, and will be overwritten when closed by wrwav_close().

Parameters:
filename [in] file name
sfreq [in] sampling frequency of the data you are going to write
Returns:
the file pointer.

Definition at line 74 of file wrwav.c.

Referenced by record_sample_open().

boolean wrwav_data ( FILE *  fp,
SP16 buf,
int  len 
)

Write speech samples.

Parameters:
fp [in] file descriptor
buf [in] speech data to be written
len [in] length of above
Returns:
actual number of written samples.

Definition at line 127 of file wrwav.c.

Referenced by record_sample_write().

boolean wrwav_close ( FILE *  fp  ) 

Close the file.

The frame length in the header part is overwritten by the actual value before file close.

Parameters:
fp [in] file pointer to close, previously opened by wrwav_open().
Returns:
TRUE on success, FALSE on failure.

Definition at line 147 of file wrwav.c.

Referenced by record_sample_close().

int strip_zero ( SP16  a[],
int  len 
)

Strip zero samples from speech data.

Parameters:
a [I/O] speech data
len [in] length of above
Returns:
new length after stripping.

Definition at line 41 of file strip.c.

Referenced by adin_cut().


Generated on Tue Dec 18 16:01:37 2007 for Julius by  doxygen 1.5.4