libsent/src/ngram/ngram_read_bin.c File Reference

Read binary foramt N-gram file. More...

#include <sent/stddefs.h>
#include <sent/ngram2.h>

Go to the source code of this file.

Defines

#define rdn(A, B, C, D)   if (rdnfunc(A,B,C,D) == FALSE) return FALSE
#define rdn_wordid(A, B, C, D)   if (rdn_wordid_func(A,B,C,D) == FALSE) return FALSE

Functions

static boolean rdnfunc (FILE *fp, void *buf, size_t unitbyte, int unitnum)
 Binary read function with byte swap.
static boolean check_header (FILE *fp)
 Check header to see whether the version matches.
static boolean ngram_read_bin_v5 (FILE *fp, NGRAM_INFO *ndata)
static boolean ngram_read_bin_compat (FILE *fp, NGRAM_INFO *ndata, int *retry_ret)
boolean ngram_read_bin (FILE *fp, NGRAM_INFO *ndata)
 Read a N-gram binary file and store to data.

Variables

static int file_version
 N-gram format version of the file.
static boolean need_swap
 TRUE if need byte swap.


Detailed Description

Read binary foramt N-gram file.

In binary format, both 2-gram and reverse 3-gram are stored together in one file. This binary format is not compatible with other binary format of language model.

From 3.5, internal format of binary N-gram has changed for using machine-dependent natural byte order (previously fixed to big endian), 24bit index and 2-gram backoff compression. So, binary N-gram generated by mkbingram of 3.5 and later will not work on 3.4.2 and earlier versions.

There is full upward- and cross-machine compatibility in 3.5. Old binary N-gram files still can be read directly, in which case the conversion to 24bit index will performed just after model has been read. Byte order will also considered by header information, so binary N-gram still can be used among different machines.

Author:
Akinobu LEE
Date:
Wed Feb 16 17:12:08 2005
Revision
1.1.1.1

Definition in file ngram_read_bin.c.


Function Documentation

static boolean rdnfunc ( FILE *  fp,
void *  buf,
size_t  unitbyte,
int  unitnum 
) [static]

Binary read function with byte swap.

Parameters:
fp [in] file pointer
buf [out] data buffer
unitbyte [in] unit size in bytes
unitnum [in] number of unit to read.

Definition at line 86 of file ngram_read_bin.c.

static boolean check_header ( FILE *  fp  )  [static]

Check header to see whether the version matches.

Parameters:
fp [in] file pointer

Definition at line 141 of file ngram_read_bin.c.

Referenced by ngram_read_bin().

boolean ngram_read_bin ( FILE *  fp,
NGRAM_INFO ndata 
)

Read a N-gram binary file and store to data.

Parameters:
fp [in] file pointer
ndata [out] N-gram data to store the read data
Returns:
TRUE on success, FALSE on failure.

Definition at line 604 of file ngram_read_bin.c.

Referenced by init_ngram_bin().


Generated on Tue Dec 18 16:01:40 2007 for Julius by  doxygen 1.5.4