Vietnamese dictionaries for mobile phones

The first algorithm to use the Lempel-Ziv substitutionalcompression schemes, proposed in 1977. LZ77 compression

keeps track of the last n bytes of data seen, and when a

phrase is encountered that has already been seen, it outputs a

pair of values corresponding to the position of the phrase in

the previously-seen buffer of data, and the length of the

phrase. In effect the compressor moves a fixed-size "window"

over the data (generally referred to as a "sliding window"),

with the position part of the (position, length) pair

referring to the position of the phrase within the window.

The most commonly used algorithms are derived from the

LZSS scheme described by James Storer and Thomas Szymanski

in 1982. In this the compressor maintains a window of size N

bytes and a "lookahead buffer", the contents of which it tries

to find a match for in the window:

while (lookAheadBuffer not empty)

else

}

Decompression is simple and fast: whenever a (POSITION,

LENGTH) pair is encountered, go to that POSITION in the window

and copy LENGTH bytes to the output.

Sliding-window-based schemes can be simplified by numbering

the input text characters mod N, in effect creating a circular

buffer. The sliding window approach automatically creates the

LRU effect which must be done explicitly in LZ78 schemes.

Variants of this method apply additional compression to the

output of the LZSS compressor, which include a simple

variable-length code (LZB), dynamic Huffman coding

(LZH), and Shannon-Fano coding (ZIP 1.x), all of which

result in a certain degree of improvement over the basic

scheme, especially when the data are rather random and the

LZSS compressor has little effect. An algorithm was developed

which combines the ideas behind LZ77 and LZ78 to produce a

hybrid called LZFG. LZFG uses the standard sliding window,

but stores the data in a modified trie data structure and

produces as output the position of the text in the trie.

Since LZFG only inserts complete *phrases* into the

dictionary, it should run faster than other LZ77-based

compressors.

All popular archivers (arj, lha, zip, zoo) are

variations on LZ77.

[comp.compression FAQ].

(1995-04-07)