Compression of Structured Data

Structured data can be compressed very densely and the data can simply be searched when it has been compressed.

Densifier™ data compression software to compress structured data – much better compression density

 

Characteristics

Typical values

Compression density - removed percentage

2-4 times better as ZIP for optimized formats

PostScript

80%

PDF (Adobe Portable Document Format)

50%

Texts

80%

HTML / XML

70%

Java / Binary-data

70%

Compression speed –
Pentium IV 3 GHz

3 MB/second, similar ZIP

Decompression speed –
Pentium IV 3 GHz

5 MB/second, similar ZIP

Memory requirements

at least 20 MB (better compression density with more memory)

Operating systems

MS-Windows NT, 2k, XP, 2003, Unix (Linux, Solaris); others as requested

Technology

no flat relation-list of words to bit codes, but graphs (state- transition-graphs, finite state transducers)

Special technological strengths

·        Support for structural descriptions and nested structures, for example: URLs mostly have the form http://www.name.com

·        Correct modeling of the beginning, inner parts, and of the ending of structures, e.g.: after <center>, there will also </center> be expected as HTML-Tag. In between this, there should exist a natural language text.

·        Intelligent recognition when new names are being introduced, e.g. through typedef in C/C++ or def in PostScript

·        Automatic learning of data structures

·        Dynamic graph structures interchangeable at runtime, which are able to optimally pack a specialized data format

Characteristics / advantages:

·        efficient search in stored data possible

·        easy integration in programs, easy compression of structures and objects in programming languages

·        efficient operations on compressed data: Sometimes even faster than without compression because of immediate usage of compressed "tokens" that correspond to larger independent data units; so less data needs to be processed

·        faster random access to single data sets and even the smallest information units

·        faster decompression, as only the information really needed  is going to be decompressed

·        Protection against spying out data:

·        Practically: Reading the data is difficult (to that end the data compression method has to be understood and usually has to be reprogrammed)

·        Legally: Decompression by third persons violates data compression patents (either we develop an individual method for you or you use the protection from our patents)

·        Data compression is a recommended first step for  data encryption

·        Existing applications: MS Office since 1997, Adobe PDF files, Corel Draw & many data bases use compression because the advantages above

Areas of application:

·        Data transmission over networks: The network data throughput can be increased by a multiple with data compression techniques

·        Protection against hackers/crackers: Does your application contain company-specific know how – e.g. data records, graphical or lexical data? The more complex the employed compression method, the more difficult it will be to read data illegally. With textual and structured data, in particular with lexical data, processing can even be faster than without compression.

·        Access and user identification: Are personal or other data entered into your program that should be protected from being seen by other users? Data compression, complemented by encryption solves this problem. By adding elaborate data compression and checksum algorithms, you can make the life of crackers as hard as you choose.

·        Secret parameters / functions / algorithms:  Have you developed your own algorithms? With data compression you can protect your own functions/algorithms (e.g. for checking for a copy protection) as well as parameters/input data for an algorithm from being detected when debugged or disassembled.

·        Protect-worthy company internal data: Do you offer users additional benefits to which they obtain access after paying – e.g. an additional dictionary of technical terms? With data compression potentially followed by an encryption component you can protect your data effectively.

·        When encryption is not an alternative: Encryption has the essential requirement that the key may not be known to an attacker. In order to gain access to local data, e.g. from a synonym dictionary or a grammar checker, a password must either be stored in the program or users must be forced to enter the password with every program call. For this solution the data basis must be adapted for each user as well (encryption beforehand or deposition of an individual complementary key). In this case an obscure data compression program is far better.

·        Similar to known methods, our methods can as well be applied to the compression of files (text files, web pages, binary files), hard disks and memory areas.

Available programming components

price

·        Software development kit with English documentation, format:

·        Object-library or

·        DLL (Dynamic Link Library; only for MS-Windows)

·        US $4980,- once per platform for all components, $100,- per installed computer
– no patent license cost –

 

Densifier™ data compression to achieve persistency, compression, and encryption for your source codes

 

The goal of this project under development: Automatically add storage routines for compression or encryption to your programs by simply adding a line to your make file or project file. Even this last step can be done automatically by an installation program.

 

Functionality: The source code in C++ will be analyzed automatically by a preprocessor to find data elements in classes. Their type is known and functions to read/write these data can be generated – if desired even with compression and/or encryption included. Essentially this varies on the software for compressing structured data, which is applied automatically. The generated functions are inserted directly as source code or as calls to functions of the libraries that are provided. This system can naturally also be applied to the compression of structured data or the encryption with the brand new Twofish or Rijndael method, the DES successor and AES standard.

 

Characteristics

typical values

Compression density - removed percentage

2-4 times better than ZIP for optimized formats

String of characters/strings

80%

Natural numbers/integer

50%

Floating decimal point

80%

General buffer

50%

Speed compression –
Pentium IV 3 GHz

4 MB/second; similar to ZIP

Speed decompression –
Pentium IV 3 GHz

6 MB/second; similar to ZIP

Memory requirements

about 20 MB

Operating systems

MS-Windows NT, 2k, XP, 2003, Unix (Linux, Solaris); others on request

Compression

new Compris Intelligence technology of compressing structured data

Encryption

Public key method: key interchange with RSA; block encryption with the brand-new Rijndael, the DES successor standard AES

Steganography (information hiding)

Patented and awarded TextHide method to hide information in text; also methods for hiding information in pictures, sounds (WAV), and compressed data/ZIP data files

Watermarking technology

in texts, pictures, sounds, compressed data/ZIP data files (similar to steganography)

Available components

save/load alone or with compression and/or encryption and/or hiding of info in text or pictures and/or watermarking technology – all combinations are available

Technology

no flat relation-list of words to bit codes, but graphs (state- transition-graphs, finite state transducers);  everything perfectly adjusted to the respective program; pointer structures are represented as graphs. These connected structures allow much better compression

Special technological strengths

·        Detection when new names are introduced, e.g. by class, struct, DEFINE, const, typedef in C/C++

·        Automatic learning of data structures

·        Graph structures that can be exchanged dynamically at run time and can each compress one data format optimally

Characteristics / Advantages:

·        10%-50% time and cost reduction when coding any C++ program

·        unproblematic data interchange between different operating system versions of a program, far fewer source code porting expenditures

·        efficient search in stored data possible

·        easy integration in programs, easy compression of structures and objects in programming languages

·        efficient operations on compressed data: Sometimes even faster than without compression because of immediate usage of compressed "tokens" that correspond to larger independent data units; so less data needs to be processed

·        faster random access to single data sets and even the smallest units of information

·        faster decompression, as only the information really needed  is going to be decompressed

Available programming components

price

·        Software development kit for automated save/load functionality with German/English documentation

·        General & structured data compression

·        Encryption (Rijndael & RSA)

·        Steganography (hiding, watermarking)

·        US $2990,- per installed computer or user (with server)

 

·        US $945,- extra charge per computer/user

·        US $345,- extra charge per computer/user

·        US $345,- extra charge separately for texts, graphics, ZIP data files, sounds

Release date

Planned for 2005

 



Information & Questions: Densifier@compris.com




www.compris.com  | Data Compression  | Contact/Map  | About Compris Intelligence GmbH