One of the most flexible systems for information extraction and information acquisition: Processing is done on the level of content / the meaning of texts. Consequently differences in wording or formatting of texts can easily be abstracted. Rules for information extraction are more generally valid and will have to be adapted rarely even when the data from which information is to be extracted changes.

Our software tools can extract the following types of information from most kinds of textual documents - Text, RTF, HTML, SGML, XML, PDF, PostScript:

We work with ultra modern and very flexible declarative systems, based on a combination of rules with a probability pattern which allows an optimized assignment of data to information slots needing to be filled. For example the extraction of addresses poses the following major difficulties to classical information extraction approaches:

Through our novel basic approaches we offer solutions to the problems shown above.

Our exceptional strengths in the extraction of content consist in:



