Challenges in Next Generation Scientific and Patent Information Mining

In the last decade, various concepts and tools for the search, retrieval and presentation of scientific and IP information have evolved enormously. Information professionals demand ever more efficient ways to gain precise and high relevant search results. In particular, the retrieval of patent information in the chemical and pharmaceutical area requires specialised indexing, searching and visualization techniques. The automatic extraction of chemical compounds both fully defined and generic (Markush structures) from text and images is of significant importance. Moreover, completeness and up-to-dateness are indispensable, whilst information about the scope of patent families and their legal status is of utmost importance.

InfoChem’s main projects in this area are the approach to automatic extraction of Markush-Structures (ChemProspector), the automated work-up of structure and reaction information from graphical schemes and images, and the creation of a database of “Supplementary Patent Information” (SPIN).

In this lecture we will present the status of a series of projects on which InfoChem is currently working, describe the main challenges and discuss some of the results.

II-SDV Website

II-PIC Website

Challenges in Next Generation Scientific and Patent Information Mining