Home  »  ICIC  »  ICIC 2010  »  Programme  »  Monday 25 Oct  »  ChemProspector: Advanced Mining and Searching of Chemical Content in Patent Documents

ChemProspector: Advanced Mining and Searching of Chemical Content in Patent Documents

 

Chemical information mining has turned into a well-established scientific area over the last five years. Several software solutions exist that are able to identify and extract names of chemical compounds in text documents and convert them into chemical structure-searchable information. However, a still unsolved issue is the automatic abstraction of generic compounds (Markush structures). These usually consist of a core structure image and variable groups specified in the text, in additional images or in tables.
   This presentation describes our approach to extract generic structure information from documents by using a hybrid approach combining information science, cheminformatics, computational linguistics and pattern recognition techniques. The development of chemical ontologies and their usage is discussed; experiences with the envisaged methodology and first results are presented.
   This research project is funded by the German Ministry of Economics and Technology. It is part of the THESEUS research programme with the goal of developing a new Internet-based infrastructure in order to make better use of the knowledge available on the Internet.