Home  »  ICIC  »  ICIC 2005 - 2009  »  Programme  »  Tuesday 20 Oct  »  Progress in Automated Chemical Structure Recognition in Text and Images

Tuesday 20 October 2009

Text mining in chemistry and drug discovery relies heavily on the automated extraction of chemical compounds and pharmaceutical substance names from text and images. In this presentation a hybrid approach combining information science, cheminformatics, computational linguistics and pattern recognition techniques will be presented. Various text mining applications have been developed recently that promise comprehensive access to knowledge for researchers. However in many cases the quality of the extracted chemical content in terms of precision and recall is questionable. Bad image quality, ambiguous notation or incorrect names can be the source of errors and wrong results. Thus strict chemical validation and verification of the extracted information is of utmost importance to achieve reliable and consistent results. The approach presented here combines specialised software tools for graphical structure recognition, chemical named entity extraction and name to structure conversion. Combination with established verification and checking tools for automatic chemical validation ensures high quality in the generated content.