Home  »  ICIC  »  ICIC 2005 - 2009  »  Programme  »  Wednesday 21 Oct

Wednesday 21 October 2009

09:00 - 10:00

Patent Research: How a Patent Document Resists Searching

Patents, being both technical and legal documents that are easily available in an analysis-friendly format, can lure unsuspecting researchers to struggle with many unknowns in the data as presented. The liberties that inventors and lawyers take with the language used is a balance between the scientist’s technical discussion coupled with a duty of disclosure and the assignee’s demands not to give any additional information to a competitor in light of the patent attorney’s intent to fully protect a client’s rights under the umbrella of international patent law. Patents are further complicated by variances in patent families which can lead to double counting of inventions, classification codes which vary quite dramatically, changes in the law which can influence a patent drafter’s lexicography, and even problems introduced by the very forms used to apply for patents. Citation analysis, a staple of academic research, can also be clouded by different influences that are unique to patents, including the sometimes competing intent of the examiner and the applicant. These various interests add up to a document that has hidden traps for the unwary researcher. This presentation discusses many of these issues as an aid to patent analysis.

10:00 - 10:30

Characterising Pharmaceutical Patent Space Using a Combined Text and Chemical Analysis Approach

Patents represent a rich source of crucial information for chemists in the pharmaceutical and biotechnology industries. Identifying key patents, and establishing a network of related patents, are two important challenges for comprehensive characterization of freedom-to-operate in patent-space. One way to tackle these joint challenges is to combine text and chemical analysis of patents in a highly integrated fashion. This can be achieved with a datapipeling approach, using Pipeline Pilot, from Accelrys, and including the dedicated ChemMining module for chemical text mining, developed by Notiora. With Pipeline Pilot and ChemMining it is possible to chemically index a large number of patents, by identifying and extracting chemical names, converting them to molecular structures, and storing them in a chemically-aware database. Chemical queries to this database will return patents containing structures matching the query, within a user-specified degree of similarity or by substructure. Having identified key patents, twin networks of related patents can be computed using 1) the patent family information, and 2) by using all structures in the key patent as queries to the chemically indexed patents, to derive a network based on chemical structural similarity. These twin networks will provide greater coverage of patent space than either network alone.

11:15 - 11:45

Semantic Insights using Agile Natural Language-based Text Mining

This presentation reviews the challenges faced by researchers in bringing together diverse information resources to answer business-critical R&D questions. In particular, it outlines how an NLP-based approach for discovering facts and relationships from free text can be used to leverage chemical and biological knowledge. We also discuss the challenges faced when moving from relatively small texts such as Medline abstracts, to more complex patent submissions and study reports.

A variety of applications are described, ranging from ad-hoc querying, where unstructured text is treated as an on-demand virtual database, to automatically generated hypotheses based on both text and structured data. Examples include a specific pharmaceutical case study demonstrating how NLP-based text mining can unlock value in legacy safety studies for toxicity predictions.

Evaluation of Information Retrieval Tools for Chemical Patents and Scientific Articles

The presentation describes the experience of the TREC Chemistry Track, which, by 18 October 2009 will be in its final stages. TREC-CHEM is the most comprehensive evaluation effort for information retrieval tools on chemical patents and scientific articles. It is organised, with the support of NIST (USA), by the Information Retrieval Facility (Austria), University College London (UK) and York University (Canada). For the first year, the evaluation will focus on text retrieval, aiming to compare generic text retrieval tools with domain-specific tools, in terms of recall and precision, but also in terms of efficiency. The data provided to the participating research groups will consist of approximately one million patent files provided by the IRF and 59,000 scientific articles provided by the Royal Society of Chemistry, UK. Both datasets will be available in XML format. The creation of topics, as well as the evaluation of the results, will be done partially in an automatic fashion, leveraging search results performed by entities involved in the patent application process (applicant, patent office, opposers). The track also benefits from the support of patent experts who have volunteered to help create industry-relevant topics and evaluate the results provided by the participants.