Home  »  ICIC  »  ICIC 2011  »  Programme  »  Tuesday 25 Oct

Tuesday 25 Oct

Starts at 09:00

09:00 - 10:20

Chair: Randall Marcinko, MEI, USA

How to Perform Comprehensive Patent Landscape Searches

Patent landscapes are valuable tools for both scientists and business developers and can be used for strategic planning. A patent landscape gives an overview of what is patented, where, when and by whom. The discipline involves in-depth search strategies, thorough sorting, meaningful charting and analysis of the often large data set retrieved within a given area of technology. Patent landscapes are discussed in general focusing on what they can be used for and on how to perform a comprehensive search without losing important patent documents. Several parallel search strategies using e.g. various indexing terms, classification codes, chemical structures, and keywords are presented in order to obtain complementary results. Practical ways to sort a large number of documents are discussed and relevant possibilities on how to display the results in a useful and reader friendly manner is shown. This is exemplified by searches aiming to give an overview of an active pharmaceutical ingredient.

Challenges in Next Generation Scientific and Patent Information Mining

In the last decade, various concepts and tools for the search, retrieval and presentation of scientific and IP information have evolved enormously. Information professionals demand ever more efficient ways to gain precise and high relevant search results. In particular, the retrieval of patent information in the chemical and pharmaceutical area requires specialised indexing, searching and visualization techniques. The automatic extraction of chemical compounds both fully defined and generic (Markush structures) from text and images is of significant importance. Moreover, completeness and up-to-dateness are indispensable, whilst information about the scope of patent families and their legal status is of utmost importance.

   InfoChem’s main projects in this area are the approach to automatic extraction of Markush-Structures (ChemProspector), the automated work-up of structure and reaction information from graphical schemes and images, and the creation of a database of “Supplementary Patent Information” (SPIN).

In this lecture we will present the status of a series of projects on which InfoChem is currently working, describe the main challenges and discuss some of the results.

New Product Introductions - STN / Dialog / QWAM Content Intelligence / CAS

10.20 - 10.50

Exhibition and Networking Break

The 2011 ICIC Patent Panel

Are we working towards a changed and interesting future for all the participants in Intellectual Property information and applications? Patent professionals, users, suppliers and developers give their views


This year's panel is chaired and animated by Pierre Buffet of Questel. The five expert panellists are Rahman Hyatt of Minesoft (a service provider), Alfred Elmaleh of the French Petroleum Institute (a patent professional), Monika Hanelt of Agfa Graphics (a patent information expert), James Ryley of SumoBrain (a service and software provider) and Roca Campana of WIPO. The panellists will examine and discuss, with the audience, a range of questions relating to the future of the patent information domain. The future scenarios touched upon will be of interest to everyone at the ICIC meeting.

Please find the Patent Panel Report and the Patent Panel Questions in the attachment.

12.50 - 14.30

Lunch, Exhibition and Networking

14:30 - 16:25

Chair: Elisabeth Piveteau, Digital Science, UK

Synthesising Knowledge by Exploiting Diverse Data Sources: from Microblogs to Patents

This presentation examines the role of text mining in providing answers for business-critical decisions, whether in R&D or commercial contexts. With use cases from pharmaceutical and other domains, it shows how agile text mining can be applied to unstructured and semi-structured documents ranging from microblogs to full-text patents. In particular the presentation demonstrates how we can both categorise and summarise sets of documents and find connections between entities within and across them. By adopting these techniques, we show how it is possible to accelerate the systematic and comprehensive analysis of literature sources, and discover new insights by structuring and synthesising knowledge from diverse sources.

From BACON to XML – Why Patent Information Publishers are still Converting Image Data to Searchable Text


With almost 150 active patent issuing authorities, the fact remains that less than 20% of patent offices publish regular updates of their full text data; fewer still have published their complete backfiles. This presentation outlines why and how the patent information publishing industry is still creating full text data from images, more than 25 years after the EPO began its BACfile CONversion (BACON) project. During the presentation, we will describe the breadth of sources currently available to vendors, some of the processes and systems involved in converting images to text, the scope for adding value to existing first-level patent data, and discuss some of the new content that is fast emerging. We will also highlight some of the issues faced by users of such databases and offer insights on overcoming those challenges.

Chembl - Open Data for Drug Discovery

At the EMBL-EBI we have recently established a large-scale database of drug Structure Activity Relationship (SAR) data. These data are integrated with other EMBL-EBI resources, which include ENSEMBL, UniProt, PDBe and ArrayExpress. Together these Open Resources form a powerful and free infrastructure to assist drug discovery, these typically sit alongside, and complement, commercial software products at user sites. The construction process for the database is complex, with early stages outsourced (itself an unusual aspect for an academic database) and a series of data transforms and normalisations applied during later curation and indexing steps to improve usability of the data. Over the next year, we plan to introduce new views on the data which are friendlier to a semantic web environment, while continuing to improve new descriptors and annotation for the underlying data.

New Product Introductions - Fairview / Max.recall / ChemAxon / Springer

16.25 - 16.55

Exhibition and Networking Break

16:55 - 17:55

Chair: Heinz-Gerd Kneip, BASF, Germany

Making Searching Faster and More Complete: Cross-Collection Search and Automated Result Set Analysis


Specialist databases are frequently limited by collection rather than
subject matter. For example, patent searches are frequently performed
against patent-only databases. That situation seems to be dictated largely
by copyright and convenience, not by what the searcher really wants. What
the searcher would presumably prefer is a database that covers all quality
information in his area of search.

However, broader databases can also introduce more irrelevant documents,
begging the question as to what is ideal in terms of corpus scope. Another
question that becomes more relevant as the database becomes larger and more
diverse is automated analysis of query result sets. For example, tools such
as real-time clustering, the ability to review search results as a set of
images instead of just text, and the ability to use a subject document as
the basis for either a new search or refining an existing search, can
increase search efficiency.

Our observations in the patent search space will be presented, as we have
expanded our patent database to also include thousands of full-text
journals, millions of technical abstracts, more sophisticated search tools,
and advanced post-search tools that help the searcher quickly sift through
large result sets.


Multi-file Searching of Polymers in Various Databases

Identifying all relevant patents in the field of polymer chemistry can be challenging. One can search by:  (a) the chemical structure of monomers, which are the building blocks of polymers, in the REGISTRY database, (b) keywords representing attributes of Polymers in CAplus, and (c) specific structure and attribute index codes in IFICDB and WPIX.  Experience indicates that multi-file searching in various databases, as opposed to a single database, yields the highest number of relevant hits. We illustrate the complexity of the search process with the following example.  

Case:  Terpolymer Search
Objective:  Retrieving all patent records for terpolymer, an adhesive made up of three monomers and a plasticizer.  The plasticizer in this case is generic, meaning it could be any plasticizer.
Search:  STN platform using REGISTRY, CAplus, IFICDB, and WPIX databases
Standard Classification Codes Used: IPC, ECLA and US class codes
Indexing Codes Used: Fragment, Uniterm, and Derwent manual codes
Search Results:  Each of the databases uniquely contributed a significant percentage of the relevant patents, defined as‘10% relevant and new patents’ found in one database over the other after de-duplication of results.  These findings suggest a multiple database search is most exhaustive and we will demonstrate the benefits of our search strategy.

   In this presentation we will present such challenging case studies and discuss what works and what does not in multi-file searching.


Buses leave for Barcelonate

20:30 - 22:30

Conference Dinner