|
- Info
Monday 19 Oct
09:00 - 13:00
Chair: Wendy Warr, Wendy Warr & Associates, UK
Evaluating Patent Full Text Documents with Chemical Ontologies
Chemical ontologies represent abstractions of chemical compounds - providing structural as well as functional and chemical property classifications. With automated patent text processing there is also an increasing interest to automatically classify chemical compounds in patent documents to enable chemical searches based on known chemical classes.
Thus, we will present strategies to automatically classify chemical compounds based on their names and chemical structure or function using a chemical ontology derived from the pure lexical variants MeSH and ChEBI but incorporating SMARTS and chemical calculation based logic. We will describe the development of this ontology - comprising also functional classifications and material science terms such as alloys and polymers.
Using our UIMA based OCMiner annotation pipeline, over 90 million patent full text documents were extracted to find mentions of chemical compounds, substances, chemical classes and chemical groups. In addition, the claimed uses of these compounds were also extracted. Subsequently, chemical terms were classified by our chemical ontology, transforming more than 10 billion found chemical class mentions into an ontology enabled, Lucene based search index. This index was also used to analyze the frequency of found chemical classes per time period, giving indications on the focus of general chemical reseach activities and recent trends in patenting strategies.
An annotated data set of 10 years US patents is freely available for further investigations and can be used to train and develop further the use, quality and interchangeability of chemical ontologies.
Automatic Chemical Annotation of Large Full-Text Patent Corpora. Pitfalls, Challenges and User Benefits
The past decade has seen various approaches for automatic identification and extraction of chemical information from unstructured sources emerging. These have opened new possibilities to exploit, organize, query, and analyse chemical content to support research and development processes as well as IP-related tasks.
Several solutions for chemical named entity recognition exist, all of them showing a reasonable annotation quality. Each of them uses slightly different approaches depending on its focus and therefore shows specific strengths and weaknesses. However, when it comes to real-world applications, technical challenges such as large and/or heterogeneous text corpora appear. Questions for scalability, performance, and parallelization emerge.
This talk addresses the above mentioned questions and challenges in terms of a joint FIZ Karlsruhe and InfoChem project, where FIZ Karlsruhe will leverage the chemical annotations, based on InfoChem’s chemical text mining technology, for its comprehensive range of patent full-texts, making them more easily accessible and allowing for even more precise and complete user queries.
New Product Introductions - Minesoft, InfoChem, BizInt
10:30 - 11:00
Exhibition and Networking Break
Efficient and Effective Patent Landscaping Using PatBase: a Case Study
Key questions for any patent landscape analysis are:
- Is this a growing area of interest?
- What are the fields of current interest?
- Who are the key players?
Delivered from two different perspectives, we will discuss the challenges in creating a patent landscape analysis and the features and functionality of PatBase which enable the efficient creation of clean and reliable patent landscape studies.
From the development of a comprehensive search strategy to the analysis and visualization of results, we will use a case study to demonstrate a number of features and techniques including
- Thesauri to craft a comprehensive strategy.
- Citation ranking to identify the most pertinent patents.
- Advanced keyword highlighting to efficiently review large numbers of documents.
- Use of the “Similar” function to identify related documents and
- PatBase Analytics to:
- Help build a comprehensive strategy and
- Visualize the results of the landscape analysis.
The aim of any patent landscape analysis is to accurately identify and visualise results so that answers to key questions are quickly and easily found. This presentation will demonstrate how any user can benefit from the innovative features and functionality in PatBase to craft and visualize a meaningful patent landscape for any technical area.
Systematic, Automated Analysis of Patents and Related Literature
Text mining is increasingly being used not only to find patents, but also to provide systematic and automated analysis of the patents and associated literature. Automated workflows are used to alert subscribers to relevant patents, and provide tailored summaries of the patents for fast review. This talk will present new use cases for text mining patents, and demonstrate recent developments in the I2E platform, including better multilingual support, improved visualization, and easier extraction of information from tables. Finally, it will demonstrate how the use of federated text mining facilitates opposition searching across multiple data sources such as literature and grants as well as patents themselves.
The open patent chemistry “big bang”: Implications, opportunities and caveats
In 2012, after the first IBM deposition, few would have predicted that PubChem compounds that included patent-extracted structures would exceed 20 million within three years (i.e. 30% of the total). The current major open patent chemistry submitters (in size order) are NextMove, SCRIPDB, Thomson Pharma, IBM and SureChEMBL. This “big bang” has a range of utilities and implications. Firstly, pharmaceutical companies must now integrate their exploitation of both public and commercial patent chemistry because capture is divergent. Secondly, the academic community and small companies can now patent-mine extensively without commercial sources. Thirdly, first-filings of most lead series and clinical candidates can now be tracked. Fourthly, drug targets in ChEMBL can be intersected with Structure Activity Relationship (SAR) data sets from patents, some of which are now target-mapped in other databases (doi:10.1016/j.ddtec.2014.12.001). However, while this patent chemistry “big bang” is generally welcomed by database users, there are significant caveats. In particular, both automated and manual extraction bring in a variety of artefacts that add confounding structural “noise”. These include a) permutations of mixtures and chiral exemplifications, b) virtual structures (including isotopic analogues of approved drugs), c) an emerging trend of vendor “patent picking” for non-stocked compounds, d) 85% of public patent chemistry has no biological data links and c) extractions from documents do not directly indicate IP status. These problems and some partial solutions will be discussed.
New Product Introduction - RightsDirect, Intellixir
Welcome to France, Homebase of the French Speaking Patent Information Association
Let us present the CFIB, the French Speaking Patent Information Association (Club Francophone d’Information Brevet). Who are we? What are our activities and how do they impact the patent information com.
13:00 - 14:30
Lunch, Exhibition and Networking - Lunch
14:30 - 16:10
Chair: Morten Christoffersen, NovoNordisk, Denmark
Big Data: Big Issues for IP
"Big data" is a broad term that encompasses a wide range of data and contents. Big data offers new approaches to analysis and decision making. At first glance big data and IP may seem to be opposites, but have more in common than one may think. This talk will focus on how big data will impact, and be impacted, by IP. One of the biggest promises in big data is the possibility to re-use data produced via different sources, create new services or predict the future, via the analysis of correlations. In this context, how can companies protect information assets and analytical skills? What are the new skills required to search and analyze in real time a big amount of datasets ? Big data will change not only patents information, but will also generate new types of patents.
Thieme Publishers: New Vistas for the Pharmaceutical Industry: Combining full-text and chemical information
The pharmaceutical and health industry strives to continually improve and augment the available repertoire of pharmaceutical treatments. Medicinal chemists start the innovation cycle by systematic searches in all available published information, i.e. scientific journals, chemical databases and internal documentations.
An impediment is the existence of two separate information silos: full text information and chemical structures/reactions contained in databases. This talk will describe how InfoChem has developed a technological infrastructure within the MarkLogic platform that allows effective and integrated searches in both full-text and chemical information.
New Product Introductions - CAS, FIZ Karlsruhe, ChemAxon, Questel
16:10 - 16:40
Exhibition and Networking Break
16:40 - 17:40
Chair: Jignesh Bhate, Molecular Connections, India
Optimising Content Spending with Analytics
Libraries are constantly under pressure to reduce content spending. In order to meet this challenge while continuing to serve their user’s real needs, the library needs to develop a deep understanding of content use across the organisation. This means augmenting plain usage data with details of organisational structure, and using that knowledge to choose optimal license and access strategies. By taking analytics a step deeper and leveraging a broad portfolio of access options, libraries can continue to support the research process while managing costs.
We will discuss the challenges of aggregating usage data, demonstrate a selection of analytical tools, and how the analysis can inform the buy decision as well as the justification (ROI) for it. Attendees will gain an overview of analytical techniques, learn about the trade-offs of build vs. buy strategies, and come away with an understanding of key success factors for implementing an analytics program.
The Enterprise Search Market in a Nutshell
As data sets continue to grow, search remains a key technology for many applications. But what is the current state of the enterprise search market? Which providers are gaining market share, and what are the latest developments and innovations? Based on experience from dozens of recent search projects using a range of technologies, this presentation will summarize market conditions, discuss current best practices for creating great search systems, and suggest some future trends to watch out for.
19:00 - 22:00
Conference Dinner at the Hotel Westminster in Nice
|