Home  »  AI-SDV  »  II-SDV 2016  »  Programme  »  Tuesday 19 April 2016

Tuesday 19 April 2016

Chair: Andrew Hinton, Linguamatics, UK

09:00 - 09:30

Data Science with R and Vanilla Air

Companies are facing challenges to analyze Big data databases & data Lakes in uncertain technologies environment, in order to provide accurate analysis and build forecast model.
In a context of budgdet constraints, the R project is a reliable alternative to legacy commmercial software to develop and deploy business analytics data model. R has a worldwide recognation and fast adoption from companies everywhere in the world. Together with Vanilla Air, everybody can start now a Data Science project and share his analysis in an instant.
This session comes along with a presentation of the Vanilla Air, a new "cloud / on premise" Data Science platform to develop & deploy R data model at enterprise level


09:30 - 10:00

Sentiment Analysis: What your Choice of Words Says about your Company

Text Mining research has historically focused on addressing who, what, when, and where.  However, new technical approaches combined with advancements in related scientific areas mean that Text Mining can potentially address how and why.  This presentation provides an example of using sentiment analysis to assess behavior and outcomes at publically traded companies.  Using US Securities and Exchange Commission filings semantically analyzed in VantagePoint, we explore correlations between the language used in corporate filings and financial outcomes and behaviors.  This discussion covers some of the techniques used for this type of analysis and will address some of the application areas where this approach might be useful.

10:30 - 11:00

Exhibition and Networking Break

Chair: Nils Newman, Search Technology/VantagePoint, USA

11:00 - 11:30

Elastic Search & Patent Information @ mtc

This presentation will introduce our next generation patent publication store with Elastic Search 2.0 and our own modules for query translation, document access and content delivery. This is how we think, offices and data providers should give access to their content and backends. But they won't, and we will. We have our alpha version ready to show off its features and are curious about your feedback.


11:30 - 12:00

Visual Data Exploration – Having a Conversation with Complex Data to Understand What Else it Contains

As we are all aware, the world is collecting and storing more transactional data now than at any time. Specifically in healthcare, most countries in G20 either already have or are developing electronic health record capture and storage for the majority (and frequently the entirety) of their population. Terms such as “personal medicine” and “learning health system” have become common ways of expressing the desire to use these large health record datasets to guide ways of improving patient outcomes, improving population health and reducing the cost burden of care. What is not discussed in very much detail is how these data are going to be used to create these benefits – and that is because many of the methods that have traditionally formed the core of health data analytics are not capable of achieving these goals.

Traditional analytical methods rely on two things – a clear definition of the data to be used and a clear understanding of the hypothesis that is going to be tested. These two result in a subset of the data being used for any given analysis and the scope of that analysis to be restricted. Even in so-called artificial intelligence applications there is a need to define the boundaries of the analysis due to the size and complexity of the data. This bounded approach has the benefit of producing analytical problems that can be readily addressed but has a significant weakness in that the true complexity of most of human health issues are, in effect, ignored.

For example, the healthcare community will frequently discuss chronic diseases as if they exist in isolation. A typical analysis will explore patients with, for example, Type 2 diabetes and high blood pressure, but will not further divide that patient group into the many sub-cohorts of patients with multiple additional co-morbid conditions. Frequently this lack of granularity is driven by an inability to analyse the data in sufficient depth – the analytical problem becomes too big for a straightforward hypothesis and so the more complex issues remain unanalysed.

This paper describes a means of using visual data exploration to overcome many of these analytical problems. Using multi-dimensional data mapping and on-demand computation techniques allows the analyst to explore layers of complexity not previously available. In addition, completing these explorations very rapidly, in near real-time, allows the analyst to explore multiple options and alternative approaches – in effect analysing multiple hypotheses simultaneously.

The paper will illustrate how visual data exploration methods can identify characteristics of disease that would not previously have been considered, allowing them to be defined and quantified. The paper will illustrate how, by allowing the analyst to simultaneously explore tens of millions of patient records in their entirety without prior exclusion / inclusion criteria, the analyst can have the opportunity to derive findings of significant benefit that previously would have almost certainly been missed.

The approach described leads to levels of clinical insight that were previously often suspected but not known, allowing for the generation of knowledge relating to patient care, improved clinical outcomes and better direction of clinical trial design - among numerous other benefits. The paper will conclude by describing the ways in which visual data exploration is being used to take us closer to some of the goals of “personal medicine” and the “learning health system”.


12:00 - 12:30

KOL Analytics from Biomedical Literature

Strategic partnerships between pharmaceutical companies and medical experts lead to more effective medical and marketing activities throughout a product life cycle. Identification of such medical experts, that is, key opinion leaders (KOLs) from bibliometric analysis is challenging due to volume and variety of data. Today, the research community is flooded with scientific literature, with thousands of journals and over 20 million abstracts in PubMed.  Developing a holistic framework to identify, profile and update the KOLs is the need of the hour. Customers want digestible information – everything relevant.   In this talk, we will present case studies on how we used the ontologies and disambiguation techniques to address KOL identification for different therapeutic areas.


12:30 - 14:00

Lunch, Exhibition and Networking

Chair: Bob Stembridge, Thomson Reuters, UK

14:00 - 14:30

Taking Patent Research platforms beyond Search

While the perpetual quest for access to more full-text coverage and authorities goes on, what are patent databases doing beyond expanding content and coverage? How can all of this help you? 

This talk will highlight various innovations and capabilities being added to Patent Research Platforms that add greater value to day-to-day operations of patent and scientific information teams. Various innovations for the patent professional, the patent department, the patent analyst and the patent licensing teams will be highlighted and discussed in depth. We will also take a look at what future innovations can be expected in the next 2 years.


14:30 - 15:00

Patent Landscape Reports and Other WIPO Activities in the Area of Patent Analytics

WIPO started work in the area of patent analytics in 2010 with a Development Agenda project on “Developing Tools for Access to Patent Information” which resulted in the production of a series of Patent Landscape Reports (WIPO’s patent landscape reports can be found here). These reports, prepared in cooperation with various UN Agencies, non-governmental organizations, research institutes and national IP Offices, analyze patent activity in various topics in the areas of public health, food and agriculture, environment and energy, and disabilities. The key findings are often summarized in an infographic.

In 2013 WIPO started working also on awareness raising and capacity building in the area of patent analytics. Apart from various workshops organized on this topic, WIPO published in September 2015 the “Guidelines for Preparing Patent Landscape Reports”.  The Guidelines describe the objectives and motivations for preparing Patent Landscape Reports (PLR) and other types of patent analysis, the tasks associated with patent analytics, as well as the stages in the preparation of PLRs, providing also some insights from WIPO’s experience in the area.

Since 2015 WIPO is exploring open source tools for patent analytics purposes in the framework of the preparation of a Manual on Open Source Tools for Patent Analytics. Open source tools are typically used by other disciplines, usually business/data analysts, statisticians, IT professionals and scientists, rather than with regard to patent data. Nevertheless, in recent years they started emerging as an alternative and/or a complement to ready-to-use tools, providing flexibility and adaptability in different analysis types. In view of the necessary programming related to this type of tools, WIPO developed step-by-step instructions in the Manual with example datasets, and will provide capacity building activities with training on patent analytics for Technology and Innovation Technology Support Centers (TISCs) around the world (for more information on the TISC program please visit www.wipo.int/tisc) .





15:00 - 15:30

Monitoring and Analysis of Web Information for Various Business Contexts : Competitive Intelligence, Company Information, Knowledge Management, Market Intelligence

A range of business use cases will be presented for illustrating the combined use of several technologies and tools including web content crawling and monitoring, advanced information search and retrieval, information analytics and information delivery developed by Qwam.

Use cases presented will cover :

  •  Automated gathering and extraction of corporate web information for competitive intelligence, CRM systems, lead generation, etc.
  • Building knowledge base with selected web information for marketing intelligence needs
  • Information analytics of content and usage with knowledge portal applications


15:30 - 16:00

Exhibition and Networking Break

Chair: Christoph Haxel, Dr. Haxel CEM, Austria

16:00 - 16:30

Improving Text Mining Results with Access to Full-Text Scientific Articles

Life science companies increasingly rely on text mining to gain important insights from vast amounts of published information. But researchers struggle to get access to full-text articles for text mining. When they do get the full text they must contend with multiple formats and inconsistent license terms – all of which inhibit text mining efforts. In this presentation, we will describe the value in mining full-text scientific literature and outline the issues researchers face in accessing and licensing this content for commercial purposes. We will provide a walkthrough of Copyright Clearance Center’s (CCC) RightFind™ XML for Mining solution and contrast this with other approaches to solving these time-consuming content and licensing challenges. CCC is the parent organization of RightsDirect.



16:30 - 17:00

Text mining - as normal as data mining?

How can we capture information from free text as conveniently as accessing a database? One of the essential differences is the lack of normalisation of terms and concepts in free text.  In this talk we will discuss several applications of specialised normalisation solutions. We will show how range search can be achieved over free text e.g. capturing weights between 60 and 80 kg whether expressed in kilograms or pounds, for patient selection from EHRs. We will also show how queries can be expressed in an Extraction and Search Language (EASL) which allows programmatic access to unstructured data similar to SQL over structured data. Finally we will show a particular use case where gene mutations have been linked to rare disease progression.