Home  »  AI-SDV  »  II-SDV 2015  »  II-SDV Programme  »  Monday 20 April 2015

Monday 20 April 2015

Chair: Harry Collier and Anne Girad, Infonortics, UK

09:00 - 09:30

WebCrawling / Internet Research: Emancipation from Public Search or Why Boehringer Ingelheim is Betting on Proprietary Search Corpora

Because having to rely on services like Google to access unstructured scientific information has proven to be a blocking element in high quality Internet research, the Scientific Information Center (S.I.C) at Boehringer Ingelheim has started its BI SEARCHCORPUS project in 2014. SIC has been building a constantly evolving repository of structured and unstructured information about biotechnology companies at an international scale. Sources like company websites, news and blogs but also more specific public information about company products on the market and in development are constantly crawled, extracted and structured and are used as search base to collect, aggregate and rate scientific information available in proprietary sources and publicly on the Internet tailored to specific needs of chemical and biochemical researchers.


09:30 - 10:00

Graph & Maps Vizualisation Using Open Source Components

This session will introduce Open Source Maps infrastructure - OSM (OpenStreetMaps) - The Free Wiki World Map - to build scalable visualization platforms that can deploy on the Cloud, leveraging infrastructure like Amazon EC2 or OpenStack. In the new era of IOT (Internet of Things), maps vizualisation provides manager with instant overview of their life data, for a fraction of the cost compare to legacy Maps platforms. 

This session comes along with a presentation of Thermolabo, and its innovative data-logging pharmaceutical platform dedicated to traceability of transport condition. This session will show how Maps vizualisation, instant data analysis and alert using Hadoop platform could be a path forward to streamline data collection and processing.

10:30 - 11:00

Exhibition and Networking Break

Chair: Phil Hastings, Linguamatics, UK

11:00 - 11:30

Bridging the Gap: Best Practices for Operationalizing Predictive Analytics across your Organization

Organizations often build and deploy transformative predictive analytics platforms with great excitement, but are frustrated that analytical insights are not fully adopted in day-to-day decision making. The greatest challenge is in the translation of predictive insights into clear business objectives. This session provides best practices for “bridging the gap” between the data scientist and the business decision maker to ensure that analytics is embedded in your business decisions.  

In The Data Warehousing Institute (TDWI) Best Practices Report on Predictive Analytics, respondents stated that operationalizing analytics was one of the most important tasks of a predictive analytics platform. However, the same study showed that only 25% of those respondents believed that data drives their companies’ day-to-day decision making. A major issue in operationalizing predictive analytics is the handoff between the data scientist and the business decision maker. In this session, I will go over best practices to ensure that this handoff is seamless and will examples of success stories in operationalizing analytics. These best practices include:

  •  Transforming predictive model results into tangible business insights
  •  Use of model scoring to obtain granular level customer, supplier, and transaction information
  •  Use of visualization, mobile, and social technologies to alert and inform the business user
  •  Measuring the true impact of embedding predictive analytics into business processes
  •  Creating organizational structures, training programs, and incentives to ensure adoption


11:30 - 12:00

Smart City: A Patent Thematic Analysis

Smart cities are presently gaining lots of attention as a new way of thinking cities in a context of growing urban populations. In Europe, almost three quarters of the inhabitants live in cities; and one of the greatest challenges facing the EU is how best to design and adapt cities into smart, intelligent and sustainable environments. This concept is used all over the world in different contexts and with different meanings. In this study, the definition used is the definition given by the Commissariat Général au Développement Durable (2012) which includes the description of the European Initiative on Smart Cities. This definition considers seven main topics: transportation, smart networks, centralised services management, smart buildings, digital spaces, information systems, and urban functionality.

Based on this definition, smart city technologies are the object of analysis in this study, which attempts to answer the following question: What does the landscape of “smart city” technologies look like? To describe the landscape of “smart city” technologies, we are using the April 2014 version of the EPO Worldwide Patent Statistical Database (PATSTAT), which allow an analysis through several dimensions, i.e. patent classifications and territorial dimensions.


12:30 - 14:00

Lunch, Exhibition and Networking

Chair: Jane List, Extract Information, UK

14:00 - 14:30

An Overview of the Enterprise Search Engine Market & Current Best Practices

Today’s leading enterprise search engines provide reliable, scalable and detailed functionality, and these are indicators of a maturing market.  So, although some functional differences remain, core capabilities are generally good. The growing use of cloud-based repositories complicates the picture, and other aspects of building and sustaining search excellence are becoming more important than core technology. This presentation will provide brief summary of the leading players in the enterprise search market, and will discuss how techniques and approaches to search applications are evolving. It will conclude by discussing future architectures for enterprise search, taking account of the increasing involvement of search with “big data.”

14:30 - 15:00

Analyzing and Visualizing Information: Tools and Brains

How can we process the increasing amounts of information? What are we expecting from tools dedicated to information research and analysis?

This presentation proposes a feedback on the evolution of tools (Orbit, Digimind, Intellixir, WoS...) and questions about the way we use them and the way we provide information to our customers.


15:30 - 16:00

Exhibition and Networking Break

Chair: Nils Newman, Search Technology, USA

16:00 - 16:30

Text Mining: it's about Time

Text mining improves time to insight. There are several ways in which timeliness is achieved. Firstly, by use of automation, data can be dealt with immediately as it comes in. This is seen in the case of workflows for competitive intelligence which push relevant information to end-users as data sources get updated. Secondly, it is a achieved by more efficient, faster, and more uniform ways to deal with large quantities of both unstructured and, increasingly, structured information. Finally we will look at how we can deal with mentions of time within documents themselves, reporting on the i2b2 competition for extracting current patient information.

16:30 - 17:00

Solving the Content Retrieval and Licensing Conundrums for Text and Data Mining

Over the last decades, there has been a confluence of events that makes text and data mining (TDM) not just feasible but extremely valuable to a number of industry segments. On one hand, significant progress has been made in the field of natural language processing (NLP), data mining, and machine learning. On the other hand, technologies that can support the velocity, the variety, the volume, and  the veracity of text and data have been developed and are begun to spread across the industry. However, in order to fully exploit the potential of that confluence we must overcome a number of practical hurdles.


In this presentation, we will describe a number of challenges that surface when one tries to take advantage of the technological advances in the area of text and data mining and we will delineate solutions for two specific problem categories, namely, content retrieval and licensing.   


17:00 - 17:30

Taming Patent Data with Elasticsearch

A brief introduction into Elasticsearch and how one can utilise the technology to build a web based service on top of patent data.

18:30 - 20:00

Welcome Reception - Cocktails and Fingerfood