Monday 14 April 2014

Predictive Analytics and the Big Data Challenge

With examples from Life Sciences and Genes Expressions research, this presentation explores the impact of Big Data and In-Memory technologies on analytics, and discusses the possibility of a new reality where accurate prediction is an affordable commodity available to everyone. How will that change our world?

Predictive analytics is about analysing known facts to make predictions about unknown events. What if we knew absolutely everything that ever happened and every bit of that data was available instantaneously; would that enable us to make more accurate predictions?

Organising Data: The step before visualisation

Nils Newman (Search Technology, USA)

When working with large collections of text -- patent, scientific publication, other, or a combination -- creating usable order out the mass of information before you offers a true challenge, particularly for topical content. This presentation looks at the fundamental issues related to analysing text and summarises some of the key machine learning approaches to create ‘order out of chaos.’ It also addresses techniques ranging from Natural Language Processing to statistical models, such as Topic Modelling. The presentation also discusses hybrid approaches such as ‘Term Clumping’ which use a mixture of computer algorithms and human-aided processes. These sets of techniques will be placed in a framework to assist analysts in determining the best approach for their type of data and the impact of their decision on the visualisation process downstream.

Innovation in Building Materials: Implementing a collaborative intelligence platform in India using Digimind

Carla Monfray (Lafarge, France)

Nelly Gilibert (Digimind, France )

In our presentation, we present Digimind and its collaborative intelligence platform. The experience of deployment in India will be reported: the key steps, success factors, difficulties and recommendations.

Lafarge’s Construction Development Laboratory (CDL), based in India, requested the Group Technology and Competitive Intelligence Department to implement a collaborative watch with their teams.The goal was to detect new building products and technologies, Indian competitors and architects initiatives in the country. This project was the first Digimind deployment outside France for Lafarge, and this was a real challenge. The project duration was three months and the CDL team achieved success in alerting news and creating specific deliverables.

Standing on the Shoulders of Giants: New strategies to involve more of the world with data mining and intelligence extraction

Steven M. Muskal (Eidogen-Sertanty, USA )

Over 70% of the world's population, i.e. 5 billion people will routinely use mobile devices in the next few years. In this age of immediate access and connectivity, an unprecedented “global conversation” and capability has arisen creating many new opportunities for intelligence extraction. Unfortunately, the number of people developing cloud-based mobile applications geared towards data capture and data mining is far too small to support the demand. To this end, we have been collaborating with Accelrys over the last year to extend the server-side pipeline pilot framework to interface with various hand-held mobile device functionalities including image capture, upload and annotation, geo-location tagging, audio and video capture, and other forms of data capture. By enabling mobile devices through server-based pipeline technologies, we hope to simplify mobile-cloud development and engage larger populations of people in the quest to more readily collect and extract intelligence from big-data.

Development of Reports and Visualisations to Facilitate the Analysis and Transformation of the Clinical Trial Landscape into Actionable Intelligence

Diane Webb (BizInt Solutions, California, USA)

Jasen Chooramun (AstraZeneca, UK )

This presentation describes a project to identify and develop new reports and visualizations to address clinical trial intelligence challenges facing pharmaceutical companies, including:

how can we best identify and evaluate key competitor compounds?
how can we understand the development cycle for a compound, including trial phases,
countries and outcomes?
what can we learn from successful -- and especially unsuccessful -- competitor trials?
how can we identify partnership opportunities -- especially with academic centres and
institutes?

In this paper, we show how we identify key business issues, select relevant data sources, and then develop targeted reports and visualizations to help decision makers evaluate threats and opportunities.

Patent Valuation: Building the tools to extract and unveil intelligence and value from patent data

Laurent Hill (Questel, France)

Patent tools are now expected to provide answers to far more complex business questions and within a shorter turnaround than ever before. This presentation shows how, by combining high quality patent content (such as invention-based patent families, normalised legal status, company names, citations, dates, numbers, litigations) with sharp similarity algorithms including patent searchers’ best practices and state-of-the art semantics, one can produce sets of valuation metrics, adjustable to company and industry specificities.

Network Visualisation - unpicking the knot

Joseph Parry (Cambridge Intelligence, UK )

This presentation discusses the science of information visualisation and how it applies to networks, with many inspiring pictures and interactive demos.

We live in a world of connected networks. Everyone and everything is connected through ubiquitous networks that operate on many different levels. There are conceptual networks of ideas, social networks of people, communication networks and, of course, networks right down to the physical layer of device inter-connectivity. We know that the behaviour of people and nodes is fundamentally governed by their network location. Near neighbours and structural positioning govern both access to information and influence over other nodes. Because of this, understanding network structure is a critical concern in a growing number of fields, as diverse as anti-fraud, PR, corporate sales and IT infrastructure management. To fully understand a network one needs to both see it and interact with it. However, building these kinds of visualisations is difficult – both technically and from a design perspective. The talk will contain bad (and hair-raisingly awful!) examples as well as good ones, to uncover the fundamental design principles and give practical advice for those wishing to pursue a more visual approach to interacting with networks.

Recommender Systems for Analysis Applications

Roger Bradford (Agilex Technologies, USA)

This presentation provides an overview of recommender systems for analytic applications. Topics covered include methods for expressing user interests, techniques for comparing user interests and items, dealing with noise, suppressing redundant data, highlighting novelty, providing explanations of recommendations, and enhancing user trust. Although the talk deals primarily with high-level aspects of these systems, it does include some data on performance and scalability issues.

In recent years, recommender systems have come to play a major role in online commercial applications. For example, in 2012, Netflix, the world’s largest movie rental company, reported that 75% of its movie rentals were generated through its recommendation engine. Today, recommender systems provide advice to users on a wide variety of topics, including entertainment, consumer products, investments, travel itineraries and even jokes.

One of the most recent developments in this area has been the implementation of recommender systems to aid users in the conduct of analytic work, such as activity monitoring and legal investigations. These systems differ from transaction-oriented commercial recommenders in several important ways. In particular, recommenders used in commerce typically employ a “wisdom of crowds” strategy. The large retailer Amazon, for example, develops recommendations for products based on the purchase history of over 100 million customers – a highly successful big data application. In analytic applications, there typically are few users, so recommendations must be based on a different strategy. In practice, they are based on the content of the items being recommended.

The Road to Federated Text Mining: Are we there yet?

Guy Singh (Linguamatics, UK )

The presentation looks at tackling how a wide variety of document types along with a range of ontologies/vocabularies, spread across multiple servers, can be used for text mining. Instead of taking an approach to normalise and integrate the data, we look at how to approach this by leaving the data in multiple locations on the network and in a heterogenous form. A single search that covers multiple data sources regardless of where the data resides on the network has become even more important as data continues its rapid and distributed growth. Extending this approach to NLP-based text mining introduces additional obstacles due to the varied semantic and structural content of documents.

This presentation looks at the challenges associated with providing such a capability and outlines approaches to address them.

Insights into a Next Generation IP and R&D Dashboard

Jeroen Kleinhoven (Treparel, The Netherlands )

Together with Evalueserve, Treparel has released a source-neutral (web based) dashboard for easy client usage in IP and R&D environments. This presentation discusses the results and reactions from the launch customers and their experiences from working with it.