Tuesday 15 April 2014

Hybrid Intelligence – foresight for opportunities

Youri Aksenov ( Philips, The Netherlands )

Daily routines of adaptable entities involve predictions of how motions of the environment are evolving, what new things might happen, and where it will all end up. Drawing on laws of nature and behaviour and integrating the necessary information about a given society or system, it is possible to recognise patterns that will be completed some time in the future.

The presentation looks at a framework that facilitates a foresight endeavour of commercial entities. The framework exploits an interlaced co-operation of human and machine intelligence and illuminates trends in technologies, innovation and business opportunities that might have a chance in the future. The core of the framework comprises structuring, scouting for information, analysing, decision making and visualising. Patent and other IP information is used as a scaffold for the system. The design of a scaffold resembles and concords to the human way for commercial innovations including parts of new business creation thinking. The built up mass of intelligence in such a manner opens new dimensions and facilitates decision making by a variety of stakeholders. The presentation opens discussion on the framework itself, interfaces along co-operation between human and machine intelligence and requirements to the machine abilities on analyses, decision making and visualisation.

Search and Data Mining Open Source Platforms

Patrick Beaucamp (Bpm-Conseil, France )

This presentation discusses the technologies available to deploy search applications and statistical/data mining applications using Learning Machines and Open Source Components. Global Statistical Approach and Simulation/Prediction will be covered as well as the learning machine concept, constraints and implementation. Business cases from large companies are discussed, to show how they are using Open Source technologies and Cloud infrastructure to deploy data mining applications at reasonable cost.

Design and development of a novel Patent Alerting Service

Wolfgang Thielemann (Bayer, Germany )

Although there are multiple commercial patent alerting products or patent sources with alerting functionality available, we decided to develop a proprietary system which combines categorisation of patents with content enrichment through text mining. This talk focuses on the design and functionality of this novel proprietary system which has recently been launched at Bayer HealthCare.

Automated Relevancy Check of Patents and Scientific Literature

Philipp Daumke (Averbis, Germany)

Katrin Tomanek (Averbis, Germany)

In the area of Big Data and the explosion of data volume of all kind, organisations seek for leveraging such data - be it patent information, research literature, social media data etc. - for competitive advantage and to help achieve strategic aims. A strategic task is to filter out relevant knowledge from large amounts of data and to provide information specialists with accurate and timely information. Traditional approaches try to filter such information using simple keyword approaches, that are subsequently assessed for relevance by information specialists (IS). However, the search and filter requirements of IS go typically far beyond simple keyword search. Text-mining approaches are a promising approach to support more fine-granular analysis of data. Nevertheless, defining a detailed search strategy is still a tedious task, and not all rules can be made explicit and translated into text mining approaches. Therefore, there is a pressing need for intelligent decision support systems that assist human experts in this complex filter and categorisation task and at the same time learn from previous expert relevance judgments to speed up the whole process.

In this presentation we describe a largely automated process to assess the relevance of literature (patents, scientific, etc.) with respect to the strategic direction of the organisation. Where possible, the system will need to decide automatically whether literature is relevant, or not. Critical cases, however, are handed to a human to achieve a high precision rate. Our approach follows a typical four step screening workflow and the elements of the methodology are described: Scope definition, filtering, assessment and visualisation.

Analysing Patent Full Text – Comparison against analysis of abstract and bibliographic data, and lessons learned

Richard Gynn (LexisNexis, UK)

Over recent years we have seen a shift towards full text data in patent searching and machine translations that help us search across this international content using a single language. For example five to ten years ago we were relying on bibliographic data and, often non-English language, abstracts for countries we now have full text available in English. This has increased possibilities of comparison between and analysis across full text content from a range of different authorities. Rather than critiquing actual tools, we will take a look at the data behind the analysis and use a range of analytical tools, on example datasets, to compare what is uncovered through bibliographic analysis and text mining of abstracts with what is uncovered through mining the full text. What benefits are there to mining full text, and lessons learned?

A New Approach to Flexible, Meaning-Rich Document Parsing

Paul Barba (Lexalytics, USA)

Through the use of decomposed matrices/tensors, we are able to take an unsupervised learning approach that is flexible across multiple corpora and application spaces to extract unambiguous meaning. Examples of ambiguity will be described. Across a large document corpus, you detect which phrases are used commonly and which only show up as the occasional possible parse.

What we do is run a ‘chunker’ or shallow parser to identify the phrase units in a document. This is a straightforward, fast process that can identity simple grammatical units, such as noun phrases, but does not know how they relate to each other. Then, a few simple rules state what links could possibly exist between the phrases. A verb usually has a subject and an object. A conjunction lets two words share a relationship (but does not have to). With these, we can generate all the possible parses for each document in a giant corpus and record how often word and phrase pairs and triples occur as possible parses. As we pour more and more data in, real, human parses of a sentence start to show up more than incorrect ones, and we end up with a parser that used no hand-annotated training data. This means we can quickly add parsers in new languages, or parsers that are specialised to individual domains. This will have huge implications for accuracy and context sensitivity for all looking for advanced text mining.

The Challenges of Managing “Big Data” in the Patent Field: Patents for business

Olivier Huc (Minesoft, UK)

This paper presents the challenges of maintaining a global patent information database – how to ensure quality in the face of an ever-increasing quantity of patent data, and how to enable users to extract meaningful insights for business decisions.

As international patent filings increase steadily year on year, we discuss how a 35 terabyte relational patent database tackled issues of availability of patent data, variations in quality depending on the source and variable formats, and how we continue to implement automated and manual quality control processes.

With patent data retrieved from over 100 patent issuing authorities around the world, language presents another significant challenge. We discuss how pre-machine translated text, on-demand machine translations and cross-lingual search tools can help overcome language barriers, and we highlight areas that present particular challenges such as non-Latin text.

While having access to quality patent information is important, one can recognise that today’s users are demanding more from a patent database. Organisations need to be able to extract relevant and meaningful information and visualise trends from patent data in order to make strategic decisions. Since patent documents are legal/scientifically drafted documents that are designed to be difficult for the lay person to understand, it is vital that tools exist to help interpret patent search results.

Finally, a crucial element of patent information is Legal Status since it is used to determine for example, whether an application has been rejected or granted. Legal Status has long been problematic for patent database providers with sparse data availability and collection creating a barrier to providing up-to-date information, and differing patent laws in virtually every country making it hard to create a homogenous product. We discuss steps we have taken recently to overcome these problems, and the challenges that remain.

Competitive Positioning and Technological Complementarity in the Case of Hydraulic Fracturing: a combination of scientific and technological approaches

Jean-Michel Careil (Intellixir, France )

The exploitation of shale gas resources is currently at the heart of a European debate. Beyond the energetic or commercial potential of shale gas, it is in fact the main production technique that is controversial: hydraulic fracturing raises several environmental concerns. Patent databases however list no less than 4000 patents worldwide related to hydraulic fracturing, suggesting a true economic value of this technology. Patent applications have sharply increased over the past three years. In which way do current debates impact the propensity to innovate in Europe? Can patent analysis reveal major evolutions regarding environmental issues raised? More generally, the last question will provide qualitative and quantitative analyses of the competitive environment in which sector-related companies evolve and innovate.

To this end, we combine two complementary sources of information: patents (Questel) and scientific publications (Web of Science). Intellixir is used as an analysis tool for the study of specificities and positioning of those involved. Finally, the association of Gephi and its extension Sigma JS produce a global mapping of those involved and technological proximities in order to allow for an efficient understanding of the results.

Patent Intelligence with Bibliographic, Legal Status and Patent Register Data: How patent statistical analyses can help to improve services

Christian Soltmann (European Patent Office – Austria)

Case studies will be presented which illustrate how patent intelligence can be based on data from the PATSTAT database developed by the European Patent Office to disseminate bibliographic, legal status and patent register information. The case studies aim to clarify how relevant patent data can be processed using various available statistical tools, ranging from user-friendly software such as Tableau to flexible and diverse tools such as R and KNIME, to provide meaningful information.

Patent intelligence can play an important role in improving decision making in companies as well as in organisations. It can back technology management in small and medium-sized enterprises and strategic decisions of economists and other experts on the political level. Potential fields of application of patent intelligence range from the assessment of new technologies and the identification of potential competitors, to clarifying IP strategies in specific markets.

The Digital Workplace – The death of desktop search?

Martin White (Intranet Focus, UK)

For fifty years the modus operandi of search has been a user sitting in front of a terminal, conducting a search and sending documents on to colleagues by email. This will no longer be the dominant use pattern. Physical and virtual team working will result in a shift towards collaborative search models. In addition, search sessions may well be started on a laptop, shared using a tablet or an Electronic Laboratory Notebook (ELN) and then completed on a smartphone. This presentation looks at the implications for search of the evolution, albeit slowly at present, of the digital workplace. It will include survey results from the 2014 global Digital Workplace Trends report of NetJMC and summarise the outcomes of recent research. The objective is to set out the role that search will play in distributed virtual working environments, and assess the technical, managerial and governance implications.