Monday 14 Oct

Starts at 09:00

Welcome

09:15 - 12:30

Chair: Wendy Warr, Wendy Warr & Associates, UK

Text and Non-textual Objects: Seamless access for scientists

Uwe Rosemann (German National Library of Science and Technology (TIB), Germany)

The European High Level Expert Group on Scientific data has formulated the challenges for a scientific infrastructure to be reached by 2030: “Our vision is a scientific e-infrastructure that supports seamless access, use, re-use, and trust of data. In a sense, the physical and technical infrastructure becomes invisible and the data themselves become the infrastructure – a valuable asset, on which science, technology, the economy and society can advance”.

Here, “data” is not restricted to primary data but also includes all non-textual material (graphs, spectra, videos, 3D-objects etc.).

The German National Library of Science and Technology (TIB) has developed a concept for a national competence center for non-textual materials which is now founded by the German State and by the German Federal Countries. The center has to perform the task: developing solutions and services together with the scientific community to make such data available, citable, sharable and usable, including visual search tools and enhanced content-based retrieval.

With solutions such as DataCite and modular development for extraction, indexing and visual searching of new scientific metadata, TIB will accept the challenge. And will make all data accessible to its users fast, convenient and easy to use.

The paper shows what special tools are developed by TIB in the context of scientific AV-media, 3D-objects and research data.

Link to presentation @ Slideshare

The Big Data Challenges Associated with Building a National Data Repository for Chemistry

Presentation as pdf

Antony Williams (Royal Society of Chemistry , USA)

At a time when the data explosion has simply been redefined as “Big”, the hurdles associated with building a subject-specific data repository for chemistry are daunting. Combining a multitude of non-standard data formats for chemicals, related properties, reactions, spectra etc., together with the confusion of licensing and embargoing, and providing for data exchange and integration with services and platforms external to the repository, the challenge is significant. This all at a time when semantic technologies are touted as the fundamental technology to enhance integration and discoverability. Funding agencies are demanding change, especially a change towards access to open data to parallel their expectations around Open Access publishing. The Royal Society of Chemistry has been funded by the Engineering and Physical Science Research of the UK to deliver a “chemical database service” for UK scientists. This presentation will provide an overview of the challenges associated with this project and our progress in delivering a chemistry repository capable of handling the complex data types associated with chemistry. The benefits of such a repository in terms of providing data to develop prediction models to further enable scientific discovery will be discussed and the potential impact on the future of scientific publishing will also be examined.

Link to presentation @ Slideshare

New Product Introductions - InfoChem / BizInt / Minesoft

Link to new product introduction @ Slideshare

InfoChem

BizInt

Minesoft

10:30 - 11:00

Exhibition and Networking Break

The Pistoia Alliance and HELM – an open information standard for the molecular representation of Biologics

Alex Drijver (Pistoia Alliance, USA / ChemAxon, Hungaria)

Sergio Rotstein (Pfizer, USA)

Tianhong Zhang (Pfizer, USA)

Presentation as pdf

Roland Knispel (ChemAxon, Hungaria)

The share of biopharmaceutical substances - or Biologics - in pharma R&D portfolios has grown substantially in recent years. Current informatics tools, which have been primarily designed to work with small molecules and unmodified, unconjugated biological sequences, lack capabilities for supporting the molecular complexity of novel biotherapeutic agents, such as antibody drug conjugates (ADCs), which commonly incorporate unnatural, conjugated or otherwise modified building blocks. The Hierarchical Editing Language for Macromolecules (HELM) was developed by Pfizer researchers to address this gap (see J. Chem. Inf. Model 2012, 52, 2796-2806) and is now proposed as a versatile approach for general Biologics handling.

The Pistoia Alliance, a diverse organization aiming to foster pre-competitive, collaborative innovations for improved R&D processes, has initiated the HELM project to promote HELM as an industry-wide information standard for describing Biologics. Wide-spread adoption of HELM is anticipated to have a substantial impact on cost reduction and improve Biologics R&D efficiency in increasingly collaborative and cross-functional environments. An initial standard definition document was created and an open source reference implementation, comprising of a notation toolkit and graphical bio-molecule editor, was released. Being an open standard, extensions to HELM will be driven by the evolving needs of its user-base. Discussion and suggestions for extending HELM are encouraged to ensure relevance and long term stability of this standard.

This talk is intended to draw attention to this project and will briefly touch upon achievements since its initiation in Q4 2012. We will describe current capabilities and limitations of the HELM standard, elaborate on the expected impacts of HELM in improving and consolidating biological information management, information exchange and registration solutions - both from the perspective of the pharmaceutical industry and from vendors of information management solutions.

Link to the presentation @ Slideshare

European Lead Factory – A unique public-private partnership

Colm Carroll (Innovative Medicines Initiative)

Presentation as pdf

Hugh Laverty (IMI, Belgium)

The European Lead Factory is a pan-European platform for drug discovery supported by the Innovative Medicines Initiative (IMI) that is set to give a major boost to drug discovery in Europe. Comprising a collection of half a million compounds (derived from new public and existing private company collections) and a screening centre, the European Lead Factory will offer researchers in academia, small and medium-sized enterprises (SMEs) and patient organisations an unprecedented opportunity to advance medical research and develop new medicines. The screening centre will provide High Throughput Screening (HTS) services for selected public projects from academia and SMEs. It will also handle all logistics for the Joint European Compound Collection, acting as a neutral, ‘honest’ broker in the transfer, handling and analysis of confidential data.

The European Lead Factory will provide the compounds and support for 48 HTS screens per year. Of these, 24 will come from the industry partners, who will run their own screens. The other 24 HTS projects will be selected from the public sector following competitive Calls for proposals. Once the HTS has been run, the ‘target owner’ (i.e. the organisation that submitted the target for inclusion in the project) will receive a list of a maximum of 50 compounds that have been identified. The project hopes to attract public screening proposals in a variety of therapeutic areas.

Covering all these activities a legal framework has been put in place to protect the confidentiality of key data, ownership and access rights, as well as allowing the generation of value and filing of intellectual property. Importantly, the legal framework developed tries to accommodate the interests of all the stakeholders, both public and private.

The European Lead Factory combines the power of the pharmaceutical industry’s previously inaccessible compound libraries with the innovation of the academic communities in designing novel compounds and the expertise of many SMEs in HTS and library generation. Importantly, it will provide a screening platform of industrial quality focused on value generation.

Link to presentation @ Slideshare

New Product Introduction - max.recall / LexisNexis / Linguamatics

New product introductions @ Slideshare

max.recall

LexisNexis

Linguamatics

12:30 - 14:00

Lunch, Exhibition and Networking - Lunch sponsored by Linguamatics

14:00 - 17:40

Chair: Monika Hanelt, Agfa Graphics, Belgium

Open Source Search

Andreas Pesenhofer (max.recall, Austria)

Presentation as pdf

Helmut Berger (max.recall, Austria)

Open source search technologies receive more and more attention for a growing range of applications. While initially being developed for the sole purpose of search, they are increasingly being used to power analytics applications with the purpose of performing complex operations on large amounts of complex data.

With the ever-increasing amount of data being available on many levels (e.g. personal, team- or company-wide, or global), search often is the only way to get access to the information actually needed. Given the value of this information, the more important it is to have full control over how it is indexed, a fundamental property open source search technologies are able to provide in contrast to many proprietary solutions.

This presentation provides an overview of what can be done with Lucene and Lucene-based search engines like Solr and - recently receiving more attention in the light of cloud-based scale-out solutions - ElasticSearch. These open source projects have reached a state of maturity and commercial support that enabled them to compete with and already replace proprietary solutions of established vendors.

Link to presentation @ Slideshare

Untangling the scientific information web

Presentation as pdf

Jane List (Extract Information, UK)

Professional users of information expect their sources to be reliable, secure, complete and current Microsoft Academic search, Google Scholar, and Elsevier’s scirus all provide access to academic scientific, technical, legal and medical information. All three provide search of academic journal articles, theses, ebooks, abstracts, and conference papers from academic publishers, universities, and professional societies. This paper will review the three search engines by comparing inclusion criteria, ranking methodology, citation analysis, link analysis, visualisation, social media and collaborative tools. Could a professional search service depend on Elsevier, Google or Microsoft to straighten out the tangled web of published academic information? More recently reference management tools have entered the scientific search arena, for instance Mendeley, and Qiqqa. These tools are rapidly growing content of cited articles which users can access through social networking linkages and user added search tags. This paper will conclude by considering the searching opportunities offered by reference management products; could they offer users a real alternative to the ranked results search engine?

Link to presentation @ Slideshare

New Product Introductions - Dolcera / RightsDirect / ChemAxon / GenomeQuest

New product introductions @ Slideshare

Dolcera

RightsDirect

ChemAxon

GenomeQuest

15:40 - 16:10

Exhibition and Networking Break

Extraction of structural information from ChemDraw CDX files: easy, or an underestimated, difficult challenge?

Presentation as pdf

Josef Eiblmaier (InfoChem, Germany)

In the past decade various systems for the automatic identification and extraction of chemistry-related information from unstructured sources have emerged. They have opened up new possibilities for organizing, querying, and analyzing chemical content to support the research and development process. Patent authorities and scientific publishers make available, on a large scale, not only full text and images, but also ChemDraw CDX files for many sources. The chemical information contained in these CDX files is primarily intended for layout purposes for publications but it is often erroneously considered to be readily available as input for structure and reaction database building processes. Unfortunately, automatic work-up of chemical structures and reactions from these CDX files entails serious obstacles and problems and consequently the information produced is often incorrect or incomplete and thus not properly available to information professionals via structure and reaction searching. This talk will present different approaches to extracting reactions and structures correctly from CDX files and will describe the main difficulties and drawbacks encountered.

Link to presentation @ Slideshare

Making hidden data discoverable: How to build effective drug discovery engines?

Presentation as pdf

Sebastian Radestock (Elsevier, Germany)

In a complex IT environment comprising dozens if not hundreds of databases and likely as many user interfaces it becomes difficult if not impossible to find all the relevant information needed to make informed decisions. Historical data get lost, not normalized data cannot be compared and maintenance becomes a nightmare. We will discuss a new approach to address this issue by showing various examples and use cases on how in-house data and public data can be integrated in various ways to address the unique and individual needs of companies to keep the competitive edge.

Link to presentations @ Slideshare

Personal Data in the Digital Age - Big Brother on Steroids?

Wolfie Christl (Cuteacute, Austria)

In the digital age virtually everything we do is recorded, monitored or tracked in some way. Consequently, the processing and exploitation of personal data has become part of all areas of life. The quantity and the value of the many different types of personal data being collected today is vast: our profiles and demographic data from bank accounts to medical records to employment data. Our Web searches, the sites we visited, our likes and dislikes and purchase histories. Our tweets, texts, emails, phone calls and photos as well as the coordinates of our real-world locations. State-of-the-art data mining technologies help to analyze and combine these massive amounts of personal data, emerging businesses in social media, mobile apps and online marketing increasingly make commercial use of the this data. According to former European Consumer Commissioner Meglena Kuneva “personal data is the new oil of the Internet and the new currency of the digital world.” At the same time most people have insufficient knowledge about what may happen to their personal data when using contemporary ICT, many people are stuck between uncritical usage and apathy towards the increasingly non-transparent aggregation of personal data. As a result the project „Data Dealer“ was started. „Data Dealer“ is an online game about collecting, collating and selling personal data, which provides a unique, casual and humorous way to engage in issues of personal data and privacy.

No slides available ... but there was a similar presentation in Vienna (tedxVienna)

II-SDV Website

II-PIC Website

Starts at 09:00

Welcome

Chair: Wendy Warr, Wendy Warr & Associates, UK

Text and Non-textual Objects: Seamless access for scientists

The Big Data Challenges Associated with Building a National Data Repository for Chemistry

New Product Introductions - InfoChem / BizInt / Minesoft

Exhibition and Networking Break

The Pistoia Alliance and HELM – an open information standard for the molecular representation of Biologics

European Lead Factory – A unique public-private partnership

New Product Introduction - max.recall / LexisNexis / Linguamatics

Lunch, Exhibition and Networking - Lunch sponsored by Linguamatics

Chair: Monika Hanelt, Agfa Graphics, Belgium

Open Source Search

Untangling the scientific information web

New Product Introductions - Dolcera / RightsDirect / ChemAxon / GenomeQuest

Exhibition and Networking Break

Extraction of structural information from ChemDraw CDX files: easy, or an underestimated, difficult challenge?

Making hidden data discoverable: How to build effective drug discovery engines?

Personal Data in the Digital Age - Big Brother on Steroids?