Home  »  AI-SDV  »  Programme  »  Monday, 10. October 2022

Monday, 10. October 2022

Conference starts at 09:00

Opening by Christoph Haxel (Dr. Haxel CEM, Germany)

Big data analytics platform at Bayer – Turning bits into insights

What if there was a platform where literature, conference abstracts, patents, clinical trials, news, grants and other sources were fully integrated? What if the data would be harmonized, enriched with standardized concepts and ready for analysis? After building our patent analytics platform we didn’t stop dreaming and built our big data analytics platform by semantically integrating text-rich, scientific sources. In my presentation I will talk about what we built and why we built it. And, of course, I will also address the challenges and hurdles along the way. Was it worth it and what comes next? Let’s talk about it!

Possibilities and limitations of AI-boosted multi-categorization for patents, scientific literature, and web

The everyday use of AI-driven algorithms for data search, analysis and synthesis comes with important time savings, but also reveals the need to understand and accept the limitations of the technology. Practical deployments on concrete topics are inevitable to assess and manage the challenges of neuronal network based AI. A workshop report.

New Product Introductions: CENTREDOC / Lighthouse IP / Copyright Clearance Center

10:45 - 11:15

Exhibition and Networking Break / Breakout Session

Where’s the one about…? Looney Tunes® Revisited

How do you find video when you only have sparse data?  While you can wander the stacks (if you can still find open stacks) for inspiration, video either physical or digital, is difficult to discover.  Wandering the virtual stacks is, well, virtually impossible.  Discovery platforms on the whole have not replicated the inspirational experience of wandering the stacks.  

More companies are using archivable video for internal communication of the various research projects, product developments, test results, and more that are being considered, in progress, or completed.  Showing how an experiment was conducted can convey considerably more information that is very difficult to communicate via text.  How do you find a company video that might be helpful for your project? 

A case study is presented of the problems and the solutions that were implemented by a large, multinational chemical company.  A suite of content discovery technologies was used including a video to text to tagging system connected to their documents database and automatically indexed using several chemical as well as conceptual systems (rule-based, NLP, inference engine).  To build the system and support the manuscript and video submission there is a metadata extraction program which pulls and inserts the metadata into the submission forms so the author can move quickly through that process.



AI developments and usability

New Product Introductions: Search Technology/VantagePoint

12:30 - 14:00

Lunch, Exhibition and Networking

Scientific publishing in the age of data mining and artificial intelligence

Most scientific journals request, that the complete set of research data is published simultaneously with the peer-reviewed paper. The publication of the research data usually is carried out as so-called "Supplementary Material", attached to the original paper, or on a "Research Data Repository". Both forms have in common, that the data is published usually unstructured and not in an uniform machine processable format. This makes its further use in electronic tools for AI or data mining unnecessarily difficult or even impossible. A concept is presented, in which the data is digitally recorded, following the principle of FAIR data, as part of the publication process. This digital capture makes the data available to the scientific community for easy use in data mining and AI tools. The data in the repository contains links to the publication to document its origin. The concept is applicable for preprints, peer-review papers, diploma and doctoral theses and is particularly suitable for open access publications. Moreover, the presentation highlights correspondent activities, which were released in scientific publications recently.

Extracting information from tables in documents

In our customer projects involving automated document processing, we often encounter document types providing crucial data in the form of tables. While established text analytics algorithms are usually optimized to operate on running text, they tend to produce rather poor results on tables as they do not capture the non-sequential relations inside them (e.g. interpret the content of a table cell relative to its column title, interpret line breaks inside a cell differently from line breaks between cells or rows). While there are elaborate information extraction products in the market for a few highly specific types of tabular documents, there is no general approach out there. The main cause for this is the fact that table structures can be encoded by a heterogenous range of layout means (e.g. column boundaries can be signaled by lines vs. aligned text vs. white space). In this talk, we will illustrate several solutions that we have developed for a range of challenges occurring in this context, both for scanned and digitally generated documents.


New Product Introductions: Patent Intelligence and Engineering Management

15:30 - 16:00

Exhibition and Networking Break / Breakout Session

New Insights from Trademarks with Natural Language Processing

Trademarks serve as key leading indicators for innovation and economic growth. As the vanguards of new and expanding enterprises, trademarks can be used to study entrepreneurship and shifting market demands in response to varying economic factors. This responsiveness has been seen as recently as the COVID-19 pandemic, where trademark research revealed key insights about business reaction to the global upheaval.

At CIPO, we have been delving more deeply than ever before into trademark analysis by leveraging cutting-edge natural language processing (NLP) tools to derive actionable business intelligence from trademark data. In this presentation, we present a survey of NLP in use at CIPO and the insights we have learned applying them. These insights include COVID-19 responses, line-of-business trends based on firm characteristics, and more.

We also discuss ongoing and future trademark research projects at CIPO. These projects include emerging technology detection methods and high-resolution trademark classification systems. We conclude that artificial intelligence-enhanced tools like NLP are key components of future exploitation of trademark data for business and economic intelligence.

Finding the WHAT – Will AI help?

It is relatively easy for a human to read a document and quickly figure out which concepts are important.  However, this task is a difficult challenge for a machine.  During the past few decades, there have been two main approaches for concept identification: Natural Language Processing and Machine Learning.  During the early part of this century, Machine Learning made great strides as new techniques came into wider use (SVM’s, Topic Modeling, etc..).  Sensing the competition, Natural Language Processing responded with deployment of new emerging techniques (sematic networks, finite state automata, etc..).  Neither approach has completely solved the WHAT problem.  Advances in Artificial Intelligence have the potential to significantly improve the situation.  Where AI is making the most impact is as an enhancement to make Machine Learning and Natural Language Processing work better and, more importantly, work together. This presentation looks at some of this history and what might happen in the future when we blend the interpretation of language with pattern prediction.


Conference ends at 17:30


Bus leaves from Hotel Savoyen

19:00 - 23:00

Conference Dinner / Heurigen