Home  »  AI-SDV  »  Programme  »  Tuesday, 11. October 2022

Tuesday, 11. October 2022

Conference starts at 09:30

Machine learning based patent categorization: A success story in monitoring a complex technology with high patenting activity

10:00 - 10:30

Exhibition and Networking Break / Breakout Session

Rolling out web crawling at Boehringer Ingelheim - 10 years of experience

10 years in the making. How real-world business cases have driven the development of CCC's deep search solutions, leading to the capabilities for web-crawling and delivery of targeted intelligence that helps R&D intensive companies gain a competitive advantage.

Embedding-based Search Vs. Relevancy Search: comparing the new with the old

In 2013 we witnessed an evolutionary change in the NLP field evolved thanks to the introduction of space embeddings that, with the use of deep learning architectures, achieved human-level performances in many NLP tasks. With the introduction of the Attention mechanism in 2017 the results were further improved and, as result, embeddings are quickly becoming the de facto standards in solving many NLP problems. In this presentation, you will learn how generate and use space embedding for search purposes and provide comparison metrics to more traditional relevance-based search engines. Moreover, I will provide some initial results on a paper currently under review that provides an insight on hyperparameter tuning during the generation of embeddings.

Identifying Relevant Patent Data with the help of Artificial Intelligence

Conventional Patent Search Engines mainly plays around patent classification, technical keywords, and other bibliographic data. The patent classification and keywords are one where the application of intelligence is mostly sought. For a patent search strategy, the patent search database mainly looks for the technical keywords and identifies the patent classification in the patent documents and generate the patent result set. Based upon the frequency of keywords and number of patent classifications the search engine scouts the patent result set and ranks them accordingly. The key attribute of any smart patent search revolves around a context in which the search strategy is run. In many cases it is identified that context corresponding to the relevant patent technology is missing in the patent result set. To bring in the context and patent search engine to behave more humanly we propose the patent search engines to ingrain Artificial Intelligence. The patent search engine performs like a Technical expert with identifying relevant patent dataset such that when the search query is performed, it automatically populates questions related to the technology, wherein depending on the answers the relevant patents are automatically captured and populates as a result set.

12:30 - 14:00


Domain Knowledge makes Artificial Intelligence Smart

In the patent domain, all types of issues, from very specific search requirements to the linguistic characteristics of the text domain, are accentuated. Consequently, to develop patent text mining tools for scientists and patent experts, we need to understand their daily work tasks, as well as the linguistic character of the text genre (i.e., patentese). Patent text is a mixture of legal and domain-specific terms. In processing technical English texts, a multi-word unit method is often deployed as a word-formation strategy to expand the working vocabulary, i.e., introducing a new concept without the invention of an entirely new word. This productive word formation is a well-known challenge for traditional natural language processing tools utilizing supervised machine learning algorithms due to limited domain-specific training data. Deep learning technologies have been introduced to overcome the reduction in performance of traditional NLP tools. In the Artificial Researcher technologies, we have integrated explicit and implicit linguistic knowledge into the deep learning algorithms, essential for domain-specific text mining tools. In this talk, we will present a step-by-step process of how we have developed the mentioned text mining tools. For the final outline, we will also demonstrate how these tools can be integrated in a cross-genre passage retrieval system, based on a technology from 2016 that still holds the state-of-the-art within the patent text mining research community in 2022.

Accommodating the Deep Learning Revolution by a Development Process Methodology

Word embeddings, deep learning, transformer models and other pre-trained neural language models (sometimes recently referred to as "foundational models") have fundamentally changed the way state-of-the-art systems for natural language processing and information access are built today. The "Data-to-Value" process methodology (Leidner 2013; Leidner 2022a,b) has been devised to embody best practices for the construction of natural language engineering solutions; it can assist practitioners and has also been used to transfer industrial insights into the university classroom. This talk recaps how the methodology supports engineers in building systems more consistently and then outlines the changes in the methodology to adapt it to the deep learning age. The cost and energy implications will also be discussed.

15:00 - 15:30

Exhibition and Networking Break / Breakout Session

The race to net zero: Tracking the green industrial revolution through IP

In 2019 the UK was the first major economy to embrace a legal obligation to achieve net zero carbon emissions by 2050. More broadly, the 2021 UK Innovation Strategy sets out the UK government’s vision to make the UK a global hub for innovation by 2035 with a target of increasing public and private sector R&D expenditure to 2.4% of GDP to support the UK being a science superpower with a world-class research and innovation system.

IP rights create an incentive for R&D which ultimately leads to innovation. Analysis and insights from IP data can therefore help provide a better understanding of how the IP system is being used and where and what innovation is taking place. Research and analysis of IP data is a key input to the ongoing work of the UKIPO’s Green Tech Working Group which seeks to:

  • further the UK’s status as a global leader by making the UK’s IP environment the best for innovating green technology;
  • develop and deliver IP policies to support government’s ambition on climate change and green technologies; and
  • to help innovators best protect and commercialise their green tech innovations both at home and internationally.

The UKIPO has been developing a broad portfolio of ‘green’ IP analytics research. A series of patent analytics reports have been published looking at green technologies, and analysis of how the UK’s Green Channel scheme for accelerated processing of green patent applications has been conducted. Patents have been used to identify technological comparative advantage within different green technologies at a country level, and new insights uncovered by mapping green technology patents to the UN Sustainable Development Goals (SDGs). Trade mark data provides a timeliness and closeness to market factor that patent data does not, and complementary trade mark analysis of UK ‘green’ trade marks, identified using a machine learning algorithm, provides a commercialisation angle to our research.

Creation and updating of large Knowledge Graphs through NLP Analysis of Scientific Literature

Knowledge Graphs are an increasingly relevant approach to store detailed knowledge in many domains. Recent advances in NLP allow to enrich Knowledge Graphs through automated analysis of large volumes of literature, reducing a lot the efforts in traditional manual information capturing. In our presentation we report the approach taken in a project with partner Fraunhofer SCAI in the life sciences where a knowledge graph organising detailed facts about psychiatric diseases has been computed.

Information of cause-effect relations between proteins, genes, drugs and diseases has been encoded in the BEL (Biological Expression Language) and imported into a Graph database to approach an indication-wide Knowledge Graph for the selected therapeutic area. Ultimately, updating the graph will amount to just rerunning the analysis on the newly published literature.

16:30 - 16:45

Closing Remarks - Christoph Haxel