Home  »  ICIC  »  ICIC 2005 - 2009  »  ICIC 2008  »  Programme  »  Wednesday 21 Oct 2008

Wednesday 21 Oct 2008

08:30 - 09:00

XTractor -Data Mining Simplified

Mining PUBMED to retrieve accurate hits or extract relations has always been a long-standing problem in biology. It becomes a highly impossible task to identify the right gene name or the right disease term and extract relations at all times. This problem has been addressed numerous times by many NLP engines and also many solutions have been suggested.
To circumvent this problem we have come out with a text mining service model called XTractor. XTractor is highly accurate and more efficient than many of NLP engines, since we use hybrid technology of semi-automated data mining, which means the process involves NLP mining followed by a layer of manual validation. So we end up getting the most accurate hits for genes, diseases, drugs and many more entities. Since the annotation is accurate we would also be able to perform complex queries and retrieve the most complex relations in PUBMED, which is currently not possible with the conventional NLP systems. We have been able to achieve up to 99% accuracy in term pickups and relationship extraction with the XTractor system. A few advantages of the XTractor system are as follows:

  • XTRactor acts as an alert service and keeps you up-to-date with the latest publications, as and when it gets published at PUBMED for your choice of Keywords.
  • Sentences are manually validated and classified into categories such as Biomarker-Disease, Drug- Gene, Gene- Process and many other relevant categories. So searching PUBMED and extracting relationships becomes simpler and more effective.

With XTractor, the entities/terms in the sentences are manually categorised to public biological ontologies and it also provides users with the ability to create their own databases of sentences and relations for their sets of Keywords. XTractor also provides the user with ability to change Keywords preferences from time to time.

09:00 - 09:30

Emerging issues with corporate intranets: options, opportunities and current issues

Ensuring that your intranet delivers business value requires the right mix of content, form, technology and strategy. This presentation takes a critical view at emerging trends within the intranet sphere, at trends which are -- or should be -- dying and at evergreens.
Some of the issues discussed are: What does it take to run a good wiki? How should you deal with intranet news and personalisation? Is MOSS 2007 the answer to your dreams? How should you organise to increase your chances of making your intranet a success?
This presentation also takes a look at some of specific issues related to running intranets in the pharmaceutical industry, innovative approaches and mission critical knowledge management tools as seen in the context of the wider online media trends.

09:30 - 10:00

An A to X of Patent Citations and Searching

Using citations can be a convenient method for expanding searches in patent and scientific literature and this technique is well known. Cited references and citing references can add hits backwards in time, forwards in time and also laterally. In a patent search citations can provide new approaches for the search, new search terms and new potential applications. This presentation looks at the different types of patent citations, what they mean and how they can add insight for searching. Examples of patent searches such as competitor patent monitoring, prior art and invalidity searching may be used to explore the strengths and weaknesses of using A, X, Y and examiner citations. Patent citations from patent offices around the world are indexed by several vendors and also by patent office and other independent search engines for the benefit of patent information searchers. The approaches to citations taken on databases such as Patbase, DPCI and esp@cenet are reviewed for different searches. To finish, we take a look at graphic visualisations of patent citations. How useful are patent citation trees in gaining deeper understanding of the patent landscape?

10:00 - 10:30

Finding Meaningful Competitive Intelligence through Enterprise Search

According to leading technology analyst firm Forrester Research, the top concern of market and competitive intelligence professionals is "a wellspring of competitive and market insight that goes untapped." Researchers, scientists and marketers are all struggling with the best way to understand their own products' competitive weaknesses and strengths, knowing when competitors will announce their next product or upgrade and whether their product line will soon be imitated by a lower cost offering. Imagine being able to search instantly across multiple data repositories to learn more about your market in order to ward off outside threats and competition. The ability to find and digest information easily such as what patents your competitors are applying for or what new compounds might be on the horizon could be invaluable to an organization. The reality today is that organizations are doing this by leveraging the power of enterprise search.

This presentation describes enterprise search and provides several real-world use cases of organisations – particularly in the pharmaceutical industry – that use search today for competitive intelligence as well as for boosting employee productivity. It also touches upon the social aspects of search and demonstrates how companies are enabling collaboration through this tool.

10:30 - 11:00

The Future of Searching for Scholarly Literature: Discipline-specific Research Databases in Web 2.0 and the Semantic Web

Both generalised web search engines and discipline-specific bibliographic databases will need to evolve to remain competitive — comprehensive and authoritative — in discovery of scholarly literature. Initiatives such as Google Scholar and Microsoft Academic-live Search, acknowledge the importance of specialisation in searching for scholarly literature, and rising expectations of comprehensive access require that discipline-specific databases increase coverage. In parallel, cross-disciplinary pursuits such as neuroaesthetics — neuroscience and art history — increase the need for an integrated search of specialised databases. By following models of open collaboration in Web 2.0 and applying thesauri in the ontology of the Semantic Web, producers of discipline-specific databases can apply existing knowledge bases not only to expand coverage and maximise discovery of scholarly literature but also to foster interdisciplinarity. A strategy for leveraging primary assets of a specialised database — discipline-specific partnerships, expert abstracts and indexing, and discipline-specific thesauri — serves as a case study. The strategy illuminates the potential for integrating a discipline-specific database in the humanities with datasets from the sciences through the evolving infrastructure of the Web.

11:00 - 11:30

Prospecting for Chemistry in Publishing

The RSC's Project Prospect, which was the first application of semantic web technologies to primary research publishing, won the 2007 ALPSP/Charlesworth Award for Publishing Innovation. The application of open and standard identifiers for both compounds and subject matter has opened new possibilities for linking between related publications and data, which promise to transform the way published chemistry is handled in the next few years. The role of a publisher, between author and reader, offers particular advantages and challenges - to preserve more of the original lab science throughout the publication process while delivering the science in ways that aid discovery and re-use. This presentation discusses the problems with the conventional publication process which we tried to address, the development process, and successes and failures in applying new standards. We look at the InChI and identifying chemical entities, using existing ontologies and building new ones, and their real-life application. While new developments applied to RSC's book and journal portfolio will be highlighted, the application of the underlying technologies can be seen to offer real benefits for both standalone and web-wide chemical information applications.

11:30 - 12:00

Full Text Searches in E-books. Crossing the Borders of Publishing Houses

During the last years major publishing houses have started to publish electronic books which they are offering within their search systems. Normally, using the search engine of a publishing house, the user can search the content of a publishing house, full text or within defined metadata and then download hits as a PDF file, in conformity with licence agreements. Unfortunately, this is a time consuming and tedious procedure, because searching the total content of a publishing house will retrieve all possible hits, whether in eBooks or journals, and independent of the actual licence agreement. Therefore it is complex to pinpoint exactly the desired answer in a licensed eBook. Another problem users are facing is that libraries have licences with many publishing houses. In addition, for obvious reasons, publishing houses do not allow cross-publishing-house-searching. This greatly hinders the use of eBooks and therefore the development of this new and important market.

This presentation discusses implementing a licence-dependent full text cross publishing house search engine. It shows users the highest ranked hits relating to their searches and licence agreements, independent of the publishing house. A prototype is presented which already contains more than 10,000 of the latest eBooks from three major scientific publishers. The search engine is updated daily. First experiences by university librarians, consortia and industry will be discussed.

12:00 - 12:30

Usage Analysis with COUNTER: Pros and Cons of the Code of Practice for Electronic Non-Periodicals

Six COUNTER reports are analyzed for e-books in general and technical reference works in particular to ascertain the value derived from each report by subscribers of online services. Some of the reports provide skewed statistics and are not very useful for aggregated STM e-references, however COUNTER compliance is frequently a requirement and certainly desired by the subscribers.

COUNTER statistics favour comprehensiveness over relevancy in search and retrieval. Relevancy is most important for e-references, whereas it is comprehensiveness for periodicals, leaving the former at a disadvantage. Several changes to the COUNTER Code of Practice are proposed to correct this bias. One proposal involves separate reports for e-books, databases and interactive e-books and online tools.