Home  »  ICIC  »  ICIC 2005 - 2009  »  ICIC 2005  »  Programme  »  Tuesday 18 Oct

Tuesday 18 October 2005

Where is Google going? What information professionals should be watching, and why

Google is a new type of information company, and its strengths come from highly motivated brilliant people and the little-understood technical engine that makes startling new products and services seem trivial. Google has emerged as a computing platform with remarkable scope, flexibility, and raw power.

Google insists that it is a search company "solving interesting problems in information science." At the same time, the company has deployed a beta service that challenges commercial database companies on their home ground -- indexing scientific and technical information and placing the content in a separate collection. Google has continued to enhance its enterprise search platform, tapping the strong demand for the Google interface simplicity with the power of its search-and-retrieval technology. With little fanfare, Google has entered into relationships with more than 300 educational institutions. The long-term implications of this type of tie up are unclear.

Almost simultaneously, Google has launched major efforts to integrate search across books and monographs, creating a next-generation version of the old University Microfilms. Google's speed in developing large-scale applications and initiatives into edited and peer-reviewed content is remarkable. The real power behind Google is its computing infrastructure and its employees'cleverness in using this infrastructure as a development platform. Google is a new type of company combining applications development and advertising sales. Google's impact on organisations -- particularly those engaged in fundamental research -- will grow over time. Google is a company for the digital age. Its rivals in commercial database and operating systems are rooted in an analogue environment.

Innovating at and with Google: Potential for the STM Community

With the launch of Google Scholar, Google signaled its commitment to providing search services to meet the needs of professional researchers. Google is developing yet other services in support of publishers, information professionals and users in the STM community. This presentation provides an overview of Google innovations relevant to the STM community, while explaining Google's perception of "information problems" and the company's non-traditional approach to product development.

The Changing Face of Scientific Reference

Information products have always developed in response to user needs, and the web has expanded options for consumers such that their needs are now in the driving seat. The success of search engines indicates that end-users value convenience over quality, but this is not a new phenomenon. Users have always pursued the most convenient answers to their questions and Google is, in fact, simplifying and extending that pursuit. There is no doubt that chemists and other scientists formulate better early stage questions and information strategies through search.

Our success will be through better-informed users. The challenge is to understand how users interact with information and to develop products which make quality more convenient. This presentation describes a way of transforming printed reference works into effective solutions for chemists and engineers by developing more intuitive and natural discovery and navigation processes and analytical tools to help users put data to productive use. Fitting product innovations to user needs rather than forcing users to fit their behavior to our products will impact the nature and delivery of scientific reference information.

The New World of Publishing

Publishers are faced with ever increasing challenges based on new media where information is sent to their customers in various ways. P2P, Blogs, Search Engines, Aggregators, Syndicators, Infomediaries: all present challenges to the traditional publishing model of the magazine or book. Reed Business Information publishers over 140 magazines, with differing business models including subscription and controlled circulation and this presentation discusses how the company is entering the web world, embracing Google, and commercialising its content using new media.

Cyber-Infrastructure: Enabling Science and Engineering Research

After setting the stage with an overview of National Science Foundation investments in Cyber-Infrastructure (Information Technology), the current status of Cyber-Infrastructure (CI) and its impacts are explored, with examples from chemical information and related Science and Engineering research. Then, three key challenges facing today's Science and Engineering community are laid out. The first challenge is devising a strategy to integrate emerging Cyber Infrastructure (the fruits of new technological advances from computer science and engineering research) with existing components to produce a reliable yet dynamically evolving infrastructure environment. The second challenge is how to balance the development of CI tailored to each individual discipline for specific applications with more general, inter-operable CI, shared across multiple applications, communities, and disciplines, which reduces redundancy while still offers the flexibility needed for rapid advancement of each frontier. The third challenge is how to best address the range of non-technological challenges accompanying all opportunities that are revolutionary in scope.

From Knowledge Discovery to Understanding: Making Sense of Large Data Sets

Even the most sophisticated search strategies often yield large and unwieldy answer sets. End users may find it impractical to determine relevance by reviewing abstracts or full documents, one by one, and the problem of sheer magnitude will only increase as the literature and patent growth curve continue to climb more steeply. This presentation reviews a new innovative tool for analysing and visualising STN search results. This tool helps users take unstructured text data and find patterns, segments, trends and relations to enable them to make the most of these data. The new tool powers analytical decision making, allowing users to gain insight and actionable intelligence from large amounts of data.

Analyzing Intellectual Property in Engineering Markets

Efficient analysis of intellectual property (IP) data is a key element to achieving positive business results. This is especially true in IP active markets. Successful IP mapping can contribute directly to business profitability when performed in the context of a defined business strategy. Many mapping tools are successful at defining general trends and relationships from large amounts of patent information. However, the ability to draw conclusions from search results that are relevant to a business strategy is lacking. As a result, analysing the most relevant information in a competitive landscape still requires multiple steps and significant resource investment. A process used for mapping intellectual property will be discussed in the context of a chemical industry example. Tools used to map the patent landscape will be described, including the search-retrieval-analytical process used. Outcomes using real examples will be reviewed as well as the limitations of current tools, and potential areas for improvement.

A Case Study for Information Integration and Analysis with Wistract – a New Platform for Scientific Information

Current chemical and biological R&D related information management has to cope with a number of challenges: High-quality integration capabilities for data from a large variety of important scientific data sources, comfortable data-management in combination with flexible data-mining functions and server synchronisation for data security and team-oriented multi-user operation.

A case study on the basis of a new commercial platform for scientific information is presented which addresses these challenges. The software system Wistract provides a (growing) number of import filters capable of high quality extraction and standardisation of data for relevant scientific information sources scuh as Chemical Abstracts, Medline or the World Patent Index. Scientific information is organised in datapools which contain documents. These documents carry distinct information in descriptors and corresponding contents. Descriptors may also have non-textual contents such as graphical images, chemical structures/reactions, biological sequences or other binary data types. Wistract provides links to associated data sources, eg, patent originals or web order services as well as comfortable information management capabilities with various view, import/export, merge or printing options. Especially attractive is a specific user-defined descriptor/content management that supports the use of a controlled vocabulary. The core of Wistract features data-mining functions on descriptors and contents such as frequency distributions, left/right truncated textual queries with optional regular expressions and boolean logic or similarity ranking and neural-network classification on the basis of patent codes. Current data-mining developments target advanced frequency distributions, biological sequence alignment, chemical (sub)structure and similarity search, rule-based descriptor/content generation for "intelligent" text-mining or profile scoring for information dissemination. All data-mining operations and information management capabilities are available within a successive multi-level analysis framework for a thorough drill-down into the data.

The New Generation of Semantic Web-Based Solutions for the Organization of Corporate Knowledge

Today, life science companies and life science content publishers need effective and reliable access to their knowledge and associated content as part of a drive to improve their decision-making processes, to re-use content in different contexts, to enhance collaborative work, and to identify and highlight strategic information. Text Mining solutions associated to the concepts withinin ontologies allow strategic, relevant and up-to-date corporate documentation systems to be built. They also allow content to be managed and transformed into actionable knowledge.