متن6

5 Successful Text Mining solutions

Text Mining solutions process text to enable better access, to extract well-defined

results, to reduce the content to the relevant parts and, in the end, to reduce the

amount of reading as the main benefit to its users. It is yet unresolved, which existing

or future solution will be the best in the end. The following are some of the parameters

relevant in the design of Text Mining solutions that either support improvements

or, if not considered, will hinder usability: Types of data searched in the literature,

types of documents available, different ways to post-process the data, interface design,

linking with other resources etc. On the other hand, every successful Text Mining

solution incorporates design principles, which help to understand how terminological

resources and user profiles and expectations fit together.

Therefore, the third day covered talks presenting ingredients and pitfalls of successful

Text Mining systems. Opportunities for getting Text Mining involved in every day

curation work were explained in detail by Judith Blake (Jackson Lab), using the experience

from the Mouse Genome Database as an example, including relevance

classification, topic-based routing, gene name tagging and information extraction.

Anna Divoli (University of Chicago, U.S.A.) presented results from two user surveys

which were conducted in conjunction with the BioText project to explore on the priorities

in the design of user interfaces for biological users. There was a general agreement

that it is important to keep end users involved in the development phase. HM

Müller (Caltech, California, U.S.A.) presented the design principles of TextPresso,

which is being used by at least 20 curation teams around the world. J?rg Hakenberg

and Martin Krallinger (CNIO, Madrid, Spain) reported on the development of a meta

service for Text Mining tools that emerged from the second BioCreative competition,

which was acknowledged as having the potential of a high impact in the field by giving

access to advanced Text Mining solutions. Services were also the focus of the

presentation of Dietrich Rebholz-Schuhmann, highlighting a suite of Text Mining tools

hosted at the European Bioinformatics Institute. Commercial tools were presented by

Dagstuhl seminar proposal „ Ontologies and Text Mining for Life Science“ 5/5

Michael Schr?der (GoPubMed, University of Dresden, D) and David Milward (Linguamatics,

Cambridge, U.K.). An example for a very innovative application of Text

Mining was shown by Nigel Collier (University of Tokyo, Jp): The BioCastor system

gathers and analyses news for their relevance to indicate disease outbreaks, thus

building an early warning or “rumor surveillance” system.

6 Ongoing work in the development of phenotype resources

A topic that emerged in the course of the seminar was the increasing demand and

importance to manage, represent and integrate conceptual representation of phenotypes.

As an immediate action, present experts in this topic reported on ongoing work

and progress in this domain. Judith Blake (Jackson Laboratory, Maine, U.S.A.) presented

ongoing work in the design and development of the Mammalian Phenotype

Ontology at the Mouse Informatics Centre. This ontology was, among many other

textual resources, used by Ulf Leser and colleagues to infer predictions of protein

functions through the association of concept profiles composed of phenotypic features.

Suzanna Lewis (Berkeley Drosophila Genome Project, U.S.A.) reported on the

development of phenote.org, a novel resource for describing phenotype data in a

very generic data format. The format reduces all representations to tuples that are

formed by an ontological concept and a qualifier from a special qualifier ontology, an

approach which nicely leverages existing ontologies for a new purpose. Finally,

Robert H?hndorf (MPI, Leipzig, D) showed the involved logical consequences of representing

“phenotypes” as derivations from a wildtype which calls for the use of nonmonotonic

or default logics.