Publication
CIKM 2002
Conference paper

A system for knowledge management in bioinformatics

View publication

Abstract

The emerging biochip technology has made it possible to simultaneously study expression (activity level) of thousands of genes or proteins in a single experiment in the laboratory. However, in order to extract relevant biological knowledge from the biochip experimental data, it is critical not only to analyze the experimental data, but also to cross-reference and correlate these large volumes of data with information available in external biological databases accessible online. We address this problem in a comprehensive system for knowledge management in bioinformatics called e2e. To the biologist or biological applications, e2e exposes a common semantic view of inter-relationship among biological concepts in the form of an XML representation called eXpressML, while internally, it can use any data integration solution to retrieve data and return results corresponding to the semantic view. We have implemented an e2e prototype that enables a biologist to analyze her gene expression data in GEML or from a public site like Stanford, and discover knowledge through operations like querying on relevant annotated data represented in eXpressML using pathways data from KEGG, publication data from Medline and protein data from SWISS-PROT.