The dragon on the gold: Myths and realities for data mining in biomedicine and biotechnology using digital and molecular libraries
Abstract
To develop bioscience and personalized medicine in the post-genomic era, the biggest problem may be how to extract knowledge from the rich libraries of biomedical data. A particular dragon protects the gold therein: the dragon is the "curse of dimensionality" and its formidable fire weapon, which is burning researchers, is the "combinatorial explosion". This arises because many genomic, proteomic, clinical, and lifestyle factors may interact that cannot necessarily be considered on a simple pairwise or additive basis. A suggested theoretical solution-or at least "road map" that ameliorates management of these problems-borrows from several disciplines. It is undertaken also in the hope might also lead to research with broader impact on several unresolved issues in biotechnology: conversely, mathematical understanding of processes involving molecular libraries, such as cDNA libraries and DNA in the living cell itself, may open the opportunities to use biotechnology to construct nanotechnological storage and query systems.