Combinatorial pattern discovery approach for the folding trajectory analysis of a β-hairpm
Abstract
The study of protein foSding mechanisms continues to be one of the most chauenging problems in computational biology. Currency, the protein folding mechanism is often characterized by calculating the free energy iandscape versus various reaction coordinates, such as the fraction of native contacts, the radius of gyration, RiVSSD from the native structure, and so on. In this paper, we present a combinatoriall pattern discovery approach toward understanding the global state changes during the folding process. This is a first step toward an unsupervised (and perhaps eventualay automated) approach toward identification of globa! states. The approach is based ors computing bidusters (or patterned dusters)-each clyster is a combination of various reaction coordinates, and its signature pattern facilitates the computation of the Z-score for the duster. For this discovery process, we present an algorithm of time compSexlty c εRO{(N- nm) log n), where N Is the size of the output patterns and (n × m) Is the size of the input with n time frames and m reaction coordinates. To date, this is the best time complexity for this prabaem. We next appîy this to a β-halrpin folding trajectory and demonstrate that this approach extracts crucial information about protein foidlng intermediate states and mechanism. We make three observations about the approach: (1) The method recovers states previoissiy obtained by visyauy analyzing free energy surfaces. (2) it aaso succeeds in extracting meaningful patterns and structures that had been overlooked in previous works, which provides a better understanding of the foiding mechanism of the β-hairpin, These new patterns also interconnect various states in existing free energy surfaces versus different reaction coordinates. (3) The approach does not require calculating the free energy values, yet it offers an analysis comparable to, and sometimes better than, the methods that use free energy iandscapes, thus vaiidating the choice of reaction coordinates. (An abstract version of this work was presented at the 2005 Asia Pacific Bioinformatics Conference [1].) Copyright: © 2005 Parida and Zhou.