We make a novel contribution to the theory of biopolymer folding, by developing an efficient algorithm to compute the number of locally optimal secondary structures of an RNA molecule, with respect to the Nussinov-Jacobson energy model. Additionally, we apply our algorithm to analyze the folding landscape of selenocysteine insertion sequence (SECIS) elements from A. B\"ock (personal communication), hammerhead ribozymes from Rfam, and tRNAs from Sprinzl's tRNA database. It is known that RNA with a functionally important structural role has lower folding energy (i.e. energy of the equilibrium minimum free energy structure) than does random RNA of the same dinucleotide frequency. However, applications of our algorithm demonstrate that there is greatly different distribution of kinetic traps, or non-global, local energy minima, in structural RNA versus random RNA of the same dinucleotide frequency.
Given an RNA molecule a1,...,an and an integer 0 <= k, a k-locally optimal secondary structure S is a secondary structure on a1,...,an which has k fewer base pairs than the maximum possible number, yet for which no basepairs can be added without violation of the definition of secondary structure (e.g. introducing a pseudoknot). Despite the fact that the number numStr(k) of k-locally optimal structures for a given RNA molecule in general is exponential in n, we present an algorithm running in time O(n^4) and space O(n^3), which computes numStr(k) for each k. Structurally important RNA, such as SECIS elements, hammerhead ribozymes, tRNA, all have a markedly smaller number of k-locally optimal structures than that of random RNA of the same dinucleotide frequency, for small and moderate values of k. This suggests a potential future role of our algorithm as a tool to detect noncoding RNA genes. Algorithm webserver at http://clavius.bc.edu/~clotelab/
We develop a new, self-contained proof that the expected number of generations required for gene allele fixation or extinction in a population of size n is O(n), provided that mating is random, without selection, mutation, migration or other factors. Similarly, starting from a founding population of n females, the expected number of generations before mitochondrial monomorphism is reached is linear in n, provided that the female population size for each generation is n. These problems, collectively known, as the Fisher-Wright problem, had previously been shown to have expected linear time solutions, only by a difficult application of the diffusion equation (a.k.a. Fokker-Planck equation), or by use of the coalescent, both of which are continuous models requiring careful justification for the discrete Fisher-Wright problem. We additionally develop an algorithm, implemented in C, to compute expected fixation/extinction time to any desired precision.
Extending work of Haig-Hurst on optimality of the natural genetic code, in this paper we use Monte-Carlo (with simulated annealing) to consider general (rather than block-structured or codon-shuffle) codes and study the optimality of natural genetic code is with respect to fault tolerance using the WAC similarity matrix of Wei-Altman-Chang (amino acid microenvironments in 1 Angstrom shells).