Isaac Newton Institute for Mathematical Sciences

Mathematical and Statistical Aspects of Molecular Biology

Protein Sequence Evolution: Markov or Non-Markov?

Authors: Carolin Kosiol (EMBL - European Bioinformatics Institute), Nick Goldman (EMBL - European Bioinformatics Institute)

Abstract

In 1992, Henikoff and Henikoff derived the series of BLOSUM matrices whose elements are probabilities of amino acid substitutions, but are not based on a Markov model. BLOSUM matrices often perform better than evolutionary models for the purpose of comparing protein sequence alignments or database searches. It is unclear why this should be, but it maybe because protein sequences behave in a non-Markovian manner. We show that some of the non-Markovian behaviour observed in literature can be explained by an aggregated Markov process (AMP) which incorporates rate heterogeneity among different codon sites of the protein and properties of the amino acids encoded by the sequence.