Talk:Substitution model

	This article is within the scope of WikiProject Molecular Biology, a collaborative effort to improve the coverage of Molecular Biology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Molecular BiologyWikipedia:WikiProject Molecular BiologyTemplate:WikiProject Molecular BiologyMolecular Biology
???	This article has not yet received a rating on the importance scale.
	This article is supported by the Computational Biology task force (assessed as Mid-importance).

Statistics Low‑importance

	This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
Low	This article has been rated as Low-importance on the importance scale.

Untitled

I moved the descriptions of the DNA models to Models of DNA evolution because I think they fit better there. I would like to use this page more as overview of substitution models and to make it more generic (not only DNA but also protein and codon based models). I will include shortly a discussion of empricial vs. parametrical models and then summaries and discussions of common approaches taken for DNA, amino acids and codons. I hope this idea appeals also to previous contributors to this article. If not, please let us discuss the issues. Wild8oar 13:07, 27 June 2007 (UTC)[reply]

There are some big problems with this page. The main issue is that its wrong. It claims that exponentiation a matrix must be approximated with a Taylor series. This is plainly wrong. Its not a approximation and you can easily do it with normal eigenvalue decomposition. I have too much teaching for the next few days to fix it properly (teaching precisely this material). I will see how next week goes. Delt0r (talk) 17:41, 6 December 2007 (UTC)[reply]

Added a bit about the diagonalizable case (you might want to check it, though :-)) Tjunier (talk) 11:08, 28 October 2009 (UTC)[reply]

Diagonalizability of Q

Is it the case that a rate matrix Q is always diagonalizable?: All of the off-diagonal entries of Q are non-negative and all of the diagonal entries of Q make each row sum to 0.0. These guarantee a desired probabilistic interpretation and a desired Markov chain infinite divisibility condition; do they also guarantee diagonalizability? Quantling (talk) 19:30, 4 January 2010 (UTC)[reply]

Never mind. The matrix

Q=\left({\begin{array}{rrr}-1&1&0\\0&-1&1\\~~0&~~0&~~0\end{array}}\right)

is not diagonalizable. Quantling (talk) 17:48, 5 January 2010 (UTC)[reply]

Transposed matrix

Hello. The matrix in the section GTR: Generalised time reversible was transposed but it is not in accordance with the defintion. --kupirijo (talk) 11:20, 28 July 2010 (UTC)[reply]

support: I agree, the matrix has columns summing to zero rather than rows summing to zero, and needs to be transposed. Care to do it? Quantling (talk) 16:31, 28 July 2010 (UTC)[reply]

I left a message to the user that did the modification but I do not know if he will read the message soon. What do you sugget? --kupirijo (talk) 16:39, 28 July 2010 (UTC)[reply]

Oh, I see. Another article, Models of DNA evolution, has columns summing to zero instead of rows summing to zero. We could just transpose the matrix in Substitution model, so that each article is self-consistent, but it would probably be better to make both articles use the same convention. My preference is for row vectors, i.e., that rows sum to zero in these instantaneous rate matrices, but we should look at any other relevant articles. Quantling (talk) 20:23, 28 July 2010 (UTC)[reply]

How do we go about the task of polling others for their opinions on the issue of row-vector convention vs. column-vector convention in these related articles? Quantling (talk) 20:23, 28 July 2010 (UTC)[reply]

If you want a general, Wikipedia-wide convention, you can try asking at the WikiProject talk page: Wikipedia talk:WikiProject Statistics. You can also try a Request for comment, but that's only if this is a moderate-to-major issue--otherwise, feel free to be bold and change everything yourself. —fetch ·comms 20:31, 28 July 2010 (UTC)[reply]

I agree with you Quantling, i.e. write matrices in such a way that rows sum to zero in both articles. However the article Models of DNA evolution needs a lot of changes which I am not capable of doing since my maths knowledge is not up to that level. We should still aim for consistency within each article though. --kupirijo (talk) 00:02, 29 July 2010 (UTC)[reply]

What's neutral got to do with it?

Why are these models described as "neutral"? No reference is given for this assertion, and it is not explained or justified in the article.

While one could interpret the models as neutral models, this interpretation is not formally necessary. They could be models of an ongoing process of beneficial changes, or deleterious changes, or neutral changes, or a combination of these. In 1971, Kimura & Ohta ("On the Rate of Molecular Evolution." J Mol Evol 1(1): 1-17) presented formulas for the rate of substitution, and these included both a formula (Eqn 9) for a neutral process, and another formula (Eqn 7) for beneficial changes, both of which are versions of a more generally formula for a steady-state rate of an origin-fixation process (Eqn 5). The substitution models in this article could be interpreted as applications of Eqn 5, but even this is not necessarily. We could treat them as merely descriptive models.

So, logically, the form of these models does not make them neutral, if "neutral" is intended to be a reference to the concept of random fixation by genetic drift.

This raises the question of whether there is some historical reason to call these neutral models, e.g., if they were derived by assuming neutrality. This also is not necessarily the case. Zuckerkandl, Jukes and others started proposing Markov models of sequence change prior to the proposal of the neutral theory.

Therefore, I believe that it is a mistake to refer to these as neutral models. If someone has a counter-argument to this, I would like to hear it. Otherwise I'm going to change the article.

Dabs (talk) 15:10, 3 August 2011 (UTC)[reply]

An approach that has been taken is that of first describing evolution in the absence of selective pressures, in terms of a substitution matrix, and then modifying that substitution matrix to reflect the selection pressures. For instance, ^[1] takes this approach, and the resulting matrix is generally not from the same family as was the neutral substitution matrix. Or, if the selective pressure is epistatic, so that fitness depends not on just one sequence position, but on the combination of several positions (perhaps adjacent positions), then things are no longer simple small matrices. So, yes, substitution matrices can be applicable even in non-neutral settings, but you have to be careful by what precisely you mean. —

Q

uantling (talk | contribs) 00:49, 4 August 2011 (UTC)[reply]

^ Halpern, A. L.; Bruno, W. J. (1998). "Evolutionary distances for protein-coding sequences: Modeling site-specific residue frequencies". Molecular Biology and Evolution. 15 (7): 910–917. doi:10.1093/oxfordjournals.molbev.a025995. PMID 9656490.

Thanks! That is a fascinating article. I agree that substitution matrices can be applicable in non-neutral settings, but I'm actually making a more basic point than that. There are 2 well established meanings of "neutral model" in biology. The more common but narrower meaning is a model of evolutionary change in which fixations take place by drift. This is the mechanistic meaning of neutral, the pop-gen meaning. The second, less common, but more general meaning refers to a kind of null model, an other-things-being-equal model, as in Nitecki, _Neutral Models in Biology_. This is an instrumental or epistemological sense of neutral. I would not object to calling most substitution models "neutral" in this latter sense. Most of them are just black-box markov transition models with factors thrown in to make things fit. Obviously these factors can have a biological interpretation, but this does not make them mechanistic. For instance, equilibrium frequencies may have a biological interpretation, but they are not mechanistic. When modelers toss in free parameters that are labeled as equilibrium frequencies, or loaded with the estimates of presumed equilibrium frequencies, they are implying the mechanistic equivalent of future causation, i.e., teleology.

The mechanistic sense of "neutral" is clearly intended in the Substitution_models article. But, so far as I can see, the baseline "absence of selective pressures" model of Halpern and Bruno is not necessarily neutral in this sense. The authors merely assert that the probability of fixation in this model does not reflect selection, but Kimura & Ohta do not assume that a probability of fixation is a neutral probability. The probability-of-fixation term for any class or sub-class of transitions could be an average reflecting the fixation of a mixture of deleterious, nearly-neutral, neutral and beneficial mutations. The more advanced Halpern-Bruno model (with so-called "selective pressures") is clearly not instrumentally neutral, but it might be mechanistically neutral if we choose to interpret it that way: a biased spectrum of amino acid frequencies may come about through unbiased mutation, unbiased random genetic drift, and differential negative selection. — Preceding unsigned comment added by Dabs (talk • contribs) 02:16, 5 August 2011 (UTC)[reply]

Yes, in phylogeny, "neutral" means the absence of selection pressures at a sequence position, and, usually, statistical independence from other sequence positions, including independence from indel processes. If you want to, please modify the article to clarify that it is in this sense rather than another sense. — $Q$ uantling (talk | contribs) 00:35, 6 August 2011 (UTC)[reply]

I would like to remove the references to the substitution models being "neutral". Although I appreciate the comment that the"second, less common, but more general meaning refers to a kind of null model, an other-things-being-equal model, as in Nitecki, _Neutral Models in Biology_. This is an instrumental or epistemological sense of neutral. I would not object to calling most substitution models "neutral" in this latter sense. Most of them are just black-box markov transition models with factors thrown in to make things fit." by Dabs, the link in the "Neutral, independent, finite sites models" section is to the "Neutral theory of molecular evolution" page and there is an explicit statement in the "Neutral, independent, finite sites models" section that "selection does not operate on the substitutions".

The less common but more general sense of "neutral models" seems confusing to me, given the importance of neutral theory in molecular evolution. The statement that selection does not operate on the substitutions is simply wrong in all but the simplest of cases. Most substitution models are combined with rate heterogeneity (e.g., invariant sites and or gamma distributed rates). Although it is possible, in principle, that among-sites differences in the rate at which new mutations enter the population enter the population, essentially all molecular evolutionists interpret low-rate or invariant sites as evidence for purifying selection. Moreover, models of codon evolution can definitely include positive selection (i.e., cases where Ka/Ks > 1). I think referring to standard substitution models as neutral is just wrong and plan to remove this material, unless another user would like to argue for its retention in a modified form. EBraun68 (talk) 21:05, 1 November 2020 (UTC) -User:Ebraun68[reply]

Since there has been no feedback on the issue of referring to substitution models I have chosen to delete the material. I felt it was appropriate to preserve what was interested in case one of the original authors of the material wished to clarify the intent and reintroduce the material in a revised form. Original text follows:

Neutral, independent, finite sites models

Most substitution models used to date are neutral, independent, finite sites models. Neutral sites mean selection does not operate on the substitutions, and so they are unconstrained. Independent sites mean changes in one site do not affect the probability of changes in another site. Finite sites are finitely many sites, and so over evolution, a single site can be changed multiple times. This means that, for example, if a character has value 0 at time 0 and at time t, it could be that no changes occurred, or that it changed to a 1 and back to a 0, or that it changed to a 1 and back to a 0 and then to a 1 and then back to a 0, and so on.

I do not feel this should be reintroduced unless it is revised to clarify the intent. EBraun68 (talk) 18:07, 10 November 2020 (UTC)[reply]

Redirected from T92?

Looking for the T92 Howitzer Motor Carriage, I was redirected from "T92" to this article. This redirect baffles me, since I cannot fathom (or grep, for that matter) what "T92" has to do with "Substitution model"? -- DevSolar (talk) 09:04, 5 June 2013 (UTC)[reply]

[HB98-1] Halpern, A. L.; Bruno, W. J. (1998). "Evolutionary distances for protein-coding sequences: Modeling site-specific residue frequencies". Molecular Biology and Evolution. 15 (7): 910–917. doi:10.1093/oxfordjournals.molbev.a025995. PMID 9656490.

[1]