Template:Did you know nominations/AlphaFold

The following is an archived discussion of the DYK nomination of the article below. Please do not modify this page. Subsequent comments should be made on the appropriate discussion page (such as this nomination's talk page, the article's talk page or Wikipedia talk:Did you know), unless there is consensus to re-open the discussion at this page. No further edits should be made to this page.

The result was: promoted by SL93 (talk) 00:54, 5 February 2021 (UTC)

AlphaFold

(

)

~~... that the results of DeepMind's AlphaFold 2 program in the CASP 14 protein structure prediction competition have been called "astounding" and transformational?~~ Source: "astounding": CASP14 scores just came out and they’re astounding transformational: ‘It will change everything’: DeepMind’s AI makes gigantic leap in solving protein structures (Nature)

Comment: Note: I am one day over in submitting this, because it was previously up for consideration at WP:ITNC (discussion), and only fell off the page there at midnight this morning. So any leeway you could give it would be appreciated.
Reviewed: The Adults Are Talking

Converted from a redirect by Ktin (talk), Jheald (talk), and My very best wishes (talk). Nominated by Jheald (talk) at 18:36, 8 December 2020 (UTC).

The article is new enough and long enough. The rationale for the two nonfree images is perfectly argued. I had some doubt for the first image, but searching in google-images didn't show the same image anywhere else. The article is neutral and well sourced. The "Earwig's Copyvio Detector" don't show any copyvio (it just marked the quoted part). You have done a great job, congratulations. Alexcalamaro (talk) 20:03, 9 December 2020 (UTC)
Comment: in spirit of MOS:PEACOCK, I suggest the following blurb: ~~... that AlphaFold 2 won the 14th biannual CASP competition achieving 92% accuracy, essentially solving the decades-old protein folding problem.~~ --bender235 (talk) 01:26, 10 December 2020 (UTC)

@Bender235: While claims such as "In a serious sense the single protein single domain [prediction] problem is largely solved" have been widely made (that quote is from conference chair John Moult's closing presentation to the conference), were very widely featured as a top line in media coverage, and have also been supported in thoughtful commentary by eg Mohammed AlQuraishi [1], they have also met with opposition; and so we are not currently running them on the article. (Though this could be changed). See article talk page for extended discussions. That is why I submitted the DYK text as above.

Note also that while AF2 has made a very significant advance in the protein structure prediction problem, this is a different question to the question of how protein folding develops in nature, so caution should be taken not to confuse the two. — Preceding unsigned comment added by Jheald (talk • contribs) 09:43, 10 December 2020 (UTC)

I would suggest the following: ... that a team of researchers who used AlphaFold 2, an artificial intelligence-based software, won the 14th biannual CASP competition in protein structure prediction and achieved high accuracy for almost all single domain protein targets. See discussion here. My very best wishes (talk) 21:11, 10 December 2020 (UTC)

Returning to WP:DYKN until review is settled. I personally find the ALT0 wordy and difficult to understand. Yoninah (talk) 21:28, 12 December 2020 (UTC)

Folks @Jheald, My very best wishes, Alexcalamaro, and Bender235:, this one has been open for sometime now, let's go ahead and drive this one to closure. I think the below text is the best that someone on homepage would be able to follow; anything more and we run the risk that folks find it too wordy or too complex. Let's move ahead, if you are good. Also @Yoninah: I do not want to presuppose your background but can you read the below two hooks as a layperson and let me know if you a) find it interesting b) generally get the gist of this one? If you are not a layperson for this topic, I am happy to go chase down some laypersons for this topic. Cheers.Ktin (talk) 22:42, 14 December 2020 (UTC)

ALT 3.0~~.... that DeepMind's protein-folding AI AlphaFold 2 has solved a 50-year-old grand challenge of biology?~~ (source: MIT Technology Review).

OR

ALT 4.0 ~~.... that DeepMind's AI AlphaFold 2 can predict the shape of proteins to within a width of an atom?~~ (source: MIT Technology Review).

@Ktin: I am a layperson, and I really like ALT 3.0. ALT 4.0 is also more understandable than ALT0. Thanks, Yoninah (talk) 22:35, 14 December 2020 (UTC)

Thanks @Yoninah:. Wonderful. @Alexcalamaro and Bender235: -- if one of you have a moment, please can you review the above two hooks per the standard WP:DYK hook review guidelines? Cheers. Ktin (talk) 22:42, 14 December 2020 (UTC)

I agree with Yoninah, ALT 4 blurb is very catchy. Probably the best choice. --bender235 (talk) 23:04, 14 December 2020 (UTC)
@Ktin: I prefer ALT 3.0, I think fits better with the achievement and is more attractive for "the layperson" ;-) . Alexcalamaro (talk) 23:07, 14 December 2020 (UTC)

Wonderful. Thanks both of you @Alexcalamaro and Bender235:. Please can one of you review the hooks per our guidelines and approve both the hooks, we can choose one from the two post that or empower the posting Admin to make a choice. But, first step, lets approve the hooks. Cheers. Thanks again folks. Ktin (talk) 23:14, 14 December 2020 (UTC)

General: Article is new enough and long enough
New enough: Long enough:

Policy: Article is sourced, neutral, and free of copyright problems
Adequate sourcing: Neutral: Free of copyright violations, plagiarism, and close paraphrasing:

Hook: Hook has been verified by provided inline citation
Cited: Interesting:

Image: Image is freely licensed, used in the article, and clear at 100px.
Freely licensed: Used in article: Clear at 100px:

QPQ: Done.

Overall: Both hooks ALT 3.0 and ALT 4.0 meet our guidelines Alexcalamaro (talk) 18:08, 15 December 2020 (UTC)

Wonderful. Thanks much Alexcalamaro.

Passing the baton over to you Yoninah to take it from here. I am good with either of the hooks (ALT3 or ALT4). I know you had prefered ALT3 and Bender235 had prefered ALT4. Alexcalamaro -- do you want to cast the tie-breaker vote? ;) Ktin (talk) 18:54, 15 December 2020 (UTC)

I vote for ALT 3.0 option (after all, we are talking about folding proteins). Alexcalamaro (talk) 19:24, 15 December 2020 (UTC)

Thanks much Alexcalamaro. Passing the baton to Yoninah. Over to you now for next steps :) Thanks everyone. I want to specially thank @Jheald and My very best wishes: who have done and continue to do lots of good work on the article. Genuinely thank you folks. Ktin (talk) 19:27, 15 December 2020 (UTC)

I think all these versions of hooks, including ALT3 and ALT4 misinform a reader. No, the "50-year-old grand challenge of biology" has not been solved. There will be many future CASP meetings to assess further progress in this direction. Just saying that it "predicts the shape of proteins to within a width of an atom" is also wrong. No, it does not. AlphaFold-2 makes sufficiently precise predictions only for 2/3 of proteins, according to CASP assessors. But even in these good cases it does NOT predict protein structure with such precision for all atoms, as a reader would assume. Actually, such claim is simply ridiculous because there is protein dynamics and there is no such thing as width of an atom. There are only atomic radii, but but this is not a single number; they are very different for different types of atoms. Also, this is not "shape", but a three-dimensional structure. The referencing is to a misleading opinion piece. Author does make a claim that AlphaFold can predict the shape of proteins to within the width of an atom, but he apparently does not have a slightest idea what he is talking about. Let's not multiply the misinformation in Wikipedia. Please see the hook I suggested above (it can be shortened if needed). My very best wishes (talk) 19:51, 15 December 2020 (UTC)

All, very good points My very best wishes, but, this gets very close to WP:OR unless substantiated with a clear note from WP:RS. For now, the statements are sourced perfectly from WP:RS, and I think they meet the layperson's needs on the homepage. My suggestion is let's move forward with ALT3 as discussed above. Ktin (talk) 20:12, 15 December 2020 (UTC)

Yes, there are indeed WP:News sources about it (some of which claim nonsense like predicting "the shape of proteins to within the width of an atom"). However, this is an extraordinary and exceptional claim about solving a fundamental scientific problem, and not everyone agree (some similar WP:News type sources claim the opposite). I think we do need some WP:MEDRS quality sources here, such as serious independent scientific reviews. There is none. The method (AlfaFold-2) has not been published. The official assessment on CASP has not been published in any peer reviewed journal.

For example, as this article tells, "DeepMind’s press release trumpeted “a solution to a 50-year-old grand challenge in biology” based on its breakthrough performance on a biennial competition dubbed the Critical Assessment of Protein Structure Prediction (CASP). But the company drew criticism from academics because it made its claim without publishing its results in a peer-reviewed paper. ... “Frankly the hype serves no one,” and so on. I just do not think we should multiply this "hype" in WP. My very best wishes (talk) 20:29, 15 December 2020 (UTC)

@My very best wishes: in a literal sense the protein folding problem is not "solved," since we can obviously always move the goalposts regarding the necessary precision (≥90% accuracy? ≥99%? ≥99.99%?). The jump in precision at this year's CASP certainly deserves to be called a "breakthrough." I agree that the catchy "width of an atom" is not a precisely determined length (just as the even more popular "width of a human hair" is not); the press release said less than two angstrom, which we could use, too. --bender235 (talk) 21:31, 15 December 2020 (UTC)

Yes, one can say a "breakthrough" (I agree), but one can not say that "the problem was solved" for a number of reasons, such as (a) the protein set on CASP is absolutely not a representative set of proteins (it included only one membrane protein and the group was ranked #43 for this target, it did not include any "intertwined" protein structures or any linear peptides or any proteins with unique sequence in genomes, and so on.), (b) the method has not been even published and is not publicly available for independent evaluation, (c) AF2 has failed for a single multi-domain protein in CASP14 data set, while such proteins represent a majority in Eukaryotes, (c) the method was not tested for protein complexes. This is not at all about the percentage. We simply do not know that percentage. We do not even know the percentage on CASP until the assessment has been officially published. My very best wishes (talk) 18:52, 16 December 2020 (UTC)

I would oppose to most of these hooks. OK, let's keep it simple. We do have page AlphaFold. I think this is fair page. However, any hook above (except my suggestion) simply contradicts this page. Does it follow from our AlphaFold page that it "has solved a 50-year-old grand challenge of biology"? No, it does not. Does it follow that AF2 "can predict the shape of proteins to within a width of an atom?" No, it does not. Not at all. Take the lead of this page and summarize it in the hook please. That is what I was trying to do. My very best wishes (talk) 15:03, 16 December 2020 (UTC)

Now, let's consider first hook at the top that the results of DeepMind's AlphaFold 2 program in the CASP 14 protein structure prediction competition have been called "astounding" and transformational?. Well, this is actually much better than last versions. Yes, this is advertisement (just as others), but at least this is not an explicit misinformation. Some people did say that, and most important, yes, the results were very good. My very best wishes (talk) 15:23, 16 December 2020 (UTC)

In the spirit of serving our homepage readers, I will still recommend that we go with either of ALT3 or ALT4. Sufficient backing form WP:RS to move ahead. Ktin (talk) 00:00, 16 December 2020 (UTC)
Maybe we could change the "problematic" word solve by crack (also used in the MIT review), so we keep the catchy hook for the "layperson", without multiplying the "hype". What do you think of this one? :

ALT 3.1 ... that DeepMind's protein-folding AI AlphaFold 2 has cracked a 50-year-old grand challenge of biology? (source: MIT Technology Review).

Alexcalamaro (talk) 04:06, 16 December 2020 (UTC)

@Alexcalamaro: I am good with this hook (i.e. ALT 3.1). Ktin (talk) 06:30, 16 December 2020 (UTC)

OK, we need an uninvolved reviewer to review ALT3.1. Striking unused hooks Yoninah (talk) 12:48, 17 December 2020 (UTC)

OK. I am an uninvolved reviewer because I was not on CASPs for a long time and I do not have connections to CASP organizers or any participants. I only helped with editing page about AF2 in WP. Here is my independent assessment. Yes, there was a great progress with protein structure prediction on CASP14. True. However, "protein folding problem" was NOT solved by AF2 (yet). This is hype. Here is why:

There was only one transmembrane protein in CASP14 dataset, and AF2 team was ranked #43 for this target; the prediction for this target by AF2 or other groups is far cry from solving the structure. Transmembrane proteins constitute at least 25-30% of proteins in human genome [2] (more by other estimates)
The performance by AF2 was not great for multidomain proteins, as could be expected because AF2 was not tested for predicting protein complexes. The subunits in complexes are similar to domains. Up to 80% of all eukaryotic proteins are multidomain proteins [3].

Was it solved by AF2 at least for single domains of water-soluble proteins? There is no proof of that because

Many proteins are represented by just a single or by a few related sequences in sequence databases, when one can not make large sequence alignment. However, AF2 method is actually based on using large high quality sequence alignments. We do not know if AF2 was tested for such cases and how did it perform.
As follows from presentations on CAS14 (for example, [4]) AF2 did NOT achieve the accuracy of experimental methods. Moreover, looking at the distance cutoff-sequence coverage graphs here for specific CASP14 targets (T1024, T1027, T1028, T1029, T1030, T1032, T1040, T1047, T1052, T1061, T1070, T1091, T1092, T1099 T1100), one can see they are not even very close. For example, T1024 has only 50% of residues covered by best models for distance cutoff of 2A. Yes, they correctly predicted protein "fold", even family where it belongs (which is great!), but this is far cry from "solving protein folding" problem.
AF2 is not publicly available for an independent evaluation
AF2 and assessment of AF2 were not published not only in WP:MEDRS sources, but in any peer reviewed sources.
GDT measure used by CASP assessors is a poor (insensitive) measure of performance for high-precision modeling. Having GDT of 90 or 60 (e.g. [5]) does not mean that 90% or 60% of the structure was predicted with the same accuracy as provided by X-ray crystallography, for example.

My conclusion Hook ALT 3.1 is misinformation. Do not do it. My very best wishes (talk) 14:31, 18 December 2020 (UTC)

Following the comments above by My very best wishes and aiming to reach a wide consensus, I propose the following alternative hook :

ALT 3.2 ... that DeepMind's protein-folding AI AlphaFold 2 has made great progress towards a decades-old grand challenge of biology? (source: ~~MIT Technology Review~~ Nature)).

Alexcalamaro (talk) 08:05, 19 December 2020 (UTC)

Yes, I think that's OK, with one correction: if you need a ref, it should be this [6]. That MIT writer makes too many incorrect claims, such as AF2 used 170,000 PDB structures for training (they used less), etc. My very best wishes (talk) 21:19, 19 December 2020 (UTC)

Comment I have changed the source of ALT3.1 to Nature, and added the hook text to the "Responses" section of the article (to meet Hooks criteria). We need more reviewers to validate the proposal. Alexcalamaro (talk) 06:31, 21 December 2020 (UTC)

Hey @Yoninah and Ktin: I think we have a consensus here with ALT3.2. I am not very familiar with these matters. What is the next step in the DYK process? Thank you. Alexcalamaro (talk) 21:40, 26 December 2020 (UTC)

Missed this one. @Yoninah: as an uninvolved editor, please can you help review this one? I know this has been waiting for a long time, but, worth wrapping this one imo. Appreciate your helping hand in the review. Ktin (talk) 22:56, 2 January 2021 (UTC)

OK, ALT3.2 looks good but there is a bit of run-on blue linking in the beginning of the hook. What words don't need to be linked? I also would like to know why the two images from a CASP presentation are licensed as fair use. It seems to me that OTRS permission should be obtained from the author. Alternately, can't someone draw up a similar graph that would be freely licensed? Yoninah (talk) 18:04, 9 January 2021 (UTC)

The last point is addressed in the fair-use rationales. Regarding OTRS permission, before Christmas I emailed the DeepMind press account for the block-diagram image, and both the CASP account and John Moult for the graph, and didn't get back a reply from any of them. Jheald (talk) 19:12, 22 January 2021 (UTC)

As for the hook, I would suggest unlinking "AI", as that should be pretty obvious and is a term known to most people. Jheald, still no update on the OTRS? Yoninah seems to be on a short break atm, but I think this DYK should be finished at some point. It's the only one remaining from November. --LordPeterII (talk) 12:03, 3 February 2021 (UTC)

@LordPeterII: Thanks much. This has been waiting for quite some time. Thanks again for picking this up. Let's go without the image. I have written ALT 3.3 with AI removed. Appreciate your approval. Thanks. Ktin (talk) 03:52, 4 February 2021 (UTC)

ALT 3.3 ... that DeepMind's protein-folding program AlphaFold 2 has made great progress towards a decades-old grand challenge of biology?

@Ktin: Oh, I'm afraid I don't feel confident enough to approve this nomination myself :/ I have never reviewed anything, and this article's topic is rather complex. I was merely pinging to inquire about the progress, to get the discussion going again. Maybe some experienced editor or admin can help out... maybe @Cwmhiraeth would you have time for this? (sorry, I'm not really sure whom to ask) --LordPeterII (talk) 09:35, 4 February 2021 (UTC)

Relying on the original review by User:Alexcalamaro and the comments by User:My very best wishes, I am approving ALT 3.3, which seems to meet the DYK criteria. Cwmhiraeth (talk) 10:21, 4 February 2021 (UTC)