Wikipedia talk:Article size/Archive 6
This is an archive of past discussions on Wikipedia:Article size. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | ← | Archive 4 | Archive 5 | Archive 6 |
Oct. 2 Lead
Allow me to explain the changes in the lead made on Oct 2. The discussion above titled Proposed load size rule of thumb under discussion is most pertinent to the understanding, but it is the same topic as the two discussions before it and the one after it—"which metric?"
Old | New | Expl. |
---|---|---|
This page contains an overview on issues related to article size. Several different ways may be used in Wikipedia to measure article size, but readable prose size (which is largely the text the reader reads, ignoring references) is generally considered the most important. | This page contains an overview of the key issues concerning article size. Three measures are key: I then explicitly define the three metrics in bullet points. They need improvement. | undefined phrase "readable prose" is used (the parenthetical definition doesn't quite help); summary statement is yet premature I think, because "most important" seems it may shift in the many efforts under discussion, from "readable prose" |
a list of issues that over-sized articles cause | I then term them "usability considerations" | These four "categories of issues that arise" are largely unmentioned in the project body, and could be better integrated |
<nada> | When an article seems the wrong size, five, rule-of-thumb administrative responses are given here. The decision to divide it (break it out) into a new, or to combine it into an existing article, is based solely on the size metric. The rule-of-thumb responses not to; maybe to; possibly to; probably, or certainly to, is currently guided by readable-prose size. Other solutions to large articles, for editors and administrators, and information about the size of articles, are provided below. Note that the licensing policy mandates that any decision to "break out" or combine articles also be accompanied by an edit summary ala "new content from the [[page name]]". | Add'l info from body of project: "break out" is project phraseology; an edit summary with specific content is a licensing requirement; the result of the rule-of-thumb chosen is always one of five possible results. There's room for improvement. |
— CpiralCpiral 06:55, 20 December 2012 (UTC)
Claims for average attention span have never been cited
In previous discussion, the issue of an unsourced claim of an average attention span of 20 minutes was challenged. A source was never found, and eventually the claim was removed. Currently the article/guidline has a claim that there is a "limit of the average concentration span of 40 to 50 minutes." It may well be correct, but if it is, there has to be a source. I am certain extensive research was done into this, so it should not be hard to source it. Of course it is likely that such research was debated vigorously, so beware. The claim was removed before because no one bothered to source it. I've done as in the previous case and as is custom and added a citation needed tag. The claim of the average attention span is really kind of a distraction in my opinion, as the point can be made without resorting to potentially contentious claims. Rifter0x0000 (talk) 10:40, 31 January 2013 (UTC)
Reading speed & attention span statistics
I've removed the parts about averages. I think it's unnecessary to mention statistics that are highly context-dependent and debatable.
Average reading speed isn't cited, but in order to find a source on average speed, we should know how readers read Wiki pages. Do most readers skim? If so, 200 seems low, and a source on average skimming speed should be used.
The source for a 40-minute attention span didn't seem reliable for that fact. It tangentially mentions attention span and states that "the normal maximum attention span is said to be 40–50 minutes" (emphasis mine). (Aren't those weasel words?) In short, a better source is needed. I think a max of 40 min actually seems too long in this day and age. It seems more suitable for an action movie then a long, written article.
In the wider context, I don't think reading speed should even be mentioned. As long as we agree on an acceptable attention span, lengths of long articles should be adjusted accordingly (assuming that most readers would like to read the entire article). –Temporal User (Talk) 09:49, 12 May 2013 (UTC)
- It's better to have a reference than no reference, otherwise it's OR. While OR is not actually banned in policy pages, it's still better to avoid it. If you can find a better reference then we can use that instead.Teapeat (talk) 11:46, 12 May 2013 (UTC)
- The 40-50min attention span seems seldom relevant in this context, when we know from Alexa that time spent on Wikipedia was between 4 and 5 minutes in average in the past two years. --ELEKHHT 22:32, 12 May 2013 (UTC)
- That's not the same thing. Wikipedia is a reference site; the main use is to dip into a page, grab the info you want from a page and go elsewhere. It doesn't matter so very much what size the page is for that.
- But the article size guideline is rather different, it's about how big the article should be for people that actually do read the whole article.Teapeat (talk) 00:36, 13 May 2013 (UTC)
Question
At what size do the technical issues start to become a problem nowadays? Chrisrus (talk) 18:10, 25 August 2013 (UTC)
- This is a question that has come up in looking at the number of images at History of Painting, where there's 391 images in addition to 118k of readable prose - the question is if this much data is a technical problem for the readers we expect to serve? (There are content and other related issues, but I'm focusing only on the byte-size right now, which I estimate ends up being about 3M for the entire page + images of raw HTML. ) --MASEM (t) 16:27, 24 October 2013 (UTC)
Size in kB being related to readability
I would just like to raise an issue with the article size in kB being related to readability: what about articles that contain a lot of sound files and pictures? In these cases, an article might have a high kb count, while the amount of text might not be overly long.OnBeyondZebrax (talk) 01:11, 6 November 2013 (UTC)
- The page size tool gives a prose kB size. I'm not familiar with how exactly it figures out the difference, but it does give a much smaller figure than the overall page size. CMD (talk) 12:33, 17 November 2013 (UTC)
explanation of how to edit only introduction
Especially the section talking about difficulties encountered in editing large pages is missing the important info that one can get an edit link next to the introductions of all articles. And why isn't this edit link there by default? --Espoo (talk) 16:00, 19 January 2014 (UTC)
Guidelines for FAs and GAs
I think it'd be a good idea to include a mention of this, specifically to showcase that they can vary widely in size while still being considered among the highest-quality Wikipedia articles. (For example, look at MissingNo. and then at nearly any FA about a country or politician.) Tezero (talk) 21:47, 14 February 2014 (UTC)
On mobile devices
I think consideration must now be given to how material will present on mobile devices such as mobile phones, as smart phones have become more prevalent more and more people will be reading Wikipedia through such devices. Something that is no problem on a laptop can be a pain to scroll through on a phone.--KTo288 (talk) 22:13, 11 November 2014 (UTC)
- Very true! I'm using a smartphone right now, as I do for perhaps half my Wikipedia time including editing, and I'm a Wikignome. --Thnidu (talk) 16:08, 2 August 2015 (UTC)
Regardless of it specifically stating that WP:SIZERULE shouldn't be used for general content deletion, it's still cited if no other reason exists
Something I have noticed more "veteran" users do is cite this article to delete large chunks of text and entire sections without any other justification, I think that it should also be stated somewhere within the summation or introduction of this article that this is in fact not intended to purely mean deletion for the sake of deletion. Ahoy, --42.114.33.55 (talk) 06:56, 30 December 2015 (UTC)
- Where is that statement made? Either way, if such deletions are a common problem then this should be pointed out in the article. Gap9551 (talk) 21:11, 31 December 2015 (UTC)
100 kB too much? An ideal? Or for old/slow computers/networks? Out-of-date?
Hi, "> 100 kB Almost certainly should be divided". See Special:LongPages. Over 6000 pages this size (probably a lot more, just didn't want to scroll down too much), 6000th is 1999–2000 UEFA Cup [107,611 bytes]. Thereof the biggest 1000 are 174,380 bytes or over. Among bigger than 100 kB, Barack Obama (and lots with his name - lists related to his administration), [5828th] Internet Explorer and my favorite The Big Bang Theory. Do we really want all of these split up? Many are lists, more frequent the bigger they get, probably some even harder to split up logically? I'm not saying we shouldn't have any limits (as a guideline), maybe just higher? comp.arch (talk) 10:55, 25 October 2013 (UTC)
- Many of those can be split apart without much difficulty; the fifth highest, List of people of the Three Kingdoms, is just 26 tables (one for each alphabet character) so that could be split in 4 to bring each below 100k, for example. Articles like Golden Eagle can be split to move finer details to subpages per WP:SS. Interesting, Barack Obama is only 53k of readable prose - the bulk of the wikitext is in references (and that's something to remember with Special:LongPages is that is considers the entire page, not readable prose). The 100k limit is still good both technically and keeping in mind the reader's attention and time. --MASEM (t) 13:50, 25 October 2013 (UTC)
We'll never get numbers that are right; that's why it's all vague. However, the importance of size is not declining. All over the world, market share is shifting to smaller screens, often using slower connections. Usage in some countries of "real" computers with big screen and fast connection has even stopped rising and begun declining, in favor of mobile devices connected by 4G or more often slower. Last weekend I was out of town with only those two items, and some articles were just too big to study easily, either by mobile version with sections too numerous or more often too large, or by "desktop" version too big to understand on the little screen. And editing anything big with that slow connection was out of the question. So yes, large articles still ought to be trimmed, split, adjusted. At least, important ones. For trivial articles like a cute current TV sitcom especially appreciated by us geeks, there's little need for rigor in this or other quality criteria. And yes, today I'm back to a luxuriously big screen, real keyboard and moderately fast connection, but an increasing fraction of the WP:AUDIENCE doesn't work that way. Jim.henderson (talk) 15:05, 25 October 2013 (UTC)
- Even trivial ones can be improved. Take The Big Bang Theory. For one, there's a large amount of trainspotting plot details that could be taken out (given this is a sit com and not a serial drama), but easily the list of awards can be split to a separate article, a common practice for highly successful shows and actors. WP:SS is always a good place to start for these. --MASEM (t) 15:22, 25 October 2013 (UTC)
I've put this question to the larger VPP at [2]. --MASEM (t) 00:05, 26 October 2013 (UTC)
- Archive link: Wikipedia:Village_pump (policy)/Archive 110#Our current WP:SIZE metrics and modern technology. ~ Tom.Reding (talk ⋅dgaf) 19:36, 2 May 2016 (UTC)
This discussion is getting confused among three different metrics: readable prose size that the user sees, wiki markup size that the editor sees, and fully expanded HTML etc size that the browser sees. The 100 kB figure applies only to the first. Wasted Time R (talk) 13:36, 27 October 2013 (UTC)
- I wasn't even sure of the reason, I thought it was the users network bandwidth. That has been improving. As for the reader comprehension many will just read the lead anyway or skip sections? On small screens/my tablet I have to click on each section to see it anyway. I wander if sections are downloaded lazily or could be. comp.arch (talk) 09:35, 29 October 2013 (UTC)
People missed my point kind of and take individual examples of how pages are too big and can be split up. "Almost certainly should be divided" means all 6000+ top articles should be split up? I'm not going to do it! Or even put a banner on top of each of these articles! There must be (many) exceptions to this rule. People use this an excuse to not improve (add to) Internet Explorer (see: Wikipedia:Articles_for_deletion/Internet_Explorer_11). comp.arch (talk) 09:56, 29 October 2013 (UTC)
- It's a bit cliche, but wikipedia article is a work in progress. The vast majority of our articles won't follow all our guidelines (which aren't hard and fast rules), but they're always open to editing. Improving is not just adding to articles, refining articles also improves them. CMD (talk) 21:32, 4 November 2013 (UTC)
It's interesting that some of us think one of the ways of measuring size, or one of the reasons, is important and the others not. I don't see why this is so. And no, an article being too big should not be taken as an excuse to avoid improving it. Just the opposite. Big articles should be improved, made more pleasant to the screen and to the mind that isn't familiar with the topic. They should be made easier to download, and easier to edit with slow connections and slow methods such as the new, and as yet poor, WP:VisualEditor. The main way to improve such articles is by trimming, especially by a more rigorous application of WP:SUMMARY. 14:12, 7 November 2013 (UTC)
- For long lists, especially glossaries, which are mostly consulted for a single entry (each of which has its own anchor), not read top-to-bottom, even 400 KB is reasonable, and we used to have text in the guideline saying so; while it takes a while to render in editing mode, this is still smaller than many single image files. I'll have to dig in the page and talk history to see why this was moved and on what basis (I'd be almost willing to bet it was removed without consensus). — SMcCandlish ☺ ☏ ¢ ≽ʌⱷ҅ᴥⱷʌ≼ 21:35, 23 August 2015 (UTC)
Inconsistency with Wikipedia:Summary style
This guideline states that for articles under 40 kB readable prose size "[l]ength alone does not justify division". But Wikipedia:Summary style#Rationale states: "What constitutes 'too long' is largely based on the topic, but generally 30 kilobytes of readable prose is the starting point at which articles may be considered too long. Articles that go above this have a burden of proof that extra text is needed to efficiently cover their topics and that the extra reading time is justified." I started a discussion at Wikipedia talk:Summary style#Inconsistency with WP:Article Size to address this issue and also made the suggestion that this guideline be merged with the WP:Summary style guideline. AHeneen (talk) 09:23, 23 May 2016 (UTC)
List articles
It is claimed, at Talk:List of compositions by Johann Sebastian Bach#Page length, that this guideline does not apply to list articles. List of compositions by Johann Sebastian Bach is 702,038 bytes long, and is the third-longest entry at Special:LongPages. Additional input at that talk page would be appreciated. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:53, 1 May 2016 (UTC)
- Like wise, at Talk:List of law clerks of the Supreme Court of the United States#Re-split. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:21, 2 May 2016 (UTC)
- This comment is a mischaracterization of the discussion at Talk:List of compositions by Johann Sebastian Bach#Page length. You prefer to apply the more-strict, less-relevant, non-list-article portions of WP:SIZE, and selectively ignore the less-strict, more-relevant, list-article portion of WP:SIZE: particularly
They [(these rules of thumb)] also apply less strongly to list articles, especially if splitting them would require breaking up a sortable table.
. ~ Tom.Reding (talk ⋅dgaf) 17:00, 2 May 2016 (UTC)- No, it does not, comments made there include:
"Note also WP:SPLITLIST – which means that other general considerations regarding article size don't always apply to lists. Did I miss anything regarding 'a list or table should be kept as short as is feasible for its purpose and scope'?"
and"Nah, 'Regardless, a list or table should be kept as short as is feasible for its purpose and scope' is about all that applies here from the page size guideline. Specifically, WP:TOOBIG is about prose (literally: "These rules of thumb apply only to readable prose"), not sortable tables. So, please indicate where the list falls short of 'a list or table should be kept as short as is feasible for its purpose and scope' if you think it does."
. Nothing in this guideline supports having pages of over 700K; indeed, it advises that"Long stand-alone list articles are split into subsequent pages alphabetically, numerically, or subtopically."
. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 12 May 2016 (UTC)- Yes, you appropriately applied the guideline to Talk:List of comets by type#Page length, since it was easily split by subtopic. Doing so for the Bach & law clerks lists is much less straight-forward, so you need to consider the rest of the guideline. Saying
It is claimed [...] that this guideline doesn't apply to list articles.
is more inciteful than it is accurate, even after the quotes you mention—WP:SPLITLIST & WP:TOOBIG both point to this very guideline—hence your mischaracterization. ~ Tom.Reding (talk ⋅dgaf) 13:31, 23 May 2016 (UTC)
- Yes, you appropriately applied the guideline to Talk:List of comets by type#Page length, since it was easily split by subtopic. Doing so for the Bach & law clerks lists is much less straight-forward, so you need to consider the rest of the guideline. Saying
- No, it does not, comments made there include:
Would like to change Last name in page Chamnongsri Rutnin
Dear Sir, According to the page of Chamnongsri Rutnin, she is now would like to change her last name to be the updated one. it is Chamnongsri (Rutnin) Hanchanlash. please advise how to change the page name.
thank you and best regards. Kaewchieranai Assistant to K.Y.Chamnongsri
tong.hmt@gmail.com +66818463019 — Preceding unsigned comment added by 114.109.170.170 (talk) 08:21, 9 June 2017 (UTC)
The largest article on Wikipedia
See Talk:List of compositions by Franz Schubert#This article is far too long. --Francis Schonken (talk) 04:26, 28 August 2017 (UTC)
This rule seems useless
To what I know wikipedia is to give people knowledge on things. I don't see how having a large amount of information is a bad thing. Perhaps there should be other sections on articles that condense it? Alex of Canada (talk) 20:19, 2 November 2017 (UTC)
- Please read it more closely; it is not about removing information from Wikipedia, but about moving it around so that articles are not so long that people fall asleep while reading them, or have their browser crash. — SMcCandlish ☏ ¢ >ʌⱷ҅ᴥⱷʌ< 22:50, 2 November 2017 (UTC)
Smartphones compatibility
Are large articles really such a technical problem for smartphones these days? Apart from the very largest, they seem to render ok on my phone. Certainly a good few hundred KB seems fine. Greenshed (talk) 04:45, 11 November 2017 (UTC)
- For most new smartphones sold in the US, Europe, and other very developed countries/regions, the size is not a major problem. However, keep in mind that Wikipedia is also accessed on smartphones that may be a few years old (insufficient RAM) and, more importantly, smartphones in many poorer/developing countries may have more trouble accessing longer Wikipedia pages. English Wikipedia is accessed by people in many poorer countries where English is either a common first or second language. I'm not saying that quality smartphones aren't sold in poorer countries, just that most manufacturers make cheap models to sell there that may be common. That said, I think that connection speed over mobile data networks is a much bigger issue on smartphones than technical limitations of the smartphone itself. Even in the U.S., mobile phone service providers limit the amount of data that can be accessed (each billing cycle) at higher speed and then throttle down the speed after the limit is reached to very slow speeds. AHeneen (talk) 10:54, 22 November 2017 (UTC)
- Still basically a non-issue. Even the longest WP pages are smaller in byte-size than the average medium-quality image file at a photography website, and than any YouTube video clip longer than a couple of seconds. — SMcCandlish ☏ ¢ >ʌⱷ҅ᴥⱷʌ< 21:15, 22 November 2017 (UTC)
So an article has been tagged as too large. What steps can be taken to remove the tag if the user community thinks the article is not too large?
I came to the Wikipedia project page for Article size because a list I work in (List of 2016 albums) has been tagged as being too large. I do admit that the page is approximately the 30th largest article in Wikipedia, but as it is an annual list, it is probably difficult to split it without confusion, and the article is a quarter the size of the technical size limit. I will take the issue to the article's size to the talk page for the article, and if there is a consensus, I will remove the "too large" tag. If there is consensus for splitting the article, then we the users can do that. However, I think that this project page should have a paragraph for resolution of when to remove a tag when not altering an article. I find this project page too vague, too definitive of size and not definitive enough on action. Mburrell (talk) 05:33, 22 November 2017 (UTC)
- See WP:SALAT:
- "The potential for creating lists is infinite. The number of possible lists is limited only by our collective imagination. To keep the system of lists useful, we must limit the size and topic of lists."
- "Lists that are too general or too broad in scope have little value, unless they are split into sections. For example, a list of brand names would be far too long to be of value. If you have an interest in listing brand names, try to limit the scope in some way (by product category, by country, by date, etc.). This is best done by sectioning the general page under categories. When entries in a category have grown enough to warrant a fresh list-article, they can be moved out to a new page, and be replaced by a See [[new list]] link. When all categories become links to lists, the page becomes a list repository or 'List of lists'and the entries can be displayed as a bulleted list. For an example, see Lists of people, which is made up of specific categorical lists."
- I would say that the longest months (if not all, see WP:LISTOFLISTS) be split out into new articles. AHeneen (talk) 10:42, 22 November 2017 (UTC)
- What to do with the list is a decision to be made by the users of the list. I apologize for being vague. The question I had, or the request I was making, was what mechanism should be in place to support the removal of the tag if the article users choose not to split up the list? Would a general vote by the users of the articles who choose to participate in the vote be sufficient? If the tag is left on the article for a month and no-one steps up to split up the article, is it okay to remove the tag? Applying the Too Long tag appears to be an opinion by one user. Can another user apply their opinion and remove the tag?
- What I hoped would happen is that someone who is into the developing rules and definition articles such as the Too Long article would expand on the article to provide a mechanism to enforce the ruling or to allow the tag to be removed. Mburrell (talk) 05:04, 23 November 2017 (UTC)
- If a consensus discussion in article talk concludes without a consensus to split, then just remove the tag. There is no "waiting period" after consensus is clear. RfCs run for a month (max), so if the discussion has been open for a month+ that is surely long enough to determine whether consensus has been reached, and if so, for what. — SMcCandlish ☏ ¢ >ʌⱷ҅ᴥⱷʌ< 08:20, 23 November 2017 (UTC)
Subtopic of too long, excessive header section
Any recommendations for the case of the before-TOC header becoming too large? 4 paragraphs seems a bit big, and 6 (even short!) paragraphs seems right out. 74.104.188.4 (talk) 06:43, 2 March 2018 (UTC)
- See Wikipedia:Manual of Style/Lead section#Length. --Francis Schonken (talk) 06:58, 2 March 2018 (UTC)
Aloha Marketing Tech - A.i. Bot
Meet our a.i. not. KauVakauta22 (talk) 19:00, 3 June 2018 (UTC)
Thank you, well be in touch soon! KauVakauta22 (talk) 19:03, 3 June 2018 (UTC)
Proposing a change to § Splitting an article
I propose changing
- For non-mainspace articles, consider splitting and transcluding into the split parts.
to
- Also consider splitting and transcluding the split parts.
since this method is not limited to non-mainspace articles. See for example List of Latin phrases (full) and its AfD. wumbolo ^^^ 17:10, 2 July 2018 (UTC)
- See also this AfD. wumbolo ^^^ 23:34, 10 July 2018 (UTC)
- Sure, but "transcluding into" doesn't seem to make sense here, just "transcluding". I.e. transcluding the split parts into a larger combo article. — SMcCandlish ☏ ¢ 😼 21:22, 11 July 2018 (UTC)
Largest articles?...
The Technical issues section states that "there are ~1,000 articles larger than 200 kB, the largest being ~1.1 MB (as of December 2016)." Is there a place around WP somewhere where I can see this info, maybe a list of the articles that are larger than 200kB or a tool that I can run to see such a list? Thanks, Shearonink (talk) 17:43, 23 June 2018 (UTC)
- @Shearonink: If you don't get a quick answer here, try WT:Tools, and if not there WP:VPTECH. — SMcCandlish ☏ ¢ 😼 18:29, 23 June 2018 (UTC)
- @Shearonink: (and anyone else): Special:LongPages. Onetwothreeip (talk) 00:25, 13 August 2018 (UTC)
Issues with implementation
In recent weeks, I've been involved in a number of discussions of our largest articles (400,000-600,000 bytes, or more), where people have cited this page as a justification for not splitting the articles, saying things like (I paraphrase) "list articles are allowed to be this long", "there is no rule requiring us to split this article" and "the guideline only applies to prose, not tables or references".
Examples include:
- Talk:List of law clerks of the Supreme Court of the United States#Redux: 2018
- Talk:Opinion polling for the 2015 United Kingdom general election#Redux
- Talk:Sub-national opinion polling for the 2015 Spanish general election#Page size
It is my view that this page is being misinterpreted, perhaps because it is not worded clearly. What do other editors think? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:09, 24 December 2018 (UTC)
- I don't think this page has a lot of traffic. The most important point here is that the intent of guideline applying to prose is that most articles are not simply tables, and elements like tables are usually just additional information. These very large articles which are lists are essentially completely tables, and are more relevant to the reader than the prose. To stick with the intent of the guideline means to consider that these tables are too long and should be split. The rhetoric from some editors like how you describe is not much of a problem and most people are reasonable when there is actually a solution. I've only ever had one incident that ended up on the ANI. Onetwothreeip (talk) 12:48, 24 December 2018 (UTC)
Applying WP:Notable to splitting every topic
Pontificalibus, regarding this and this, I reverted because I haven't seen it always be the case that the split-off topic must meet the WP:Notability guideline. I'm mainly referring to "list of" articles and filmography articles. I see so many "list of" articles that have notable and non-notable fictional characters. But is the topic of "list of" actually WP:Notable itself in those cases? Similar goes for filmography articles. It's common to split out an actor's filmography section into its own article, sometimes even when this is unnecessary. In those cases, are we stating that the filmography is WP:Notable because the actor's films (or some or most of the actor's films) are WP:Notable? Going by what I've seen over the years, and your edit summary, I think this needs wider discussion. On a side note: Since this page is on my watchlist, I ask that you don't ping me to this page when replying. Flyer22 Reborn (talk) 01:49, 21 January 2019 (UTC)
- A reminder that WP:LISTN does exist and does mean that notability standards don't necessary apply to every list but we'd still prefer some factor of notability for the arcing topic. --Masem (t) 01:53, 21 January 2019 (UTC)
- I think our practice has been to require that a list article has a main "mother" article, and, to save doubt and controversy, to require that each item in a list article is blue-linked, IOW that it has an article where the sourcing proves its notability, and the existence of the article proves it's not some flash-in-the-pan thing. -- BullRangifer (talk) PingMe 01:57, 21 January 2019 (UTC)
- We have plenty of split articles like Filmographies, discographies, and the like where the bulk of the works are blue-linked, but the remainder are unlinked. Key is that sourcing must still be present to support each entry in the list. --Masem (t) 02:05, 21 January 2019 (UTC)
- In that case, the sourcing must be very good and established by multiple sources. Notability is not established by one RS.
- Unlike a normal article, where the topic's notability must exist to get an article, and that by many good sources, a list article's individual elements must be notable, so requiring an article places the burden on the one who wishes to include the item. That weeds out all the spammers and promoters. -- BullRangifer (talk) PingMe 05:27, 21 January 2019 (UTC)
- We have plenty of split articles like Filmographies, discographies, and the like where the bulk of the works are blue-linked, but the remainder are unlinked. Key is that sourcing must still be present to support each entry in the list. --Masem (t) 02:05, 21 January 2019 (UTC)
- I think our practice has been to require that a list article has a main "mother" article, and, to save doubt and controversy, to require that each item in a list article is blue-linked, IOW that it has an article where the sourcing proves its notability, and the existence of the article proves it's not some flash-in-the-pan thing. -- BullRangifer (talk) PingMe 01:57, 21 January 2019 (UTC)
Does this project page regulate size of lists / maximum size non-prose Wikipedia articles?
There have been some discussions in some of the larger Wikipedia articles about splitting the articles, and I and another user, User:Onetwothreeip have been going around on whether this list regulates list sizes as well as prose articles.
This project page was created to manage the readability issues. It starts out with three related measures of article size, Readable-prose, Wiki markup size, and Browser-page size. My opinion is that this project page does not regulate list size, as it states under the readable prose section excluding material such as footnotes and reference sections, diagrams and images, tables and lists. Under size guidelines, it again uses the word Readable prose size when defining size limits. It also states under readability issues that Wikipedia has practically unlimited storage space, however long articles may be more difficult to read, navigate, and comprehend. Again, my opinion is that lists have their own internal logic for reading, navigating and comprehending, so for these reasons, I do not believe this project page regulates list size.
On the other hand, this project page lists technical issues, such as users with dial-up, smartphones with bad bandwidth, and others may take longer to load up, so it is difficult for older browsers and such to view larger articles. The end of that section lists the maximum default article size as basically 2 million characters, but at this time, the largest list Wikipedia articles are under 600,000 characters. The list I am most familiar with, List of 2017 albums, hit another hidden limitation, which is that there is a limited amount of template usage allowed per article, and so at 1284 references, the reference table failed, and so the list had to be scaled back. In my opinion, the maximum size of the list is regulated by the amount of references, and so has a natural cap that will keep the article under 600,000 characters, probably, and I do not see the need to split the list.
However, I just might be interpreting the project page words to mean what I want them to mean, and so I would like some administrators to chime in on article size limits, the domain of this project page, and what is best for Wikipedia.
I invite User:Onetwothreeip and/or User:Pigsonthewing to provide a countervailing view. Of course other editors as well, do not mean to be exclusive. Mburrell (talk) 05:28, 10 January 2019 (UTC)
- Well this is more information than I believe you revealed about the List of 2017 albums than on its talk page. If there is a limit to how many references the page can hold, the solution is to split the article so that more entries can be included. Article size considerations are of course applicable to list articles too, and why wouldn't they be? Even if lists have separate requirements, that page is the second largest list on Wikipedia. The reasons to split any article are not to adhere to the 100,000 bytes rule or to be less than 600,000 bytes, it's simply to make an article easier to read and easier to edit. I think some editors may feel that an article being split which they have worked on is somehow penalising them, but if anything this is more like approving the content they have created and says that there is more than one article's worth of work put into it. I don't think any discussion would do well with large opening statements, so I'll just leave it at that. Onetwothreeip (talk) 05:38, 10 January 2019 (UTC)
- I also would like to be enlightened. The List of observatory codes was split by a non-involved editor, breaking sortability and cross-references from a large amount of articles without a single note on the talk page. This seems reckless. The spirit of a collaboration is not given when bureaucrats pursue a single-minded goal of reducing a kilobyte figure while the frustrated contributors/experts potentially abandon the topic. After reading the above posts, I know of at least one other frustrated contributor. Rfassbind – talk 16:54, 21 January 2019 (UTC)
- I am not a bureaucrat. Discussion went on from February 2018 about making the article shorter, and it wasn't split until December by another editor. That split was reverted, I decided to restore the split, and it was reverted again. This is an entirely proper process in the spirit of discussion and of bold edits. Please do not hesitate to query me or bring to my attention any issues regarding splits at any place, including on my user talk page. Onetwothreeip (talk) 22:27, 21 January 2019 (UTC)
- I also would like to be enlightened. The List of observatory codes was split by a non-involved editor, breaking sortability and cross-references from a large amount of articles without a single note on the talk page. This seems reckless. The spirit of a collaboration is not given when bureaucrats pursue a single-minded goal of reducing a kilobyte figure while the frustrated contributors/experts potentially abandon the topic. After reading the above posts, I know of at least one other frustrated contributor. Rfassbind – talk 16:54, 21 January 2019 (UTC)
The RfC at Talk:Israeli occupation of the West Bank#RfC: Article size may interest talk page watchers here. Input welcome.Icewhiz (talk) 06:53, 27 January 2019 (UTC)
Unnecessary archive links
I wonder how the addition of unnecessary archive links affects this? Some editors use IABot (not an approved bot?) to do this, and forget to set it so it only adds the links to deadlinks. They thus create a huge amount of bloat, and make the article much harder to edit.
Here's a recent example with this edit summary: "(Rescuing 190 sources and tagging 0 as dead.) #IABot (v2.0)". That single edit added 39,490 bytes! I've seen this done where it added over 100,000 bytes. -- BullRangifer (talk) 17:38, 24 January 2020 (UTC)
References
Are all the bytes used for references counted in a "too large" article? -- BullRangifer (talk) 04:55, 25 January 2020 (UTC)
- No, refs aren't part of readable prose. --Masem (t) 06:15, 25 January 2020 (UTC)
- Thanks. BTW, how is readable prose calculated? Does one simply copy the finished article as it appears to readers, minus the references, and then use that figure? -- BullRangifer (talk) 16:10, 25 January 2020 (UTC)
- User:Dr pda/prosesize can do that. --Masem (t) 17:18, 25 January 2020 (UTC)
- Thanks. BTW, how is readable prose calculated? Does one simply copy the finished article as it appears to readers, minus the references, and then use that figure? -- BullRangifer (talk) 16:10, 25 January 2020 (UTC)
Content removal section
As seen here, I removed the Content removal section and moved the link to the See also section. This section relied on an essay and contradicts the entire page. We do remove contently solely to reduce length. But, yes, that content is saved elsewhere if valid to retain. That is per WP:Preserve. If having a Content removal section, it needs to be better presented. I'm not sure how it lasted so long in that form. Flyer22 Frozen (talk) 02:01, 2 April 2020 (UTC)
- Disagree. This needs to be further discussed and a clear consensus established before the section in question, i.e. Content removal, is removed. Content removal is based on tangential bloating, lack of verifiability, NPOV issues, off topic material, the breaking out an unwanted section, per consensus, etc. Content should not be removed if it is presented in summary style, well sourced, has consensus, etc. i.e.It should not be removed simply on the basis that the article exceeds size guidelines. We will need more than the Talk at the O.J, Simpson article over so called bloating, to decide whether this long standing section should be removed. -- Gwillhickers (talk) 23:19, 30 June 2020 (UTC)
Ruhul Islam Hridoy
Ruhul Islam Hridoy (born 21 February 1998)[1] is a Bangladeshi journalist. He was work in Daily Desh Rupantor. [2] He is also a political analyst. — Preceding unsigned comment added by Ruhul Islam Hridoy (talk • contribs) 04:24, 7 October 2020 (UTC)
- Did this get posted on the wrong talk page? There is nothing in the project page that mentions Ruhul Islam Hridoy. Recommend previous editor determines where the comment should have been posted, and posts in the correct article. Thanks. Also recommend that another editor take this as a vote to delete this section from the talk page for Article size. Mburrell (talk) 04:33, 7 October 2020 (UTC)
SIZERULE
In a recent discussion, the following opinions were offered:
- "As for WP:SIZERULE, that applies to articles, not lists."[3]
- "in practice, SIZERULE applies to any page that you want people to be able to load on mobile or low-end devices, and any page you want people to actually read. Also note that SIZERULE itself currently says "apply less strongly to list articles", not "does not apply to lists".[4]
Also WP:HASTE says "As browsers have improved, there is no need for haste in splitting an article when it starts getting large."
I don't think that latter is accurate. In many third-world countries -- and among the elderly in first-world countries -- low end devices with limited browsers are still in use. We should not assume that just because we have newer browsers that everyone does.
There is another group that can have problems with overly long pages: User who use satellite phones. If you are on a small ship in the middle of the ocean, that satellite may be your only internet access. Such access is often slow, and each packet sees a delay.
Also, a few people actually buy refrigerators with internet browsers in them, and a lot more people use the browser that is in their TV -- and people keep TVs for many years. I know that my TV can access the Internet but is is quite limited.
Finally, as an engineer in the toy industry, I know that it is only a matter of time before the costs get low enough so that we can start putting WiFi and browsers into low-cost toys.
In my opinion, we should make the advice at SIZERULE apply to all mainspace pages on Wikipedia.
Comments? --Guy Macon (talk) 06:04, 30 November 2020 (UTC)
- It definitely should apply to all content. Too many people think how they browse from a first-world country where there may be no seemingly practical limits, but we have to remember about both third-world limits as well as screen readers and accessibility concerns related to that. (I also beg the question of a non-article that can get past SIZERULE without having problems with reference template issues - unless they have no references which is even a worst problem). --Masem (t) 06:26, 30 November 2020 (UTC)
- Nope-- it is an antique rule that was designed for the AVERAGE reader who now has 1000x faster access. The rule reduces the quality and content of articles. Even "slow" browsers can download faster than people can read so they lose no time. I suggest that depriving folks in poor countries of information is no service to them. As for imaginary cruise ship passengers, they have time to spare for the extra 20 seconds. Rjensen (talk) 06:37, 30 November 2020 (UTC)
- We are looking not at the average reader but the lowest common denominator - those with poor infrastructure in third-world countries. We want to reach the maximum possible audience, not make it convenient for the average audience.
- And I will strongly argue that most articles that get to the 100k prosesize limit (eg ignoring tables, references and other wikimarkup) that SIZERULE gives are almost always possible to split via summary style into two or more articles that are more comfy to read given that we're supposed to be an encyclopedia. It's not only a technical limit but a human comprehension limit as well. --Masem (t) 06:51, 30 November 2020 (UTC)
- A Wikipedia that is designed for the average reader is, by default, suboptimal for 50% of our readers. From Rjensen's "cruise ship" comment, it is clear that they don't understand the technical issues with satellite phones. Cruise ships have onboard WiFi, served by an antenna bigger that your house. That's why I specified small ships. And even the cruise lines have issues: see www.cruisecritic.com/articles.cfm?ID=1419\ and thepointsguy.com/guide/cruise-ship-internet-wifi/ for details. A slow connection is a real problem when you are paying 75 cents a minute to look something up on Wikipedia. --Guy Macon (talk) 07:13, 30 November 2020 (UTC)
- what % of Wiki readers pay 75c per minute? I think the max size rule is suboptimal for 90+% of our users. Rjensen (talk) 08:12, 30 November 2020 (UTC)
- Nope-- it is an antique rule that was designed for the AVERAGE reader who now has 1000x faster access. The rule reduces the quality and content of articles. Even "slow" browsers can download faster than people can read so they lose no time. I suggest that depriving folks in poor countries of information is no service to them. As for imaginary cruise ship passengers, they have time to spare for the extra 20 seconds. Rjensen (talk) 06:37, 30 November 2020 (UTC)
- (edit conflict) Oppose – this confounds a lot of things. The SIZERULE is about readable prose, not wiki markup size. There is no direct relation between readable prose and download time; and, BTW, there is also no direct relation between wiki markup size and download time. Note, for instance, that images and other files are neither counted in readable prose, nor in wiki markup size. Same for, for example, templates: e.g. a battery of five detailed navboxes at the bottom of an article counts zero in readable prose, and nearly nothing in wiki markup, but can load slower than a fairly extended list. Also cite templates are known for taking a lot of bandwidth relative to their prose size (zero) and wiki markup (always much smaller than the number of bites they make you download). So, when comparing, for example, a reasonably-sized well-referenced article about a famous prolific baroque painter (lots of images, lots of templates), and a very extended list, which has no images and only a few general references for the entire list, then the first may load much slower.
- I think download times should be reasonable, and splits may be implemented where possible. See e.g. Wikipedia:Naming conventions (long lists) which implicitly shows how to split up lists. Any bullet list, or ordered list, can be split up in this fashion. Tables generally can: large sortable tables are a bit trickier if one wants to keep the sort functionality (a possible solution is, for example, to provide smaller partial lists for those who don't want to load the big sortable list). But one doesn't have to be freakish about some pages loading slower than others. People with slow connections know that. A prose article with a lot of images can load slow. A comprehensive sortable list can load slow. Provide alternatives where possible, but, e.g., removing all images, references and navboxes from Raphael's article is as bad an idea as converting all sortable tables to splittable bullet lists.
- Another issue altogether is WP:CHOKING, including CHOKING of an edit window: these are the "amount of downloaded bites" issues that matter. For the edit window, the "amount of downloaded bites" is effectively correlated to wiki markup size. In which case large chunks of markup code can often be put in templates, so that they take less space (and download time) on the article you're editing. But as such, these techniques can be applied without downsizing, e.g., large sortable tables. --Francis Schonken (talk) 07:21, 30 November 2020 (UTC)
- I was also under the impression that SIZERULE is as much, if not more, about readability than technical issues, as part of Wikipedia:Summary style. As Francis Schonken notes, there are far more important technical considerations for size than prose. Anecdotally, in areas or times with particularly spotty internet, I have found pages with significant numbers of templates to be the biggest barrier. I believe this is why for example templates are not allowed on FA review pages. CMD (talk) 08:33, 30 November 2020 (UTC)
- Oppose - For all the reasons listed by User:Francis Schonken. Article size has three related measures, Readable prose, Wiki markup size, and Browser page size. This applies to articles, and is not in question. In the Readability issues section, there is the section on Lists, tables, and summaries that already sums up the thinking on lists and such, where they may be left intact if no natural way to split the list, keeping them as short as feasible within it's scope and purpose. In the technical issues section, it mentions that templates are affected past the post-expand limit issue. Certainly, all lists, tables and summaries should be kept smaller that the post-expand limit. That is the technical limitation, not the article size.
- I work with some large lists, and have made an edit while a user was making an edit using a smart phone. The user complained about the problem of making an edit when someone posts an edit conflict, but did not complain that it was impossible to make edits using a smart phone. I have myself made edits on my older model smart phone. When someone states that technical issues should limit list size, how will it be quantified that there are edit challenges? Should we review large list and table talk pages and see if people are leaving comments about difficulty in making edits due to list or table size? Are we assuming problems that don't exist? As for low cost toys being the new computer low common denominator, has it been quantified who the users of these devices will be and those users desires to edit large tables and lists? When was the last time a child with a toy tablet wanted to add to the List of Joe Biden 2020 presidential campaign endorsements (currently the largest list/article in Wikipedia)? I oppose placing hard limits on list sizes for assumed problems, trying to fix issues that are probably hypothetical, not proven to be real at this time. Let's collect proof of a problem before breaking what works in the name of fixing a perceived problem. Mburrell (talk) 21:12, 30 November 2020 (UTC)
Updating language re slow connections
Is dial-up prevalent enough in 2020 to still make reference in Wikipedia:Article size#Technical issues? Lots of people in rural areas do have slow connections — even just a few kilometres northwest of modern, urban Toronto, there's a large area with near paleolithic download speeds — but it's not technically dial-up. Is there a better way to address word the issue? -- Zanimum (talk) 02:09, 21 April 2020 (UTC)
- +1 to Zanimum's point. I would add that I'm pretty sure images are more of a problem than text length. Ed [talk] [majestic titan] 05:38, 13 September 2020 (UTC)
- The ed17 Very true. I may take a stab at re-writing the section, as there have been no objections listed here. -- Zanimum (talk) 02:57, 16 September 2020 (UTC)
- Section appears to be stable and acceptable, and talk here is six months old, so I have removed the section banner. Please replace the banner if discussion resumes. SilkTork (talk) 10:55, 19 March 2021 (UTC)
Inefficient lead
I came to this article because I read today's featured article John, King of England (12762), found it a bit long, and wanted to see what Wikipedia's guidelines say about this. So I searched for WP:ARTICLESIZE, which opened the present article. I read the lead and found it read rather like an introduction to the problematic than a summary of a guideline. I had to read for quite a while before finding the table in Section 3.4, which gives size limits in terms of characters. I feel the limits expressed in words would also be useful. According to WP:Size comparisons the average word length in Wikipedia is 6. I feel the lead should contain a sentence that something says to the effect of:
- Articles having a readable prose size in excess of 100000 bytes or 17000 words, which ever is reached first, should almost certainly be shortened. Articles with a readable prose size in excess of 60000 bytes or 10000 words should probably be shortened. The size might be reduced by using more concise language, dropping trivia, offloading to related articles, splitting, creating articles focussed on matters treated in the article as a related matter. Johannes Schade (talk) 15:38, 19 October 2021 (UTC)
- @Johannes Schade: Thanks for your input. The discussion that precedes this section is an attempt to get some clarifications to size guidelines as they are sometimes misinterpreted. As a fellow OG (old guy), I can appreciate the length issue, but I personally would rather have a longer article as I usually just read the parts I'm interested in at the time. Putting some of the detail into notes is sometimes, but not often, used. I would change your wording "should probably be shortened" to "should be considered for shortening" and I think the size quantification are about right, in nice, round numbers. Coincidently, a couple of us were actually having a lengthy conversation this morning about King John and his many mistresses and children.VarmtheHawk (talk) 17:54, 19 October 2021 (UTC)
- The proposed language goes in the wrong direction.
- We should be getting rid of the "almost certainly" language. And the "whichever comes first" language is absolutely atrocious, making it sound even more like crafting articles is an exercise in bean-counting.
- If the language can be made more concise, it should be made more concise, period, regardless of the length of the article.
- If trivia doesn't add to the reader's understanding of the subject, it should be dropped, regardless of the size of the article.
- EEng 18:02, 19 October 2021 (UTC)
- What I am trying to do is nudge one of the people that have the power to edit this guideline (like User:SandyGeorgia?) to write a better lead, which should include the essential numeric limits. I find this is needed for general guidance but perhaps most importantly for use by reviewers so that articles that are too long are not promoted (e.g. to GA or FA). Johannes Schade (talk) 07:39, 20 October 2021 (UTC)
- There are no essential numerical limits. EEng 02:14, 21 October 2021 (UTC)
- Dear User:EEng. I see you are a very experienced Wikipedian and I cannot match you qualification, but surely the table in Section 3.4 does give numeric limits. Perhaps you can edit the lead? Johannes Schade (talk) 06:55, 21 October 2021 (UTC)
- See the discussion elsewhere on this page. We should be deemphasizing numeric "limits", not giving them a more prominent role. EEng 12:12, 21 October 2021 (UTC)
- Dear User:EEng. I see you are a very experienced Wikipedian and I cannot match you qualification, but surely the table in Section 3.4 does give numeric limits. Perhaps you can edit the lead? Johannes Schade (talk) 06:55, 21 October 2021 (UTC)
- There are no essential numerical limits. EEng 02:14, 21 October 2021 (UTC)
- What I am trying to do is nudge one of the people that have the power to edit this guideline (like User:SandyGeorgia?) to write a better lead, which should include the essential numeric limits. I find this is needed for general guidance but perhaps most importantly for use by reviewers so that articles that are too long are not promoted (e.g. to GA or FA). Johannes Schade (talk) 07:39, 20 October 2021 (UTC)
Clarification needed for "article splitting activists"
There seems to be a recent trend of a couple of people (@Blubabluba9990 , @Zsteve21 , and @Onetwothreeip) using the Wikipedia:Database_reports/Articles_by_size page and going around to each page and trying to split articles or edit them in some ways incorrectly to try to shrink the size. Is there any description that can be added to this page such that it can be clarified that simply trying to split articles because they're relatively large and ONLY because they're relatively large is not good editing etiquette? Especially when the splitting is being done by non-subject matter experts they seem to commonly make mistakes when splitting and are done without consultation of the regular editors of the pages. Ergzay (talk) 04:01, 15 October 2021 (UTC)
- This guideline already states that such editorial decisions should obtain consensus. The rest seems to fall somewhat within WP:BOLD. If an editor is being perhaps too bold, the best course of action is probably direct engagement with the users. CMD (talk) 06:09, 15 October 2021 (UTC)
- I've split and reduced many articles over the last few years, mostly without any controversy at all. I can't stress enough that the vast majority of articles >450,000 bytes that I have split have been without any opposition from other editors. I'm sorry if some editors supporting such actions have been uncivil, but the etiquette is clearly a matter of how it is done rather than it being done at all and I have always sought to upheld the highest standards of civility, even when faced with spurious accusations of vandalism, sockpuppeting, bad-faith editing or other abuses. Sometimes I have disagreements with editors over splitting or condensing articles, and that's fine, we work them out. I am willing to offer advice or assistance to Blubabluba9990, Zsteve21 or any other editors that wish to help in the size area, but ultimately they will be accountable for their own actions.
- Most of all I would like to stress to everyone that civility should be of the highest importance. Sometimes editors feel that they own a certain article, and can feel offended when other editors seek to make the article congruent with Wikipedia's guidelines and the vast majority of other articles. This should be considered, although obviously we don't let editors make decisions for an article as if they are the owner(s). An editor who has never edited a particular article has as much right to make changes as an editor who has done most of the work on it.
- I would also be the first person to say that editors who have worked on articles for a significant amount of their time are often those who know best the most optimal way to split an article, or to otherwise reduce its size. These articles may not be split or reduced in the way that one might anticipate, but it happens eventually in some way or another. Onetwothreeip (talk) 06:39, 15 October 2021 (UTC)
- @Onetwothreeip "can feel offended when other editors seek to make the article congruent with Wikipedia's guidelines and the vast majority of other articles" Except this is not true. You're not trying to make articles congruent with Wikipedia's guidelines. You're trying to make articles congruent with your own opinion that many articles should be much smaller than they are now. You've made your own guidelines that you think should be followed, and that is fine, but then you go on to assert that those personal guidelines are Wikipedia's guidelines which is simply a form of gaslighting. Ergzay (talk) 20:52, 15 October 2021 (UTC)
- The articles we are talking about are the extremely long articles, several times larger than the average article size. Those articles are inconsistent with the great majority of Wikipedia articles which are much smaller. Articles being split when they get large is a normal process on this project. Onetwothreeip (talk) 21:59, 15 October 2021 (UTC)
- So they're several times larger than the average article. So what? Are they several times larger than the average well-developed, comprehensive article? And even if so, again: so what? Different topics have different needs. And are you really using Wikipedia:Database_reports/Articles_by_size, which reports the source size of each page, not the amount of readable prose? This is the worst kind of gnoming.You say
Articles being split when they get large is a normal process on this project
-- yeah, a normal process when carried out by people who have an interest in a topic and have thought about how it might be best presented, not drive-bys who fancy themselves working "in the size area". I'll say it again: worst kind of gnoming.Exhibit A: Talk:Glossary_of_engineering#Splitting_this_article was complete waste of time -- yours; that of anyone else interested in the article; and that of anyone wanting to use the article, since you've uselessly broken it into two pieces so that readers have to jump around. You also broke intra-article links while you were at it. Tell us what you achieved there? And while you're at it, convince the rest of us that you even understand the difference between source size and rendered size (or, if you like, readable size). EEng 23:10, 15 October 2021 (UTC)- Yes, they are several times larger than the average well-developed comprehensive article, and their excessive size is either an issue itself, caused by another issue, or both. I don't know what you mean by using that particular page, that's simply a weekly summary of the largest articles by the size of the source code. I did not split that particular article you are mentioning, but I'm happy to defend the splitting of any articles I've split myself, or any other issues to do with this area. All I am concerned with is that the articles and Wikipedia itself is improved. Onetwothreeip (talk) 23:21, 15 October 2021 (UTC)
- I've had a look at the split of that article and it seems fine to me. There doesn't seem to be any issues with intra-article links being broken. One of the two halves hadn't been renamed yet, but I've done that now. It looks like the only issue in this example was that editors were too concerned about process. Onetwothreeip (talk) 23:33, 15 October 2021 (UTC)
- Onetwothreeip: Before we go on... where do you get your statistics on the average size of well-developed, comprehensive articles? You say you didn't split Glossary_of_engineering -- that's right, you merely told others it was a good idea [5][6], and now say that splitting it into two arbitrary halves "seem fine". So I'm going to insist that you defend that decision. You have still failed to give any indication of what the benefit was, so I repeat the challenge: how did it help anything? Because here are eight ways it hurt:
- (1) Readers have to think about which of two arbitrary subpages (A-L, M-Z) has the entry they're looking for;
- (2) If you're searching for a word or phrase, you have to do it on two different pages;
- (3) Intra-article links are broken (contrary to what you say -- if you think they're not, then you're not competent to be splitting articles);
- (4) Even once the intra-article links are fixed, it will take significantly longer to follow such links (in 1/2 the cases);
- (5) Countless incoming inter-article links are now broken, and I don't see you rushing to find and fix them;
- (6) Fixing (5) will create pointless churn of watchlists;
- (7) Adding new links from other articles is now harder, since editors have to remember how the list is split;
- (8) Everyone's time has been wasted marveling at this personal crusade you've created for yourself so that you can feel you're doing something useful, which you're not.
- Now, again: what was the benefit of the split? And, specifically, when do you bunch plan to find and fix all the broken intra-article links and incoming links? EEng 04:06, 16 October 2021 (UTC)
- By comparing the sizes of these super-large articles, which are often but not always those with the most source code, with the sizes of what are considered our better articles, such as featured articles. I did express that it would be good for the article to be split, but that doesn't endorse any possible split. The splitting that did indeed take place of that article, I support.
- This is not the right place to discuss the merits of splitting the article, and I'm happy to discuss that on my user talk page. I will briefly address the points you raise. (1) assumes the reader is looking for a specific entry, which is not true. If they wanted the definition of one specific word or phrase, they would use the main search function. (2) is essentially the same point as (1). (3), you'll have to be specific which links you're referring to, but you are admitting in (4) that it's a fixable problem and I don't accept that it takes longer. The same can be said of (5), keeping in mind that I didn't split the article myself. If I did, I would be attentive to particular issues arising from the split. (6), added activity on watchlists is negligible, (7) is not true as the previous links still apply, and (8) it's up to you if you want to spend your time discussing this, that's not my fault or the fault of anyone splitting the article. There are thousands, if not millions, of articles that could use my attention or the attention of any editor, and since I don't have the capacity to address all of the articles we have, I decide which articles I focus on. Onetwothreeip (talk) 06:07, 16 October 2021 (UTC)
I did express that it would be good for the article to be split, but that doesn't endorse any possible split.
-- What??? EEng 06:51, 16 October 2021 (UTC)- (1) Of course they may be looking for a particular entry. By your reasoning we ought to have a thousand individual pages instead of one (or, I guess, two) consolidated pages.
- (2) Your response makes no sense at all. Let's say I'm interested in engineering terms related to the word heat sink. I have to search two different pages.
- (3) No, I don't have to be specific what links I'm referring to. If you can't find them without my help then (I repeat) you're not competent to be dealing with article splits.
- (4) So it's someone else's job to fix the broken intra-article links (I guess because you don't even know how to find them). And of course it takes longer in half the cases, since now half of the intra-article links are now inter-article links, so that you have to load a new page to follow it. Do you really not grasp that?
- (5) The point remains.
- (6) I guess watchlist churn is unimportant to you, but to those who actually tend to articles it's a significant timewaster.
- (7) What are you talking about? Someone wanting to add a link to a particular entry on what used to be a single page now have to go look to see that how the page was split. Many will perhaps be completely unaware that it was split, and unknowingly link to the old article, which no longer exists.
- (8) What about the participants at WP:Administrators'_noticeboard/IncidentArchive1026#Undiscussed_split? Was that up to them as well? Are you just an innocent onlooker, or are you the editor whose activities are raising so much concern.
- I'll note again that, for all the above threadbare excuses for why nothing too bad resulted from the split, you still haven't responded to the most important question asked: What was the benefit?While you struggle to find an answer to that, let's look (as you suggest) at an article you yourself did split. This article [7] was a handy collection of statistics on the 2021 German elections. Apparently because its source was 400K+ (which is a result of every line of every table carrying an external link as a source, not because there's unusually much material in the article, for an article of this kind) you decided to split off one arbitrary piece [8]. Why that piece? How does that better serve the reader? In fact, do you have any idea of how that material relates to the rest of the material? Do you have even the foggiest idea of the significance of what you did, or how it might affect a reader interested in the elections? Let me guess: no.Pinging in Rosguill, who closed the ANI discussion linked in (8).
- EEng 06:59, 16 October 2021 (UTC)
- The opposite of splitting something into a thousand individual articles is to combine a thousand different articles into one. My comment on a talk page saying that it would be good for an article to be split doesn't mean I support every possible way to split an article. Articles should neither be too small or too large, but often the large size is because of another problem.
- I'm willing to take this extensive discussion to my talk page, but I think you should take a break from your computer as you're getting needlessly heated. To respond briefly, on 1 and 2, readers search using the search bar in the top right. I can't address which links you're talking about in 3, 4 and 5 if you don't tell me which links you're talking about. 6 is bizarre, because edits shouldn't be discouraged on the basis that they appear in watchlists. It only takes a few edits to fully split an article anyway. 7, the old article destination has links to both.
- This next article you mention was never "a handy collection of statistics on the 2021 German elections". It was and remains an article about opinion polling for a federal German election.
Why that piece?
It was an especially large part of an article which was not the core content for the article and worthy of its own article.How does that better serve the reader?
Both the content that remains in the main article and the article split off are more accessible to readers.In fact, do you have any idea of how that material relates to the rest of the material?
Yes, the content is about opinion polling for the election; voting intention polling and favourability of the lead candidates. That article is one I have been reading for years and is currently on my watchlist. If you wish to follow up about this article, I invite you to take the discussion to my talk page. Onetwothreeip (talk) 07:26, 16 October 2021 (UTC) (Note: Much of the comment which this is a response to, timestamped 06:59, was added in subsequent edits after I had first read EEng's comment, so my response didn't cover all of what they added afterwards. Onetwothreeip (talk) 02:27, 17 October 2021 (UTC))- @Onetwothreeip Note, normally with opinion polling you include a constituency prediction based on the opinion polling. They go hand in hand and splitting that article was incorrect based on the how the two pieces of information are normally together (look elsewhere on wikipedia where similar information is presented and those pieces of information are on the same page). I'm going to revert that split of the german article. Ergzay (talk) 15:51, 16 October 2021 (UTC)
- Good idea. Replace the old split page with
#REDIRECT [[destination page]] {{R from merge}}
. EEng 19:06, 16 October 2021 (UTC) - That is not true. Constituency results, predictions and polling are typically separate from the other articles in an election series when there is enough content to justify a separate article. Onetwothreeip (talk) 21:21, 16 October 2021 (UTC)
- Good idea. Replace the old split page with
- We're having the discussion here, now because the real issue isn't any particular article, but your idea that arbitrary splits based on size, and with little or no attention to the effect on the presentation of the material, are somehow helpful. We are trying to help you see that, but you seem unable to engage the issues I've raised -- for example, after doing years of splits you still don't seem to know how to find links broken by a split, and when I've referred to watchlist churn caused by fixing broken links, you responded by saying
only takes a few edits to fully split an article
, which shows you still don't understand the issue. So we'll put that stuff aside to focus on this one thing: I've asked over and over what the benefit was of these splits, and the best you've come up with isBoth the content that remains in the main article and the article split off are more accessible to readers
. Sorry, but that makes no sense. How in the world does splitting the article make any content "more accessible to readers"? EEng 19:06, 16 October 2021 (UTC)- Sometimes it's appropriate to split large articles, but that's not always what is the best solution. Often there are other solutions not only to the issue of an article being exceptionally large, but other issues which also happen to greatly increase the source size of the article.
- You can't say I haven't engaged with what you've saying, I've taken each point you've made and responded. What you mean to say is that I am not agreeing with the opinions you've presented.
- It is not controversial at all to say that articles being extremely large are harder to read for readers, and harder to edit for editors. You can read the Wikipedia guidelines to see more on that. Onetwothreeip (talk) 21:25, 16 October 2021 (UTC)
- Other editors will decide whether you're engaged my concerns. I'm assuming your statement that
articles being extremely large are harder to read for readers, and harder to edit for editors
is an attempt to answer my request that you explain how (as you claimed) that splitting articles makes them more accessible to readers. Let's say that's true, at least all other things being equal (which they rarely are). But how does that apply to the engineering glossary, which isn't "read", or to the German polling, which also isn't "read" (though someone might want to use it to find trends and so on -- a use case you've neatly hobbled by isolating a big part of the data from all the rest). Please explain. EEng 02:01, 17 October 2021 (UTC)- Both those articles are read, viewed and accessed by readers. Those verbs can be used interchangeably with my previous use of "read", which should cover those articles. Only one of those articles you mention have I actually split, and I've very easily defended it (and also the split of another article by a different editor). Even if you disagree with an article split I have made, what you should have done is reverted the split or raise it with me. In the few circumstances that I have made a split that was contested, this is what editors who opposed it have done. Then we go to the talk page and work it out, coming to an agreeable conclusion as per WP:BRD. Onetwothreeip (talk) 02:21, 17 October 2021 (UTC)
- Read, viewed, and accessed are certainly not interchangeable -- you don't "read" a glossary the way you might read the bio of some senator. But in any event, you still haven't said in what way the split makes it easier for a reader to read, view, or access the material, especially given that you've broken it into to pieces that can't be considered together. Again, please explain. EEng 18:18, 17 October 2021 (UTC)
- Both those articles are read, viewed and accessed by readers. Those verbs can be used interchangeably with my previous use of "read", which should cover those articles. Only one of those articles you mention have I actually split, and I've very easily defended it (and also the split of another article by a different editor). Even if you disagree with an article split I have made, what you should have done is reverted the split or raise it with me. In the few circumstances that I have made a split that was contested, this is what editors who opposed it have done. Then we go to the talk page and work it out, coming to an agreeable conclusion as per WP:BRD. Onetwothreeip (talk) 02:21, 17 October 2021 (UTC)
- Other editors will decide whether you're engaged my concerns. I'm assuming your statement that
- @Onetwothreeip Note, normally with opinion polling you include a constituency prediction based on the opinion polling. They go hand in hand and splitting that article was incorrect based on the how the two pieces of information are normally together (look elsewhere on wikipedia where similar information is presented and those pieces of information are on the same page). I'm going to revert that split of the german article. Ergzay (talk) 15:51, 16 October 2021 (UTC)
- Onetwothreeip: Before we go on... where do you get your statistics on the average size of well-developed, comprehensive articles? You say you didn't split Glossary_of_engineering -- that's right, you merely told others it was a good idea [5][6], and now say that splitting it into two arbitrary halves "seem fine". So I'm going to insist that you defend that decision. You have still failed to give any indication of what the benefit was, so I repeat the challenge: how did it help anything? Because here are eight ways it hurt:
- @EEng: Wow that split to glossary of engineering is horrendous. Is there any way to revert these types of things? Ergzay (talk) 15:08, 16 October 2021 (UTC)
- That's going to be a bit harder. More urgent is to put a stop to all this ongoing spilt nonsense. EEng 19:06, 16 October 2021 (UTC)
- So they're several times larger than the average article. So what? Are they several times larger than the average well-developed, comprehensive article? And even if so, again: so what? Different topics have different needs. And are you really using Wikipedia:Database_reports/Articles_by_size, which reports the source size of each page, not the amount of readable prose? This is the worst kind of gnoming.You say
- The articles we are talking about are the extremely long articles, several times larger than the average article size. Those articles are inconsistent with the great majority of Wikipedia articles which are much smaller. Articles being split when they get large is a normal process on this project. Onetwothreeip (talk) 21:59, 15 October 2021 (UTC)
- @Onetwothreeip "can feel offended when other editors seek to make the article congruent with Wikipedia's guidelines and the vast majority of other articles" Except this is not true. You're not trying to make articles congruent with Wikipedia's guidelines. You're trying to make articles congruent with your own opinion that many articles should be much smaller than they are now. You've made your own guidelines that you think should be followed, and that is fine, but then you go on to assert that those personal guidelines are Wikipedia's guidelines which is simply a form of gaslighting. Ergzay (talk) 20:52, 15 October 2021 (UTC)
- Referring to 123IP, after years of disruption and IDHT refusals to accept the concerns of myriad editors, I think the only solution is a topic ban against splitting of any kind, including discussion of the subject, as much of the disruption is on talk pages. This excessive focus on article size is weird and counterproductive. -- Valjean (talk) 20:18, 16 October 2021 (UTC)
- Not true at all Valjean, I have a long record of collaboration on article talk pages with editors I disagree with. You've lied about me before, which you admitted to after being called out by other editors, so I don't think you are being or will be constructive in advising me. I would much prefer to have disagreements over content than whatever personal issues you may have with me, stemming from our interactions on contentious articles. Onetwothreeip (talk) 21:29, 16 October 2021 (UTC)
- Well I've never run into you before, and my analysis is exactly the same. I think we should wait to hear from the admin who closed the ANI thread on this two years ago, and then decide how to move forward. EEng 02:01, 17 October 2021 (UTC)
- You should've raised any concerns you had with any of my edits on the talk page of those article. You're overreacting. Onetwothreeip (talk) 02:08, 17 October 2021 (UTC)
- EEng, I see quite a bit of heated discussion about article splitting philosophy, and a handful of editors asserting that specific edits were poor. I don't have a strong opinion about the topic. Onetwothreeip has made an adequate effort to respond to several complaints here. It's clear that several editors disagree with Onetwothreeip about article organization, and that Onetwothreeip's changes are discovered by said editors long after they have been made, creating a scenario where you're objecting to a pattern of behavior rather than challenging individual edits. I would need to see much stronger consensus that Onetwothreeip's recent edits were undesirable and reckless to justify a sanction. signed, Rosguill talk 06:02, 17 October 2021 (UTC)
- I should have been clearer that it's indeed the pattern of behavior that's of concern here. Obviously a sanction (read: topic ban, as was proposed at ANI last time) would need careful evidence and a community discussion. I was just interested in your thoughts about the situation, given that you closed that discussion (and so perhaps have the best sense of its gestalt); I wasn't suggesting that you do anything. EEng 18:18, 17 October 2021 (UTC)
- 123IP, that's an oddly hypocritical personalization, considering you're charging me with actually lying about you. I suggest you strike that and stick to the issue, which happens to be your attitude toward article splitting. -- Valjean (talk) 02:06, 17 October 2021 (UTC)
- Your entire comment was about myself personally. I would much rather discuss issues to do with editing articles. Onetwothreeip (talk) 02:10, 17 October 2021 (UTC)
- Well I've never run into you before, and my analysis is exactly the same. I think we should wait to hear from the admin who closed the ANI thread on this two years ago, and then decide how to move forward. EEng 02:01, 17 October 2021 (UTC)
- Not true at all Valjean, I have a long record of collaboration on article talk pages with editors I disagree with. You've lied about me before, which you admitted to after being called out by other editors, so I don't think you are being or will be constructive in advising me. I would much prefer to have disagreements over content than whatever personal issues you may have with me, stemming from our interactions on contentious articles. Onetwothreeip (talk) 21:29, 16 October 2021 (UTC)
This entire discussion shows that there needs to be additional guidance on article size. I think it is not likely that this is a coordinated effort, there is a group of editors whose objective seems to simply be to split articles. It would be instructive to look at how the list of longest articles has evolved over the last year, especially the editors and their rhetorical tactics and creative use of the Wikipedia policies. Just some examples from a recent "split battle:"
- "The largest, second largest and third largest articles should be split, or in some way have their size reduced." [Obviously an impossibility–there will always be a largest article.]
- "The article is almost at 500,000 bytes, so it is not consistent with WP:SIZE." [Again, obviously wrong, as the reference only discusses readable prose]
- "Our size guidelines do allow for articles to exceed 100,000 bytes, but this article is a few times larger than that." [The first part contradicts the second bullet above, the second part is irrelevant.]
- "The reason for splitting this article is best summarised as making it easier for readers to access and view the overall content, which may be better done over more than one article." [A segue to the alternate argument, point out a problem that doesn't exist.]
- "I don't think you would be convinced by anything I would show you." [The fallback approach when the others are failing.]
- "The prose size limits are there for the ease of the reader in reading the main content of the article, which is typically the written prose for most articles." [A made-up "rule"]
- "When assessing the size of a prose article, we typically don't consider tables, images and other elements to be the primary content of the article, but that's obviously not tenable for articles which primarily contain those elements." [Another made-up rule, where the second part contradicts the first part.]
As I said, there needs to be some written policy on this because once started, the assault never stops.VarmtheHawk (talk) 16:19, 17 October 2021 (UTC)
- Well summarized. Can I trouble your for diffs for the above, or links to the discussions? EEng 18:23, 17 October 2021 (UTC)
- Those quotes all appear to be mine. In the first one, I was referring to the articles that are currently the very largest, not all articles. All the other quotes are correct in their context. What's most important is to take an approach that evaluates each article's needs separately, so for example if an article's content is mostly in tables, we would evaluate the size of the content within the tables. Onetwothreeip (talk) 21:21, 17 October 2021 (UTC)
- Thank you for explaining that "The largest, second largest and third largest articles should be split..." referred to "articles that are currently the very largest, not all articles." Those of us with a public education had trouble figuring that one out. You might note that this is in direct opposition to your last sentence above. Maybe you could enlighten us as to what your position is on this issue, quantitatively if possible.VarmtheHawk (talk) 05:23, 18 October 2021 (UTC)
- See Tall_poppy_syndrome#Etymology. EEng 14:33, 25 October 2021 (UTC)
- Great analogy. But, as noted below, some tall poppies are not really tall at all; all are equal but some are more equal than others. And, as I frequently point out, there will always be a largest article.VarmtheHawk (talk) 17:52, 25 October 2021 (UTC)
- See Tall_poppy_syndrome#Etymology. EEng 14:33, 25 October 2021 (UTC)
- Thank you for explaining that "The largest, second largest and third largest articles should be split..." referred to "articles that are currently the very largest, not all articles." Those of us with a public education had trouble figuring that one out. You might note that this is in direct opposition to your last sentence above. Maybe you could enlighten us as to what your position is on this issue, quantitatively if possible.VarmtheHawk (talk) 05:23, 18 October 2021 (UTC)
Discussion about proposed solution
We obviously need some clearly stated official wording to guide editors when the idea of splitting is broached. We are not talking about normal content editing here. Splitting is rarely necessary and should always be preceded by a thorough discussion and near 100% consensus for splitting. It should NEVER be a BOLD move.
With other content changes that may be considered controversial, BOLD does not apply, but sometimes a passing editor is not aware of any controversy and they make a BOLD controversial edit. In such cases, they should follow BRD when their edit is reverted and not restore their change. They should allow the status quo version to remain untouched until a discussion has produced a very solid consensus. With normal content editing, BOLD is okay once, but if there are objections, caution should then rule. We are not talking about normal content editing here.
Proposal: It should be plainly stated here and at BOLD that:
Article splitting (which is never a normal content type edit) is a de facto controversial change that excludes appeals to BOLD. Splitting is too consequential a change to do as normal editing and using BRD. The possibility of edit warring over a split should be excluded. A split should only happen after an official RfC reaches a very clear consensus, determined by outside observers, not the one wishing to do the splitting.
Let's discuss and improve this suggested wording. -- Valjean (talk) 19:01, 17 October 2021 (UTC)
- Wikipedia doesn't have a problem of bold edits which split articles. Any objected bold edit which splits articles gets reverted and they don't get reinstated unless there's consensus. I would certainly self-revert a bold edit splitting an article if I was asked to do so. Onetwothreeip (talk) 21:24, 17 October 2021 (UTC)
- History has shown that to not be the case. BOLD splits have often caused problems. This proposal would prevent the many debacles that have led to much debate, edit wars, disruption, wasted time, and strong warnings which have been ignored. Let's plug that open pit so more people don't fall into it and even more editors have to waste time pulling them out and cleaning up the mess they have made. We would not be here if this proposal had been our guideline for splits. We're here because it hasn't been. Following this proposal would also prevent the need to threaten topic blocks for BOLD edits that did not enjoy consensus and the ensuing, long, IDHT discussions that have often followed. That's history. Let's enforce the basic principle that is supposed to work here, which is collaborative editing. Let's stop the kind of solo editing that creates problems. Splits are not normal content editing, so they should be treated differently. -- Valjean (talk) 02:14, 18 October 2021 (UTC)
Splits are not normal content editing
– That's a really good point. In many cases they're more akin to RMs, and should be treated as such (in -- I repeat -- many cases, but by no means all). EEng 06:00, 18 October 2021 (UTC)- Moves are actually often very routine and unremarkable. RMs are in effect only for contested moves. Onetwothreeip (talk) 06:20, 18 October 2021 (UTC)
- What part of
in many cases ... I repeat -- many cases, but my no means all
do you not understand? You have an extremely annoying habit of responding to fragments of what other say. EEng 17:47, 18 October 2021 (UTC)
- What part of
- Moves are actually often very routine and unremarkable. RMs are in effect only for contested moves. Onetwothreeip (talk) 06:20, 18 October 2021 (UTC)
- You're here because you want to be here, in your case because you saw a comment on my talk page. "Bold splits" have a very simple solution when they are contested: they are reverted. That is what's happened every single time they were contested before and is normal process. If someone is edit warring over it, that's a specific matter solved through our usual processes. Onetwothreeip (talk) 02:51, 18 October 2021 (UTC)
- History has shown that to not be the case. BOLD splits have often caused problems. This proposal would prevent the many debacles that have led to much debate, edit wars, disruption, wasted time, and strong warnings which have been ignored. Let's plug that open pit so more people don't fall into it and even more editors have to waste time pulling them out and cleaning up the mess they have made. We would not be here if this proposal had been our guideline for splits. We're here because it hasn't been. Following this proposal would also prevent the need to threaten topic blocks for BOLD edits that did not enjoy consensus and the ensuing, long, IDHT discussions that have often followed. That's history. Let's enforce the basic principle that is supposed to work here, which is collaborative editing. Let's stop the kind of solo editing that creates problems. Splits are not normal content editing, so they should be treated differently. -- Valjean (talk) 02:14, 18 October 2021 (UTC)
- I'm not quite sure what the first sentence means, but simply repeating the current policy doesn't really add much to the discussion. Valjean has raised a valid point, and I don't think the policy anticipated the destructive editing that is occurring. Case in point is zsteve21. He says, and I quote: "I am just a novice Wikipedia editor on my own who wants to make articles have manageable markup sizes." Notwithstanding the bizarre nature of that statement, do we want a novice editor making bold split decisions as he has numerous times in his impressive 2-month experience with Wikipedia? I don't much care about "List of Hallmark Movies" but don't think an article like "Glossary of Engineering" should be messed with without a discussion with the large number of expert contributors. VarmtheHawk (talk) 05:23, 18 October 2021 (UTC)
- At this point I don't know if we need a guideline change, or a handful of topic bans, or both. Your last example there is pretty scary. EEng 05:46, 18 October 2021 (UTC)
- If a particular editor is making bad edits, that's a specific matter and not a flaw of policy. It would be helpful if editors could raise what they think are the bad edits. These are all pretty solvable simply by reverting such bad edits. Onetwothreeip (talk) 06:22, 18 October 2021 (UTC)
- This is not about "good" or "bad" edits. Splits can be either. They are bad when made as BOLD edits without a pre-existing consensus for "if" and exactly "how" it should be done.
- Splits are not normal editing. They are very different and should be governed by different rules. They should not be subject to back-and-forth and BRD editing procedures. All the preliminary work should be done on the talk page, with no attempts to perform the split until a consensus is reached. -- Valjean (talk) 17:35, 18 October 2021 (UTC)
Summary to-date. Since this discussion appears to be winding down, I thought it would be useful to summarize the major points that have been made.
- There is a group of editors whose mission is to split the largest articles, regardless of merit.
- This group will use a myriad of arguments, generally untrue, irrelevant or exaggerated, and will continue to recycle these arguments until the contributors of the article are worn down.
- Counterarguments by the subject matter experts to these arguments are met with derision or requests to prove their counterarguments by comparing their work to other Wikipedia articles; responses are never good enough.
- They support the destructive process of "bold splitting" and believe that it is easy to counter, despite evidence to the contrary.
- They are very familiar with Wikipedia policies, much more so than an average editor working on an article.
- They do not support any changes to policy.
- They are particularly adept at using the SIZE argument, making it mean whatever they feel like at the time.
Please feel free to add to this list or to show that any of them are untrue.VarmtheHawk (talk) 17:41, 18 October 2021 (UTC)
- There may be more but that's a great start. Perfect description of the behavior in display at Talk:Glossary_of_engineering:_A–L#Reverting_the_split. A friend suggests that
What's needed is much more prominent guidance at WP:SPINOUT that this should only be done to well-established articles when there is both a strong consensus and adequate subject expertise to make a sensible subdivision. WP:HASTE is too wishy-washy about how maybe you might think for five seconds before breaking out the chainsaw.
I think that's a great framework to work from. EEng 17:02, 19 October 2021 (UTC)
- There may be more but that's a great start. Perfect description of the behavior in display at Talk:Glossary_of_engineering:_A–L#Reverting_the_split. A friend suggests that
- In looking into the background of this issue, I've noticed that, in the last six months, the top 10 longest articles have been targeted and split by this group, relegating them to a lower spot on the list. Of the current top 20, almost all are identified as targets for splitting. The first two, List of chess grandmasters and List of Falcon 9 and Falcon Heavy launches are subject to fierce debate (in addition to the revisit of Glossary of engineering). Yet the third, List of The Amazing Spider-Man issues has no such comments. Even more interesting is the fact that #7 and #18 are about Donald Trump and his presidency and both exceed 100k in readable prose. I wish someone would comment on this dichotomy. Perhaps a list of longest articles by readable prose?
- What often goes unsaid is that an article once split may again be subject to another split as the list narrows. The attempt to call a moratorium on further discussion of splitting List of Falcon 9 and Falcon Heavy launches was, of course, met with derision.VarmtheHawk (talk) 17:54, 19 October 2021 (UTC)
- When you say "top 10 longest", you mean longest per that database report Ergzay mentioned in his OP at the top of this thread? EEng 18:04, 19 October 2021 (UTC)
- Hopefully fixed.VarmtheHawk (talk) 18:53, 19 October 2021 (UTC)
- My point is that that report is about wikisource size, which is completely irrelevant. EEng 01:24, 21 October 2021 (UTC)
- Hopefully fixed.VarmtheHawk (talk) 18:53, 19 October 2021 (UTC)
- The longest articles purely by prose would mostly be related to Donald Trump and recent American politics, in my experience. Onetwothreeip (talk) 06:49, 20 October 2021 (UTC) Onetwothreeip (talk) 06:16, 20 October 2021 (UTC)
- When you say "top 10 longest", you mean longest per that database report Ergzay mentioned in his OP at the top of this thread? EEng 18:04, 19 October 2021 (UTC)
That's not even remotely true (see, for example, Douglas MacArthur), but I think the picture on this issue is getting clearer:
- The arguments usually presented towards splitting an article are frequently misstated, and the articles in question are almost always in compliance with WP:AS.
- There is not a "one size fits all" approach that works.
- Many articles should be split or otherwise reduced in size, but any argument for doing that should include a valid reason. For example, if one section is considerably more detailed that the rest of the article, that may be a candidate.
- Appearance on Special:LongPages (and similar reports based on wikisource size) has zero weight in arguing for an article to be split (e.g., List of The Amazing Spider-Man issues);[further explanation needed] all arguments should relate to the amount of material the reader sees, distinguishing article prose vs. tabular (and similar) material vs. notes and references vs. images and other visuals, etc., and take into account the distribution of material into sections, the nature of the topic, ways readers are likely to approach the material, and so on.
- The list of prose-size breakpoints (e.g. "almost certainly split" at 100K) is just something someone wrote on the back of an envelope 15 years ago, yet certain editors treat it like the Ten Commandments.
- Because of WP:HASTE, articles should not be split boldly. If an editor feels that an article needs to be split, they should make a concrete proposal and consensus reached. Significant weight should be given to the opinions of the subject matter experts.
If the vote is against, the issue should be put to rest until the article has major changes.
In particular, this practice of constantly throwing up arguments, with the corresponding scramble to respond, really should stop.VarmtheHawk (talk) 23:52, 20 October 2021 (UTC)
- Subject to User:VarmtheHawk's approval, I've made some changes to the above list. EEng 01:37, 21 October 2021 (UTC)
- Yes, no problem. The comment above reflects that the List of The Amazing Spider-Man issues is not materially different from the lists being contested and yet no one has a problem with it. What I meant to say on the proposed deletion was: "If the vote is against, the issue should be put to rest. Should the article significantly change through the addition of material, the issue could be revisited." Either way is fine with me.VarmtheHawk (talk) 01:48, 21 October 2021 (UTC)
- There's always a general discouragement of re-raising a question too soon, but specific language saying "You can't raise this again until X" would be very unusual, and there's no special reason for it here. It would become a point of contention, trust me. EEng 01:58, 21 October 2021 (UTC)
- VarmtheHawk, I agree, and would like to add that article splitting is so consequential that long discussions by advocates should be avoided. They should only try to split articles where they meet no resistance from other editors. The need for splitting, and manner of doing so, should be readily apparent to all and uncontroversial. If there is much resistance, they should move on and not try to press the point. -- Valjean (talk) 15:19, 21 October 2021 (UTC)
- Yes, no problem. The comment above reflects that the List of The Amazing Spider-Man issues is not materially different from the lists being contested and yet no one has a problem with it. What I meant to say on the proposed deletion was: "If the vote is against, the issue should be put to rest. Should the article significantly change through the addition of material, the issue could be revisited." Either way is fine with me.VarmtheHawk (talk) 01:48, 21 October 2021 (UTC)
- Subject to User:VarmtheHawk's approval, I've made some changes to the above list. EEng 01:37, 21 October 2021 (UTC)
Rewrite of WP:SIZERULE section
I did an initial rewrite of the WP:SIZERULE as we seemed to be making no progress in the discussion. It's been rewritten to instead use words instead of byte size as humans don't read bytes, we read words. I took the previous values and used a length of "5" for the word length, which is intentionally small to also uprate the length of articles to be considered and also factor in spaces and other punctuation characters. Ergzay (talk) 01:51, 26 October 2021 (UTC)
- Switching from bytes to words has one VERY important effect, which is to stop people from stupidly looking at the size of the source instead of the amount of readable prose. EEng 02:09, 26 October 2021 (UTC)
- Can we recommend how people can find this info? Maybe using Wikipedia:Prosesize?VR talk 21:24, 28 October 2021 (UTC)
- Ergzay I'm not sure if the "5" factor makes sense. I'm looking at today's main page article and its 27,527 bytes or 2,345 words, meaning a factor of 12. Yesterday's main page article had a factor of 39.VR talk 21:32, 28 October 2021 (UTC)
- For God's sake, the 27k is the wikisource size, not the readable prose size, which is 14k. If we can't get stuff like this straight this conversation is doomed. EEng 05:39, 29 October 2021 (UTC)
- @Vice regent I wasn't trying to match it exactly. I intentionally was doing it to slightly expand the maximum size of articles as the rule was written back when computers and phones in general were less performant and couldn't handle large page sizes. (The sizes were originally added before 2006 when flip phones were the norm even in the US.) Exploring the page history is illuminating. Ergzay (talk) 23:11, 28 October 2021 (UTC)\
- @Ergzay: ok but I don't see this as merely a slight expansion. The last few main page FAs have had these many words: 3339, 2345, 4799, 3308, 3003, 3931, 2055, 2869, for an average of ~3,200 words/featured article. So maybe we should recommend splitting a lot earlier than 20,000 words.VR talk 00:24, 29 October 2021 (UTC)
- I don't think the factor of bytes to words should matter. The point of the size rule is to separate out the byte count as the reason to split the article, as the byte count includes tables, references, re-worded links, and other things that should not be counted for the reason to split articles. Wikipedia uses XTools to calculate prose words and characters, and I verified by copying over the article page for 1989 (Taylor Swift album) to Microsoft Word, deleted out tables and photos, and used used the Word Count tool on the Review tab, and got a word count of 5,132 words, for 34,053 characters, for an average word size of 6.64, compared to XTools calculation of 4,819 prose words and 30,423 prose bytes or characters with an average word size of 6.31, not the factor of 39 mentioned above. This is a reasonable size article that is not close to needing to be split. If we compare a word count of 4,819 to an article byte size of 188,818 for those stuck on byte size, then if this article grew to 20,000, the article might have a byte size of 784,000 bytes, maybe an imposingly large article size, about 50% larger than article sizes of 500,000 that the article editors have been pursuing. I suppose we could compromise on 15,000 words, which if we extrapolated the Taylor Swift album article, would take it to 588,000 bytes, in range with other extra large articles, and almost 95,000 prose bytes, at which point it is a good idea to recommend splitting the article. Mburrell (talk) 03:11, 29 October 2021 (UTC)
- 15,000 words is nearly 5 times more than the average featured article that has ~3,200 words (1989 (Taylor Swift album) is the biggest FA; I'm using FAs on main page in last week as a random sample). I'm seeing plenty of other FAs also in the 2,000-5,000 word count range. I think moderately sized articles are easier to read and maintain and that's what we should strive for.VR talk 04:25, 29 October 2021 (UTC)
- I'm going to draw a line in the sand right here and now on this (and I'm sorry if you feel picked on -- not my intention):
- (a) Our convenience in editing is of ZERO consequence. All that matters is what serves our readers best.
- (b) Articles are very rarely "read". Most, er, readers read the lead, read or skim the first section or two, and then dip in here and there according to level of interest or what they're after, possibly using the TOC as a guide. Talking about making articles "easy to read" (read top to bottom, that is) is a red herring.
- EEng 06:00, 29 October 2021 (UTC)
- I'm going to draw a line in the sand right here and now on this (and I'm sorry if you feel picked on -- not my intention):
- 15,000 words is nearly 5 times more than the average featured article that has ~3,200 words (1989 (Taylor Swift album) is the biggest FA; I'm using FAs on main page in last week as a random sample). I'm seeing plenty of other FAs also in the 2,000-5,000 word count range. I think moderately sized articles are easier to read and maintain and that's what we should strive for.VR talk 04:25, 29 October 2021 (UTC)
- @Vice regent To be frank I'm in favor of deleting the section entirely. The rule was originally written in a time frame of the internet when it was dominated by low performance devices with very low amounts of internal memory. That is not the norm now even for the most underpowered of Android phones. I changed it to words to "repurpose" the size section for something useful, as splitting on fixed byte size in this day and age is frankly ridiculous. Ergzay (talk) 05:54, 29 October 2021 (UTC)
- I don't think the factor of bytes to words should matter. The point of the size rule is to separate out the byte count as the reason to split the article, as the byte count includes tables, references, re-worded links, and other things that should not be counted for the reason to split articles. Wikipedia uses XTools to calculate prose words and characters, and I verified by copying over the article page for 1989 (Taylor Swift album) to Microsoft Word, deleted out tables and photos, and used used the Word Count tool on the Review tab, and got a word count of 5,132 words, for 34,053 characters, for an average word size of 6.64, compared to XTools calculation of 4,819 prose words and 30,423 prose bytes or characters with an average word size of 6.31, not the factor of 39 mentioned above. This is a reasonable size article that is not close to needing to be split. If we compare a word count of 4,819 to an article byte size of 188,818 for those stuck on byte size, then if this article grew to 20,000, the article might have a byte size of 784,000 bytes, maybe an imposingly large article size, about 50% larger than article sizes of 500,000 that the article editors have been pursuing. I suppose we could compromise on 15,000 words, which if we extrapolated the Taylor Swift album article, would take it to 588,000 bytes, in range with other extra large articles, and almost 95,000 prose bytes, at which point it is a good idea to recommend splitting the article. Mburrell (talk) 03:11, 29 October 2021 (UTC)
- @Ergzay: ok but I don't see this as merely a slight expansion. The last few main page FAs have had these many words: 3339, 2345, 4799, 3308, 3003, 3931, 2055, 2869, for an average of ~3,200 words/featured article. So maybe we should recommend splitting a lot earlier than 20,000 words.VR talk 00:24, 29 October 2021 (UTC)
- I think there still needs to be byte-size considerations, particularly when you get to pages like tables and lists that do not use a lot of prose. While it is important to not have extensively long prose articles and thus reasons to split, we also don't want pages that are extremely large in byte-size for readers on slower/limited connections (5g and fast connections are *still* not universal). You probably need to have both word count and byte size, though word count should be the leading reason to split. --Masem (t) 23:14, 28 October 2021 (UTC)
- I think this suggestion conflates two different discussions, because lists and tables are excluded from the paragraph about readable prose. The Lists, tables and summaries section does not have a size limit specified, but these days there is an unofficial splitting logic that reduces tables and lists when they approach 500,000 characters.Mburrell (talk) 03:11, 29 October 2021 (UTC)
- Lets make the unofficial, official. And I'd prefer splitting much before 500,000 characters.VR talk 04:27, 29 October 2021 (UTC)
- Characters are not bytes though. Wikimarkup isn't readable characters either. None of these metrics are good as they are all open to interpretation. It's better to have no rule at all and better to have a "no extreme articles" rule and "split where appropriate" rules rather than constantly trying to chop articles to smaller sizes purely based on their size. Some topics need lots of references and sources which inflate the size to extreme sizes despite the page itself being small. Others have massive amounts of prose with little sources and could often have sections split out and summarized when they become too bloated. There is no hard and fast rule based on size on when something should be split, so writing it down in a page like this just gives excuses to trolls to come and split a page your working on (I had the poor experience to encounter such a troll recently which is what caused me to come to this page and start this effort to fix this page). Ergzay (talk) 06:12, 29 October 2021 (UTC)
- Lets make the unofficial, official. And I'd prefer splitting much before 500,000 characters.VR talk 04:27, 29 October 2021 (UTC)
- @Masem When you say "not 5G", I don't even have 5G nor does anyone I know. 5G isn't even relevant. Byte size is a historical remnant of the time when 2G (or slower) was the norm and devices had memory sizes that were given in terms of single digit megabytes. The size limits were originally added even before the iPhone 1 came out with 128 MB of onboard RAM (and little left for the web browser) which was huge for the time. The era of trying to limit page sizes to such an extreme extent is long past. Modern websites even clock into the megabytes (which I agree is too much), but trying to chop webpages at the 100kb or 200kb mark is just absurd. Ergzay (talk) 06:00, 29 October 2021 (UTC)
- I think this suggestion conflates two different discussions, because lists and tables are excluded from the paragraph about readable prose. The Lists, tables and summaries section does not have a size limit specified, but these days there is an unofficial splitting logic that reduces tables and lists when they approach 500,000 characters.Mburrell (talk) 03:11, 29 October 2021 (UTC)
- While I think this whole guideline is in dire need of reform, I vigorously object to etching in stone a new set of numbers coming off the back of some envelope in 2021, to replace the previous written-in-stone numbers that came off the back of some other envelope 15 years ago. And I absolutely cannot believe there's still talk about "byte size", meaning the size of the wikisource -- which is absolutely irrelevant to what the reader sees, the cost of downloading, or anything else. And even if you fixed that goofup in the discussion and talked about HTML (etc.) size instead, that's still irrelevant to download cost. Are there people in this day and age who don't realize that images completely dominate download cost?And is there anyone who thinks this guideline will be substantially changed without an eventual RfC? EEng 04:56, 29 October 2021 (UTC)
I'm going to muscle in here to order you all to do something, on pain of excommunication (and I've got a personal pipeline to the pope, so I can arrange that if really necessary). Before anyone says one more word, everyone needs to go to Preferences > Gadgets > Browsing and check the box that says Prosesize: add a toolbox link to show the size of and number of words in a page. That adds a "Page size" link to the toolbox to the left of each article. When you click that, it barfs back a bunch of statistics for the article you're looking at, including
- Prose size (text only): XX kB (YYYY words) "readable prose size"
Those, and only those, are the numbers we should be discussing (at least when we're talking about prose, not tables and quotes and stuff).
After everyone does the above, I'll allow discussion to resume. The Great and Powerful Oz has spoken! EEng 05:46, 29 October 2021 (UTC)
- I have reverted the change, which did more than a slight upgrade to length, it actively changed the level of guidance so that it was far more encouraging of longer articles. Humans read words not bytes, but bytes are used as a proxy here, on the assumption that 10,000 words is around 50,000 bytes. The SIZERULE section is a supplementary rule of thumb for the rough 10,000 word guideline. Further, the edit removed mention of two tools which can measure the byte proxy, with no replacements. Regarding overall technical size, the byte size of the SIZERULE section explicitly refers to prose size, so it does not correspond to the download size, the loading size, or similar considerations. If there is a technical issue relating to overall technical size (eg. Wikipedia:Template limits), it would need to be reflected in the Markup size section. CMD (talk) 05:38, 29 October 2021 (UTC)
- I made the change and I frankly agree. I'm primarily in favor of deleting the section entirely, but I kept the section and rearranged it as a concession. If the preference is that it's entirely bad I'd prefer to dlete it. Ergzay (talk) 06:01, 29 October 2021 (UTC)
- I've deleted the section as an alternative rather than trying to massage it into something more appropriate for this day and age. If this is agreeable (or no comments in a few weeks) I'll also go fix all the now broken links to the section. Ergzay (talk) 06:04, 29 October 2021 (UTC)
- I love your enthusiasm, and personally I'm for it, but just killing all numbers is never going to fly without substantial discussion. Or maybe it will -- wouldn't that be wonderful? EEng 06:15, 29 October 2021 (UTC)
- Right now I'm trying to drive more discussion on why people think fixed size limits are a good idea at all. As far as I'm aware, if you have internet at all wikipedia web page sizes are going to be smaller than the rest of almost anywhere else on the internet. Even the largest pages (that don't have images). I'd like one person to arrive with real numbers that can justify such limits. As so far it's just been hand waving. Ergzay (talk) 06:19, 29 October 2021 (UTC)
- I think EEng's doctrinal invocation should be noted here. The presence and absence of images is not relevant to the SIZERULE subsection, and overall web page sizes similarly do not relate to the subsection's purpose or intention. If the issue relates to images and overall web pages, I am not sure why SIZERULE is being edited in any direction. CMD (talk) 06:44, 29 October 2021 (UTC)
- I think you're agreeing with me but to be honest I'm sure what you just said. EEng 06:47, 29 October 2021 (UTC)
- @Chipmunkdavis My primary impetus for starting this discussion is people (gnomes/trolls) abusing SIZERULE to go around to many disparate articles that they are not involved with, trying to split them, often ignoring discussion or not seeking to bring in the usual editors of the page to consult their opinions. Then if you go against their splitting of the article they immediately point to SIZERULE and use the wikimarkup size as proof that they are doing good work by chopping other people's articles into smaller pieces. I wish to stop this behavior. How we get there I am not particular on. Cutting off their incorrect use of a very old article written in the days of 2G and sub 100 MB memory phones by deleting/modifying/etc the section they are using seems like a good start. In either case the article should be changed because it is outdated for the modern era. Ergzay (talk) 07:54, 29 October 2021 (UTC)
- Anyone pointing to wikimarkup size to support SIZERULE is not applying SIZERULE, and if done consistently should be handled as disruptive behaviour as in any other area of the wiki. With regards to 2G and sub 100 MB, such factors are not relevant to SIZERULE, which is more or less a part of MOS, and not too attached to how modern our era is. CMD (talk) 08:22, 29 October 2021 (UTC)
- @Chipmunkdavis How is it part of MOS? Isn't it just something that people just got used to as it encrustified into "this is just how we've always done things"? Look at the section I created below. It definitely started as a technical limitation when the rule was originally created. Ergzay (talk) 11:43, 29 October 2021 (UTC)
- It is similar to MOS in how it is treated, providing general parameters for the formatting of our articles. MOS is indeed pretty encrustified. CMD (talk) 12:34, 29 October 2021 (UTC)
- Well I'm up for getting MOS changed. It deserves to be with regards to this. Ergzay (talk) 12:51, 29 October 2021 (UTC)
- That's fine, but it should have a wide discussion involving the areas that use it, such as the GAN and FAC processes. CMD (talk) 05:52, 30 October 2021 (UTC)
- Well I'm up for getting MOS changed. It deserves to be with regards to this. Ergzay (talk) 12:51, 29 October 2021 (UTC)
- It is similar to MOS in how it is treated, providing general parameters for the formatting of our articles. MOS is indeed pretty encrustified. CMD (talk) 12:34, 29 October 2021 (UTC)
- @Chipmunkdavis How is it part of MOS? Isn't it just something that people just got used to as it encrustified into "this is just how we've always done things"? Look at the section I created below. It definitely started as a technical limitation when the rule was originally created. Ergzay (talk) 11:43, 29 October 2021 (UTC)
- Anyone pointing to wikimarkup size to support SIZERULE is not applying SIZERULE, and if done consistently should be handled as disruptive behaviour as in any other area of the wiki. With regards to 2G and sub 100 MB, such factors are not relevant to SIZERULE, which is more or less a part of MOS, and not too attached to how modern our era is. CMD (talk) 08:22, 29 October 2021 (UTC)
- I think EEng's doctrinal invocation should be noted here. The presence and absence of images is not relevant to the SIZERULE subsection, and overall web page sizes similarly do not relate to the subsection's purpose or intention. If the issue relates to images and overall web pages, I am not sure why SIZERULE is being edited in any direction. CMD (talk) 06:44, 29 October 2021 (UTC)
- Right now I'm trying to drive more discussion on why people think fixed size limits are a good idea at all. As far as I'm aware, if you have internet at all wikipedia web page sizes are going to be smaller than the rest of almost anywhere else on the internet. Even the largest pages (that don't have images). I'd like one person to arrive with real numbers that can justify such limits. As so far it's just been hand waving. Ergzay (talk) 06:19, 29 October 2021 (UTC)
- I love your enthusiasm, and personally I'm for it, but just killing all numbers is never going to fly without substantial discussion. Or maybe it will -- wouldn't that be wonderful? EEng 06:15, 29 October 2021 (UTC)
- Today's FA is Climate change, which is a complex topic with scientific, political, and economical dimensions. Despite its complexities, it is covered in prose size of only 53,000 bytes (8298 words). Each of the article's main sections has its own article, and what's left behind is a summary (WP:SUMMARYSTYLE). To me limiting article size is not about bandwidth limitations, its about article quality.VR talk 15:50, 31 October 2021 (UTC)
- Took a look at Climate change. You are correct that the article has a prose size of 53 kB, 8294 words. It has a wiki-text of 263 kB. So if we took the 8294 words, scaled it up to 20,000 words proposed for a size limit, we would have a prose size in bytes of 128 kB, and a total wiki-text size of about 634 kB, not that we are trying to use wiki-text size, so just using that to compare to currently enforce unofficial standards. This makes it a little larger, but not excessively larger, so if I am reading your statement as a comment on article size, it seems to say that a proposed 20,000 word limit would be acceptable? Or are you suggesting that a smaller 15,000 word limit would be more acceptable (a scaled 95 kB prose text, 475 kB wiki-text)? Maybe I am missing the thrust of your argument on the discussion on article size. Are you saying article size does not matter, as long as every article is written to the quality of a featured article standard? Could you expand on what you are trying to state in terms of article size? Thanks. Mburrell (talk) 20:50, 31 October 2021 (UTC)
- @Mburrell: yes I think 15,000 words should be the limit. Although personally I'd prefer even lower, as the policy page does quote 10,000 words as ideal from a human attention span perspective[9]. Smaller pages force us to summarize content, which is incredibly useful to the average reader (they can always go to the spinned off article if they want more detail).VR talk 23:11, 31 October 2021 (UTC)
- Or they could keep reading the current article if they want more detail. I'm sorry, but this discussion is built on sand. I've just removed the passage asserting that articles should be X words at more because humans read at Y words per minute and can only concentrate for 40 minutes -- cited to a book on management (not psychology, or education, or anything like that) -- and which at the same time links to the article attention span -- which, interestingly, says that adults can concentrate for 5 to 6 hours. It's all a mess of conjecture and OR, founded only on a few random editors' unsupported assertions about what our readers want or need. EEng 00:56, 1 November 2021 (UTC)
- Can we have an RfC to decide this? Guidelines should reflect a broader and stronger consensus than we have here.VR talk 01:15, 1 November 2021 (UTC)
- I too would like to see an RfC. I am mostly in agreement with the changes that User:EEng is doing, but I agree that we need a broad and strong consensus to change a project page. I would not mind a fuzzy upper limit where reducing or splitting a prose article should be discussed, and I would not mind setting the relative (not absolute) limit at either 15k or 20k, and I wouldn't mind an upper fuzzy limit on lists and tables as well, but it should be based on community agreement, or real size logic, and not the hand-waving logic that EEng has been excising. I think the current modifications to the article give a good discussion point for the RfC, but I would like to see community buy-in on the changes. Mburrell (talk) 02:13, 1 November 2021 (UTC)
- As I note in a separate section below, Wikipedia:Splitting is a parallel page that very much overlaps this one (or the way this one was until the axe was taken to it recently) re the triggers and considerations for splitting, plus it gives detailed how-to on carrying out splits. I think the thing to do is to take this conversation over there, or get the participants there over here, and hash it out among us -- before opening any RfC. On the down side, I now have an external commitment that's going to take up a lot of time, and may be as attentive as I know all of you would love me to be. EEng 02:52, 1 November 2021 (UTC)
- As there's no recent discussion there, best to continue it here. And yes the RfC would affect the wording both here and at Wikipedia:Splitting. What exactly will be asking in an RfC? a) no size guideline, b) prose-based size guideline of a max of 10 or 15 or 20 k words, c) a guideline that is based on both prose and other markup? Sorry just hypothesizing.VR talk 16:35, 1 November 2021 (UTC)
- There shouldn't be any use of markup in the size guideline, if we have guideline at all, as that reason existed historically only for technical reasons that no longer exist.
- @EEng I know you said you'd be busy soon, but you're leading this very well so far with your edits. If you want additional people to comment just ping me however and I'll stop by. I'll be watching this page but probably not regularly looking. Ergzay (talk) 05:37, 2 November 2021 (UTC)
- It's impossible to overemphasize the following: There are clearly editors over at WP:Splitting who would be interested in what we've been doing here, and my guess is they don't have this page watchlisted, which explains why there's been so little comment so far. They need to be brought into the discussion before we start thinking about a project-wide RfC. It's just that I've been hesitant to open that door given my other commitments. EEng 11:05, 2 November 2021 (UTC)
- As there's no recent discussion there, best to continue it here. And yes the RfC would affect the wording both here and at Wikipedia:Splitting. What exactly will be asking in an RfC? a) no size guideline, b) prose-based size guideline of a max of 10 or 15 or 20 k words, c) a guideline that is based on both prose and other markup? Sorry just hypothesizing.VR talk 16:35, 1 November 2021 (UTC)
- As I note in a separate section below, Wikipedia:Splitting is a parallel page that very much overlaps this one (or the way this one was until the axe was taken to it recently) re the triggers and considerations for splitting, plus it gives detailed how-to on carrying out splits. I think the thing to do is to take this conversation over there, or get the participants there over here, and hash it out among us -- before opening any RfC. On the down side, I now have an external commitment that's going to take up a lot of time, and may be as attentive as I know all of you would love me to be. EEng 02:52, 1 November 2021 (UTC)
- I too would like to see an RfC. I am mostly in agreement with the changes that User:EEng is doing, but I agree that we need a broad and strong consensus to change a project page. I would not mind a fuzzy upper limit where reducing or splitting a prose article should be discussed, and I would not mind setting the relative (not absolute) limit at either 15k or 20k, and I wouldn't mind an upper fuzzy limit on lists and tables as well, but it should be based on community agreement, or real size logic, and not the hand-waving logic that EEng has been excising. I think the current modifications to the article give a good discussion point for the RfC, but I would like to see community buy-in on the changes. Mburrell (talk) 02:13, 1 November 2021 (UTC)
- Can we have an RfC to decide this? Guidelines should reflect a broader and stronger consensus than we have here.VR talk 01:15, 1 November 2021 (UTC)
- Or they could keep reading the current article if they want more detail. I'm sorry, but this discussion is built on sand. I've just removed the passage asserting that articles should be X words at more because humans read at Y words per minute and can only concentrate for 40 minutes -- cited to a book on management (not psychology, or education, or anything like that) -- and which at the same time links to the article attention span -- which, interestingly, says that adults can concentrate for 5 to 6 hours. It's all a mess of conjecture and OR, founded only on a few random editors' unsupported assertions about what our readers want or need. EEng 00:56, 1 November 2021 (UTC)
- @Mburrell: yes I think 15,000 words should be the limit. Although personally I'd prefer even lower, as the policy page does quote 10,000 words as ideal from a human attention span perspective[9]. Smaller pages force us to summarize content, which is incredibly useful to the average reader (they can always go to the spinned off article if they want more detail).VR talk 23:11, 31 October 2021 (UTC)
- Took a look at Climate change. You are correct that the article has a prose size of 53 kB, 8294 words. It has a wiki-text of 263 kB. So if we took the 8294 words, scaled it up to 20,000 words proposed for a size limit, we would have a prose size in bytes of 128 kB, and a total wiki-text size of about 634 kB, not that we are trying to use wiki-text size, so just using that to compare to currently enforce unofficial standards. This makes it a little larger, but not excessively larger, so if I am reading your statement as a comment on article size, it seems to say that a proposed 20,000 word limit would be acceptable? Or are you suggesting that a smaller 15,000 word limit would be more acceptable (a scaled 95 kB prose text, 475 kB wiki-text)? Maybe I am missing the thrust of your argument on the discussion on article size. Are you saying article size does not matter, as long as every article is written to the quality of a featured article standard? Could you expand on what you are trying to state in terms of article size? Thanks. Mburrell (talk) 20:50, 31 October 2021 (UTC)
- Comment - so I came to this page this morning, looking for the usual article size guide table, only to find it's gone completely... and on the back of a few bold edits by just two or three editors, which has already been reverted once by Chipmunkdavis. So I've reverted again, pretty much for the identical reasons given by CMD above. The guidelines on article length are a longstanding and highly-used aspect of the MOS, and I cite the 60kb "probably should be split" guidance frequently at FAC and elsewhere. Of course, there are exceptions, and the guidance already gives advice about not being hasty, but the general guidance is sound.. and it's not just about length of time to load the page (something which is still a factor for those in the global south who don't enjoy the advanced internet connections that we do), it's also a simple issue of readability and good article design. If changes of this magnitude are to be effected, it needs to be via a sitewide RFC, and with extremely good reasons set out as to why having long articles is suddenly fine and dandy, when it never has been before. — Amakuru (talk) 10:17, 4 November 2021 (UTC)
- @Amakuru They're longstanding because they've been forgotten about with the advancement of technology. Please see my summary in the documentation down below. The rules are abused by trolls/gnomes to chop articles on the basis of wiki markup size rather than some reasonable standard about clarity or topic coverage being too wide. There's numerous articles that have been long standing but have been ruined by the adventurism of these types of people. The size rules date to the era of 2G pre-smartphone phones and when many people had dialup internet and should be discarded. Ergzay (talk) 15:26, 4 November 2021 (UTC)
- Also, do note @Vice regent recently put an item on the talk page for starting discussion to head into an RFC. Ergzay (talk) 15:27, 4 November 2021 (UTC)
- It really doesn't matter how the guidelines came into being 15 years ago, the point is that they are in effect today and they are used regularly to inform size decisions and I see no evidence that the rules of thumb contained in those tables are not relevant now. As noted previously, the recent FA articles on climate change and Earth both come in at significantly below 60kb of prose size, yet these are among the most complex topics that one could possibly seek to write an article on. So it is not only possible to write articles that aren't too long, it is also desirable. From a stylistic standpoint as much as from a technology one. And, on that topic, the "advancement of technology" you mention may be significant in the western world, but as someone with experience working in Africa, I can assure you that bandwidths and data rates there can still be limited.
- If the guidance re bytesize is misunderstood by those whom you characterise as "trolls/gnomes", then the solution is to clarify the language around this guidance so that it's crystal clear that we refer to prose sizing rather than Wiki markup sizing. The solution is not to throw the whole guidance out altogether, just because a few people misunderstand it. Also, the concept of prose size as a byte count is well-established and already used to ensure minimum article sizes in processes such as WP:DYK and destubathons, so let's not pretend this is an archaic and little-understood metric. Cheers — Amakuru (talk) 15:57, 4 November 2021 (UTC)
- Confusion re source size vs prose length is the least of it: the history shows that the numerical limits are simply made up, the end product of a cascade of arbitrary transformations applied to an original, real, 32K limit on source size (which of course no longer applies), leavened by some nonsense about human attention span. They're built on nothing, and while it may be comforting to feel you're guided by some kind of authority [10], it's not healthy for articles to be cut apart on such a basis.
- There were dozens of changes made, the substantive ones explained in edit summaries. If you think something should be restored or changed, do that (after duly considering the reason offered for the original change, of course), but blindly reverting everything because you miss the comfort of someone telling you what to do instead of deciding for yourself, no. EEng 18:12, 4 November 2021 (UTC)
- The edits start here [11]. I propose we keep them. If no one says what they don't like about them in the next few days I'll be putting them back. EEng 06:35, 10 November 2021 (UTC)
- I didn't like the edit here.VR talk 13:07, 10 November 2021 (UTC)
wp:size under discussion
The redirect page wp:size is currently discussed at WP:RFD. --George Ho (talk) 10:13, 23 December 2021 (UTC)
Documentation of the history of WP:SIZERULE
Here I will document the history of SIZERULE and show how little it has been updated in recent years. SIZERULE first appeared with the creation of this page on March 7th, 2003. At the time the max page size was given as 30K. It was explicitly at that time written as a technical limitation of browsers of that time period. Smartphones didn't exist in 2003, and data plans for phones, what they were, often used 2G or worse. Some people had cable internet but most people still used dialup. At some point in the intervening years the value was tweaked to 32K, likely to be base 2, and there were additional changes clarifying that the limit was only for the article, and not for lists as the meaning started to drift away from it being for a technical reason. Some time before 2005 or so a clarification was added that said that mobile browsers and some web browsers crop any pages longer than 32KB and refuse to load any more. On January 17th, 2006 the limit was increased to 50kb. On February 22nd, 2007 the limit was increased to 100kb. And there it has sat for 13 years, with a technological and digital revolution happening around it, we now keep chopping articles to 100kb in wikitext length for "technical reasons".
Does this not strike anyone else as utterly ridiculous? Ergzay (talk) 08:19, 29 October 2021 (UTC)
- It's not only utterly ridiculous, but completely and totally ridiculous as well. And here's more ridiculousness: that early guideline was talking about the size of the wikisource [12], but then suddenly someone apparently just stuck in the words
of readable prose
, thereby completely changing the meaning [13]. - Then in 2006 someone actually proposed (AND I AM NOT MAKING THIS UP) a "Mandatory breakup committee":
First, an editor tries to establish consensus: the issue is brought up on the talk page, and it is suggested that the regulars break up the article into subtopics, with short summary paragraphs (w/ main article attachments), see thermodynamics as an example, so that the main page gets below a certain limit. Second, if plan #1 stifles out in argument and indecision to act, for a number of consecutive weeks, then an breakup arbitration committee notice is placed on the talk page, putting an ultimatum deadline, such that either the regulars break up the page to below a certain limit by that date or an external breakup committee, enforced by a team of administrators, will do so.
- This cookie-cutter approach persists to this day (see elsewhere on this very page) and must be resisted at all costs. As one editor put it (elsewhere on the page just linked):
The persons providing the justification for limits on article size are predominantly "techies" for whom the writing part is a chore compared to the joy of formatting pages, blocking miscreants and otherwise engaging in the plumbing aspects of html page production. These are the folks who theorize that readers will get bored with articles that are longer than x kb (notice how the limits are in kb and not words - very instructive) often because of their own inadequacies in that department. What is lost in this discussion is that some articles are well written and can hold the readers' interest far longer than much of the mediocre prose found in other entries.
- Just so! I absolutely support removing the numerical limits, which are fashioned from whole cloth and based on no evidence whatsoever about what readers want or need. EEng 13:47, 29 October 2021 (UTC)
- I'm afraid you're both mistaken. There are many content creators, including yours truly and SandyGeorgia, who emphasize the importance of writing concise articles and using summary style when they get too long. Anything over 10,000 words is unlikely to pass at FAC. (t · c) buidhe 10:34, 7 January 2022 (UTC)
- Note to self and to my fellow editors: Don't forget Wikipedia:Splitting, which repeats the stupid character count cutoffs, claiming they're based on an assertion that readers can concentrate for 30-40 minutes, citing our very own article Attention span -- which says nothing like that, rather says 5-6 hours. We all know this varies tremendously depending on the reader, motivation, nature of material, and 50 other things, and figures like 30-40 minutes (or 5-6 hours, for that matter) are just pulled out of the air.This page and that page need to be harmonized somehow; they're really both trying to do the same thing. I know! Let's merge them! EEng 22:55, 30 October 2021 (UTC)
- I agree that the attention span argument is quite unfounded; in my experience many readers don't even read a full Wikipedia article from top to bottom anyway; they are looking for a specific piece of information. But I think that's not the only reason why you'd want to limit how long articles can get. Extremely long articles have longer load times and are harder to navigate. I've spent some time at Wikipedia talk:Manual of Style lately and that page is so long that I constantly get lost. If we want to aid readers in finding information then splitting extremely long articles up into ones that are overseeable seems like a good idea to me. ―Jochem van Hees (talk) 16:41, 2 November 2021 (UTC)
- I'm afraid I must disagree with some of what you say. The load-time argument is completely fallacious given that most article's download time is driven almost entirely by images. And saying something about articles based on your experience with MOS is like saying you drive a compact car because you had trouble finding the bathroom on a 747. They're completely different animals. EEng 22:44, 2 November 2021 (UTC)
- Sorry I only noticed your reply just now. I'm not sure which fallacy my argument has, nor where you got that statistic for download time from. I'm not at all an expert on this but I did a quick test by loading the page Border control, which is not only huge in page size but also has loads of images. According to Chrome's network devtools, loading the page content took longer than any of the images; the content took 833ms while the images were anywhere between 10ms and 130ms, and they were downloaded in parralel. (That's with my relatively good wired connection; it will take significantly longer on a weak mobile wireless connection.)In any case, if you're really that disstatisfied with me using the MOS as an example, I only used that example because that happened recently and was therefore quickly on my mind. I have had similar issues during the UEFA Euro 2020 when I often wanted to look up the latest developments but had to scroll all the way down each time; especially on mobile it's hard to find stuff. Or an article that I have worked on myself, List of Eurovision Song Contest entries, which was ginormous before we split it into two. ―Jochem van Hees (talk) 12:13, 5 November 2021 (UTC)
- @Jochem van Hees The mobile site being a bad user experience is mostly a result of them collapsing every table section heading by default. You can stroll from the top of the page to the bottom of the page in an instant though as a single flick can move very quickly from top to bottom. Ergzay (talk) 16:17, 6 November 2021 (UTC)
- Sorry I only noticed your reply just now. I'm not sure which fallacy my argument has, nor where you got that statistic for download time from. I'm not at all an expert on this but I did a quick test by loading the page Border control, which is not only huge in page size but also has loads of images. According to Chrome's network devtools, loading the page content took longer than any of the images; the content took 833ms while the images were anywhere between 10ms and 130ms, and they were downloaded in parralel. (That's with my relatively good wired connection; it will take significantly longer on a weak mobile wireless connection.)In any case, if you're really that disstatisfied with me using the MOS as an example, I only used that example because that happened recently and was therefore quickly on my mind. I have had similar issues during the UEFA Euro 2020 when I often wanted to look up the latest developments but had to scroll all the way down each time; especially on mobile it's hard to find stuff. Or an article that I have worked on myself, List of Eurovision Song Contest entries, which was ginormous before we split it into two. ―Jochem van Hees (talk) 12:13, 5 November 2021 (UTC)
- I'm afraid I must disagree with some of what you say. The load-time argument is completely fallacious given that most article's download time is driven almost entirely by images. And saying something about articles based on your experience with MOS is like saying you drive a compact car because you had trouble finding the bathroom on a 747. They're completely different animals. EEng 22:44, 2 November 2021 (UTC)
- I was just trying to replace some references in an article whose entire size (not just prose) is 131K. It was slowing my browser down to edit the entire article, so I had to edit it section by section. Does this happen to others too? VR talk 13:08, 10 November 2021 (UTC)
- @Vice regent I've never had that problem personally though for large articles (the one I edit commonly is over 400K) however it sometimes takes a couple seconds to load the page and then submit the edit for the page, but there is no problem browsing the page or editing the page. I've heard that the "visual editor" is extremely bad/slow for Wikipedia. Are you using that? Ergzay (talk) 20:09, 10 November 2021 (UTC)
- No, always source editor. Maybe I have too many windows or tabs open? VR talk 20:32, 10 November 2021 (UTC)
- I'm not sure. I'm on Firefox on a more recent Macbook M1 but I had no problems on my 2015 Macbook Pro I used to use. Ergzay (talk) 21:25, 10 November 2021 (UTC)
- No, always source editor. Maybe I have too many windows or tabs open? VR talk 20:32, 10 November 2021 (UTC)
- Yes. I live in a first world country and use a recent laptop with an updated browser with a fast broadband connection. Editing starts to slow down around 40-50k wikitext and when it gets much longer than that, either you have to live with a lot of lag or go section by section. (t · c) buidhe 10:31, 7 January 2022 (UTC)
- Then go section by section. That's what section editing is for. It's incomprehensible that you're putting the convenience of your editing over the needs of the reader. EEng 00:32, 8 January 2022 (UTC)
- And I am also in a developed country with modern technology yet am finding J.K. Rowling hard to edit. There are still very good reasons for the size limits, not all related to technology, and one of the key historical issues left out of the initial analysis is attention span and average time to read the page. I suggest the page has not changed because it is still useful as is. SandyGeorgia (Talk) 00:27, 8 January 2022 (UTC)
one of the key historical issues left out of the initial analysis is attention span and average time to read the page
– And the evidence about attention span, and types of users who want to "read the page" versus use it in other ways is ... where? EEng 00:32, 8 January 2022 (UTC)- Our experience at WP:FAC and WP:FAR (where you don't contribute) shows that too-long articles accumulate bloat, and are very difficult to write and maintain, which has a detrimental downstream effect on the article quality and therefore the reader experience compared to an article kept to an appropriate length. A reader who wants more detail on a specific aspect should visit a sub-article. (t · c) buidhe 00:41, 8 January 2022 (UTC)
- That a "too long" article is ... well, too long, is a tautology. The question is: how long is too long? At what point should a particular article have a chunk split off? It's self-evident that articles should be, just as you say,
an appropriate length
, but that takes judgment based on the topic, not some stupid one-size-fits-all table based on, AFAICT, just something someone arbitrarily wrote down 15 years ago. - I really appreciate the
where you don't contribute
throwaway, because it gives me a chance to remind everyone that FAC reliably produces articles which conform to a checklist of mindless rules but which are often pretty awful, sometimes laughably so. As well expressed in the essay User:Physchim62/Situation Normal: All FACked up, "ensuring that featured articles meet the featured article criteria is NOT the end in itself." EEng 01:19, 8 January 2022 (UTC)
- That a "too long" article is ... well, too long, is a tautology. The question is: how long is too long? At what point should a particular article have a chunk split off? It's self-evident that articles should be, just as you say,
- Our experience at WP:FAC and WP:FAR (where you don't contribute) shows that too-long articles accumulate bloat, and are very difficult to write and maintain, which has a detrimental downstream effect on the article quality and therefore the reader experience compared to an article kept to an appropriate length. A reader who wants more detail on a specific aspect should visit a sub-article. (t · c) buidhe 00:41, 8 January 2022 (UTC)
- @Vice regent I've never had that problem personally though for large articles (the one I edit commonly is over 400K) however it sometimes takes a couple seconds to load the page and then submit the edit for the page, but there is no problem browsing the page or editing the page. I've heard that the "visual editor" is extremely bad/slow for Wikipedia. Are you using that? Ergzay (talk) 20:09, 10 November 2021 (UTC)
Should the size guideline be removed?
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
This guideline currently contains a table described as Some useful rules of thumb for splitting articles, and combining small pages
. Should this table (and the two short paragraphs of attendant notes) be retained or removed? XOR'easter (talk) 18:21, 28 June 2022 (UTC)
Survey
- I support removing the size guideline. Wrote my reasoning above. Ak-eater06 (talk) 04:18, 28 June 2022 (UTC)
- I also support the removal of numerical "limits", though it might be helpful to give editors an idea what the distribution of articles sizes is. And there's plenty of room for guidelines about how to usefully think about article length as an important aspect of topic organization and presentation. I suggest other editors review the earlier threads on this page to get an idea of the recent history of this issue. EEng 04:27, 28 June 2022 (UTC)
- Oppose simple scrapping, without any alternatives being proposed. There is no hard limit as is, there is a range of guidelines which can be applied differently if situations warrant it (depending on how the overall topic is structured with regards to WP:SUMMARYSTYLE for example). As it stands, we have articles whose topics fill multiple books. Unless the proposal is to have book length articles (which is one reading of one of the quotes above I suppose), then there are obviously going to be some guidelines on the matter. The quote from EEng above is correct that size is "one of many considerations" and that this guideline shouldn't be "blindly applied", however those are not reasons to scrap this guideline, they apply inherently to all guidelines on en.wiki. CMD (talk) 05:22, 28 June 2022 (UTC)
- I'm sympathetic to most of what you say, but the problem is that this particular guideline is peculiarly susceptible to being mindlessly "enforced". EEng 06:43, 28 June 2022 (UTC)
- Oppose it's not about "devices [not having] the capacity to navigate long articles smoothly", it's about brains not having the capacity to navigate long articles smoothly. Indeed, the attention span of our average user now is likely even less than it was when Wikipedia was first formed, back in the year 6 B.I. ("before iPhone"), at a time when everybody read books (for fun! not just for school!) instead of being glued to their smartphones. Which, of course, have now colonized brains with the idea that a "full page of information" is whatever fits on a six-inch (15cm) diagonal screen. If it were only about device capacity, then articles could be a thousand times longer than they are now, and a couple years from now, a million times longer. Mathglot (talk) 07:28, 28 June 2022 (UTC)
- It's great to see that the outmoded, 20-year-old made up stuff about attention spans and reading speed -- the justification for this guideline until now -- is being replaced by new and modern made up stuff about attention spans and reading speed. EEng 13:17, 28 June 2022 (UTC)
- Oppose These are guidelines, not rules, designed to say that Wikipedia articles should be neither too large nor too small. They should contain as much information as a narrow subject can allow, and when there is the opportunity to expand the overall content by splitting an article into one or more, that opportunity is usually taken. Guidelines like these are necessary to promote good editorial standards, such as articles neither being "book-length" when it is more appropriate for a topic or list to cover more than one article, nor many articles being one sentence long when they could be reasonably merged into one article under a broader topic. Onetwothreeip (talk) 07:44, 28 June 2022 (UTC)
- Oppose per Mathglot and Onetowthreeip, and keep the limits where they are. WP:NOTTEXTBOOK is policy. Miserably long articles are ... miserable to read, difficult to check, often filled with bloat and text-to-source integrity issues ... and unencyclopedic. WikiBooks is a sister project for those who want to write books. SandyGeorgia (Talk) 10:49, 28 June 2022 (UTC)
- Adding on, now that the non-neutrally-framed RFC has been converted to a still-less-than-neutrally-framed RFC containing a large and one-sided introduction. As a huge percentage of Wikipedia content is sub-standard, allowing articles to sprawl even larger will only add to that problem, and make articles harder to hold to any reliability and harder to check for core policies. Wikipedia is an encyclopedia, not a book; allowing articles to sprawl even larger will only add to Wikipedia's already existing quality problems, and make it harder to prune the extraneous trivia and enforce any standards of writing. I cannot recall having seen an article larger than 10,000 words of prose that couldn't be trimmed and wasn't full of unnecessary detail. This proposal will make an already bad situation worse. SandyGeorgia (Talk) 19:33, 28 June 2022 (UTC)
- Oppose I think there are solid reasons to avoid using the numbers given as gospel (that's why it's a guideline), but I think the proposal seems to act like article splits are the only option when given Wikipedia's purpose as a general encyclopedia and focus on summary style the other answer that should probably be considered first in every case is whether to just cut less-important or extraneous details/information. Der Wohltemperierte Fuchs talk 11:01, 28 June 2022 (UTC)
- This is completely illogical. Extraneous detail should be cut from an article regardless of the article's length. And cutting "less important" material just to fit some arbitrary size guideline is even worse than splitting just to fit some arbitrary size guideline; at least in a split the material is still somewhere. EEng 13:21, 28 June 2022 (UTC)
- Your post would make the same point more effectively if you cut the first sentence, which unnecessarily personalizes comments made by another editor; it seems perfectly logical to me that too-long articles are often that way because of extraneous information. Also, see WP:BLUDGEON re the entirety of this page. SandyGeorgia (Talk) 13:44, 28 June 2022 (UTC)
- I'm sorry, but you're just repeating the illogic. If an article is "too long" because of extraneous information, then the extraneous information should be removed. And if the article isn't "too long", but has extraneous information, then the extraneous information should be removed. Same either way. The two things aren't related. Saying that we should be keep an otherwise unfounded length guideline because it incidentally prompts people to do things they should be doing anyway, and meanwhile also prompts people to do things they shouldn't be doing anyway (like removing "less important" material, or splitting the article) is (and here I'll say it again) completely illogical.As for WP:BLUDGEON, I'll just respond by pointing you to WP:BAREASSERTIONSABOUTATTENTIONSPANANDWHATREADERSWANTREPEATEDOVERANDOVERWITHNOEVIDENCETOBACKTHEMUP. EEng 14:13, 28 June 2022 (UTC)
- Your post would make the same point more effectively if you cut the first sentence, which unnecessarily personalizes comments made by another editor; it seems perfectly logical to me that too-long articles are often that way because of extraneous information. Also, see WP:BLUDGEON re the entirety of this page. SandyGeorgia (Talk) 13:44, 28 June 2022 (UTC)
- This is completely illogical. Extraneous detail should be cut from an article regardless of the article's length. And cutting "less important" material just to fit some arbitrary size guideline is even worse than splitting just to fit some arbitrary size guideline; at least in a split the material is still somewhere. EEng 13:21, 28 June 2022 (UTC)
- Oppose. Not everyone lives in a first world country with the latest technology and fast broadband. Long pages DO take longer to load and ARE difficult to impossible to edit for some people. If wikipedia wants to be the encyclopedia that everyone can access and edit, then pages must be accessible to all. DrKay (talk) 16:44, 28 June 2022 (UTC)
- Doctor, I like you, so it pains me to point out that this is a completely false argument. Any one image in a article takes more download bandwidth than all the text put together. EEng 17:34, 28 June 2022 (UTC)
- I know. That's why I, and others who were in similar circumstances, spent 10 years accessing wikipedia with images turned off. DrKay (talk) 20:13, 28 June 2022 (UTC)
- Support removal of quantitative number-of-kilobytes rules, which are too easy to enforce blindly and AFAICT have no empirical foundation in either technical or human limitations, having been changed from one round figure to another without methodical study of what is slow to transfer over which network and why. Moreover, the specific table being talked about refers to "Readable prose size", not "amount of data that a browser has to download", so if we are basing our guidelines on the latter, we should still scrap the existing table as a distraction. Concerns about attention spans are more relevant to article organization than size; "get to the important stuff quickly" is an argument for a good lede but not an argument for a short page overall. Long articles can be hard to check, but material spread across multiple articles can be even harder. Pages grow out of sync, content gets added to the main article rather than the more appropriate sub-article because the main article gets more traffic, etc. The Wikipedia is not a textbook policy mentioned above strikes me as a red herring. Content can be long and not-textbook-like, or short and textbook-like. Textbookishness is about working step-by-step through mathematical calculations, asking leading questions, and other such stylistic choices. For example, if Speed of light (FA, ~141K) were written like a textbook, it would probably start with the Maxwell equations, write them for vacuum conditions, derive a wave equation, deduce the propagation speed of the resulting waves, etc. Likewise, Pi (FA, ~158K) presents mathematics encyclopedically rather than textbookily. XOR'easter (talk) 16:55, 28 June 2022 (UTC)
- Textbookily is a great word! I'm adding it to my spellchecker. EEng 17:34, 28 June 2022 (UTC)
- Speed of light has a prose size of 46 kB, within the lower bounds of the currently suggested length, while Pi has a prose size of 64 kB, only slightly over the currently suggested length. Holding these are examples of well-written articles would seem to support the current guidelines. CMD (talk) 01:21, 29 June 2022 (UTC)
- Except that people aren't thinking of "prose size", even though that's what the table is nominally about; they're making arguments based on data transfer, which includes everything that DYKcheck doesn't put in yellow. XOR'easter (talk) 01:35, 29 June 2022 (UTC)
- That may be the case, but there are no quantitative rules relating to data transfer to remove. (Although we should probably create one for WP:PEIS.) I'm not sure we have great tools to handle data issues at the moment, an easy way to disable images on mobile Wikipedia is probably a good start, but out of scope here. CMD (talk) 02:00, 29 June 2022 (UTC)
- Disabling images and auto-collapsing infoboxes are both ideas for mobile browsing that feel like they should have been implemented long ago. But if data issues are "out of scope here", then we fall back on the question of whether long articles (by the "readable prose" metric) are too long to be useful. To perhaps further clarify: I picked the first two Featured articles that came to mind which could illustrate the difference between textbook-style and encyclopedia-style writing. For long FA's, Allied logistics in the Southern France campaign has 79,068 characters of "readable prose", Harry S. Truman manages a whopping 84,021, Sonic the Hedgehog has 62,573, Vampire has 61,025, Intelligent design edges the line with 59,137, Pink Floyd build a wall of 71,290, Paul McCartney scores 82,506, and The Beatles need to break up at 90,185. Perhaps appropriately, Byzantine Empire breaks the 105 barrier with 105,111. I just don't think the guideline provides more than an illusion of objectivity, and it's so good at doing that that it becomes a risk. XOR'easter (talk) 02:08, 29 June 2022 (UTC)
- I'm getting a slightly different 103 kb for Byzantine Empire, but either way that is a prime example of an article that needs a judicious cutting of extraneous information. Over half (59 kB) of its length is in one section! I'm not sure what the risk being mentioned is, but this seems an example of why length guidelines may be useful to focus minds. An article should fail FACR4 if it has a single section that is longer than the entirety of Speed of light. Happy for this to be moved into the discussion section, if it gets longer. CMD (talk) 02:51, 29 June 2022 (UTC)
- The risk is that people take it seriously and make judgments based on it, when they are really just numbers plucked from nowhere and backed by nothing. Should we have some guidelines about what makes a page egregiously big? Quite possibly. Is what we've got now even a reasonable starting point for that? I only find myself growing more convinced that it isn't. XOR'easter (talk) 03:27, 29 June 2022 (UTC)
- I'm getting a slightly different 103 kb for Byzantine Empire, but either way that is a prime example of an article that needs a judicious cutting of extraneous information. Over half (59 kB) of its length is in one section! I'm not sure what the risk being mentioned is, but this seems an example of why length guidelines may be useful to focus minds. An article should fail FACR4 if it has a single section that is longer than the entirety of Speed of light. Happy for this to be moved into the discussion section, if it gets longer. CMD (talk) 02:51, 29 June 2022 (UTC)
- Except that people aren't thinking of "prose size", even though that's what the table is nominally about; they're making arguments based on data transfer, which includes everything that DYKcheck doesn't put in yellow. XOR'easter (talk) 01:35, 29 June 2022 (UTC)
- Support A size limit from 15 years ago limits military and political pages far too much. Presidency articles are divided into far too many sub-pages and SCOTUS rulings need more room to explain the background and legacy of rulings. It should at least be doubled. Jon698 (talk) 17:59, 28 June 2022 (UTC)
- I don't see how attention spans matter. If you don't have the will to scroll down a Wikipedia page you most likely don't have the will to learn anything. Longer pages would give people the ability to improve their attention spans as no other popular website offers as much information about a wide array of topics as Wikipedia. Jon698 (talk) 18:04, 28 June 2022 (UTC)
- Also I just thought about this. Somebody with an attention span too short to read a Wikipedia article would probably not be willing to search for information across five separate pages that it was diced up into. Jon698 (talk) 18:06, 28 June 2022 (UTC)
- User:Jon698 I like your presidency point. Obama's presidency for example, is split into a DOZEN or so articles (economic policy, energy policy, East Asia policy, South Asia policy, space policy, etc.). Why need an East Asia article and South Asia article when you can merge them with the foreign policy article? Well, you can't unfortunately because there is this certain group of editors that will claim it breaks the size guideline. So now if readers want to find one policy, they will have to flip between a dozen articles. Ak-eater06 (talk) 18:09, 28 June 2022 (UTC)
- I haven't seen the specific articles in question, but this line of argument chases a red herring. Articles are rarely created or split just due to the size guidelines. More commonly, they are created because someone thinks "Wikipedia should have an article on this". For example, Obama's foreign policy pivot towards the Asia-Pacific was widely remarked about at the time. It is undoubtedly a notable topic, that many would be interested in writing an article on. Could you nonetheless merge it with another article? Possibly yes, but there are reasons such merges tend not to happen and these reasons emphatically are not primarily related to this size guideline. If it were, the result would not be the current proliferation of stubs throughout Wikipedia. CMD (talk) 01:28, 29 June 2022 (UTC)
- User:Chipmunkdavis. I understand. Thank you for the clarification. However, I know for sure that Stephen Harper's tenure is divided into four articles (Premiership, domestic policy, foreign policy, and environmental policy) in the name of one combined article being too long. Same with his successor, Justin Trudeau (tenure is divided into premiership, domestic policy, and foreign policy). Ak-eater06 (talk) 02:56, 29 June 2022 (UTC)
- I haven't seen the specific articles in question, but this line of argument chases a red herring. Articles are rarely created or split just due to the size guidelines. More commonly, they are created because someone thinks "Wikipedia should have an article on this". For example, Obama's foreign policy pivot towards the Asia-Pacific was widely remarked about at the time. It is undoubtedly a notable topic, that many would be interested in writing an article on. Could you nonetheless merge it with another article? Possibly yes, but there are reasons such merges tend not to happen and these reasons emphatically are not primarily related to this size guideline. If it were, the result would not be the current proliferation of stubs throughout Wikipedia. CMD (talk) 01:28, 29 June 2022 (UTC)
- Support It's nonsense to think that Wikipedia should be written to the lowest common denominator. I read a lot of articles, but usually just the intro and the parts I'm interested in. I'm not going to repeat the arguments above as to why the limits should be abolished, but would like to point out that there are thousands of articles that do not conform. What should we do about them, convert Wikipedia into a Readers Digest look-alike?Dr. Grampinator (talk) 18:34, 28 June 2022 (UTC)
- Oppose removal. For me, it's less about device limitations and more about human reader attention span. A 100kb article is just too long. But there are still also technical limitations: long articles can be very difficult to edit, both because the source is difficult to navigate and because the browser scripts used to edit articles don't handle the length well. —David Eppstein (talk) 19:20, 28 June 2022 (UTC)
- Support. Splitting is sometimes needed, but often it's just a topic of a debate that ends not in two (for example) nice smaller articles, but in two chunks of text that should be read together. And besides, why does it even matter how many readers read the whole article? People can look for some specific details, they can just read intro, ctrl-f for something, etc. And the book comparison is a good one, IMO - it's much better to have everything related in one place, than to have a dozen scrappy little articles that nobody would ever maintain (even if readers would read small article from top to bottom, nobody would read all the articles splitted from it). Artem.G (talk) 19:26, 28 June 2022 (UTC)
- Oppose removal. But I would prefer a change on the limits. A several MB page makes my computer hang, while a 100 KB page is fine. Now that we have better technology, it may be time for more limits. weeklyd3 (message me | my contributions) 19:33, 28 June 2022 (UTC)
- Oppose It's a guideline, not policy. 100kb ish should be sufficient for a single article, but I wouldn't object to increasing it a little bit, not sure how much (I'm resisting using the 640kb quote). Needs an upper limit, don't want even 1mb page really. -Kj cheetham (talk) 19:33, 28 June 2022 (UTC) P.S. If people are calling on it too strictly to the point of being deterimental to articles, I'm sure it could be better worded - that doesn't justify removing it completely. -Kj cheetham (talk) 21:28, 28 June 2022 (UTC)
- Oppose – although I would not count prose in footnotes and infoboxes, which require essentially no effort to gloss over. Sure, my computer can handle a bloated article, but they are beyond painful to read, verify, and balance. (As a reader, too, assessing reliability.) The argument that "you only read what you're interested in" doesn't really hold up because you have to find it in the first place! There's no cmd-F on phones. Even the pi article that XOReaster cited, in my opinion, deserves a splitting of the "roles and characterizations" section. Yes, it's probably only GA and FA reviewers who actually read through the entirety of longer articles, and I think MOS:REPEATLINK should be abolished on that account. But when I want an overview, I read a few paragraphs of every section. Why have a sprawling article when it can be easily split into more enjoyable and compact articles? 100 KB doesn't need to go anywhere.
EEng: Can you give some specific examples where this guideline was invoked to reduce an article's size (whether through decruftification, concisification, splitting...) and clearly reduced the encyclopedic quality of the article? (By "clear", I mean something that reasonable and/or experienced editors would agree on, independent of size guidelines, not personal preference.)Just saw the examples in the preceding section. Will have to assess them to see whether I agree. Ovinus (talk) 21:29, 28 June 2022 (UTC) - Oppose. First of all, the notion that page size doesn't matter any more from a technical point of view is nonsense and shows a little bit of ignorance of our worldwide audience. Perhaps that's the case for people in the western world with fast internet and fancy machines, but a lot of people around Africa and other areas don't have such luxury. Loading pages and also editing them in the code editor is definitely still an issue. Secondly, as noted by Sandy and others, a size guide is definitely needed from an encyclopedic point of view. The idea that we might combine all the different pages on Obama's presidency into one megalith is absurd. Even if readers don't read from top to bottom, an article should still be a coherent and summary style overview of what in its scope rather than a free for all. I use these size guidelines regularly and I expect them to stay. — Amakuru (talk) 21:52, 28 June 2022 (UTC)
- In anticipation of the remark that photos take up the most bandwidth: Images load asynchronously in modern browsers. You can see this in action by (in Chrome) opening developer tools, going to Network, and changing the "No throttling" option to something else, to simulate a poor connection. We first have today's TFA (copied to my sandbox), which has a fair amount of text and a fair number of images. In the graph, green indicates that a resource has been requested and a response is being waited for. Turquoise indicates that the resource is being downloaded. The blue line indicates the DOMContentLoaded event (DCL), which is slightly different from FCP (First Contentful Paint) but all that matters is that's approximately when the page becomes usable, which happens after the HTML and CSS have been loaded. The images, at that point, are only partially loaded and appear as blank or half-filled, but the text may be scrolled through and read. The images finally load after some time, as indicated by the red line, and the page is finished. Now, observe WP:FAC, a page of impressive size but with no (large) images. In this run, the CSS was also cached, giving it a leg up. But it was still slower than Red panda to initially load, because the dominant time is the HTML. Finally, we observe List of Johnson solids, an article replete with images, but because the HTML is small, the text loads quickly (while the images take a full minute!). My data collection here was rushed but someone can do a rigorous test. With a slow connection, HTML size matters. Ovinus (talk) 22:47, 28 June 2022 (UTC)
- Oppose removing entirely, but definitely support using character or word counts instead of the bytecounts currently in the table. Some sort of rule of thumb is useful to have (even if it is, in fact, entirely arbitrary), and this one seems to have served reasonably well. I'm not convinced that having even more long, unmaintainable articles will serve the project or the reader. (Also "it's a guideline, not policy" needs to be said more often in general.) -- Visviva (talk) 22:02, 28 June 2022 (UTC)
- The current table is, to my understanding, already a character count. CMD (talk) 01:32, 29 June 2022 (UTC)
- To mine, too. Mathglot (talk) 02:05, 29 June 2022 (UTC)
- Right, I didn't mean to imply otherwise. But it seems unnecessarily roundabout and confusing to have a table with page sizes in kilobytes and a note saying "remember, by bytes we mean characters of readable prose." Especially when bytes are a common metric for other kinds of page size that this section isn't about. Using normal human units up front makes things clearer. -- Visviva (talk) 04:23, 29 June 2022 (UTC)
- Oppose it's a useful guideline which has guided my thoughts on many long articles. But it's also not a mandate of length. It reflects a reality that 100kb of readable prose is quite a lot to sift through, and that unless you have a compelling reason, it really should be more concise. I'm working on American Civil War right now, and it's at 99k readable prose, and I am being conscious about length, using summary style, and putting extraneous detail in sub articles. I want to take it to GA, and maybe it will be over 100k when I do so. But the mere fact that we have a guideline that points out 100k is a good max is keeping conciseness in my thoughts. Our editors are not always good at being concise, me included!
- My fear is that by removing this we push articles to be longer, without being better. Shorter does not mean less quality. My favorite example of that is World War II, which is a GA but only 82k of readable prose, despite being a very very broad topic.
- Now, I could see revising some of the numbers up a bit. 60k, more like 75k. Or maybe listing examples of FAs at different size levels? But I don't think getting rid of this useful rule of thumb is gonna improve the encyclopedia. CaptainEek Edits Ho Cap'n!⚓ 05:59, 29 June 2022 (UTC)
- Oppose Oh god no. There are to many unwieldy articles already, this would only encourage more. - LCU ActivelyDisinterested ∆transmissions∆ °co-ords° 12:46, 29 June 2022 (UTC)
Discussion
The size guideline rule was made in 2007, when devices didn't have the capacity to navigate long articles smoothly. 15 years later, we have advanced technology and this rule is ridiculous. Let's remove it altogether. People can split articles without being influenced by this obsolete rule.
Some other arguments:
The idea that any more than 1/10 of 1% of visitors have the desire to read an article from top to bottom is absurd, as is reasoning based on such an idea. Overall article size is just one of many considerations in trying to answer the following question: What structure (of this one article, or of a group of related articles) best allows readers of various kinds to satisfy their knowledge-needs? Stupid size formulas, blindly applied, are not the answer to that question.
"Too long" is not the issue it once was, so the proposed change makes sense. To understand this, imagine a book. No matter the size of a normal book, it is always easier to find a bit of content as long as it's between the book's covers. Those who want to know everything about the topic/story will read the whole book/article, regardless of length. Splitting articles into separate locations (different book volumes) makes it much more difficult to find stuff, and increases the chance of important info never being seen by the reader.
Nowadays most readers search a page for information, so keeping it all in one place makes most sense. Few read the whole article. They may read the whole lead, and may skip to interesting parts, but that's all, unless they search for key words and phrases.
One editor's single-minded, pathological, obsession with splitting long articles is usually very destructive and contrary to the needs of 95% of our readers. Almost no one benefits from it.
It's time. Ak-eater06 (talk) 04:18, 28 June 2022 (UTC)
Is there any good research on readability size? We have editor retention stats for Wikipedia....but is there accessibility data?..... on a side note...A site-wide rfc should take place as mentioned in previous talks and edit summaries.Moxy- 04:37, 28 June 2022 (UTC)
- I hope everyone involved here is not thinking a guideline can be changed based on a survey that a) is not neutrally positioned, and b) is not a site-wide RFC. SandyGeorgia (Talk) 18:09, 28 June 2022 (UTC)
- I have changed the section heading and tried my hand at posing the question in a neutral way.
Ak-eater06
's opening statement is now under the "Survey" heading, where it can be read as part of a !vote. XOR'easter (talk) 18:25, 28 June 2022 (UTC)- I think the large, non-neutral introduction should be moved here, to the discussion section. SandyGeorgia (Talk) 19:34, 28 June 2022 (UTC)
- It should be moved to WP:VPP, have a proper introduction, and be put on WP:Centralized discussion. This is major, even if it has some enthusiastic supporters. Ovinus (talk) 21:39, 28 June 2022 (UTC)
- I've left a message for the original poster here (in the interest of first trying to work it out with the editor); if this isn't resolved quickly, other steps will be needed. I will be busy for several hours; hopefully others will follow up there in my absence. What an unfortunate approach to a long-standing guideline page. SandyGeorgia (Talk) 21:50, 28 June 2022 (UTC)
- It looks like the post has been moved, though there's still a bit of cleanup to do (an "above" should now be a "below", and there's a dangling reference to it before the !votes start). XOR'easter (talk) 22:23, 28 June 2022 (UTC)
- I'll delete that then ... now that all is moved ... SandyGeorgia (Talk) 03:02, 29 June 2022 (UTC) Diff of deleted post, leftover from when large non-neutral block of quotes was at beginning of RFC. SandyGeorgia (Talk) 03:05, 29 June 2022 (UTC)
- It looks like the post has been moved, though there's still a bit of cleanup to do (an "above" should now be a "below", and there's a dangling reference to it before the !votes start). XOR'easter (talk) 22:23, 28 June 2022 (UTC)
- I've left a message for the original poster here (in the interest of first trying to work it out with the editor); if this isn't resolved quickly, other steps will be needed. I will be busy for several hours; hopefully others will follow up there in my absence. What an unfortunate approach to a long-standing guideline page. SandyGeorgia (Talk) 21:50, 28 June 2022 (UTC)
- It should be moved to WP:VPP, have a proper introduction, and be put on WP:Centralized discussion. This is major, even if it has some enthusiastic supporters. Ovinus (talk) 21:39, 28 June 2022 (UTC)
- I think the large, non-neutral introduction should be moved here, to the discussion section. SandyGeorgia (Talk) 19:34, 28 June 2022 (UTC)
- I have changed the section heading and tried my hand at posing the question in a neutral way.
How important is the speed of content appearing versus the number of bytes, if people are on a so-and-so-many-Mb-per-month data plan? In other words, for the purpose of conserving limited network resources, is time-to-text-rendering actually the measure of "bandwidth" that matters? I'm concerned that, by merely throwing some round numbers into a table, we are patting ourselves on the back for serving a global audience without having done the serious work to determine how to do the job properly. The same goes for loading pages in either of the editors, running scripts, etc. If we need a guideline, we need a guideline, not guesstimates. XOR'easter (talk) 23:24, 28 June 2022 (UTC)
- Regarding the changing of the discussion question after the start of the discussion here, I agree it's a more neutral phrasing, but I do not think the RfC is about the table itself, but about the guideline as a whole. It wouldn't make sense to treat the table in isolate, leaving everything else in place. CMD (talk) 01:37, 29 June 2022 (UTC)
Change size guideline proposal
I propose changing the "Probably should be divided" in Wikipedia:Article size#size guideline from 60kB to 100kB and the "Almost certainly should be divided" from 100kB to 200kB. This rule was made in 2007, when devices didn't have the capacity to navigate long articles smoothly. 15 years later, we have advanced technology and this rule is ridiculous. Ak-eater06 (talk) 20:15, 25 June 2022 (UTC)
- The rule does not only relate to computer processing speed; it relates to reading and attention span. I think it fine as is. SandyGeorgia (Talk) 20:24, 25 June 2022 (UTC)
- The idea that any more than 1/10 of 1% of visitors have the desire to read an article from top to bottom is absurd, as is reasoning based on such an idea. Overall article size is just one of many considerations in trying to answer the following question: What structure (of this one article, or of a group of related articles) best allows readers of various kinds to satisfy their knowledge-needs? Stupid size formulas, blindly applied, are not the answer to that question. EEng 20:50, 25 June 2022 (UTC)
- User:EEng#s I agree, this size rule is infuriating. Take Barack Obama's presidency for example, his policies are split into a DOZEN or so articles (economic policy, energy policy, East Asia policy, space policy, etc.) in the name of the presidency article being "too long". It just makes it more disorganized. Ak-eater06 (talk) 21:01, 25 June 2022 (UTC)
- Our stats Research:Which parts of an article do readers read Moxy- 04:20, 28 June 2022 (UTC)
- I agree that the guideline shouldn't be changed. Reading and attention spans have not substantially increased during the existence of Wikipedia. While articles do not need to be written as shortly as possible, they should not be difficult to read in their entirety. Onetwothreeip (talk) 03:10, 26 June 2022 (UTC)
- It's really unbelievable to see the attention span argument trotted out over and over and over. There's no evidence anything on this page is more than stuff a few editors made up one day. EEng 21:37, 26 June 2022 (UTC)
- My proposal is changing the "Probably should be divided" from 60kB to 80kB. Ak-eater06 (talk) 19:30, 26 June 2022 (UTC)
- While I'm happy to see even the tiniest move in the direction of sanity, it's really just deck chairs on the Titanic. As you can see from various comments on this page, there's a core group (a) utterly dedicated to the ridiculous idea that any significant proportion of visitors to an article have the intention of reading it top to bottom and (b) willing to translate that myth into word counts derived via baseless "attention span" and "reading rate" numbers from low-quality, one-size-fits-all sources. I hope your change sticks, but if it doesn't just do what I do: ignore this page's nonsense and structure articles according to the needs of each topic. Let those with no judgment of their own apply this Procrustean bed to articles unlucky enough to attract their attention. EEng 21:37, 26 June 2022 (UTC)
- EEng if you want an example of people defending article-splitting insanity, check out this discussion where my proposal to merge the four articles into one of Stephen Harper's tenure was shot down 4-2. Four users opposed merging due to this stupid size rule. Stephen Harper's tenure as Canadian prime minister is divided into four seperate articles (Premiership, domestic policy, foreign policy and environmental policy)! Same with his successor, Justin Trudeau (Premiership, domestic policy, and foreign policy).
- You say to "ignore this page's nonsense and structure articles according to the needs of each topic" and while I try to, these users who voted against merging in the discussion I linked to always prevent me from merging due to WP:MERGEPROP and when I do follow WP:MERGEPROP my merger proposals get shot down due to people citing the size guideline.
- Coming back to my Harper example, it is extremely frustrating to know people have to flip between four different pages when we can easily have his tenure in one, clean article. Ak-eater06 (talk) 06:01, 27 June 2022 (UTC)
- @Ak-eater06: I haven't read all the arguments on the relevant article talk pages but I think you would find success in merging the Premiership, Domestic policy and Environmental policy articles, while keeping the Foreign policy article separate. Onetwothreeip (talk) 08:57, 27 June 2022 (UTC)
- User:Onetwothreeip I did that and got reverted. They cited WP:Mergeprop and once again the stupid size rule. Ak-eater06 (talk) 17:31, 27 June 2022 (UTC)
- @Ak-eater06: I haven't read all the arguments on the relevant article talk pages but I think you would find success in merging the Premiership, Domestic policy and Environmental policy articles, while keeping the Foreign policy article separate. Onetwothreeip (talk) 08:57, 27 June 2022 (UTC)
- I feel for you, but as things stand, reforming this page is one of those situations in which the ratio (effort to overcome mindless idiocy) / (benefit) is just too high. I wish you luck. EEng 12:22, 27 June 2022 (UTC)
- EEng and Ak-eater06 are right. "Too long" is not the issue it once was, so the proposed change makes sense. To understand this, imagine a book. No matter the size of a normal book, it is always easier to find a bit of content as long as it's between the book's covers. Those who want to know everything about the topic/story will read the whole book/article, regardless of length. Splitting articles into separate locations (different book volumes) makes it much more difficult to find stuff, and increases the chance of important info never being seen by the reader.
- Nowadays most readers search a page for information, so keeping it all in one place makes most sense. Few read the whole article. They may read the whole lead, and may skip to interesting parts, but that's all, unless they search for key words and phrases.
- One editor's single-minded, pathological, obsession with splitting long articles is usually very destructive and contrary to the needs of 95% of our readers. Almost no one benefits from it. (Yes, you know who you are.)
- EEng is right: "What structure... best allows readers of various kinds to satisfy their knowledge-needs? Stupid size formulas, blindly applied, are not the answer to that question." -- Valjean (talk) (PING me) 22:48, 26 June 2022 (UTC)
User:Valjean and User:EEng I updated it again to reflect common sense more...hope my change sticks. Thanks for your efforts :) Ak-eater06 (talk) 01:06, 28 June 2022 (UTC)
- A good change that brings us out of the dark ages. -- Valjean (talk) (PING me) 01:09, 28 June 2022 (UTC)
- You really should have consensus before changing a guideline. And these discussions would be more effective if the snark and insult throughout were lowered. SandyGeorgia (Talk) 10:16, 28 June 2022 (UTC)
- Oppose the main purpose of the guideline is not to ensure articles can load (although those with too much wiki text can still pose an issue for some readers) but to optimize the length and level of detail for readers attention spans and ensure that the most important information remains accessible. Really if you're above 7000 to 8000 words for most topics it's better to split off another article. (t · c) buidhe 20:02, 30 June 2022 (UTC)
if you're above 7000 to 8000 words for most topics it's better to split off another article
– And we know that because the guideline says so, and therefore the guideline is correct. That's logic! EEng 22:18, 30 June 2022 (UTC)- No, I know this based on my experience writing featured articles. (t · c) buidhe 04:33, 1 July 2022 (UTC)
- We're really ringing the changes here. Sometimes it's the bandwidth/plight-of-the-third-world argument. Other times it's the I-read-something-somewhere-about-attention-spans argument. Now it's the I-write-featured-articles-argument-so-I-know-best argument. (And the way you say it, it's almost as if you imagine it will impress people!)
- If your assertion truly reflects a universal truth, then editors will discover it for themselves when they apply their good judgment to particular editing situations as they arise; they won't need the FAC elites to show them the way. EEng 05:14, 1 July 2022 (UTC)
- No, I know this based on my experience writing featured articles. (t · c) buidhe 04:33, 1 July 2022 (UTC)
Let's settle on a compromise...shall we?
It appears my proposal to abolish the size guideline may have been a bit too radical for some of you...which explains the overwhelming rejection in the recent RfC.
As such, I propose a compromise. How about we increase the kB limit on "probably should be divided" and "almost certainly should be divided" on the Wikipedia:Article size#Size guideline? I personally think "probably should be divided" should be increased from 60kB to 100kB and "almost certainly should be divided should be increased from 100kB to 125kB.
Readable prose size | What to do |
---|---|
> 125 kB | Almost certainly should be divided |
> 100 kB | Probably should be divided (although the scope of a topic can sometimes justify the added reading material) |
> ?? kB | May need to be divided (likelihood goes up with size) |
< 40 kB | Length alone does not justify division |
< 1 kB | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
Ak-eater06 (talk) 13:26, 29 June 2022 (UTC)
- This is an effective doubling of current article size recommendations, so it would be good to hear the rationale for it. CMD (talk) 13:45, 29 June 2022 (UTC)
- Might help if there was a coherent rationale for the current recommendations. EEng 16:57, 29 June 2022 (UTC)
- Yeah, doubling numbers that were arbitrary and unmotivated will just make a new set of numbers that continue to be arbitrary and unmotivated. XOR'easter (talk) 18:50, 29 June 2022 (UTC)
- This is somewhat my view. Some sort of guideline here is useful to provide a goal. While I'm not too torn on what they are, if picking between two arbitrary numbers, sticking with the current ones makes more sense than picking a new one, ceteris paribus. CMD (talk) 02:24, 30 June 2022 (UTC)
- My own sense is that sticking with numbers just because we've stuck with them so far does nothing but enshrine arbitrary choices for the sake of having something to point at and call traditional. The "readable prose size" numbers are explicitly not about technical limitations, since everything from download sizes to script functionality will depend upon all the other bytes that don't count as "readable prose". Nor do they relate to reader attention spans: if people stop reading after the first few paragraphs, it doesn't matter whether the "readable prose" they leave unread is 10kB or 100kB. The numbers are just an excuse to call an article "bloated" without reading it, and thus force editors to maintain six articles instead of one. But, fuck it. Nobody ever gives up the illusion of numerical objectivity. The community will never agree that there's a problem, let alone on how to solve it, and I only push myself closer to another month-long burnout if I try to care. XOR'easter (talk) 03:59, 30 June 2022 (UTC)
- Let me give you some guidelines that I found in a crappy management manual from 1993: If your burnout is only a few days long, you should probably combine it with some other burnouts. If your burnout is a few weeks long, it might need to be divided into several shorter burnouts. If it's a month long, it should almost certainly be divided. EEng 04:10, 30 June 2022 (UTC)
- My own sense is that sticking with numbers just because we've stuck with them so far does nothing but enshrine arbitrary choices for the sake of having something to point at and call traditional. The "readable prose size" numbers are explicitly not about technical limitations, since everything from download sizes to script functionality will depend upon all the other bytes that don't count as "readable prose". Nor do they relate to reader attention spans: if people stop reading after the first few paragraphs, it doesn't matter whether the "readable prose" they leave unread is 10kB or 100kB. The numbers are just an excuse to call an article "bloated" without reading it, and thus force editors to maintain six articles instead of one. But, fuck it. Nobody ever gives up the illusion of numerical objectivity. The community will never agree that there's a problem, let alone on how to solve it, and I only push myself closer to another month-long burnout if I try to care. XOR'easter (talk) 03:59, 30 June 2022 (UTC)
- This is somewhat my view. Some sort of guideline here is useful to provide a goal. While I'm not too torn on what they are, if picking between two arbitrary numbers, sticking with the current ones makes more sense than picking a new one, ceteris paribus. CMD (talk) 02:24, 30 June 2022 (UTC)
- Yeah, doubling numbers that were arbitrary and unmotivated will just make a new set of numbers that continue to be arbitrary and unmotivated. XOR'easter (talk) 18:50, 29 June 2022 (UTC)
- Might help if there was a coherent rationale for the current recommendations. EEng 16:57, 29 June 2022 (UTC)
- See the comment in the RFC above by ActivelyDisinterested. Same thing. This kind of increase would permit several of the already discussed and way too long articles to continue with, and even expand, their unnecessary bloat. SandyGeorgia (Talk) 13:46, 29 June 2022 (UTC)
- Would oppose this change on the same grounds. (Not technical ones, and I don't think that aspect is as important as the content-related objections.) All I'd suggest is amending the parts of this guideline which refer to bandwidth etc. and updating them to line up with today's practice, so that they aren't used as a questionable justification. And if you open an RfC, put it at a more central location. Ovinus (talk) 15:01, 29 June 2022 (UTC)
- I agree with Ovinus. The guideline overall could be better worded, which is perhaps a more worthwhile use of time than this. -Kj cheetham (talk) 17:20, 29 June 2022 (UTC)
- Ak-eater06, I understand your consternation at the incoherent opposition to reforming this relic of the 20th century web, but a carefully planned project to bring enlightenment to the benighted will be required to get anywhere at all on this. There are just too many editors who need Norman to coordinate instead of applying judgment of their own. Add in the editors who confuse bandwidth with latency and stuff like that, and it's just hopeless. EEng 16:57, 29 June 2022 (UTC)
- While it's good that the rejection of your previous proposal hasn't disheartened you in attempting to improve the guideline area, I don't think there is much utility in changing the figures of the guidelines. Onetwothreeip (talk) 07:34, 30 June 2022 (UTC)
- WP:SIZE is a relic of a bygone era; I consider it obsolete, and encourage other editors to do the likewise. It lacks a rationale, and violates Wikipedia:Purpose:
to benefit readers by acting as a widely accessible and free encyclopedia; a comprehensive written compendium that contains information on all branches of knowledge.
For a decade now, people have been saying that Douglas MacArthur is too long, and then went ahead and added more material to it. Dividing articles is not simple, and when I attempted to do it with American logistics in the Siegfried line campaign (dividing it into transportation and services and supply), the result was highly unsatisfactory. Dividing the Galileo project into articles about the spacecraft and the project made it harder for readers to find they wanted, and an editor trashed the article history in the process. Hawkeye7 (discuss) 20:34, 30 June 2022 (UTC)- I can see an immediate way to cover MacArthur by moving the excessive details of his WWII Activision (which include a lot of events not directed related to him) to a separate article and leaving high level summaries ther. Sane with Galileo. The problem seems to be that editors don't write good summaries behind when content is split to make easy to see the high level details of the split content and being clear more details can be found elsewhere. Masem (t) 21:15, 30 June 2022 (UTC)
- Size would be only one consideration of articles such as Douglas MacArthur and Galileo project. There would be other considerations that ensure difficulty in splitting or reducing the size of those articles, and reduction certainly shouldn't be pursued without consideration. I don't think guidelines should be changed based on errors made by editors in a few articles, which are much more to do with the editors of those articles than these guidelines. Onetwothreeip (talk) 07:47, 1 July 2022 (UTC)
- Agreed; the decision to split articles should never be dictated by this guideline. Douglas MacArthur has grown organically over time. I have created two sub-articles, though: Douglas MacArthur's escape from the Philippines and Relief of Douglas MacArthur. The point illustrated by Galileo is that people complained bitterly about not being able to find information that was plainly in the sub-article. Simply splitting off material and summarising is unacceptable; the amount of detail in an article still has to be "balanced" to avoid being UNDUE. Hawkeye7 (discuss) 07:55, 1 July 2022 (UTC)
- Outliers don't make good examples for the purposes of this discussion; that MacArthur hasn't yet been split doesn't mean it can't be or shouldn't be. SandyGeorgia (Talk) 17:06, 1 July 2022 (UTC)
- Strong oppose among other reasons articles this long are very hard to maintain at a good level of quality, as I've found in my experience at FAC and FAR. (t · c) buidhe 04:36, 1 July 2022 (UTC)
- By the way everyone, did buidhe mention that he writes featured articles? EEng 16:53, 1 July 2022 (UTC) P.S. He also writes featured articles. Featured articles too!
- I've never quite understood this argument. If you have 12,000 words of featured content, why is it harder to maintain that in one article, compared to maintaining 7,000 featured words in one article and 3,000 featured words in each of two subarticles? (The split-up total will always be greater due to summary sections repeating information, and indeed that gives additional maintenance trouble in terms of the common content getting out of synch.) This argument only works if you don't bother with making the two subarticles featured, but how that improves the encyclopedia I'm not sure. Wasted Time R (talk) 19:18, 1 July 2022 (UTC)
- Look, people who write featured articles dwell on Mt. Olympus. From their lofty perch they look down and take delighted amusement in the feeble editing efforts of the rest of us. They also breathe pure, rarefied air and gorge themselves on a magic ambrosia; by consumption of these they are endowed with uncanny powers of composition denied to us pathetic little people. So stop arguing, insect. EEng 21:28, 1 July 2022 (UTC)
Hey guys, let's WP:BIKESHED over the Talk archiving parameters!
Just when you thought there was no final answer to the question, "To what absurd lengths can an obsessive need to prescribe and control everything be taken?", we have this: [14]. Yes, let's have a discussion on the archiving parameters now, too! See also WP:MOSBLOAT#A_rolling_stone_gathers_no_MOS. EEng 23:54, 2 July 2022 (UTC)
- And now, as a result of the above, the key recent (i.e. last 12 months') discussions on reforming the sorry monument to mindless prescription that is this guideline have been banished to the archive. Thus anyone wanting to review the history of the issue has to jump back and forth among pages (e.g. WT:Article_size/Archive_6. A delicious irony indeed! EEng 00:54, 3 July 2022 (UTC)
- "Banished to archive" is just silliness; anything archived is quite easily found. If there is something in archive that is relevant to a current topic on the page, just add a hatnote to that archive link at the top of the relevant section. Srsly, not worth worrying about. SandyGeorgia (Talk) 01:02, 3 July 2022 (UTC)
- The irony really, really is delicious. I'm trying to decide whether you see that but can't bring yourself to acknowledge it, or just don't see it. EEng 01:24, 3 July 2022 (UTC)
- Checking a watchlist only to find an unhelpful personal off-topic comment is not fun. So while I'm here, I added the hatnotes for you of the recently archived discussions. SandyGeorgia (Talk) 01:35, 3 July 2022 (UTC)
- Well, checking my watchlist only to find the same tired circular reasoning long used to justify slicing articles up, now being applied to justify slicing talk pages up as well -- that isn't all that fun either. EEng 01:45, 3 July 2022 (UTC)
- Actually, who am I kidding? It gave me a good laugh. EEng 01:46, 3 July 2022 (UTC)
- Checking a watchlist only to find an unhelpful personal off-topic comment is not fun. So while I'm here, I added the hatnotes for you of the recently archived discussions. SandyGeorgia (Talk) 01:35, 3 July 2022 (UTC)
- The irony really, really is delicious. I'm trying to decide whether you see that but can't bring yourself to acknowledge it, or just don't see it. EEng 01:24, 3 July 2022 (UTC)
- "Banished to archive" is just silliness; anything archived is quite easily found. If there is something in archive that is relevant to a current topic on the page, just add a hatnote to that archive link at the top of the relevant section. Srsly, not worth worrying about. SandyGeorgia (Talk) 01:02, 3 July 2022 (UTC)
- This is confusing, since it's the opener of this section who is proposing to change the archiving period. I'm not necessarily against that, but I wouldn't support a length as long as the proposed 720 days. As for the archive itself, Archive 6 currently contains about eight years of discussions, so I don't think there is much to move back and forth. For now I don't mind the current archiving period staying. Onetwothreeip (talk) 02:53, 3 July 2022 (UTC)
- Did I mention the delicious irony? EEng 05:13, 3 July 2022 (UTC)
- Second request. Please stop WP:BLUDGEONing on this page. With less than a year posting here, you have a third of the total page content for all time, and this kind of off-topic posting disrupts watchlists. SandyGeorgia (Talk) 05:31, 3 July 2022 (UTC)
- I guess you missed my response to your earlier BLUDGEON complaint: WP:BAREASSERTIONSABOUTATTENTIONSPANANDWHATREADERSWANTREPEATEDOVERANDOVERWITHNOEVIDENCETOBACKTHEMUP. Consider this a second (actually, more like third, fourth, fifth, or more) request to respond to that. As for the disruption to watchlists, I suppose it's similar to the watchlist churn stemming from articles being sliced up into pieces in obeisance to this guideline from the stone age.EEng 06:07, 3 July 2022 (UTC) Shhhh! Featured-article masters at work!
- Second request. Please stop WP:BLUDGEONing on this page. With less than a year posting here, you have a third of the total page content for all time, and this kind of off-topic posting disrupts watchlists. SandyGeorgia (Talk) 05:31, 3 July 2022 (UTC)
- Did I mention the delicious irony? EEng 05:13, 3 July 2022 (UTC)
This template is being used very selectively
Hi
There is a list (https://en.wikipedia.org/wiki/Special:LongPages) in which every article in that list could technically be eligible to have the "Very Long" template on the top of the article. That task would be quite easy to do at a rate of about 100 per day or more.
The "Very Long" template I am talking about has the following text:
"This article may be too long to read and navigate comfortably. Please consider splitting content into sub-articles, condensing it, or adding subheadings. Please discuss this issue on the article's talk page. (September 2022)"
And the words "too long" hyperlink to this page.
If it was simply about the prose size of an article, one would just go through the above list and put the "Very Long" template on each article, starting from the longest in descending order.
The question is why is it not done?
A reasonable and logical conclusion is that this template is being used very selectively by groups of editors for their own reasons.
No group of editors likes to have a massive warning at the start of their article, indicating some kind of fault in the article, which has the effect of depreciating the article to readers from the onset.
Also the small version of the "Very Long" template is buggy, and will not allow you to replace the word "section" with "article", and so objections can be raised in its use.
Now, I could have quite correctly, according to this guideline, put 500 of these templates on 500 articles across Wikipedia today. But I didn't because I fear that would spark such controversy and would take up too much time of too many editors that are busy improving Wikipedia.
If you want me to, I could go through the list adding "Very Long" templates by prose size to articles in descending order. However, if I were to put that template on the Donald Trump page, which is eligible for the "Very Long" template, I have a feeling that this guideline would be heavily challenged and would probably eventually be removed.
So can we just avoid that scenario and have this guideline removed or enforce it logically based on size, which shouldn't take too long.
Darylprasad (talk) 09:13, 15 September 2022 (UTC)
- Not sure what template you're referring to, but this page is a guideline, to be used with common sense, not an immutable law. It also refers to the prose size (length of an article for readers), not file size as suggested by your link. Please don't spam any template across random pages. CMD (talk) 10:02, 15 September 2022 (UTC)
- The template I am referring to is the "Very Long" template.
- Most if not all the articles in on the list (https://en.wikipedia.org/wiki/Special:LongPages) will have a prose size that breaches this guideline.
- Whilst guidelines are not immutable laws, they are treated as such by groups of editors. That point is evident when one writes Wikipedia articles.
- As far as common sense goes, isn't it common sense to put the "Very Long" template on articles starting with the largest prose size in descending order. There are tools to determine the prose size of an article and it would be a easy task to go through the list (https://en.wikipedia.org/wiki/Special:LongPages) and simply add the "Very Long" template to those that exceed the recommended prose length.
- My point is that this simple task hasn't been done because editors have other reasons for selectively adding the "Very Long" template to some articles and not others.
- Darylprasad (talk) 11:01, 15 September 2022 (UTC)
- There's no need for a conspiracy, all tags are added manually by various editors for various reasons. Blindly tagging articles would be a very poor idea. The top article on that link you provide (ie. the longest article), has according to the usual prose size tool, 11 words. CMD (talk) 11:07, 15 September 2022 (UTC)
- Yes well that one wouldn't need a template. How about others on the list? The only reason an editor adds a "Very Long" template to an article is because it breaches this guideline. There is no other reason. Darylprasad (talk) 11:14, 15 September 2022 (UTC)
- The second and third one also no, and that's already more engagement than this is worth. If you find an article that is too long, make a note, or rewrite it in WP:SUMMARYSTYLE yourself. Otherwise, not sure what you're looking for here. CMD (talk) 11:16, 15 September 2022 (UTC)
- By make a note, do you mean add a "Very Long' template to the article? I note that the prose size (text only) on the Donald Trump article is 149 kB. Would it be OK to add a "Very Long" template to that article? Darylprasad (talk) 11:25, 15 September 2022 (UTC)
- No, no, no, and fucking no. No. Find something useful to do. EEng 21:23, 15 September 2022 (UTC)
- I agree. We already have an editor with an OCD fetish for reducing the size of long articles, often without discussion. Sometimes they'll just delete huge swaths from an article, or they'll split them so info is hard to find. They did this with the Trump-Russia investigation timelines, so now one has to perform the same searches on multiple articles. No, find something useful to do that won't irritate other editors. -- Valjean (talk) (PING me) 05:05, 16 September 2022 (UTC)
- What editor is that? EEng 07:58, 16 September 2022 (UTC)
- My question about the Donald Trump article was rhetorical. What you say ("Sometimes they'll just delete huge swaths from an article, or they'll split them so info is hard to find") is being done and has been done to the Neoplatonism article, which is why I started the thread. I try my best to make positive contributions to Wikipedia, and very, very rarely delete text without replacing it with more accurate and better cited information.
- Note: Since I predominantly finished my rewriting of the Proclus article on the 8th of April 2022, the page views have gone up from 122/day (7/1/2015 - 4/8/2022) to 155/day (4/8/2022 - 9/17/2022)...a 27% increase in page views per day. So I think that readers like the new longer article. Darylprasad (talk) 03:29, 18 September 2022 (UTC)
- By make a note, do you mean add a "Very Long' template to the article? I note that the prose size (text only) on the Donald Trump article is 149 kB. Would it be OK to add a "Very Long" template to that article? Darylprasad (talk) 11:25, 15 September 2022 (UTC)
- The second and third one also no, and that's already more engagement than this is worth. If you find an article that is too long, make a note, or rewrite it in WP:SUMMARYSTYLE yourself. Otherwise, not sure what you're looking for here. CMD (talk) 11:16, 15 September 2022 (UTC)
- Yes well that one wouldn't need a template. How about others on the list? The only reason an editor adds a "Very Long" template to an article is because it breaches this guideline. There is no other reason. Darylprasad (talk) 11:14, 15 September 2022 (UTC)
- There's no need for a conspiracy, all tags are added manually by various editors for various reasons. Blindly tagging articles would be a very poor idea. The top article on that link you provide (ie. the longest article), has according to the usual prose size tool, 11 words. CMD (talk) 11:07, 15 September 2022 (UTC)
- It's correct to say that the template is not applied consistently. The largest thousand articles in size, would all or almost all have the template fairly applied to them, which would be less than 0.02% of articles on English Wikipedia. In most cases, the reason that an extremely large article does not have the template is simply because nobody has bothered to include one, or that the article is only large for a brief period of time. Onetwothreeip (talk) 07:31, 25 September 2022 (UTC)
- It is used consistently. It is not used on any article where there is no ongoing discussion. The template is invalid without one. No one has bothered to start one. Most likely because the extremely large articles are just fine as they are. Hawkeye7 (discuss) 22:42, 29 September 2022 (UTC)
- For the record, this entire discussion is a POINTy response to events at Neoplatonism, and Darylprasad has been indefinitely blocked. * Pppery * it has begun... 22:44, 2 October 2022 (UTC)
Speaking of OCD
Admirers of pointless OCD wastes of time are invited to admire the goings-on at Talk:List_of_Hindi_songs_recorded_by_Asha_Bhosle#Splitting_this_article -- the very page that User:Chipmunkdavis specifically called out at the top of this thread as one not needing splitting. EEng 23:52, 29 September 2022 (UTC)
Every attempt at shortening the article failed. Does ANYONE know what to do to get this behemoth to follow this guideline?192.42.55.22 (talk) 16:02, 24 October 2022 (UTC)
- 10,033 words excluding the table. On pure numbers, there's no egregious violation. If there is issues with overdetail, due weight concerns, unencyclopaedic information, or the potential for more effective summary style, they should be discussed on talk pages. That is without using sockpuppets of course, which appears to have bedevilled a previous attempt. CMD (talk) 16:10, 24 October 2022 (UTC)
- That article is way too over detailed on the path and destruction of the storm. Large swathes of text are only pulling from one or two sources. We are to be summarizing, not duplicating, the event. Masem (t) 18:53, 24 October 2022 (UTC)
Instructions
Onetwothreeip, I see that you've reverted a change. Here's how I think the two versions compare:
New | Old |
---|---|
When you split a section from a long article into an independent article, you should carefully follow the directions related to Wikipedia's licensing requirements and proper copyright attribution. Leave a summary of the removed material in the original article, along with a link to the new, independent article. | When you split a section from a long article into an independent article, you should leave a short summary of the material that is removed along with a pointer to the independent article. In the independent article, put the {{SubArticle}} or {{Summary in}} tag on the talk page to create a banner that refers back to the main article.
To conform with Wikipedia's licensing requirements, which permit modification and reuse but require attribution of the content contributors, the new page should be created with an edit summary attesting proper copy attribution, such as "split content from [[article name]]". (Do not omit this step or omit the page name.) A note should also be made in the edit summary of the source article, "split content to [[article name]]", to protect against the article subsequently being deleted and the history of the new page eradicated. The {{Copied}} template can also be placed on the talk page of both articles. |
|
|
You reverted the old version back into this page, so I assume you think it was better. Why do you think it's better to have redundant, incomplete instructions here, instead of sending people to the main page? WhatamIdoing (talk) 03:20, 6 November 2022 (UTC)
- Thank you WhatamIdoing for your work on this. I don't think it is important why I think your version is better or worse than the original, since I broadly agree with the approach you have taken. I would propose instead
When splitting a section into a new article, you should refer to the steps in WP:PROPERSPLIT, including an edit summary in the new article attributing the origin of the content to the existing article.
Working from your version, this would be even more concise and less likely to conflict with other pages, while still reminding the reader about the importance of edit summaries. Onetwothreeip (talk) 03:41, 6 November 2022 (UTC)- That sounds like an improvement over what we've got now, @Onetwothreeip. Would you like to make that change? WhatamIdoing (talk) 16:04, 6 November 2022 (UTC)
Convert to word count
Currently the article size tables given in readable prose. I think it would be better to get this information in word count for three reasons:
- it is more intuitive. It's the units used for secondary school essays for instance.
- It requires less technical knowledge. Currently many have to learn about how to convert character count to kB
- the confusion between overall article size (as found in history) and readable prose size (as it is defined in the table) is eliminated.
I've checked the ratio between readable prose in word count and character count for 10 articles spread over different topic areas.[1] The median I found was 0.154 words per character. So rounded to the nearest thousands, the table would become:
What to do | |
---|---|
> 15,000 words | Almost certainly should be divided |
> |
Probably should be divided (although the scope of a topic can sometimes justify the added reading material) |
> 8,000 words | May need to be divided (likelihood goes up with size) |
< 6,000 words | Length alone does not justify division |
< 150 words | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
The tools to get word count would be Caorongjin's Word Count Tool and XTools' Page History. This has proposed in the past (f.i. 2008/2021). These proposals did often not make it as people started a secondary discussion about changing the limits. Which is a shame, as I see this as a small easy win for clarity.
If there are no objections within a few days, I will implement this.
- ^ Indonesia: 9,025 words / 60,407 B vs = 0.149. Earth: 8,768 / 56,774 B = 0.154 Cancer: 9,672 / 62,862 B = 0.154 Alexander the Great: 13,632 / 85,586 B = 0.159 Ancient Egypt: 12359/78587 = 0.157 Novel: 7,598 / 48,580 = 0.156 Memory: 8,529 /55,717 = 0.153 Deforestation: =12043 / 78311 = 0.153 Music: 12193 / 78222 = 0.156 Epimistology = 0.153
—Femke 🐦 (talk) 19:13, 11 January 2023 (UTC)
Oppose as stated. There is quite the difference between "Readable prose" and "Word count excluding references". Readable prose is the amount of words that make up the main article, but excludes references, tables, lists, images. Readable prose can be in either words or characters, that I don't care, the size of the article should only be judged by readable prose, not by total article character count. I just looked up one article for example purposes, Anna Nalick, which has a byte size of 22,997, but XTools states it has a prose of 8,313 characters or 1,436 words. Quite the difference between 22,997 and 8,313, and it is not all references. I think before any changes can be made to the article size listing, the wording needs to be better understood, and I will always oppose any attempt to replace "Readable prose" with a different wording combination. Leave "Readable prose" in the table heading, provide a table footnote that restates the definition of readable prose as stated at the top of the article where readable prose was defined. The first column of the table change can still be by word count instead of character count, but the column header is unacceptable as stated. Mburrell (talk) 20:19, 11 January 2023 (UTC)- Mburrell, discussing before !voting is always helpful. In this case, if we can't come to consensus via discussion, then a survey can be initiated. So far, I don't see anything wrong with this proposal, as it does nothing more than convert kb to readable prose, which is how the page should have always been constructed. SandyGeorgia (Talk) 22:14, 11 January 2023 (UTC)
- Is it necessary to discard Wikipedia:Prosesize, Dr. Pda's script, which as far as I can tell, still works?
- Considering past contention here, I suggest waiting longer than a few days to be sure no one is caught by surprise.
- When using your .154 factor, I do not come up with the same cutoffs as in the table above; pls spell out? For example, the 60 kb cutoff gives me 9,000 words, not 10,000. What am I doing wrong ?
- Other than that, your commmon sense proposal is a good one, but I want to go on record (still) with ... accepting this change for clarity in how to calculate the numbers does not necessarily imply acceptance of the cutoffs as defined. I continue to opine that the "almost certainly" cutoff is much lower than 15,000. SandyGeorgia (Talk) 20:33, 11 January 2023 (UTC)
- Mburrel: Happy to keep readable prose size as a column. Not sure why I changed that, did not mean to change the meaning there. I think we are in agreement about the rest, right?
- SG: We very much agree on disagreeing with the 15,000. I'm planning to make a proposal in this direction later. Wanted to get get the easy bit done first. 9,000 vs 10,000 was a brainfart, corrected now. I just tried to install Dr Pda's script, but it didn't show up for me (at least not where I expect). The word count is a bit buried in all the other numbers. I think Caorongjin's script is cleaner. —Femke 🐦 (talk) 20:50, 11 January 2023 (UTC)
- Good approach :) I installed Dr pda over a decade ago, so can't speak to how to do it now, but for me, it is in the toolbox on the left-hand side of the screen. SandyGeorgia (Talk) 22:05, 11 January 2023 (UTC)
- I am satisfied with the table as currently presented. I just wanted to make sure we were working with readable prose as the definition for article size limits. I can support your changes now.
- We should still have a table footnote, with directions on how to access the Xtools or the other one you mentioned, so that any editor who accesses this page can easily determine prose size. Mburrell (talk) 01:12, 12 January 2023 (UTC)
- Good approach :) I installed Dr pda over a decade ago, so can't speak to how to do it now, but for me, it is in the toolbox on the left-hand side of the screen. SandyGeorgia (Talk) 22:05, 11 January 2023 (UTC)
OpposeByte sizes are invariably larger than prose sizes, and we use prose size for a good reason. WP:SIZE describes how it is arrived at. References, footnotes, tables, captions and the like are excluded. Note also that there is a difference between characters and bytes. Byte size really blows out in an article incorporating foreign script. But images account for most of out download. Consider this. It is a meagre 600 words (3,300 byes) but the image is 1.89 million byes. It's technical. (@EEng:) I rewrote the prose counter as a C# module so the MilHistBot could use it and could never quite get the same number as Dr pda (but did not need it better < 0.1% difference). The correlation between words and characters is of course inexact. I think 10,000 words is more suitable, as it is nice and round and a good size for a featured article. (Hanford Site, for example, is 10,271 words and only 63 KB.) Hawkeye7 (discuss) 23:21, 11 January 2023 (UTC)- Hawkeye7 I'm not sure you're understanding the proposal; could you have another look? We are all, best I can tell, in agreement that prose size is what we should be using, which is what Femke is trying to accomplish by simply changing the current numbers in the chart from KB to an approximate prose size, without proposing to change the current cutoffs. The .154 (average) conversion factor from KB to prose size is .163 in the case of Hanford Site, so close enough. Femke is not proposing to change the current cutoffs, rather to just get the chart to reflect approximate prose size in readable words rather than KB, which made no sense. The .154 factor works closely enough for every article I've checked ... as to whether we should alter the cutoffs, that would be in a separate proposal. SandyGeorgia (Talk) 23:29, 11 January 2023 (UTC)
- Fair enough. Stricken. However, the proposal still requires the use of the automated tool to determine the word count, and does not eliminate the confusion between overall article size (as found in history) and readable prose size (as it is defined in the table). Hawkeye7 (discuss) 00:08, 12 January 2023 (UTC)
- Hawkeye7 I'm not sure you're understanding the proposal; could you have another look? We are all, best I can tell, in agreement that prose size is what we should be using, which is what Femke is trying to accomplish by simply changing the current numbers in the chart from KB to an approximate prose size, without proposing to change the current cutoffs. The .154 (average) conversion factor from KB to prose size is .163 in the case of Hanford Site, so close enough. Femke is not proposing to change the current cutoffs, rather to just get the chart to reflect approximate prose size in readable words rather than KB, which made no sense. The .154 factor works closely enough for every article I've checked ... as to whether we should alter the cutoffs, that would be in a separate proposal. SandyGeorgia (Talk) 23:29, 11 January 2023 (UTC)
- Support. Word count is what the publishing industry uses; it should be used here as well. And for what it's worth, I also still use the Dr Pda thing. Wasted Time R (talk) 01:26, 12 January 2023 (UTC)
- Oppose or find a system with both Word court doesn't apply to tables and lists and references, and I have seen some massive tables/lists that need splitting at time when they go over 100k. We can't just focus on readable prose, though for prose-heavy articles, word count is reasonable. --Masem (t) 01:30, 12 January 2023 (UTC)
- This point needs to be resolved before I can support. SandyGeorgia (Talk) 15:36, 12 January 2023 (UTC)
- This is true, but a whole different can of worms. Word count is only for prose-heavy articles. The guideline explicitly excludes list articles, and for good reason. Hawkeye7 (discuss) 17:18, 12 January 2023 (UTC)
- This problem should be the same for the two units right? For instance, the insanely long List of Hindi songs recorded by Asha Bhosle has 55 characters according to the DYKcheck and 54 according to XTools. So the old system didn't work there either. —Femke 🐦 (talk) 17:21, 12 January 2023 (UTC)
- Yes. That is because almost everything in the article is in tables. WP:SIZE explicitly excludes list articles, and advises against breaking up tables. We need to come up with a completely new way of measuring them. Some list articles have blown hard limits. Transclusions are the usual culprits. But as large as it is, List of Hindi songs recorded by Asha Bhosle has not exceeded any software limits, and people keep adding more material to it. Hawkeye7 (discuss) 19:35, 13 January 2023 (UTC)
- WP:SIZE, as in the article in discussion here, doesn't exclude list articles. There is an entire subsection on lists, at WP:SPLITLIST. Lists are also mentioned in the section about splitting articles, WP:SPINOUT:
Long stand-alone list articles are split into subsequent pages alphabetically, numerically, or subtopically.
and in the size guideline subsection, WP:SIZERULE:The rules of thumb [...] apply less strongly to list articles
. The full list of Asha Bhosle songs is already split among a few articles, it happens that one of them is extremely large. Onetwothreeip (talk) 19:48, 13 January 2023 (UTC)
- WP:SIZE, as in the article in discussion here, doesn't exclude list articles. There is an entire subsection on lists, at WP:SPLITLIST. Lists are also mentioned in the section about splitting articles, WP:SPINOUT:
- Yes. That is because almost everything in the article is in tables. WP:SIZE explicitly excludes list articles, and advises against breaking up tables. We need to come up with a completely new way of measuring them. Some list articles have blown hard limits. Transclusions are the usual culprits. But as large as it is, List of Hindi songs recorded by Asha Bhosle has not exceeded any software limits, and people keep adding more material to it. Hawkeye7 (discuss) 19:35, 13 January 2023 (UTC)
- It would be a step forward to apply the readable prose guideline to the readable words in the list articles. These list articles still have their own word count, they are just stored in tables. These software tools make measuring word counts faster for many articles, but are certainly not necessary. Onetwothreeip (talk) 19:53, 13 January 2023 (UTC)
- I can see issues with both long prose where there are some lists around (For example, a well-versed actor may have tons of prose in addition to a list of films and TV appearances in a table). This situation would be in addition to where there are long lists with minimal prose around them. Masem (t) 20:50, 14 January 2023 (UTC)
- Why not include both? We can add "approximately 8,000 words" for example. Or vice versa, prioritising the word count, as the main text currently does: At 10,000 words (50 kB and above) (Presumably this number will need to be changed to 8,000 if this passes). CMD (talk) 01:38, 12 January 2023 (UTC)
- I think, given the confusion by smart people above, it would be good to keep the character count in the table too, but temporarily. Just to ensure people don't think we've actually changed the guideline here in any practical sense. I want to see our guidelines simple and accessible, and one way to do that is to reduce the guideline's jargon and word count. The kB column requires explanation. And yeah, that conversion in the body of the article doesn't make sense, and the citation is out-of-date (attention span has reduced / has a fuzzy definition). —Femke 🐦 (talk) 17:26, 12 January 2023 (UTC)
- Without withdrawing my longtime objection to the complete arbitrariness of the existing guidelines, changing to a word count has the laudable effect of, at least, making it impossible for people to think the wikisource size (the size you see in the article history) is what any of this is talking about -- it forces them to use some kind of word-count widget, which (presumably) restricts itself to prose only. EEng 09:39, 12 January 2023 (UTC)
- A word count guideline would be a good addition, alongside the existing technical size guideline. We can also use a readable character count as well. I would suggest to add an approximate word count measure to the existing technical size guideline, as any more significant changes would be unlikely to gain the consensus necessary to change longstanding and entrenched project guidelines. Onetwothreeip (talk) 21:27, 12 January 2023 (UTC)
Readable prose size | What to do | |
---|---|---|
> 15,000 words | > 100 kB[a] | Almost certainly should be divided |
> 9,000 words | > 60 kB | Probably should be divided (although the scope of a topic can sometimes justify the added reading material) |
> 8,000 words | > 50 kB | May need to be divided (likelihood goes up with size) |
< 6,000 words | < 40 kB | Length alone does not justify division |
< 150 words | < 1 kB | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
- ^ Each kB can be equated to 1,000 characters
Based on the comments above, I think the above table would work. I'm putting words first, as it's good scicom practice to put the easy units first. I would like to not say something like "approximately 8000 words", as it's clutter, and we should use the easiest unit as the norm, not the kB unit. This'll make future discussions easier too.
If we keep the kB unit (I hope only for the transition, letting people know we've not changed anything), a note directly under the table with an explanation how to convert this to characters would be useful. Q: forgot how to do this in wiki markup. Can somebody help? —Femke 🐦 (talk) 17:50, 13 January 2023 (UTC)
- Not sure what the question is, but use sup tags??? For example:Note A
- Note A. Note.
- SandyGeorgia (Talk) 18:02, 13 January 2023 (UTC)
- This table looks good. I support this in theory but I would like to see how the discussion continues. As for removing the technical units, it would take years to meaningfully transition given how long they have been around, so I would not anticipate those being removed in the near future. Onetwothreeip (talk) 19:58, 13 January 2023 (UTC)
- SG: the thing I'm looking for is a note directly under the table, without the white space that is now visible between the table and the "* Each kB .." note. I believe I've seen this in the past, but I never work with tables.
- I would say 1-2 years is sufficient for the transition, given that the tools people use to estimate readable prose size either display both units, or only word count. —Femke 🐦 (talk) 09:09, 14 January 2023 (UTC)
{{ping Femke}}
is the table note formatted how you wanted now? Caeciliusinhorto (talk) 07:38, 16 January 2023 (UTC) @Femke: fixing ping Caeciliusinhorto-public (talk) 09:01, 16 January 2023 (UTC)- Not really, as I'd like less rather than more emphasis on it. I'd like it to hug the table and be in small font. I must have imagined seeing this if nobody understands me :). —Femke 🐦 (talk) 17:05, 16 January 2023 (UTC)
- Hmm, well I can't find a template that does that, but apparently there's a style= parameter for {{reflist-talk}} and {{notelist-talk}} which you can mess about with, like so Caeciliusinhorto-public (talk) 16:54, 17 January 2023 (UTC)
- Femke, have a look at the infobox at German invasion of Greece? That suggest that maybe you can just add a field at the bottom of the table, but then you'd need to figure out the col span etc ... SandyGeorgia (Talk) 16:57, 18 January 2023 (UTC)
- I've implemented that in the end. Looks okay enough, thanks :). —Femke 🐦 (talk) 18:35, 18 January 2023 (UTC)
- Not really, as I'd like less rather than more emphasis on it. I'd like it to hug the table and be in small font. I must have imagined seeing this if nobody understands me :). —Femke 🐦 (talk) 17:05, 16 January 2023 (UTC)
Request for comment on use of template:very long
Hi. A disagreement about tagging an article with {{very long}} has begun on the talk page of French Resistance between myself and Scope creep (I see is involved in discussions on this page). Other interested editors may wish to contribute there. Fred Gandt · talk · contribs
17:30, 12 February 2023 (UTC)
- That article is 26,305 words long? That's too much even by the standards of lengthy-article enthusiasts, especially on a brutal topic such as that one. Wasted Time R (talk) 21:19, 12 February 2023 (UTC)
- Depends on the needs of the topic. My quick analysis suggest the article could use a streamlining copyedit, with more detail moved into the existing sub articles. Possibly some new subarticles could be created. But this would be because it better serves the reader's understanding, not in obeisance to some stupid numbers someone quite on the back of an envelope 20 years ago. EEng 21:33, 12 February 2023 (UTC)
Maintainability issues
The most common argument I hear on article size discussions is that very long articles are difficult to maintain. I was surprised that the body of this guideline says nothing about this, so I'm drafting some text to reflect this practice. It would be added as a second heading after the readability issues section.
Early draft: Maintainability issues
Wikipedia articles are in constant need of maintenance. This ranges from small edits to correct spelling and grammar to major updates that reflect changing knowledge and changing situations. Many articles also require a rewrite after the dust has settled to remove breaking news that does not pass the WP:10YEARTEST. In general, it is good practice to take extra care that articles in need of frequent updating do not become too long. |
The key sentence is the recommendation here: it is good practice to take extra care that articles in need of frequent updating do not become too long.
I've phrased this quite weakly (compare: Articles that cover particularly technical subjects should, in general, be shorter than articles on less technical subjects.) Alternatively, we could say: "Articles in need of frequent updating should, in general, be shorter than articles on more static topics". Or something in the middle.
Happy to hear your thoughts / see suggestions for improved language and maybe more guidance how people can reduce the maintability burden of their article. —Femke 🐦 (talk) 09:00, 14 January 2023 (UTC)
- It seems to me that this is largely already covered in the guideline, though I would have no objection to including that articles should be a size that they remain easy to maintain. Onetwothreeip (talk) 09:46, 14 January 2023 (UTC)
In general, it is good practice to take extra care that articles in need of frequent updating do not become too long.[citation needed]
articles cease to grow significantly once they reach a certain size, even when there is still relevant information that can be added.[citation needed]
- EEng 09:48, 14 January 2023 (UTC)
- Onetwothreeip: Can you give an example? The word maintain is not mentioned a single time in the guideline.
- @EEng: the first sentence is a normative one, so consensus rather than a citation is needed for it. The second sentence is repeating a part of the lead that may have had past consensus, but I was equally wondering if there is hard data on this. My gut feeling is that it would be true. However, I'm very happy to omit that sentence. —Femke 🐦 (talk) 09:55, 14 January 2023 (UTC)
the first sentence is a normative one
– Consensus on normative exhortations should be based on experience and evidence of what's desirable, not daydreaming. EEng 19:22, 14 January 2023 (UTC)
- Well, I think it's smart. It is always better, in my mind, for a policy/guideline/process to have a bit of text explaining why it exists; if anything, it makes it easier to comprehend, and I don't think there is any real objection to what you have said here. I could make some minor quibbles over phrasing, if you'd like. jp×g 11:42, 14 January 2023 (UTC)
- I (and other experienced editors) absolutely object, I assure you. There's a small number of editors obsessed with mindlessly breaking up large articles without regard to the needs of the subject matter, and this is more propaganda for that. EEng 19:19, 14 January 2023 (UTC)
- EEng, can you stop focussing on editors? I agree with a lot of the points you make in the history and discussions here, and I'm sure we can improve this guideline together, but that won't work if you describe unnamed editors as "obsessed", and arguments as "propaganda". —Femke 🐦 (talk) 20:11, 14 January 2023 (UTC)
- @JPxG: very happy to hear quibbles over phrasing :). —Femke 🐦 (talk) 20:16, 14 January 2023 (UTC)
- I (and other experienced editors) absolutely object, I assure you. There's a small number of editors obsessed with mindlessly breaking up large articles without regard to the needs of the subject matter, and this is more propaganda for that. EEng 19:19, 14 January 2023 (UTC)
- The second point is wrong. The idea is that large size deters people from updating it, but this is demonstrably incorrect. On the contrary, all the evidence points to large size not only being no deterrent to additions, but encouraging the addition of more material, the result of a form of Matthew effect. Some examples:
- World War II: no issues about splitting here; there are already more child articles than you can count. But that doesn't stop people wanting to add more material to the main article.
- Donald Trump: Again, no shortage of subarticles, but continues to get larger despite aggressive trimming efforts. (The article I mean.)
- You reasonably expect an article on a living person or an ongoing event or series of events to grow over time regardless of its size, but the subarticle Presidency of Donald Trump is actually larger at 23,000 words and despite being about a past event it too continues to grow.
- List of Hindi songs recorded by Asha Bhosle: We don't have an agreed-on metric for the article size, but we can agree that it has continued to get bigger over the last year.
- Hawkeye7 (discuss) 18:49, 14 January 2023 (UTC)
- Hmm.. I'm not sure if those article become bigger because they are big to start with, or because their subject matter attracts a wide variety of people who want to contribute. Anyway, let's remove it from the proposal, and also remove it from the exiting lead, as there it may be factually incorrect. —Femke 🐦 (talk) 20:15, 14 January 2023 (UTC)
The idea is that large size deters people from updating it, but this is demonstrably incorrect.
It's actually a very correct statement, that extremely large size ends up deterring edits to the article. First of all, World War II is an article that happens to be a very large topic and large area of interest, so is most definitely an outlier, same with Donald Trump, which has greatly decreased in size over the last few years. The size of List of Hindi songs recorded by Asha Bhosle is certainly a deterrent to making edits, as it is with Donald Trump. Onetwothreeip (talk) 20:45, 14 January 2023 (UTC)
- The second point is perhaps inaccurate and should be removed. It is really certain that maintainabily is a factor or is it just perceived that maintainabily is a factor. I've never came across any editor whose worth their salt, of being incapable of updating and restructing the very largest articles and the size tend to encourage more interaction, more talk activity and more updates. scope_creepTalk 13:20, 16 January 2023 (UTC)
- I think it's a WP:SKYISBLUE situation. If you have 2x the amount of text, you'll have 2x the amount of maintenance. Most maintenance scales linearly with how much text there is: checking text-source integrity, updating, copyediting or improving accessibility and readability. There are different ideas about motivation here. Maybe if an article becomes 2x as big, it's intimidating to edit (for me it is, I often only gnome these articles). Maybe the opposite happens, with more people disagreeing over content and flocking to talk. Either effect is very unlikely to be bigger than the "mathematical" increased burder of maintenance. —Femke 🐦 (talk) 17:15, 16 January 2023 (UTC)
- I feel it's less maintenance, more initial investment. I've seen the opinion bandied about among FA writers that it is often easier to just rewrite an article completely than edit existing articles. That seems less feasible as article size increases. (If you say take a Wikibreak, coming back to a long previously maintained page is daunting, but that is sort of initial investment again.) I think scope-creep has cause and effect regarding talk activity and updates the wrong way around though, talk and updates generates size rather than vice versa, and I suspect it is all ultimately dependent on article title/topic rather than particular article quality. CMD (talk) 01:43, 17 January 2023 (UTC)
- @Scope creep: Extreme size often gets in the way of editing an article. One example would be that the visual editor often fails to open and to save edits with very large articles. The size of an article also has very little to do with the size of the talk page or the amount of discussion. Onetwothreeip (talk) 08:11, 17 January 2023 (UTC)
- That's what section edits are for. EEng 17:57, 13 February 2023 (UTC)
- @EEng, VisualEditor has to process section edits as edits to the whole page, so the performance for them is the same (see this Phabricator task). Best, EpicPupper (talk) 00:06, 14 February 2023 (UTC)
- That's what section edits are for. EEng 17:57, 13 February 2023 (UTC)
- I think it's a WP:SKYISBLUE situation. If you have 2x the amount of text, you'll have 2x the amount of maintenance. Most maintenance scales linearly with how much text there is: checking text-source integrity, updating, copyediting or improving accessibility and readability. There are different ideas about motivation here. Maybe if an article becomes 2x as big, it's intimidating to edit (for me it is, I often only gnome these articles). Maybe the opposite happens, with more people disagreeing over content and flocking to talk. Either effect is very unlikely to be bigger than the "mathematical" increased burder of maintenance. —Femke 🐦 (talk) 17:15, 16 January 2023 (UTC)
- @Onetwothreeip: That may function of the generally poor quality software we have on here, not a function of the size of the article were any editor who is capable of working on it, can. scope_creepTalk 12:27, 17 January 2023 (UTC)
- Of course if the software was better then it would be less of a problem, but as it is, the editing software has issues handling large articles. Onetwothreeip (talk) 08:06, 18 January 2023 (UTC)
- @Onetwothreeip: That may function of the generally poor quality software we have on here, not a function of the size of the article were any editor who is capable of working on it, can. scope_creepTalk 12:27, 17 January 2023 (UTC)
- I agree that excess size impedes maintenance. It is rare to encounter an oversized article that is not chock full of redundancy and verbosity and off-topic content or content that fails to adequately use WP:SS, and attempting to sort reliability of sources, NPOV, due weight and more important matters when you have to wade through text that shouldn't even be there discourages one from wading in at all. SandyGeorgia (Talk) 02:16, 17 January 2023 (UTC)
- If you cut out the redundancy and verbosity then it won't be so long anymore. And that's way preferable to splitting it into two redundant, verbose articles. EEng 21:40, 12 February 2023 (UTC)
- You're absolutely right EEng. The sizerule is now framed as a splitting rule. In many overly long articles, redundancy and boring details should be dealt with by deleting and consensing instead. If there are no objections, I'll move the sizerule to a separate heading and rewrite it to nudge people to use common sense / consider multiple solutions. —Femke 🐦 (talk) 17:00, 13 February 2023 (UTC)
- I do not disagree that chopping the content that shouldn't be there to begin is usually in order for most oversized articles, and support any effort to make that distinction (split vs chop) clear. SandyGeorgia (Talk) 13:49, 14 February 2023 (UTC)
- @Femke: That depends entirely on what you intend to write. Long articles can be improved by a range of measures, not limited to condensing content, removing extraneous information, splitting the article. Onetwothreeip (talk) 20:08, 14 February 2023 (UTC)
- You're absolutely right EEng. The sizerule is now framed as a splitting rule. In many overly long articles, redundancy and boring details should be dealt with by deleting and consensing instead. If there are no objections, I'll move the sizerule to a separate heading and rewrite it to nudge people to use common sense / consider multiple solutions. —Femke 🐦 (talk) 17:00, 13 February 2023 (UTC)
- If you cut out the redundancy and verbosity then it won't be so long anymore. And that's way preferable to splitting it into two redundant, verbose articles. EEng 21:40, 12 February 2023 (UTC)
- This is an important point on the problems with large size. Often it is a sign that there are significant problems with the content or format, let alone problems to do with the size itself. Onetwothreeip (talk) 08:12, 17 January 2023 (UTC)
- I would suggest altering your proposal to the following:
|
- If this has cut anything that may have been important to the proposal, I would certainly be open to including further wording. Onetwothreeip (talk) 08:21, 17 January 2023 (UTC)
- Perhaps work in a WP:PROSELINE link somewhere, as that is also one of the factors that often turns up in oversized articles. SandyGeorgia (Talk) 16:59, 18 January 2023 (UTC)
Proposal
Working from Onetwothreeips text: I've switched the last sentence from 'article' to 'amount of text on a topic', to reflect that splitting isn't always the sensible thing to do.
|
—Femke 🐦 (talk) 16:40, 22 January 2023 (UTC)
- Absolutely oppose. Keeping several fragmented pages, all actually on a single subject, in sync and nonredundant is way more difficult than maintaining a single integrated page. EEng 17:57, 13 February 2023 (UTC)
- See my suggestion above. The SIZERULE shouldn't be framed as a splitting rule. Often, splitting only makes the problems worse. Common sense should prevail when articles are unwieldy. —Femke 🐦 (talk) 18:39, 13 February 2023 (UTC)
- There is a trade-off in ease of maintenance between few large articles and many small articles. If the entirety of Wikipedia was in one article, it would be very hard to edit it. Likewise, it would be difficult to edit Wikipedia if each word was on a separate page. Onetwothreeip (talk) 09:40, 14 February 2023 (UTC)
- Nothing here recommends "several fragmented pages", so that's a non-objection. This proposal is incredibly inoffensive, to the point that it almost states the obvious. It should also be obvious that splitting should occur when it reduces problems, and not when it creates more than it resolves. Onetwothreeip (talk) 09:43, 14 February 2023 (UTC)
Size guideline: subsection of 'splitting an article'?
@Onetwothreeip: could you detail why you reverted [15]? Your edit summary gave the impression you reverted solely because the previous discussion (in maintainability issues) was too short, which is not a proper reason to revert per Wikipedia:Don't revert due solely to "no consensus". I believe this is an edit that resolved an issue brought up repeatedly by EEng and others, namely that people use this guideline to split articles without considering alternatives. Femke (alt) (talk) 09:54, 16 February 2023 (UTC)
- I have no issue with the inclusion of maintenance issues. The risk is confusing readers and creating unintended meanings with this language. Onetwothreeip (talk) 10:07, 16 February 2023 (UTC)
- Could you explain what kind of confusion this specific edit causes or what unintended meanings you think this change could have? Femke (alt) (talk) 10:55, 16 February 2023 (UTC)
- Unforeseen misunderstanding being a general risk when adding new wording, which does not appear to be necessary. Onetwothreeip (talk) 21:39, 18 February 2023 (UTC)
- Could you explain what kind of confusion this specific edit causes or what unintended meanings you think this change could have? Femke (alt) (talk) 10:55, 16 February 2023 (UTC)
I really do not see how such a simple change can lead to misunderstanding, and the discussion just above shows that the old wording can guide people to the wrong solution for an overly long article. —Femke 🐦 (talk) 22:30, 18 February 2023 (UTC)
- Let's ask for some more opinions of the people who seem to be regulars here: @EEng, @Hawkeye7, @Scope creep, @SandyGeorgia, @Masem, @Chipmunkdavis. Should the size guideline be framed solely as a splitting guideline, or should the text be more agnostic about the proper solution to dealing with long articles? —Femke 🐦 (talk) 08:22, 19 February 2023 (UTC)
- Is this just about the "or shortening" clause? I feel the times you most look to shorten rather than split (if a split is desirable) is when information is trivia or recentism, and thus undue anywhere. In those cases, the other guidelines suffice. CMD (talk) 08:40, 19 February 2023 (UTC)
- Yes. My experience is that shortening is usually about removing out-of-date information (which may have been 20-yr old recentism, but perhaps not recognizible as recentism.
- My experience is that people look at the table without reading other parts of the guideline, so I'd disagree with the assessment that we can refer people to other parts of the guideline. Especially given the fact that the headings about deleting trivial information are vague ("Breaking out trivial or controversial sections", rather than something like "when not to split".) —Femke 🐦 (talk) 08:52, 19 February 2023 (UTC)
- I'd like to see a rationale, a reason that is deeply reflective on why long articles are considered a "bad thing" or "not a good thing", based on a modern premise, possibly design, flow, readability, technical reason, software limits. Not based solely on some kind of technical limitation, which no longer exists for the majority of people. The current article which doesn't seem to have been updated for some time except in the most cusory manner, is getting increasing calls by folk as seen as being outdated. Most folks seems to look at the table immediatelty and it seems to be the same kind of response you get when the discussion starts at splitting. Its always the same. It needs to be much finer in its approach, with more guidance, and possible a better decision path, not "its 230k, it needs to be split". scope_creepTalk 09:48, 19 February 2023 (UTC)
- Is this just about the "or shortening" clause? I feel the times you most look to shorten rather than split (if a split is desirable) is when information is trivia or recentism, and thus undue anywhere. In those cases, the other guidelines suffice. CMD (talk) 08:40, 19 February 2023 (UTC)
- I agree this guidance needs to be updated with rationales that apply to the current state of the encyclopedia. In the olden days, splits made more sense than now. I think my proposed change will make it less likely people say '230k, needs to be split', and more likely '230k, what's the best way forward'. —Femke 🐦 (talk) 09:58, 19 February 2023 (UTC)
- Tinkering around the edges isn't going to fix the problem with the article. WP:BIAS needs to be addressed as well, as the article is seeminly written for an American audience in mind. scope_creepTalk 10:04, 19 February 2023 (UTC)
- The table in question relates to prose size not total article size, so it isn't related to the technical limitations. CMD (talk) 10:14, 19 February 2023 (UTC)
- Tinkering around the edges isn't going to fix the problem with the article. WP:BIAS needs to be addressed as well, as the article is seeminly written for an American audience in mind. scope_creepTalk 10:04, 19 February 2023 (UTC)
- I agree this guidance needs to be updated with rationales that apply to the current state of the encyclopedia. In the olden days, splits made more sense than now. I think my proposed change will make it less likely people say '230k, needs to be split', and more likely '230k, what's the best way forward'. —Femke 🐦 (talk) 09:58, 19 February 2023 (UTC)
- The only acceptable reasons for removing content from an article are that it is unsourced, inaccurate, irrelevant or inappropriate. Or it has been moved to another article by splitting. Shortening an article in order to reduce the word count is unacceptable behaviour, and as an admin you are expected to issue a block. Hawkeye7 (discuss) 10:12, 19 February 2023 (UTC)
- Is there better wording possible where it's not implied people remove appropriate content with the sole goal of reducing word count? Usually long article have problems with irrelevant or otherwise inappropriate content. I hope people look at that first before splitting an article, making it harder to find appropriate content. —Femke 🐦 (talk) 10:19, 19 February 2023 (UTC)
- I don't understand the reasoning for the revert at all; separating the size guideline to its own section made sense, and it is a matter distinct from simply "splitting an article". A frequent problem in overly long articles is that they include information that has already been "split"-- that is, the info is already or should be in a sub-article, and the main article fails to adequately use summary style. Repeating excess detail in a main article, that is already in a sub-article, simply means maintaining the same content (sometimes long) content in two places. This may not matter (as) much in fairly static articles, but it matters a lot, for example, in medical content. Details of the "Management of X condition" can change often, and summarizing the detail at a sub-article is an example where the maintenance nightmare is avoided by using a sub-article. Often, the split is already there, and editors simply chunk up the main article rather than more appropriately use the sub-article. Maintainable and readable size is about more than "splitting"; it's about appropriate use of sub-articles. Separately, due weight issues also come in to play. One good example is that it would be UNDUE to include all of the Politics of J. K. Rowling at J. K. Rowling, as that is not a reflection of due weight in the best sources. When someone chunks in a bit of WP:NOTNEWS about JKR to the main article, that can be removed with a reminder/move to the appropriate article. Just as it would not be appropriate to include every detail of the Management of schizophrenia at Schizophrenia, because that is not what the highest quality sources do (where instead, entire journal publications are dedicated to "Management of ... " aspects of medical conditions. SandyGeorgia (Talk) 13:47, 19 February 2023 (UTC)
- WP:UNDUE is indeed a problem, but it mainly affects short articles rather than long ones. This often is the case with biographical articles where some scandal is written up in great detail, unbalancing the whole article. Splitting the scandal off into its own article is not recommended (Politics of J. K. Rowling seems like an invitation to use that article as a toxic waste dump to me). The best possible approach is to expand the whole article so the scandal is no longer UNDUE. The parent article still needs to be complete in its own right, so where a subarticle exists a summary of it (usually about the size of its lead) is still required. Hawkeye7 (discuss) 21:47, 19 February 2023 (UTC)
- It was a problem at the very long schizophrenia; often a matter of researchers trying to get their work shoved in to a high-pageview Wikipedia article. Sometimes that work might belong in a sub-article. (Politics of J. K. Rowling is about more than the recent transgender issue, where it has been used as a toxic dump; if someone cared to clean it up, that could be a decent sub-article. But that's not where the interests lie ... ). But that's a digression ... The more relevant example is medical content. At the FA level, Wikipedia generally (I say generally because there is so much neglected content these days) follows the pattern in the highest quality sources; that is, it is not possible for a broad medical overview to cover every aspect, and Epidemiology of, Diagnosis of, Management of, History of, and so on typically refer to the reader out to such overviews. This is not only an appropriate use of WP:SIZE, WP:SS; it is a due reflection of what is done in sources, and makes maintenance of medical content much easier. Imagine the work if every time a new drug is introduced, one had to add that, rather than stating in the main article that such-and-so class of drugs is used, with the explicit detail in the Management of sub-article. SandyGeorgia (Talk) 21:58, 19 February 2023 (UTC)
- To paraphrase Inigo Montoya, summary style doesn't mean what many people think it means. It says:
The parent article should have general summary information, and child articles should expand in more detail on subtopics summarized in the parent article. The child article in turn can also serve as a parent article for its own sections and subsections on the topic, and so on, until a topic is very thoroughly covered.
All works well when we have a high level parent article with many child articles. But if we only split off only one child article, then we can generate a conflict between WP:SS and WP:UNDUE, in that the parent article is now very detailed about some aspects, but not about those that are in the child article. Hawkeye7 (discuss) 23:26, 19 February 2023 (UTC)- I don't see how that conflicts. You can have a single sub-topic covered in extensive detail that would be undue at the main parent page. The way that information is appropriately balanced on the parent page is a question that can be discussed even with a sub-article. It is not uncommon for articles with multiple sub-articles to devote different proportions of their length to each, so an article with one major sub-article could still devote significant attention to that topic if warranted. CMD (talk) 01:38, 20 February 2023 (UTC)
- To paraphrase Inigo Montoya, summary style doesn't mean what many people think it means. It says:
- It was a problem at the very long schizophrenia; often a matter of researchers trying to get their work shoved in to a high-pageview Wikipedia article. Sometimes that work might belong in a sub-article. (Politics of J. K. Rowling is about more than the recent transgender issue, where it has been used as a toxic dump; if someone cared to clean it up, that could be a decent sub-article. But that's not where the interests lie ... ). But that's a digression ... The more relevant example is medical content. At the FA level, Wikipedia generally (I say generally because there is so much neglected content these days) follows the pattern in the highest quality sources; that is, it is not possible for a broad medical overview to cover every aspect, and Epidemiology of, Diagnosis of, Management of, History of, and so on typically refer to the reader out to such overviews. This is not only an appropriate use of WP:SIZE, WP:SS; it is a due reflection of what is done in sources, and makes maintenance of medical content much easier. Imagine the work if every time a new drug is introduced, one had to add that, rather than stating in the main article that such-and-so class of drugs is used, with the explicit detail in the Management of sub-article. SandyGeorgia (Talk) 21:58, 19 February 2023 (UTC)
- WP:UNDUE is indeed a problem, but it mainly affects short articles rather than long ones. This often is the case with biographical articles where some scandal is written up in great detail, unbalancing the whole article. Splitting the scandal off into its own article is not recommended (Politics of J. K. Rowling seems like an invitation to use that article as a toxic waste dump to me). The best possible approach is to expand the whole article so the scandal is no longer UNDUE. The parent article still needs to be complete in its own right, so where a subarticle exists a summary of it (usually about the size of its lead) is still required. Hawkeye7 (discuss) 21:47, 19 February 2023 (UTC)
Validity of this guidance
There is apparently some disagreement with the validity, and even worth of this guidance, with arguments that it is outdated, not taking into account modern broadband speeds and computing power of the devices used to access the encyclopedia. I have seen outright rejection of the guidance on these grounds with no apparent deference to the other concerns about navigability, maintainability or reader engagement/comfort. Unfortunately there is also a lot of opinion and not a lot of real data in these arguments, so I am (non-exhaustively) pinging the following (who's who of) highly skilled, experienced and knowledgeable technicians for a consult: @Izno, Jon (WMF), TheDJ, Mr. Stradivarius, Primefac, and Xaosflux: what is your expert evaluation of this guidance as it stands, and how should it be improved if at all? Fred Gandt · talk · contribs
16:47, 13 February 2023 (UTC)
- Perennial argument. Please review archives before launching two new sections here. SandyGeorgia (Talk) 16:49, 13 February 2023 (UTC)
- It is precisely because it is perennial and ongoing that I am asking for fresh input.
Fred Gandt · talk · contribs
16:50, 13 February 2023 (UTC)
- It is precisely because it is perennial and ongoing that I am asking for fresh input.
- I think parts of it are still very relevant - while computing power and desktop screens have certainly increased what has increased even faster is the number of readers accessing pages from mobile devices. Some of these choke on pages with very complicated and large tables, especially the types that like to load hundreds (or more!) templates with images (I'm looking at you National Flag templates in giant tables). — xaosflux Talk 17:10, 13 February 2023 (UTC)
- Are there parts of the guideline that need updating? I'm lost on the technical details. —Femke 🐦 (talk) 17:15, 13 February 2023 (UTC)
- Yes; "mobile first" is the mantra of modern web development. For example; I was pleasantly surprised by how well French Resistance (not exceptionally massive) loaded on Chrome with mobile emulation and slow 3G throttling enabled (not real world of course), but it wasn't an experience I would be impressed with as a reader trying to find quick info on the train. Putting aside the massive dent in a limited data plan; long sections are a nightmare on phone screens.
Fred Gandt · talk · contribs
17:36, 13 February 2023 (UTC)- Here's where I point out for the 1000th time that the presence of just a few images in an article consumes about the same bandwidth as all of its text. Yet somehow everyone's still talking about text. It's insane. EEng 18:00, 13 February 2023 (UTC)
- If you have a really bad connections, images don't load at all, right? Article is still semi-functional without the images, so it's not completely irrelevant. I have had trouble loading pages on shitty train connections before.. —Femke 🐦 (talk) 18:32, 13 February 2023 (UTC)
- No, that's not the case. They will download in background in a series of resolutions. The image in the infobox alone is 3.5 MB - four times the size of the HTML of the article!!! This guideline is completely irrelevant to download times. Hawkeye7 (discuss) 18:39, 13 February 2023 (UTC)
- I see, interesting. Hope that somebody with technical knowledge goes over the text and removes the obsolete parts of it. —Femke 🐦 (talk) 18:41, 13 February 2023 (UTC)
- Agreed; real, solid, data-driven reason needs to be applied to a review of this guideline so the value of it can be set in stone for at least a few more years. If there are definite concerns regarding the size of articles, they need to be clearly and unarguably described and adhered to for the benefit of all.
Fred Gandt · talk · contribs
19:09, 13 February 2023 (UTC)
- Agreed; real, solid, data-driven reason needs to be applied to a review of this guideline so the value of it can be set in stone for at least a few more years. If there are definite concerns regarding the size of articles, they need to be clearly and unarguably described and adhered to for the benefit of all.
- "The image in the infobox alone is 3.5 MB" eh. no it's not... It is 57Kb. It's a thumbnail, not the original file. And the fact that I need to explain this is... why we have the guideline :) —TheDJ (talk • contribs) 21:44, 13 February 2023 (UTC)
- I see, interesting. Hope that somebody with technical knowledge goes over the text and removes the obsolete parts of it. —Femke 🐦 (talk) 18:41, 13 February 2023 (UTC)
- Images are loaded whenever the browser is ready for them. That is the big difference with the text content. The browser NEEDS to download all the base HTML, including all the text content before it can do most other things (like running scripts, download images etc). So images don't effect the experience much, but long pages create a long delay before the page is done towards a state that it can do scripts/images etc. —TheDJ (talk • contribs) 21:40, 13 February 2023 (UTC)
- No, that's not the case. They will download in background in a series of resolutions. The image in the infobox alone is 3.5 MB - four times the size of the HTML of the article!!! This guideline is completely irrelevant to download times. Hawkeye7 (discuss) 18:39, 13 February 2023 (UTC)
- EEng; there are more concerns than purely bandwidth. The battery life and memory of mobile devices are a hot topics in development, navigability is also a huge concern with very long documents that appear like seas of text. Loading a large article is pointless, whether technically easy or not, if the only content required could have been delivered in a smaller package. Opinion: shorter articles with tighter focus are more likely to deliver what a searching reader wants than long, wide focussed articles requiring a full read to comprehend perhaps a limited subset of the content.
Fred Gandt · talk · contribs
19:26, 13 February 2023 (UTC)
- If you have a really bad connections, images don't load at all, right? Article is still semi-functional without the images, so it's not completely irrelevant. I have had trouble loading pages on shitty train connections before.. —Femke 🐦 (talk) 18:32, 13 February 2023 (UTC)
- Here's where I point out for the 1000th time that the presence of just a few images in an article consumes about the same bandwidth as all of its text. Yet somehow everyone's still talking about text. It's insane. EEng 18:00, 13 February 2023 (UTC)
- Exactly. Try loading the larger pages with desktop mode on the cheap iPads and you already see why they are WAY too big. And those are modern devices. —TheDJ (talk • contribs) 21:34, 13 February 2023 (UTC)
Okay. That article is a mess, but I tried loading it in Safari with 3G (had to switch from 4G to 3G - no idea how you set throttling on on an iphone but if you do you're not expecting anything to download quickly) and -- zoom! (I thought about loading it on the tram, but that would cause it to use wireless instead.) 3G is about 5 Mbps and 4G is around 50 Mbps. HTML is 890 K, so 890 KB x 8 / 5 x 1000 ≈ 1.5 s if it all loaded at once. Less than one second at 4G. No dint in your data plan (compared to videos of cats). Nobody looking at that article is expecting quick information; possibly they would be looking for a particular fact, like how many people were involved, or what their leadership was. What the article is really an advertisement for is that people who are WP:NOTHERE can template all they like, but article requires expert attention. This is a contentious area, and just being able to read French sources is insufficient; you have to actually be up to date on the literature. Hawkeye7 (discuss) 18:22, 13 February 2023 (UTC)
- I really had no intention to inspire interest in that one example article. I have inserted a qualifier.
Fred Gandt · talk · contribs
19:09, 13 February 2023 (UTC)
OK.. i'll try. I'll try to go through several separate points.
- Bigger articles is more bandwidth
- This creates (financial cost) on the user.
- This creates cost on hosting.
- The biggest part of pages (and thus bandwidth) is generally not the text, but the images, css, javascript widgets etc.
- The savings made for this problem are mostly done at a technical/infrastructure level (thumbnail size, only loading what is required, compression etc etc). These savings are much more important than what editors can do with article writing (especially big articles are relatively rare).
- Bigger articles require more from the client device
- This mostly starts counting at the graphics level. The entire page needs to be rendered and kept in active memory and the more your device is doing the harder this becomes.
- While technically this shouldn't effect most modern devices.. A lot of people are NOT on modern devices. There are 100 million users of the feature phone KaiOS for instance. My dad was using a 2012 iPad mini up until a year ago.
- And even on more modern devices... If I have my entry level iPad from 3 years ago and have lots of tabs open and then try to load a big article, you can get that to crash a Tab. You simply run out of memory available to that 1 single app.
- Bigger articles is more download time !
- If you have bad bandwidth, your biggest concerns are not actually bandwith cost, but bandwidth effects. They are: DELAY and interruptions.
- Bigger articles means you need a stable connection for a longer time before the page works. If you have interruptions you may end up with half loaded pages and incomplete state of the page.
- You always first have to download the complete initial document, which all content is a part of. Only then will the extras like javascript, images, graphs etc start coming in (technically this is more complex/fluid). Once these extras are in, they are cached (which is why when you reload an incomplete page, the chance that it will finish loading completely the second time around is often a bit higher, because half the stuff won't need need to be downloaded again).
- Because article content is part of the initial document that you HAVE to download, your experience actually DOES get heavily effected by delays and interruptions when they occur here. It's like an axe to the trunk. There may be many leaves you can cut, but it all doesnt matter without the trunk, and you need the complete trunk.
- Interruptions while downloading the images etc will only effect a part of the page, not the entire page, so they are not as disruptive to the enduser.
- Very long articles therefore definetly effect your experience when you have Internet of a lower quality.
- Bigger articles is a longer initial document download.
- While the page becomes visible when the first content starts to become available (This is called FirstPaint), it is still downloading the rest as you already see the document.
- The intial document size is important because certain actions (like scripts, which might have to initialize buttons for instance), will only begin once the entire document has finished downloading. So what might happen is that the UI is not fully available as you expect between first load and 'full load'.
- After the full page has downloaded, it might jump around a bit to realign based on all the content that is available that was not available before, or because scripts are modifying the page. The longer you have already been reading the page, before this happens, the more disturbing it becomes to your experience.
- But most importantly, more wikitext and more complex wikitext, equals longer compute time for the servers.
- This means that it takes the servers longer before they even begin giving you an answer.
- This means longer load times (especially for editors) when that page was not recently requested. Editors especially, because their pages cannot be cached completely (there are always some parts calculated on the fly). There is a lot of variance on this metric, as it really depends on if the page needs to be calculated from scratch for you as a user.
- This is why we have the technical limits, like max article size of 2MB and the Template limits. Without limits, you would temporarily make entire servers unavailable to anyone else if this was not kept in check.
Some things to keep in mind:
- 32kb wikitext is rather arbitrary point. But you have to put a line somewhere I guess.
- 2000kb of a page (rendered wikitext) of larger pages is often 1500-1800kb of generated wikitables, navboxes and references and other 'invisible' content. So that means that it might only be 200-500kb of raw wikitext.
- So somewhere between 32kb and 500kb of unparsed wikitext is probably where 'pain' begins for 'some people' and you need to determine how much pain you want them to deal with, and it's really hard to put a number on what is "too much" here.
- What you experience as 1-3 seconds can be 30-90 seconds on an island in indonesia, wifi on an intercontinental flight, a ferry off the coast of Italy, a subway tunnel in Kharkiv, or 64kbsp speeds because you spent your 20GB databundle for the month. And while savings might only scrape off .2secs for you, this can sometimes translate into 20 of those 90 or more seconds for those users (they are disproportionally affected, because the complexity is exponential in it's effects).
- Wikipedia is pretty special with intial document size. Most websites that have a lot of content, switch to using pagination, on demand loading, continuous scrolling and similar techniques. One of the primary reasons to do that is to keep this initial size down, improving performance. We can't do that for readability reasons and because the editors won't like it. Its only a few pages, but they generally are also oft accessed pages.
So is it up to date..
- Yes and no
- Slow is often read by people to mean speed. But what is more important here is the EXPERIENCE by the end user. And that experience is caused by a multitude of technical factors (of which raw transfer speed is one) which all get stacked together and amplify eachother in different ways for different users, but which they will all experience as 'slow'.
- 32kb of wikitext is probably a bit dated
- There seems to be a particular focus in the examples on mobile devices and dialup.. these are somewhat dated examples, and focusing on these examples takes away from the many other instances where you might not have optimal connections available.
Hope that clarifies some things. —TheDJ (talk • contribs) 23:47, 13 February 2023 (UTC)
- That is an outstanding contribution TheDJ; seriously; that would serve well as extended reading in the guideline IMO (explaining why this matters might actually help more than simply saying it does)
Fred Gandt · talk · contribs
02:24, 14 February 2023 (UTC)- From my experience managing large articles, issues of navigability are more significant than technical issues such as download capacity, though technical issues may be more prominent in certain circumstances as previously described. Onetwothreeip (talk) 09:37, 14 February 2023 (UTC)
- This is great and it's great to see editors thinking about this!
- Only thing I might add is that it's not always about length of article - number of images and inclusion of interactive elements e.g Graphs, maps, galleries add to the article size in that they require loading additional JavaScript that requires lots of a devices memory. For example an article A twice the length in text of an article B with a single graph/map could have a smaller article size with all things considered.
- On the mobile site, we currently perform various modifications to HTML, which can be slowed down by large articles. meta:Recommendations_for_mobile_friendly_articles_on_Wikimedia_wikis#Limit_number_of_images_in_a_page might be helpful when thinking about number of images in a page. Jdlrobson (talk) 18:01, 3 March 2023 (UTC)
Thanks to expert evaluation from TheDJ and xaosflux so far, we can rest assured that this guideline is not only relevant but needed. It perhaps could do with some additional explanation of why it matters, to assist confused editors in understanding its importance, and TheDJ has given a huge insight into the many technical concerns. As Onetwothreeip and others (including myself) have noted; there are other concerns than technical that could probably still benefit from review; navigation for readers and editors of large pages, and whether readers are better served by smaller, narrower-focussed articles are of particular interest to me. Fred Gandt · talk · contribs
19:50, 14 February 2023 (UTC)
How do we measure "readability"?
This guideline has for almost two decades assumed just one type of external consumer: a reader with the goal of making it through an entire article, preferably in one sitting, regardless of size, topic and context. Any other way of consuming an article is not even hinted at. We present no evidence whatsoever if an article of, say, 15000 words is actually a bigger problem for most readers compared to an article of 6000 words.
Assuming that our highest priority is to provide the best possible content for readers, how do we know what readers really want and are best served by? Why is reading an article from start to finish in a single setting the only reader-focused ideal we strive for? Peter Isotalo 09:07, 4 July 2023 (UTC)
- I fully agree that readers typically spend less time on an article than fully reading it. I think we need to rephrase the readability section to make that clear.
- Most readers now access the site via mobile and read one or two sections. With the loss of search functionality compared to a PC reader experience, this limits the effectiveness of finding data in long articles, and a long article may be cumbersome to wade through. So the arguments have changed since this was first written.
- Unfortunately, I do not think research exists to justify a longer or shorter article length. —Femke 🐦 (talk) 16:14, 4 July 2023 (UTC)
- At least some does – see Wikipedia talk:Manual of Style/Linking#DL, sections, and_mobile readers, which changed MOS:DL in response to such data. My understanding is that it rather indicates that length is much less important than good sectionalization and clear headings, because people jump around a lot to get at the specifics they've come for. But this is not a topic I've "wallowed" in, so maybe there's more and different usability data to be had. — SMcCandlish ☏ ¢ 😼 12:18, 27 July 2023 (UTC)
dial-up? older browsers?
Please tell me you're kidding when you, in the year 2023, talk about The text on a 32 kB page takes about five seconds to load for editing on a dial-up connection
. How can it be this sentence has been overlooked for what must be decades since it was relevant?
Can we please agree that 32 kB is nothing. Maybe if you were concerned about a 32 GB page I could see the point.
TL;DR: This isn't 1991 when 9600 baud modems were the newest and shiniest. I certainly didn't do any math, but I can absolutely believe loading your example page would take five seconds on that thing. Even in the poorest and most remote areas of the world people no longer connect using 8-bit Ataris. Thank you. CapnZapp (talk) 07:33, 7 March 2023 (UTC)
- While Broadband is certainly more popular than it was in 2001, We should keep in mind that not everyone has access to the blazing fast connections of today. Hell, I'm still technically on modem because they didn't route fibre to my area. From what I gather, the point of the limit is to allow english editors outside of urban areas to edit with some degree of efficiency. No one wants to wait 2 hours to load an article. - MountainKemono (talk) 09:59, 16 May 2023 (UTC)
- A bigger consideration is computer processing speeds, especially when running multiple programs and web pages, on desktop and mobile devices. Onetwothreeip (talk) 22:41, 21 May 2023 (UTC)
Regards, CapnZapp (talk) 07:33, 7 March 2023 (UTC)
- Over a quarter of a million Americans still use dial-up internet. [16] Hawkeye7 (discuss) 03:22, 11 April 2023 (UTC)
- And 22% have no internet at all. In consideration of them, let's limit articles to zero bytes. EEng 03:53, 11 April 2023 (UTC)
- The bit about internet connection speed in relation is completely irrelevant when it comes to text. The amount data involved completely overshadowed by images and other graphical elements for example. It's simply not a relevant enough factor when discussing prose size. Peter Isotalo 09:16, 6 July 2023 (UTC)
- I agree that the technical issues section is obsolete at this point. It's also contributing to the ridiculous proliferation of articles based on every facet of a topic. I propose removing it entirely. Riposte97 (talk) 23:03, 19 October 2023 (UTC)
- Inclined to concur with wholesale removal. In particular, kb size of pages is more influenced by the number/size of images they have than by the "article size" per se. So it's not in practice within the remit of this policy page. Jo-Jo Eumerus (talk) 10:27, 20 October 2023 (UTC)
- Good point. Images are far more heavy on page size than simple text. eyeballing it alone, this thread would be close to 2K. If anything, we should have a policy more directed towards images and their raw file size. 256K is about the upper limit for what web devs use in regards to image size, whats stopping us from doing the same? - MountainKemono (talk) 10:41, 20 October 2023 (UTC)
- Agree with you both, and I'll create a new topic to discuss image size. To draw a line under this conversation, I'm going to boldly remove the part in question. Riposte97 (talk) 08:36, 21 October 2023 (UTC)
- Whoops, turns out the image size policy can be found here and here. Riposte97 (talk) 08:44, 21 October 2023 (UTC)
- Agree with you both, and I'll create a new topic to discuss image size. To draw a line under this conversation, I'm going to boldly remove the part in question. Riposte97 (talk) 08:36, 21 October 2023 (UTC)
- Good point. Images are far more heavy on page size than simple text. eyeballing it alone, this thread would be close to 2K. If anything, we should have a policy more directed towards images and their raw file size. 256K is about the upper limit for what web devs use in regards to image size, whats stopping us from doing the same? - MountainKemono (talk) 10:41, 20 October 2023 (UTC)
- Inclined to concur with wholesale removal. In particular, kb size of pages is more influenced by the number/size of images they have than by the "article size" per se. So it's not in practice within the remit of this policy page. Jo-Jo Eumerus (talk) 10:27, 20 October 2023 (UTC)
- I agree that the technical issues section is obsolete at this point. It's also contributing to the ridiculous proliferation of articles based on every facet of a topic. I propose removing it entirely. Riposte97 (talk) 23:03, 19 October 2023 (UTC)
- The bit about internet connection speed in relation is completely irrelevant when it comes to text. The amount data involved completely overshadowed by images and other graphical elements for example. It's simply not a relevant enough factor when discussing prose size. Peter Isotalo 09:16, 6 July 2023 (UTC)
- And 22% have no internet at all. In consideration of them, let's limit articles to zero bytes. EEng 03:53, 11 April 2023 (UTC)
Deferring to those doing the actual work instead of drive-by taggers
Kind of buried in above discussions, I proposed: While the guideline briefly touches on the idea that there is no big hurry in splitting up a long article, I think it should more clearly state the point that if someone is actively developing the material and doesn't want it split yet, that they should be listened to since they're doing (or doing a significant portion of) the work.
And some detailed rationale behind that: [17] That was in July, and nothing's happend in this regard, probably because it's mired in a broader thread. Might take a little work to craft up a guideline-worthy line item about this, but it's a pretty simple point. — SMcCandlish ☏ ¢ 😼 07:29, 2 November 2023 (UTC)
- Sounds reasonable. Hawkeye7 (discuss) 08:21, 16 November 2023 (UTC)
- The wording will need significant improvement over the green-i-fied portion above, but I would support in theory. I've been part of a situation where someone split content prematurely and without discussion, to the wrong title, without a lead, based on the usual misunderstanding of readable prose vs. KB, leaving undefined citations that couldn't be recovered by bot, not attributing with WP:CWW, creating a double maintenance load and separating primary from secondary sources, when in fact cleaning up prose redundancies and overquoting brought the main article back within size recommendations so that the split wasn't needed anyway. The wording should focus on something like taking care not to do a BOLD split, rather discussing first to see what other options there are, and outlining the steps to do it correctly, or waiting for an editor who knows how to do it correctly. The editor who did that defended it as BOLD and accused me of OWN; in fact, it was reckless and took me days of work to correct, ending up not being needed at all. I wouldn't position it as "those doing the actual work", rather not a situation where bold should be implemented without prior discussion, to a) make sure it's necessary and b) get it done right when it is. A lot of this, though, should actually be addressed at WP:PROPERSPLIT-- and people don't read either page anyway, but reinforcement of the concept at both pages might help. SandyGeorgia (Talk) 11:34, 16 November 2023 (UTC)
- WP:PROPERSPLIT should be enhanced to add the steps to keeping the referencing in order. Hawkeye7 (discuss) 23:52, 16 November 2023 (UTC)
- Isn't that going to vary a lot depending on what citation techniques are involved? — SMcCandlish ☏ ¢ 😼 23:58, 16 November 2023 (UTC)
- Yes, but in this case, the editor just split the content and never even glanced at the mess left. First, if you're using sfns, you have to bring over the matching cite template. Second, they had made so many other messes before the split that the bot couldn't figure out what named refs to use. Third, editor never bothered to check after the bot run that citations were a mess. The general message would be that the editor who does the split should be responsible to make sure all citation info is copied over. SandyGeorgia (Talk) 01:10, 17 November 2023 (UTC)
- We should be clearer in point 5 on the need to make sure references themselves are in place, not just "A References section". Point 6 should also include a note that you may need to add references, such as if the base named reference is removed (although luckily bots seem to be okay at catching this). CMD (talk) 02:31, 17 November 2023 (UTC)
- Sure, that all sounds reasonable (both of you, I mean). — SMcCandlish ☏ ¢ 😼 07:50, 17 November 2023 (UTC)
- We should be clearer in point 5 on the need to make sure references themselves are in place, not just "A References section". Point 6 should also include a note that you may need to add references, such as if the base named reference is removed (although luckily bots seem to be okay at catching this). CMD (talk) 02:31, 17 November 2023 (UTC)
- Yes, but in this case, the editor just split the content and never even glanced at the mess left. First, if you're using sfns, you have to bring over the matching cite template. Second, they had made so many other messes before the split that the bot couldn't figure out what named refs to use. Third, editor never bothered to check after the bot run that citations were a mess. The general message would be that the editor who does the split should be responsible to make sure all citation info is copied over. SandyGeorgia (Talk) 01:10, 17 November 2023 (UTC)
- Isn't that going to vary a lot depending on what citation techniques are involved? — SMcCandlish ☏ ¢ 😼 23:58, 16 November 2023 (UTC)
- WP:PROPERSPLIT should be enhanced to add the steps to keeping the referencing in order. Hawkeye7 (discuss) 23:52, 16 November 2023 (UTC)
Iconoclasm
I've noticed the {{too long}} tag on three articles recently: Winston Churchill, John F. Kennedy and Napoleon. The readership for all of these great men will always be high but it will be especially so currently as JFK is doubly-featured on the main page today as it's the anniversary of the assassination. And there's a big new biopic about Napoleon which was released today.
The typical reader will not be surprised that these pages are long as there's obviously a lot to say about these people. For example, Ridley Scott said that there are over 10,000 books about Napoleon – one for every week since he died. What may surprise the thousands of readers is that Wikipedia is complaining about this length and is expecting them to do something about it. But they probably suspect that, if they do starting taking an axe to the content, it won't end well. Other readers may suspect that the tags have been placed by iconoclastic vandals who want to diminish and disrespect these great men. But again, if you revert these tags as vandalism, that's not likely to end well either. And so, as usual, the tags linger to annoy all those readers while nothing is actually done about them. It's not a good look for our highest-profile articles.
As for the question of the best or most appropriate length, note that Scott uses the "bum ache factor" for his movies. That makes sense for a captive audience in a cinema but Wikipedia readers are not like that, are they?
Andrew🐉(talk) 22:01, 22 November 2023 (UTC)
- thank you, yes, they are! Most people scrolling through an encyclopedia article of 100,000 words, will feel brain ache won't they, hence the guidelines including 15,0000 words. The ache factor is likely normally distributed, with mean about 2-3 hours for films. People have effectively been trying to estimate the ache factor above in the "Is there verifiable scientific basis for the article length guidelines?" section, about readabilty. I'll see if I can find any evidence on readability, Tom B (talk) 17:35, 23 November 2023 (UTC)
- "Brain-ache" from scrolling through a long article? I hardly think so; maybe scroll-wheel finger-ache; but the ToC helps with that. Or did you mean, "from *reading* through" a long article? Again, no; when you're on the sofa or your commuter train, you read as little or as long as you like, then switch to the puzzle, or your song playlist, and pick up the reading later; just like with a book. By your reckoning, the printed book industry (and eBooks) might as well shut their doors because nobody could possibly read a 100-page novella, containing around 25,000 words, let alone War and Peace. And by the way, there is no Wikipedia article with 100,000 words; that's about four times bigger than the largest one. An extreme high end article like Presidency of Donald Trump is #29 in the list and weighs in at 525kb raw and 23,606 prose words (all the larger ones are list- or table-rich and prose words can't be counted accurately). And I don't even know why we bother arguing about this, as some studies (which maybe someone will link for me) have shown that most readers don't read past the lead. Mathglot (talk) 06:38, 24 November 2023 (UTC)
- Most readers use the mobile view which gives quite a different perspective. I just tried looking at Napoleon on my phone. The many large sections of prose are not a problem in this because they are condensed into single lines with fairly clear titles like Early life and Exile on Saint Helena. They can be expanded with a single click and that seems fine as a way of handling such a big topic.
- What causes brain ache is not the condensed body but the clutter at the start of the article. The first screen starts with some tiresome disambiguation and then an even more tiresome tag complaining that the article is too long. The actual article text starts over half way down the screen and its first sentence is quite complex and cluttered.
The source-code for this is even more complex and convoluted and so would immediately repel anyone who dared to start trimming it.Napoleon Bonaparte (born Napoleone Buonaparte;[1][a] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon I, was a French emperor and military commander who rose to prominence during the French Revolution and led successful campaigns during the Revolutionary Wars.
- If the mobile reader then tries to understand why the article considered too long and scrolls down through it, they may conclude that this is because it has too many categories – a huge list of over 100 which are not condensed in the same way as the prose sections and so requires a lot of scrolling to get through.
- So, this "too long" issue is very dependent on your device and preferences. The people placing these tags are not typical readers, right?
- Andrew🐉(talk) 10:48, 24 November 2023 (UTC)
- @Andrew Davidson, thank you, I agree it's about preferences. One thing we've not managed to sift are reader preferences about length of articles, except for the good point you both make that leads are key. You and @Mathglot have positively influenced my behaviour in concentrating on leads more, Tom B (talk) 15:44, 24 November 2023 (UTC)
- Andrew, you raise two issues, each of which deserves its own discussion section: mobile view (which I'll leave for later), and lead clutter, and of course there's some overlap. With respect to just one aspect of lead clutter (which applies to all devices) I've always thought that we have a WP:FIRSTSENTENCE problem in most articles, where we separate the subject of the sentence, which is typically the bolded article topic, from the verb (often is or was) by all sorts of interpolated information that hardly anybody cares about, or at least, not that soon. I have an informal measure about how egregious it is for a given article, by simply counting the number of words between the subject, and the verb; for example, the Napoleon article has a score of "16", and it's not that unusual to find articles with scores in the 30s. This really detracts a lot from readability in my view. It's not that the information isn't important to the article at all, it's just that it shouldn't be crammed into the WP:FIRSTSENTENCE between the subject and verb. Imho, that is just a ridiculous holdover from print encyclopedias which do it this way, and a failure to fully embrace the advantages of hypertext in the first sentence, perhaps in an attempt to borrow some gravitas from staid, old print encyclopedias. We don't need to do this anymore. I'd like to see the verb come right after the subject, with rare exceptions. The interpolated material could go later in the lead, later in the body, or my preference as of now, an explanatory note. That would make the lead sentence of Napoleon look like this:
- Napoleon Bonaparte[c] was a French emperor and military commander who rose to prominence during the French Revolution and led successful campaigns during the Revolutionary Wars.
- with the note showing up in the Notes section, as usual. (For the purposes of this example, I bundled everything into one note, but some of that could be dispersed in different body sections, and needn't all be covered in the note.) Mathglot (talk) 22:19, 24 November 2023 (UTC)
- @Mathglot hiya, we're editing encyclopedia articles rather than 100-page books! This is a source of the disagreement: some editors are ok with articles being book-length, others aren't. I believe you, thanks for the good point: that most don't read past the lead. Surely it's worth bothering discussing that? Why would someone only read the lead of Donald Trump, rather than scrolling through the 17,000 words below. If the guideline was reduced from 15,000 to 12,000, would that increase quality, increase readers scrolling past leads, reduce quality or have little effect? Tom B (talk) 15:40, 24 November 2023 (UTC)
- "Brain-ache" from scrolling through a long article? I hardly think so; maybe scroll-wheel finger-ache; but the ToC helps with that. Or did you mean, "from *reading* through" a long article? Again, no; when you're on the sofa or your commuter train, you read as little or as long as you like, then switch to the puzzle, or your song playlist, and pick up the reading later; just like with a book. By your reckoning, the printed book industry (and eBooks) might as well shut their doors because nobody could possibly read a 100-page novella, containing around 25,000 words, let alone War and Peace. And by the way, there is no Wikipedia article with 100,000 words; that's about four times bigger than the largest one. An extreme high end article like Presidency of Donald Trump is #29 in the list and weighs in at 525kb raw and 23,606 prose words (all the larger ones are list- or table-rich and prose words can't be counted accurately). And I don't even know why we bother arguing about this, as some studies (which maybe someone will link for me) have shown that most readers don't read past the lead. Mathglot (talk) 06:38, 24 November 2023 (UTC)
- Charles de Gaulle recently underwent huge chops, and is still undergoing them despite being below 80kB prose size. Material (usually valuable context, or relevant detail) is getting split to new articles with ~10 daily pageviews. I expressed support for these splits a year ago, but the discussion above changed my mind, they weren't an improvement; I agree with Andrew. It likely wouldn't go well, but I wish the template was deleted altogether. The first step should be to remove any word-count/prosesize-based limit from this guideline. DFlhb (talk) 10:41, 24 November 2023 (UTC)
I second Mathglot's comment at 06:38. We should think of articles as books. How do most readers treat books? Added to that is the fact "that most readers don't read past the lead", so I'd vote to delete this guideline or severely redefine and restrict its use. (That is a bit of an extreme statement which I will explain below.) -- Valjean (talk) (PING me) 18:39, 24 November 2023 (UTC)
- WP articles are not like books. WP articles are also not like conventional encyclopedia articles. They are a new thing unto themselves, hence the difficulty of knowing how large they should be able to be. Wasted Time R (talk) 19:54, 24 November 2023 (UTC)
- Precisely. And this was not known for certain when the Wikipedia project began. What we had was a preconception of what an encyclopaedia should look like based on the familiar paper ones. Hawkeye7 (discuss) 20:36, 24 November 2023 (UTC)
- See Wikipedia:Five pillars and What Wikipedia is not. I have two copies of The Last Lion if anyone needs to read a book about Winston Churchill. Wikipedia is an encyclopedia; articles are not books. SandyGeorgia (Talk) 20:25, 24 November 2023 (UTC)
- This defines what Wikipedia is not, but not what it is. Nowadays, if someone wants to know what regiment of the British Army Winston Churchill served in, they don't borrow The last Lion from the library, they turn to the internet. A Google search will tell you. For further information, they can try more queries or turn to Wikipedia. This is not in the lead, but in the body they will find it. Hawkeye7 (discuss) 20:56, 24 November 2023 (UTC)
Yes, our articles are obviously not books. (My comment was too simplistic.) I meant that the reader's approach to our articles could be seen more like the way people approach books, in the sense that they rarely read a whole book in one sitting, with no pauses. (Yes, yes, and yes again, there are people who read whole books in one sitting, but they are exceptional.) Also, not just like paper books, but like digital books, which can be infinite in size.
Some of those who most strongly advocate for limiting the size of our long articles (one seems to have OCD about it) do so with the flawed approach of "paper" mindset editors, rather than "digital" mindset editors who embrace the newer possibilities of documenting the "sum of all human knowledge". (That was previously an unimaginable thought, so thanks to Jimbo for opening our minds to the new possibilities of the digital age.) Their Wikipedia would miserably fail, according to Baseball Bugs: "If I go looking for info, and Wikipedia doesn't have it, then Wikipedia has failed."
The old-fashioned editorial approach toward creating paper encyclopedias must be abandoned, which is why we have NOTPAPER. In spite of that, we have editors who still treat our articles as if they are paper, with the constraints that are involved, and they cite this page. No, the English Wikipedia, without media, can fit on an Apple Watch, and regardless of size, we can instantly find specific words, phrases, and sections of interest. Our articles can be searched for keywords by researchers and scholars who have no intention of reading the whole article. We do not create articles only for those who sit down and read a whole article. I don't give a flying eff if they never finish the article. They will usually be satisfied with the lead anyway. They can also hop to the sections they want to read.
Our primary concern/goal is documenting the sum of all human knowledge about that subject without violating any of our PAG. "Size" limits should never get in the way of that goal. PRESERVE and NOTPAPER trump "size". In fact, if there is any guideline that should be the first to fall victim to IAR, it is this one. The size in words or bytes is rather irrelevant. We need to get away from the "paper" mindset. Wasted Time R said it well: "They are a new thing unto themselves, hence the difficulty of knowing how large they should be able to be." We now know that size constraints are a thing of the past in a digital age. -- Valjean (talk) (PING me) 21:23, 24 November 2023 (UTC)
- @Andrew Davidson: It would seem that the tags on those articles are working, as the sizes of those articles have been reducing. Typically, excessive length is more a symptom of other problems, than a problem itself, and the editors of the individual articles would generally be those who can best determine whether an article is too long or not, with some exceptions. Wikipedia has developed the solution of the summary style, to ensure that as much information can be kept in Wikipedia as possible, but that individual articles remain accessible, readable and manageable. It's generally good practice to ensure that content removed from one article can be found in another article, typically a sub-article of the main subject. Onetwothreeip (talk) 21:32, 24 November 2023 (UTC)
Notes and refs
- Notes
- ^ English: /nəˈpoʊliən ˈboʊnəpɑːrt/, French: Napoléon Bonaparte [napɔleɔ̃ bɔnapaʁt]; Corsican: Napulione Buonaparte.
- ^ English: /nəˈpoʊliən ˈboʊnəpɑːrt/, French: Napoléon Bonaparte [napɔleɔ̃ bɔnapaʁt]; Corsican: Napulione Buonaparte.
- ^ Born Napoleone Buonaparte;[2][b] 15 August 1769 – 5 May 1821), later known by his regnal name Napoleon I.
- References
- ^ "CPA: corse, AJACCIO, L'ACTE DE BAPTEME DE NAPOLEON Ier". www.antiqu-arts.com. Retrieved 2023-09-24.
- ^ "CPA: corse, AJACCIO, L'ACTE DE BAPTEME DE NAPOLEON Ier". www.antiqu-arts.com. Retrieved 2023-09-24.
MilHist oped
Wikipedia:WikiProject Military history/News/November 2023/Op-ed. SandyGeorgia (Talk) 21:11, 26 November 2023 (UTC)
"Ease of access limits" on section size
I support "ease of access limits" on section size. It should be possible to easily hop to a section and open it using a mobile device. AFAIK, unduly large sections might be problematic. (Maybe not, so let's discuss that.) Section size is more of a concern than total article size. Finding info by searching is not a problem, including trillions of bytes. OTOH, "opening" an unduly large section might be a problem for some users. -- Valjean (talk) (PING me) 17:27, 26 November 2023 (UTC)
- I agree that section lengths are perhaps more strongly related to many of the reader issues described above than overall article length. Scanning 1000 words for the pertinent information is quite difficult, and search functions on mobile phone are not super intuitive (assuming that the information has easy keywords to search for). I would be surprised if my parents could find them.
- I would support adding a (better-phrased) consideration like this:
- There are less reader issues with long articles if they are well-structured, and section are not unduly long. Visa versa, articles with long sections may benefit more from trimming or splitting if restructuring is impossible. Of course, arguments around maintainability remain.
- —Femke 🐦 (talk) 17:39, 26 November 2023 (UTC)
- Now we're heading in the right direction. Thanks for the attempt to improve this situation. -- Valjean (talk) (PING me) 17:44, 26 November 2023 (UTC)
- 70% now are mobile users thus section size matters because most will only scroll a few times[1] This also effects article size - as in how many sections - because most will not scroll 5 times to see a huge TOC to begin with. As we know many look at the TOC for navigation (should not be colapsed by default) [1] If the article is huge full of sections it may appear overwhelming to find basic information. Moxy- 17:40, 26 November 2023 (UTC)
- I do take Moxy's point that there is a limit to how many sections we can have without impeding navigation. I think this is sort of the difference between a 6000 or 10000 max to allow for the other considerations. —Femke 🐦 (talk) 17:46, 26 November 2023 (UTC)
- Splitting does have its place, but can also create accessibility problems, and even function as improper forking. Let's look at an example. An extreme and
OCDSPA type of behavior enforcement of this guideline (by one editor) using splitting has rendered it extremely!! difficult to find and access the information linked in this article Timelines related to Donald Trump and Russian interference in United States elections. (I created that article to somewhat ameliorate the situation, but it's still a problem for readers to find information because they are forced to search many lists.) A reader has to know where to find the relevant list. Before all this splitting, a reader could find it all in one list, but that was considered (by one editor) to be too long, citing this guideline. The result is effectively a way to hide the uncomfortable information that negatively impacts Trump's administration and its proven cooperation ("conspiracy" is unproven) with Russian election interference. Splitting with this effect violates the spirit of improper forking, even if that may not have been the motive. The result is the same. -- Valjean (talk) (PING me) 17:58, 26 November 2023 (UTC)- While I've also had my issues with that main editor of the timeline article(s), and I have long found that group of articles to be a mess, it's very concerning to see an editor suggest they have a mental illness, obsessive-compulsive disorder. Can we not do that? Onetwothreeip (talk) 21:33, 26 November 2023 (UTC)
- Stricken. SPA is the better description, at least that used to be the case. I had never seen such an extremely limited focus before. -- Valjean (talk) (PING me) 02:14, 27 November 2023 (UTC)
- While I've also had my issues with that main editor of the timeline article(s), and I have long found that group of articles to be a mess, it's very concerning to see an editor suggest they have a mental illness, obsessive-compulsive disorder. Can we not do that? Onetwothreeip (talk) 21:33, 26 November 2023 (UTC)
- Splitting does have its place, but can also create accessibility problems, and even function as improper forking. Let's look at an example. An extreme and
- I do take Moxy's point that there is a limit to how many sections we can have without impeding navigation. I think this is sort of the difference between a 6000 or 10000 max to allow for the other considerations. —Femke 🐦 (talk) 17:46, 26 November 2023 (UTC)
- Sounds like a reasonable argument for the creation of a Wikipedia:Section size article (currently a redirect). This page would continue to be about article size, and editors can use whichever guideline they find more useful at the time. Onetwothreeip (talk) 21:41, 26 November 2023 (UTC)
- Well, if we rgoing to start thought-policing, be aware that a lot of people with OCD are uite offended by calling their condition a "mental illness" instead of a common neurodivergence. — SMcCandlish ☏ ¢ 😼 10:37, 27 November 2023 (UTC)
- Like who? The Wikipedia article on the subject refers to it as an illness, and not as a neurodivergence. The neurodiversity article doesn't mention obsessive-compulsive disorder either. Onetwothreeip (talk) 20:12, 27 November 2023 (UTC)
- Well, if we rgoing to start thought-policing, be aware that a lot of people with OCD are uite offended by calling their condition a "mental illness" instead of a common neurodivergence. — SMcCandlish ☏ ¢ 😼 10:37, 27 November 2023 (UTC)
- When replying to editors about sections within large articles I point to WP:DETAIL saying MOS:LEADLENGTH is a good guide. ,,,BUT yes a new page or section here would be good. ...Lead info would be good here to. Moxy- 22:43, 26 November 2023 (UTC)
- WP:MILMOS#SECTLEN:
There remains some disagreement regarding the precise point at which a section becomes too long, so editors are encouraged to use their own judgment on the matter.
Hawkeye7 (discuss) 01:47, 27 November 2023 (UTC)- This is a pointless guideline. I now have more questions after reading it. Moxy- 21:44, 27 November 2023 (UTC)
- WP:MILMOS#SECTLEN:
I've found references that have 1000-word as limits for sections of journal articles.[2] Would such references be useful for applying to sub-section i.e. level ===, is this what people are talking about? There are fewer references than for total article length. It would be simpler to have only a total article length guideline, rather than getting into non-lead section sizes too?
References
- ^ a b "Research:Which parts of an article do readers read". Meta. April 22, 2015. Retrieved November 26, 2023.
- ^ [1]
WP:SPLIT
I've raised a concern at Wikipedia talk:Splitting#Numbers and changes about the guideline there having numbers badly out-of-step with actual practice (as does the present text at WP:SIZE despite all the arguing above). It will eventually need to be normalized to whatever more solidly emerges from discussion here. Just saying we need to be mindful not to create a WP:POLICYFORK.
At any rate, the idea that an article of 100K is "too long" is clearly not tenable. Most of our country and major city articles are much larger, and they are not broken (plus have almost always already been split many times, to numerous extant side articles). — SMcCandlish ☏ ¢ 😼 09:06, 27 November 2023 (UTC)
- India, Canada, Minneapolis, Cleveland -- all Featured articles, all about 11,000 words of readable prose -- all reasonably within guideline more or less, no problem, although it's likely one could find a way to split a piece from any of those if someone insisted. If "most of our country and major city articles are much larger", they're probably a mess, as "most of our country and major city articles" are. It's unfortunate to have this discussion split to the talk page of an informational essay. SandyGeorgia (Talk) 13:20, 27 November 2023 (UTC)
- It would be useful if we could inform editors on the statistics for featured articles by giving the average word count, and giving the upper and lower amounts for something like an 80% range, i.e. the word count which 10% of featured articles are under and which 10% of them are over. This should be counted at the time when the article becomes a featured article. Onetwothreeip (talk) 20:20, 27 November 2023 (UTC)
- Not really, because a huge number of FAs have grown beyond the size they were promoted at, and now need to go to WP:FAR. It will take a script writer to go back and dig up the promoted version and calculate its prose size. Dr pda used to do that work, but I'm unaware of anyone else doing it since he left. SandyGeorgia (Talk) 22:51, 27 November 2023 (UTC)
- Right, I'm explicitly saying that it should be counted at the time the article becomes a featured article. It may also be useful to only consider recently featured articles. Onetwothreeip (talk) 09:06, 29 November 2023 (UTC)
- Not really, because a huge number of FAs have grown beyond the size they were promoted at, and now need to go to WP:FAR. It will take a script writer to go back and dig up the promoted version and calculate its prose size. Dr pda used to do that work, but I'm unaware of anyone else doing it since he left. SandyGeorgia (Talk) 22:51, 27 November 2023 (UTC)
- It would be useful if we could inform editors on the statistics for featured articles by giving the average word count, and giving the upper and lower amounts for something like an 80% range, i.e. the word count which 10% of featured articles are under and which 10% of them are over. This should be counted at the time when the article becomes a featured article. Onetwothreeip (talk) 20:20, 27 November 2023 (UTC)
Summarising evidence, arguments on limits
Issue | Summary of arguments |
---|---|
Readability | Some believe tightening the guideline will increase readability. Attention span time. The average reading session is below the "don't bother to split"-limit.[18] It is not even 10% of the limits proposed in the earliest "readers may tire"-argument from 2004.[19] Content further down is less likely to be read, but readers can pick out sections they want to read.[20] |
Comprehensiveness | Some believe there is a trade-off between comprehensiveness and readability, others believe there is no trade-off. |
Accessibility | Concision is included in dyslexia friendly guidelines and fatiguing conditions. Accessible text should be structured well. This is more challenging with longer articles, especially on mobile, which only allows navigation on top-level headings. Search engines often direct the reader to the main article even when there is a subarticle on the exact topic. Some believe technical issues for readers with slower connections should mean limiting length. |
Quality | Some believe tightening the guideline will increase quality. |
Maintenance | Long articles have more content to maintain. On the other hand, when articles are split to resolve length issues, the maintenance load over multiple articles may become even larger. |
Explicit consensus | It is difficult to achieve explicit consensus on large bodies of text; there is a higher risk of single-authored text that may not reflect consensus. |
Guideline limits, existing and others (words) | Summary of evidence |
---|---|
8,000-10,000 | Length of journal articles.[1][2][3][4][5][6][7][8] For attention span, a 2005 study includes this session estimate: 40 minutes x 238 words ~ 10,000 words. [21]. |
15,000 | Current guideline. |
No limit | Some editors believe removing the guideline would increase comprehensiveness. Removing limits reduces rules, Avoid instruction creep, WP:IAR and MOS:BLOAT. |
- I have edited this with additional material and corrections. Substituted Elvis Presley as FA example and made clearer the distinction between bytes and words. Hawkeye7 (discuss) 18:30, 24 November 2023 (UTC)
- I see no mention of maintenance issues, and don't understand where the 15,000 comes from. SandyGeorgia (Talk) 18:12, 24 November 2023 (UTC)
- Probably because length cuts in two directions, maintenance-wise: A long article has more content to check, but updating multiple articles is harder than just one. Jo-Jo Eumerus (talk) 18:26, 24 November 2023 (UTC)
- Added to the table. Hawkeye7 (discuss) 18:38, 24 November 2023 (UTC)
- @Hawkeye7, thanks very much, i've done some editing. Would anyone be able to easily calculate the max promoted size each year from 2010-2022? This metric may circle back to the current 15,000 guideline, like your example suggests, Tom B (talk) 12:30, 25 November 2023 (UTC)
- That's why splitting is no longer touted as the only solution in the guideline. Sometimes, it's about selection only the interesting and important, and not covering ever-changing details at all. For me maintenance, and the ability to achieve active consensus on a larger proportion of the text are the key arguments in favour of not creating too long an article. —Femke 🐦 (talk) 18:39, 24 November 2023 (UTC)
- I've discussed the three different methods of reducing page size in an essay. Hawkeye7 (discuss) 21:28, 24 November 2023 (UTC)
- In your essay, the sentence "Material must be preserved unless it is unsourced, libelous, patent nonsense, vandalism or violates copyright." is stronger than the policy being cited (which says should and caveats with ), and that policy as written contradicts other key policies and guidelines and daily practice. For instance, we delete information when it's outdated (WP:MEDDATE, but also common in other science articles), overly detailed (WP:summary style), information discouraged by Wikipedia:What Wikipedia is not, and I'm probably overlooking others. Starting from a clean slate is common when wanting to meet FA criteria with their focus on HQRS. —Femke 🐦 (talk) 12:37, 25 November 2023 (UTC)
- Your "caveats with" appears to be missing a quote. :-) — SMcCandlish ☏ ¢ 😼 13:08, 25 November 2023 (UTC)
- Apologies. The quote: "If you think an article needs to be rewritten or changed substantially, go ahead and do so, but it is best to leave a comment about why you made the changes on the article's talk page." —Femke 🐦 (talk) 13:13, 25 November 2023 (UTC)
- WP:RETAIN is a policy, so it overrides WP:MEDDATE. I tried to get WP:FALSE upgraded to a guideline without success. So unfortunately, correcting outdated information relies on WP:IAR. There is no contradiction between WP:RETAIN and WP:Summary style; the latter can never be used to justify deletion of sourced material on the basis that it is "overly detailed"; rather, it recommends the creation of or movement to a subarticle. Hawkeye7 (discuss) 19:22, 25 November 2023 (UTC)
- Although WP:Summary can justify deleting sourced material if such content is already contained in another article, usually a sub-article. Onetwothreeip (talk) 02:57, 26 November 2023 (UTC)
- Your "caveats with" appears to be missing a quote. :-) — SMcCandlish ☏ ¢ 😼 13:08, 25 November 2023 (UTC)
- In your essay, the sentence "Material must be preserved unless it is unsourced, libelous, patent nonsense, vandalism or violates copyright." is stronger than the policy being cited (which says should and caveats with ), and that policy as written contradicts other key policies and guidelines and daily practice. For instance, we delete information when it's outdated (WP:MEDDATE, but also common in other science articles), overly detailed (WP:summary style), information discouraged by Wikipedia:What Wikipedia is not, and I'm probably overlooking others. Starting from a clean slate is common when wanting to meet FA criteria with their focus on HQRS. —Femke 🐦 (talk) 12:37, 25 November 2023 (UTC)
- I've discussed the three different methods of reducing page size in an essay. Hawkeye7 (discuss) 21:28, 24 November 2023 (UTC)
- Added to the table. Hawkeye7 (discuss) 18:38, 24 November 2023 (UTC)
- Probably because length cuts in two directions, maintenance-wise: A long article has more content to check, but updating multiple articles is harder than just one. Jo-Jo Eumerus (talk) 18:26, 24 November 2023 (UTC)
- These tables may be a good faith effort to determine the common views regarding article size, but seem quite biased. Who are these "some editors"? I haven't seen anybody ever advocate that there should be no guidelines at all regarding size, or any editors advocating for Wikipedia to be comprised of very short articles. Editors generally agree that there is compromise between articles being comprehensive and being accessible (as in readable), it's not as though some editors want articles to be readable and others want them to be comprehensive. As for the technical issues, this comprises much more than the page size limit of 2 million bytes, such as downloading and loading speeds, displaying particularly on mobile devices, and editing particularly with visual editor. Onetwothreeip (talk) 02:55, 26 November 2023 (UTC)
- There are editors advocating the abolition of the guidelines regarding size; it is not generally agreed that there is compromise between articles being comprehensive and being accessible (which is not supported by the studies); and it has been repeatedly pointed out that downloading and loading speeds have nothing to do with readable prose size. Hawkeye7 (discuss) 04:03, 26 November 2023 (UTC)
- I'm sure you would agree that all of Wikipedia should not exist in one article. While such an article would certainly be comprehensive, it would not be accessible. Likewise, each article should not be as small as one sentence, despite being very easy to read. Regarding the abolition of guidelines, I'll address that in my response to Tpbradbury. Onetwothreeip (talk) 06:30, 26 November 2023 (UTC)
- @Onetwothreeip, editors have said above: "I'd vote to delete this guideline", "if there is any guideline that should be the first to fall victim to IAR, it is this one". "The first step should be to remove any word-count/prosesize-based limit from this guideline." You're right that no one is advocating WP be comprised of very short articles, Tom B (talk) 04:05, 26 November 2023 (UTC)
- Tom B, that was me, and it was a bit of an extreme comment taken alone, so let's look at the rest of that sentence: "I'd vote to delete this guideline or severely redefine and restrict its use." I, of course, favor redefinition and restriction, rather than total deletion. I have written my thoughts in a section below about "accessibility". -- Valjean (talk) (PING me) 17:40, 26 November 2023 (UTC)
- They would still support some guideline, whether written or not, which would relate to the size of the article. It may not be explicit, and they may not like the current written guidelines, but obviously there are no editors who would realistically say that it would be fine for a single article to be millions of words long. Onetwothreeip (talk) 06:33, 26 November 2023 (UTC)
- I think one caveat is that "the article combines too many unrelated topics" is an objection that you could expect being applied to many lengthy articles, but it's not (necessarily) about the size, nor can it be defined as a size issue. The caveat with technical issues is that a lot of people confuse the size of the page (which is often dominated by images) with the size of the prose (which isn't). Jo-Jo Eumerus (talk) 08:14, 26 November 2023 (UTC)
- @Onetwothreeip, if editors say I want to delete this guideline, it is not obvious they would support even an unwritten guideline. It would make sense for guidelines to be written? We have different estimates of good comprehensiveness, it will be different for each article, reader, editor. Many readers just need the lead, where 300 words is the suggested total. For the full Churchill article, 1k words isn't comprehensive, what's too comprehensive? Many think readability or quality, starts deterioriating 10-15k given the evidence, others aren't being explicit where they think it starts deteriorating, it seems they think tens of thousands of words are ok, which is difficult to navigate. Most agree 100,000, 90,000, 80,000 is too comprehensive. Down at 20,000 some will start disagreeing and saying we need add more words, to be comprehensive. There is a trade-off between comprehensiveness and readability? Tom B (talk) 12:56, 26 November 2023 (UTC)
- There is no trade off between comprehensiveness and readability. Articles are simply as large as they need to be. Summary style comes into play when a section become undue or is a subject that the readers would search for in its own right. Hawkeye7 (discuss) 19:22, 26 November 2023 (UTC)
- Of course there is a trade-off between readability and article length. If you need to spend longer to find the main points of a text because there is more text, you're more likely to abort or drift off halfway, and have a lower understanding of the material at hand. Readability and accessibility are intricately intertwined and there is a reason why it's included in various accessibility guidelines. —Femke 🐦 (talk) 19:30, 26 November 2023 (UTC)
- Even though useability is maybe a more correct term for this [22]: the ability of a reader to locate information. Too little information, and the reader had to go to another page. Too much information, and the reader cannot find the information among intricate details. —Femke 🐦 (talk) 19:39, 26 November 2023 (UTC)
- Have added the bit I think we all agree on from your essay Hawkeye: there is a trade-off between readability and unnecessary wordiness. I think the 5% in that essay may be a testament of good quality writing from MILHIST, I think I often achieve 10% in climate change related articles. —Femke 🐦 (talk) 20:29, 26 November 2023 (UTC)
- The summary style guideline is one about comprehensiveness and readability. Onetwothreeip (talk) 20:57, 26 November 2023 (UTC)
- Hawkeye is right. Readability is about structure and writing style, and is unrelated to comprehensiveness. Wikipedia has a readability problem but that's despite length guidelines; we're misstating the problem and failing to fix it.
- We need to make sure we understand readers and don't project our preferences onto them. I found two studies: 2017 and 2019. Roughly a third of visits involve looking for specific facts, another third are for in-depth information, and another third are for an overview. That's visits, not visitors, so it can't be said that most users don't care about in-depth information. And the assumption that looking for specific facts involves digging into the body, or is impacted by length, is pretty tenuous. DFlhb (talk) 20:36, 29 November 2023 (UTC)
- There is no trade off between comprehensiveness and readability. Articles are simply as large as they need to be. Summary style comes into play when a section become undue or is a subject that the readers would search for in its own right. Hawkeye7 (discuss) 19:22, 26 November 2023 (UTC)
- There are editors advocating the abolition of the guidelines regarding size; it is not generally agreed that there is compromise between articles being comprehensive and being accessible (which is not supported by the studies); and it has been repeatedly pointed out that downloading and loading speeds have nothing to do with readable prose size. Hawkeye7 (discuss) 04:03, 26 November 2023 (UTC)
References
- ^ "European Journal of Futures Research". SpringerOpen. May 20, 2013. Retrieved November 26, 2023.
- ^ "Information for Authors". academic.oup.com. Oxford University Press. Retrieved November 26, 2023.
- ^ "Manuscript Submission Guidelines: AERA Open: Sage Journals". Sage Journals. January 1, 2023. Retrieved November 26, 2023.
- ^ "Early Modern Women: An Interdisciplinary Journal: Instructions for authors". Early Modern Women: An Interdisciplinary Journal. November 17, 2019. Retrieved November 26, 2023.
- ^ "Development and Change". OnlineLibrary.Wiley.com. Wiley. doi:10.1111/(issn)1467-7660. ISSN 0012-155X.
- ^ "Submissions". Global Labour Journal. February 3, 2022. Retrieved November 26, 2023.
- ^ "BGSU SSCI Journal Publishing Guide" (PDF). Retrieved November 26, 2023.
- ^ "Guide for authors". ScienceDirect.com by Elsevier. January 6, 2016. Retrieved November 26, 2023.
Lead size
Could someone remind me where the 300-word lead size on Featured articles came from? [23] I've looked at "my own" (and I've seen much longer, eg climate change), and they're all around 400 (Tourette syndrome, Dementia with Lewy bodies, J. K. Rowling, Samuel Johnson). I suspect the 300 is heavily influenced by short articles like hurricanes and pop culture, and am concerned that stating the 300 as fact without a range or more qualifiers, we'll see it misused to imply adequate leads are too long. I suspect the range on lead size is dependent on topic, more technical articles have longer leads in order to adequately summarize the content. SandyGeorgia (Talk) 12:52, 27 November 2023 (UTC)
- An an example of how the average is skewed by highly represented FAs, look at the number of Featured articles in Wikipedia:Featured articles#Meteorology and climate, and glance at their lead sizes, and then contrast that to the number of articles in Wikipedia:Featured articles#Health and medicine, and look at their leads. The size of the lead is more appropriately governed at WP:LEAD, and more usefully measured as a percentage of the article size. SandyGeorgia (Talk) 13:11, 27 November 2023 (UTC)
- @SandyGeorgia, 300 was added in Jan 2023 [24] "Calculated from last month's TFAs in [25]". 300 was actually the average, the range appears to be usually 200-400 Tom B (talk) 14:01, 27 November 2023 (UTC)
- Ah, I see ... we should be very cautious about using one-month's TFA data or an average without better qualifiers. What's left to run TFA isn't highly representative, either, since we're running out of TFA material. I'm unsure how we can better word this, but I'm also not sure why we need to get into lead size on this page, when WP:LEAD is the page governing leads. If we do mention it here, it needs much better qualifiers than just a one-month TFA average; for example, there are no medical FAs left to run TFA, but plenty of hurricanes. SandyGeorgia (Talk) 14:24, 27 November 2023 (UTC)
- WP:LEAD had the 300 on it. i've amended it there to: "Most Featured articles have a lead length of about three paragraphs, containing 12 to 15 sentences, or 200–400 words". yes that page governs, but useful to pull across all the length guidelines to here? i.e. total, lead, non-lead. on the one-month average point, i've found very limited, simple evidence can often be enough, particularly when compared with no evidence! Tom B (talk) 17:01, 27 November 2023 (UTC)
- As MOS:Lead is the main location, let's discuss further at Wikipedia_talk:Manual_of_Style/Lead_section#FA_numbers —Femke 🐦 (talk) 17:37, 27 November 2023 (UTC)
- See also User:WhatamIdoing/Sandbox#WPMED FAs. Nobody who knows my editing will be surprised to see that my numbers aren't based on pop culture articles.
;-)
I do think that a range of 250–400 would be fine. 200 is not unheard of, but it is lower than normal. WhatamIdoing (talk) 18:49, 27 November 2023 (UTC)- Is there a way to get more content editors involved here. .... as in prolific FA article writers. Do we have stats to find people? Moxy- 21:40, 27 November 2023 (UTC)
- I doubt that any good FA writer spends a lot of time counting words (or sentences). If we want these kinds of numbers, I think it would be more effective to see what the end results are. WhatamIdoing (talk) 22:30, 27 November 2023 (UTC)
- Sure don't ... this fixation on word/sentence counts in FA leads is a bit troubling, as it overlooks the overarching points of WP:LEAD, which covers the territory quite well. But the way to hear from more FA writers is to post to WT:FAC. SandyGeorgia (Talk) 22:50, 27 November 2023 (UTC)
- I have six pieces of featured content, with an average lead of 345 words. These range from Ai-Khanoum with 484 to Boukephala and Nikaia with 199. I have just finished rewriting Genghis Khan, which I hope to take to FA in the near future; this has a lead of 587 words, as you would expect for a pivotal figure in world history. As a rule, I think more about how long a lead should be to properly summarize an article, rather than word counts. ~~ AirshipJungleman29 (talk) 23:06, 27 November 2023 (UTC)
- I have successfully nominated 63 articles at FAC in the past five years. I have no idea what my shortest or longest leads are, nor their mean, mode or median. And I have no interest in finding out. I can only quote Sandy "this fixation on word/sentence counts in FA leads is a bit troubling, as it overlooks the overarching points of WP:LEAD, which covers the territory quite well." I am also an FAC coordinator, and wearing that hat I care - if it is possible - even less about word counts: if a lead fits WP:LEAD, fine; if it doesn't, not fine.
- There seems to be an element of a solution in search of a problem about this discussion. Just what is it that is considered to be "broke"? Gog the Mild (talk) 23:26, 27 November 2023 (UTC)
- Bingo! -- Valjean (talk) (PING me) 23:50, 27 November 2023 (UTC)
- Gog, we've got two problems:
- The first is that we previously recommended a paragraphs:article length ratio, which is pretty silly. You could make an article "comply" or "violate" the advice by just adding or removing a line break. That's not a really way to improve a lead.
- The second is that not everyone is a good writer, and we want to give folks a basic handle on what a typical result is. The statement is not "Your lead should be n words long"; it reports only the fact that a lot of well-written leads end up being approximately this size. The idea is that if you don't really know what you're doing, you'll be able to figure out if yours is significantly different from typical. This isn't really aimed at the FAC process, but at articles like Donald Trump on the long side, whose lead is presently 7 paragraphs, 700 words, and about twice as long as the leads for all the other modern US presidents, and at the many articles with very brief, even single-sentence, leads.
- Our best writers don't need this. It's a crutch to help those who are just learning how to write. WhatamIdoing (talk) 00:45, 28 November 2023 (UTC)
- I'm still trying to figure out what Moxy's question was about, in a section started by a person who has read several thousand FACs and promoted a thousand or so FAs :) Maybe they were looking for WP:WBFAN, so they wouldn't have to take my word for it :) A "crutch" may be a start, but for those who may or may not understand the ranges and complexity and how to interpret a guideline, when guideline pages get too WP:CREEPy, they tend to be misused by those who take them literally, and we still can't base statements about what most FA leads look like based on a one-month sample of TFA. Some types of articles have longer leads than others. SandyGeorgia (Talk) 07:59, 28 November 2023 (UTC)
- Gog, we've got two problems:
- Bingo! -- Valjean (talk) (PING me) 23:50, 27 November 2023 (UTC)
- I have six pieces of featured content, with an average lead of 345 words. These range from Ai-Khanoum with 484 to Boukephala and Nikaia with 199. I have just finished rewriting Genghis Khan, which I hope to take to FA in the near future; this has a lead of 587 words, as you would expect for a pivotal figure in world history. As a rule, I think more about how long a lead should be to properly summarize an article, rather than word counts. ~~ AirshipJungleman29 (talk) 23:06, 27 November 2023 (UTC)
- Sure don't ... this fixation on word/sentence counts in FA leads is a bit troubling, as it overlooks the overarching points of WP:LEAD, which covers the territory quite well. But the way to hear from more FA writers is to post to WT:FAC. SandyGeorgia (Talk) 22:50, 27 November 2023 (UTC)
- I doubt that any good FA writer spends a lot of time counting words (or sentences). If we want these kinds of numbers, I think it would be more effective to see what the end results are. WhatamIdoing (talk) 22:30, 27 November 2023 (UTC)
- Is there a way to get more content editors involved here. .... as in prolific FA article writers. Do we have stats to find people? Moxy- 21:40, 27 November 2023 (UTC)
- WP:LEAD had the 300 on it. i've amended it there to: "Most Featured articles have a lead length of about three paragraphs, containing 12 to 15 sentences, or 200–400 words". yes that page governs, but useful to pull across all the length guidelines to here? i.e. total, lead, non-lead. on the one-month average point, i've found very limited, simple evidence can often be enough, particularly when compared with no evidence! Tom B (talk) 17:01, 27 November 2023 (UTC)
- Ah, I see ... we should be very cautious about using one-month's TFA data or an average without better qualifiers. What's left to run TFA isn't highly representative, either, since we're running out of TFA material. I'm unsure how we can better word this, but I'm also not sure why we need to get into lead size on this page, when WP:LEAD is the page governing leads. If we do mention it here, it needs much better qualifiers than just a one-month TFA average; for example, there are no medical FAs left to run TFA, but plenty of hurricanes. SandyGeorgia (Talk) 14:24, 27 November 2023 (UTC)
- My intuition is that a relative word count is likely a more relevant metric than an absolute word count (see e.g. my previous comments at Talk:Édith Piaf/GA1 and Talk:Bellona's Husband: A Romance/GA2). Because of this, as well as out of curiosity, I took a look at my three WP:Featured articles (specifically, I copied the leads to https://wordcounter.net/ and got the full word counts from https://xtools.wmcloud.org/articleinfo – I'm not sure, but I think that the lead is included in the WP:XTools word count but e.g. image captions and headings are not). Mars in fiction has a 414-word lead, and the entire article is 8,336 words (5.0%). Venus in fiction has a 271-word lead, and the entire article is 4,006 words (6.8%). Sun in fiction has a 448-word lead, and the entire article is 3,304 words (13.6%). I also took a look at my current WP:Featured article candidate: George Griffith, where the lead is 528 words and the entire article is 5,704 words (9.3%). So these articles seem to have leads with roughly 5–15% of the entire article's word count, which is a fairly large span. To me, this indicates that the figures themselves aren't really all that important (seeing as I also don't think that my relatively lengthier leads need to be shorter or the relatively longer ones need to be shorter). The qualitative aspects are more important than the quantitative ones here, as noted above. Word counts are occasionally useful to illustrate that a lead is way too long or way too short, but shouldn't be viewed as targets in themselves lest we fall victim to Goodhart's law. TompaDompa (talk) 00:13, 28 November 2023 (UTC)
I agree with TompaDompa: "The qualitative aspects are more important than the quantitative ones here, as noted above. Word counts are occasionally useful to illustrate that a lead is way too long or way too short, but shouldn't be viewed as targets in themselves lest we fall victim to Goodhart's law:
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.
Applied to the current topic, it means that even the best leads will be damaged and not serve their purpose as well when revised after focusing on word count. No, we need to think differently about this and use other metrics, and they are summarized in the nutshell at WP:LEAD: "The lead should identify the topic and summarize the body of the article with appropriate weight."
Here is a section from my essay How to create and manage a good lead section: (not for list articles)
This rule of thumb will ensure the lead covers all significant subject matter in the article:
If a subject is worth a whole section, then it deserves short mention in the lead according to its real due weight.
That due weight should also include careful consideration of the real weight of sections that summarize child articles. Those sections have much more weight than their visible size. Their weight is equal to the weight of the child article(s).
If we do not follow that equation, then POV warriors can successfully hide negative material away from many readers' notice by spinning it off and leaving a small section which is then viewed as not worthy of mention in the lead. That must not happen. It should still be mentioned in the lead according to its real due weight.
There should not be anything in the lead that does not refer to specific content in the article and is not backed up by specific references found in the article. There should not be any unnecessary elaboration or detail in the lead. Elaboration should be reserved for the body of the article. Remember to awaken the reader's interest without satisfying their hunger.
A lead written this way will stand on its own, and someone who reads it will not later be surprised by anything they find in the article or what someone else tells them about the topic. They should know because they read our article. -- Valjean (talk) (PING me) 03:36, 28 November 2023 (UTC)
- Even "See also"? "In popular culture"? "List of publications"? Hawkeye7 (discuss) 04:03, 28 November 2023 (UTC)
- No. That should be clarified better than the "(not for list articles)" above. I'm speaking of actual thematic content. Do you have a suggestion? -- Valjean (talk) (PING me) 04:20, 28 November 2023 (UTC)
- If you want to test your advice against an edge case, consider Talk:Hallowe'en Party#RfC on mention of film adaptation in the lead. WhatamIdoing (talk) 05:03, 28 November 2023 (UTC)
- I left my opinion in that RfC. -- Valjean (talk) (PING me) 07:04, 28 November 2023 (UTC)
- If you want to test your advice against an edge case, consider Talk:Hallowe'en Party#RfC on mention of film adaptation in the lead. WhatamIdoing (talk) 05:03, 28 November 2023 (UTC)
- Valjean, most of that is very sensible in my view (modulo Hawkeye7's point about sections that aren't really what you call thematic content). — SMcCandlish ☏ ¢ 😼 23:45, 29 November 2023 (UTC)
- No. That should be clarified better than the "(not for list articles)" above. I'm speaking of actual thematic content. Do you have a suggestion? -- Valjean (talk) (PING me) 04:20, 28 November 2023 (UTC)
- The section->lead due weight assessment is a good one. I was advised that when I started editing and it has served well since then. That also provides a rough feel between article content and lead length, which is better than a 300 word limit. CMD (talk) 04:24, 28 November 2023 (UTC)
- Yes, a content-rich and complex section might deserve 1-3 whole sentences in the lead. An insignificant section might deserve two words. Each section deserves some form of mention in the lead.
- The general guide of 3-4 paragraphs is a rough minimum for normal-length, relatively uncomplicated and uncontroversial articles. If an article is very long because it is extremely notable, widely covered, very controversial, and is very significant using many different parameters, the body will have many sections and the lead will reflect that with a length of 5-7 paragraphs, and that would be the proper functional lead length the topic deserves. A short lead would not be able to serve the function properly. -- Valjean (talk) (PING me) 04:51, 28 November 2023 (UTC)
- I don't recall ever coming across a well-written lead with seven paragraphs. SandyGeorgia (Talk) 04:57, 28 November 2023 (UTC)
- I think I encountered one once in an article that might justify a long lead, but I'm not sure it was optimally written. -- Valjean (talk) (PING me) 06:54, 28 November 2023 (UTC)
- Four paragraphs has traditionally been considered the maximum. I don't ever remember seeing an FA or GA with more than four paragraphs. Most of them don't even have four. WhatamIdoing (talk) 06:59, 28 November 2023 (UTC)
- There are more than a few FAs with five-paragraph leads; I've already given one sample in these discussions (climate change). And Introduction to viruses and Chagas disease and Subarachnoid hemorrhage in the med dep't. As a contrast, view India's four-paragraph lead which would be better written as five, but has been artificially constrained to meet this imaginary four limit. (India was noticed as needing a Featured article review three years ago.) Are leads getting too long? I think so, but think I think articles are too long, so of course leads are growing as well. SandyGeorgia (Talk) 07:05, 28 November 2023 (UTC)
- There might be situations where I'd choose a properly written lead over FA or GA status any day. I don't get this obsession with FA and GA. That should never get in the way of other legitimate objectives. Ideally, we should be able to do both, but if achieving FA or GA means dumbing down an article or failing to document the sum of all human knowledge on the subject, then forget about FA and GA and do what we are supposed to do. FA and GA are not the ultimate purpose of Wikipedia, and they should not be stumbling block. The ones evaluating for FA or GA status should loosen up their lead length criteria and allow longer leads when justified. -- Valjean (talk) (PING me) 07:11, 28 November 2023 (UTC)
- FA and GA reinforce the wider guidelines and MOS, rather than do anything specific by themselves. I don't think I've seen dumbing down as a consistent issue there, although there are some rewrites to deal with WP:TECHNICAL. Documenting the sum of human knowledge is something FAC often assists with, finding holes in coverage for nominated articles. CMD (talk) 07:18, 28 November 2023 (UTC)
- Well, that's good. Maybe it's just the "lead" issue that needs fixing. -- Valjean (talk) (PING me) 07:22, 28 November 2023 (UTC)
- What needs fixing? Guidelines are guidelines, they are interpreted as such at FAC, and there's not necessarily anything wrong with a five-paragraph lead. India has needed a WP:FAR for three years, so it's not representative. SandyGeorgia (Talk) 07:31, 28 November 2023 (UTC)
- Exactly. Guidelines are rubbery, and that's on purpose. We should never force articles to stay at the middle of the bell curve. Some articles should be outliers without being penalized for it. -- Valjean (talk) (PING me) 07:35, 28 November 2023 (UTC)
- I had thought that generally understood and agreed. Gog the Mild (talk) 08:45, 28 November 2023 (UTC)
- I think the problem (some of us/me) are having with the text being introduced here is that, while FAC/FAR Coords are experienced writers who know how to interpret a guideline (that is, what's on this page isn't going to change anything at FAC/FAR), we know hard data in guideline pages can be misinterpreted by editors who aren't exposed to work at the FA level, and those are the very editors who frequently misinterpret pages like these ... so while we're basing numbers on FAs, we should be explaining in better detail to those not accustomed to working at that level either how to use these numbers, or we should avoid the creep entirely. We will see editors say, "lead exceeds 400 words, too long", so we should anticipate that our shorthand might not serve those we intend to reach with this page. SandyGeorgia (Talk) 16:10, 28 November 2023 (UTC)
- So "most are 250–400 words long, but some are longer or shorter"? It irritates me to be reminded that there are people who don't understand that "most" means "not all", but it's a fact that some people are completely innumerate. WhatamIdoing (talk) 17:06, 28 November 2023 (UTC)
- No ... a better qualifier ... something like "varies by content area" or some such ... anything else that can be added to remind not to apply the numbers as absolutes. SandyGeorgia (Talk) 17:08, 28 November 2023 (UTC)
- We have no actual evidence that it varies by content area, aside from the fact that clicking semi-randomly on FAs about hurricanes this morning produced a sample set of five articles, 100% of which had two paragraphs in the lead. WhatamIdoing (talk) 17:11, 28 November 2023 (UTC)
- Well, then, we have no actual evidence of anything but random samples which could be outliers, so we should be REALLY careful about adding this content at all. SandyGeorgia (Talk) 17:24, 28 November 2023 (UTC)
- If you run some descriptive statistics on a sample, and it has a 5% chance of being an outlier, and you repeat that with a second sample, also with a 5% chance of being an outlier, and they get the same results, then the odds of both of them being outliers is just 0.25% (a one in 400 chance). If you run a third 5% sample and get the same results, the odds decrease to 0.0125% (a one in 8,000 chance).
- I haven't calculated the p-value for the samples we've run, but all of them are getting the same results. You probably don't need to worry about them being outliers. WhatamIdoing (talk) 22:12, 28 November 2023 (UTC)
- Well, then, we have no actual evidence of anything but random samples which could be outliers, so we should be REALLY careful about adding this content at all. SandyGeorgia (Talk) 17:24, 28 November 2023 (UTC)
- We have no actual evidence that it varies by content area, aside from the fact that clicking semi-randomly on FAs about hurricanes this morning produced a sample set of five articles, 100% of which had two paragraphs in the lead. WhatamIdoing (talk) 17:11, 28 November 2023 (UTC)
- No ... a better qualifier ... something like "varies by content area" or some such ... anything else that can be added to remind not to apply the numbers as absolutes. SandyGeorgia (Talk) 17:08, 28 November 2023 (UTC)
- So "most are 250–400 words long, but some are longer or shorter"? It irritates me to be reminded that there are people who don't understand that "most" means "not all", but it's a fact that some people are completely innumerate. WhatamIdoing (talk) 17:06, 28 November 2023 (UTC)
- I think the problem (some of us/me) are having with the text being introduced here is that, while FAC/FAR Coords are experienced writers who know how to interpret a guideline (that is, what's on this page isn't going to change anything at FAC/FAR), we know hard data in guideline pages can be misinterpreted by editors who aren't exposed to work at the FA level, and those are the very editors who frequently misinterpret pages like these ... so while we're basing numbers on FAs, we should be explaining in better detail to those not accustomed to working at that level either how to use these numbers, or we should avoid the creep entirely. We will see editors say, "lead exceeds 400 words, too long", so we should anticipate that our shorthand might not serve those we intend to reach with this page. SandyGeorgia (Talk) 16:10, 28 November 2023 (UTC)
- I had thought that generally understood and agreed. Gog the Mild (talk) 08:45, 28 November 2023 (UTC)
- Exactly. Guidelines are rubbery, and that's on purpose. We should never force articles to stay at the middle of the bell curve. Some articles should be outliers without being penalized for it. -- Valjean (talk) (PING me) 07:35, 28 November 2023 (UTC)
- What needs fixing? Guidelines are guidelines, they are interpreted as such at FAC, and there's not necessarily anything wrong with a five-paragraph lead. India has needed a WP:FAR for three years, so it's not representative. SandyGeorgia (Talk) 07:31, 28 November 2023 (UTC)
- Well, that's good. Maybe it's just the "lead" issue that needs fixing. -- Valjean (talk) (PING me) 07:22, 28 November 2023 (UTC)
- FA and GA reinforce the wider guidelines and MOS, rather than do anything specific by themselves. I don't think I've seen dumbing down as a consistent issue there, although there are some rewrites to deal with WP:TECHNICAL. Documenting the sum of human knowledge is something FAC often assists with, finding holes in coverage for nominated articles. CMD (talk) 07:18, 28 November 2023 (UTC)
- There might be situations where I'd choose a properly written lead over FA or GA status any day. I don't get this obsession with FA and GA. That should never get in the way of other legitimate objectives. Ideally, we should be able to do both, but if achieving FA or GA means dumbing down an article or failing to document the sum of all human knowledge on the subject, then forget about FA and GA and do what we are supposed to do. FA and GA are not the ultimate purpose of Wikipedia, and they should not be stumbling block. The ones evaluating for FA or GA status should loosen up their lead length criteria and allow longer leads when justified. -- Valjean (talk) (PING me) 07:11, 28 November 2023 (UTC)
- There are more than a few FAs with five-paragraph leads; I've already given one sample in these discussions (climate change). And Introduction to viruses and Chagas disease and Subarachnoid hemorrhage in the med dep't. As a contrast, view India's four-paragraph lead which would be better written as five, but has been artificially constrained to meet this imaginary four limit. (India was noticed as needing a Featured article review three years ago.) Are leads getting too long? I think so, but think I think articles are too long, so of course leads are growing as well. SandyGeorgia (Talk) 07:05, 28 November 2023 (UTC)
- Four paragraphs has traditionally been considered the maximum. I don't ever remember seeing an FA or GA with more than four paragraphs. Most of them don't even have four. WhatamIdoing (talk) 06:59, 28 November 2023 (UTC)
- Re: "I don't recall ever coming across a well-written lead with seven paragraphs." – But "paragraph" isn't a meaningful measurement, since it has no fixed or even approximate number of sentence or words or characters or bytes. Any four-paragraph lead can be turned into a seven-paragraph one (or vice versa) simply by fiddling with line breaks, without changing a single letter of the content or any of its order. Given that there's mounting Web-usability material that recommends shorter paragraphs in online reading material as a means of better reader attention and retention, it's likely that organization of leads will trend toward more paragraphs of fewer sentences each.I've actually noticed this over time as a reader (and not just in leads), and noticed it in my own lead-writing. Early on, I avoided introducing paragraph changes unless the focus of the material had papably shifted, which produced what today would be considered over-long paragraphs (more like late Victorian to 1920s writing, honestly; I've been reading a lot of material of that sort lately, and keep being struck by how long the paragraphs are and how similar they are to how I was writing in the 2000s). These days, I introduced para breaks much more frequently, and try to lead each with a catchy sentence, with following sentences in the same para. relating at least indirectly to material in the first, and sometimes rewritten to more clearly do so.The opposite extreme, of making individual sentences into "micro-paragraphs", should also be avoided, except for a long and complex sentence, which is usually better split up into multiple sentences in the first place. — SMcCandlish ☏ ¢ 😼 23:45, 29 November 2023 (UTC)
- I think I encountered one once in an article that might justify a long lead, but I'm not sure it was optimally written. -- Valjean (talk) (PING me) 06:54, 28 November 2023 (UTC)
- I don't recall ever coming across a well-written lead with seven paragraphs. SandyGeorgia (Talk) 04:57, 28 November 2023 (UTC)
Further, I'm not sure what the investment in getting someone to produce the longest article of each year is going to result in, other than a) articles which have since been defeatured, and b) outliers (all the Dynasty articles, and some others since defeatured). We're asking someone at WP:VPT to take a lot of time to produce something I can already tell you, and which will be somewhat meaningless because it by definition looks at outliers. And it appears that we now have very short articles coming through FAC, which is also perhaps an oddity, perhaps temporary. SandyGeorgia (Talk) 17:28, 28 November 2023 (UTC)
- I believe that his goal is to get numbers from this decade. WhatamIdoing (talk) 22:14, 28 November 2023 (UTC)
- Hi @SandyGeorgia My hypothesis was the longest promoted article had gradually been declining from 15,000 due to quality improvement. If it has been then this might be an argument to reduce the size guideline to this metric. In fact data suggests it was stable at 15,000, due to the guideline and may still be stable? Please add your stats to this time series! The data you linked does not appear to have this particular time series in it. There may be outliers or not. Your definition is average size, but the longest size time series may still be meaningful. 15,000 would imply the guideline is limiting promotion size, and therefore would not be meaningless, Tom B (talk) 13:06, 29 November 2023 (UTC)
- Tpbradbury, I questioned what useful information we would get from having someone do the work to discover the largest for each of year, a) because we shouldn't be making decisions based on outliers anyway, b) the longest by year is not meaningful data (it's editor driven outliers, not guideline driven), and c) because I doubt that your hypothesis will be proven true, simply because of what is already known about our longest FAs. The very long articles getting through FAC-- historically and today still-- are outliers that are generally driven by individual editors writing on specific topics, and as outliers, have little relation to overall trends. The historic data linked on the talk page of WP:FAS is unlikely to be any different from what you will find today, so it seemed a lot of work to go through to find something we probably already know. On the talk page of WP:FAS, you'll see:
- the author of Ketuanan Melayu wrote very long articles; they are almost all defeatured today
- the early Dynasty articles were all written by one editor, and no one has (yet) challenged that length at FAR; they continued writing very long articles, and because their writing is competent, reviewers appear reluctant to challenge the length.
- Similar to Cleopatra, there are some very long bios getting through without length complaints (Elvis Presley, Bob Dylan, and lately a lot of presidents, often from the same main author, which are being challenged at FAR), and that trend hasn't changed.
- And the other group of very long articles are those written by Hawkeye7 on Military history topics (several mentioned in this discussion, eg the logistics article from this year, and Hawkeye7 is in favor of very long articles).
- It is specific editors who historically write and still write long articles; no one wants to challenge their work, as they are generally competent writers, but these individual authors' works aren't representative of trends. I don't see what we will gain in terms of what to write on this page by talking about these outliers. In fact, if there is a trend, it's probably towards shorter FAs, as FAC increasingly becomes more like GAN, and articles that would not have come to FAC in the past now do, and are passed. SandyGeorgia (Talk) 22:50, 29 November 2023 (UTC)
- That might be a statistical factor, but I have a history of writing short articles (just long enough to get the job done and before my sourcing patience wears out), but for one topic close to my heart I've produced a very long article (briefly the longest non-list article, and still in the top 20), and am in slow and painstaking process of splitting it up. Meanwhile, I would bet that various of the "long article writers" you mention by name have also produced some short ones. I'm skeptical that "individual editor blame" is going to be helpful here. — SMcCandlish ☏ ¢ 😼 23:45, 29 November 2023 (UTC)
- I agree the trend is for shorter FAs. If the largest articles now being promoted at FA are 12k, this might be an argument to drop the 15k evidence-free guideline to an evidence-based 10k. The largest articles are 2-4K above the guideline? I.e. articles would end up being 12-14k, like there are currently some at 19k? 10k is very simple, understandable and the same as journal articles,
- Tpbradbury, I questioned what useful information we would get from having someone do the work to discover the largest for each of year, a) because we shouldn't be making decisions based on outliers anyway, b) the longest by year is not meaningful data (it's editor driven outliers, not guideline driven), and c) because I doubt that your hypothesis will be proven true, simply because of what is already known about our longest FAs. The very long articles getting through FAC-- historically and today still-- are outliers that are generally driven by individual editors writing on specific topics, and as outliers, have little relation to overall trends. The historic data linked on the talk page of WP:FAS is unlikely to be any different from what you will find today, so it seemed a lot of work to go through to find something we probably already know. On the talk page of WP:FAS, you'll see:
- Hi @SandyGeorgia My hypothesis was the longest promoted article had gradually been declining from 15,000 due to quality improvement. If it has been then this might be an argument to reduce the size guideline to this metric. In fact data suggests it was stable at 15,000, due to the guideline and may still be stable? Please add your stats to this time series! The data you linked does not appear to have this particular time series in it. There may be outliers or not. Your definition is average size, but the longest size time series may still be meaningful. 15,000 would imply the guideline is limiting promotion size, and therefore would not be meaningless, Tom B (talk) 13:06, 29 November 2023 (UTC)
- My draft on lead length is still going on strong (even though it is stalled). You are welcome to check my detailed numbers. Regards, Thinker78 (talk) 00:11, 29 November 2023 (UTC)
New essay about "Factors that influence article size"
[[Wikipedia:Article size factors]]. Factors that influence article size. Feel free to improve. -- Valjean (talk) (PING me) 23:56, 28 November 2023 (UTC)
- You may want to consider a different name for that essay, as it seems to contradict with the current guidelines on article size. Onetwothreeip (talk) 09:12, 29 November 2023 (UTC)
- Essays do not have to be fully aligned with PAG. They often explore new ways of thinking and challenge our current PAG. Sometimes they end up becoming part of them and lead to their alteration. -- Valjean (talk) (PING me) 17:52, 29 November 2023 (UTC)
- That's fine, but we may want to change the name of it then. I don't have a particular alternative to propose though. Onetwothreeip (talk) 19:45, 29 November 2023 (UTC)
- What's wrong with the title? -- Valjean (talk) (PING me) 20:24, 29 November 2023 (UTC)
- I interpret the concern to be that it could be confused with the guideline WP:Article size. VQuakr (talk) 20:57, 29 November 2023 (UTC)
- That wouldn't be good. How about "Factors that influence article size"? -- Valjean (talk) (PING me) 16:19, 30 November 2023 (UTC)
- Personally, I think that's better. VQuakr (talk) 17:45, 30 November 2023 (UTC)
- Still very similar. We would have to alter the essay to be more inclusive of mainstream views. Onetwothreeip (talk) 19:22, 30 November 2023 (UTC)
- No, those who wish to push a different view are encouraged to create their own essays. That's why we have essays that take different and opposing positions. We do not change the initial position of an essay. We seek to strengthen its position through better wording and argumentation. If you wish to push a different view, then create your own essay. -- Valjean (talk) (PING me) 21:13, 30 November 2023 (UTC)
- Now updated to better title. -- Valjean (talk) (PING me) 21:17, 30 November 2023 (UTC)
- As long as you are fine with a "factors that influence article size" essay comprising broadly of factors that influence article size, and not only the ones you wish to include. Onetwothreeip (talk) 09:36, 1 December 2023 (UTC)
- I welcome the addition of other factors. -- Valjean (talk) (PING me) 17:00, 1 December 2023 (UTC)
- No, those who wish to push a different view are encouraged to create their own essays. That's why we have essays that take different and opposing positions. We do not change the initial position of an essay. We seek to strengthen its position through better wording and argumentation. If you wish to push a different view, then create your own essay. -- Valjean (talk) (PING me) 21:13, 30 November 2023 (UTC)
- That wouldn't be good. How about "Factors that influence article size"? -- Valjean (talk) (PING me) 16:19, 30 November 2023 (UTC)
- I interpret the concern to be that it could be confused with the guideline WP:Article size. VQuakr (talk) 20:57, 29 November 2023 (UTC)
- What's wrong with the title? -- Valjean (talk) (PING me) 20:24, 29 November 2023 (UTC)
- That's fine, but we may want to change the name of it then. I don't have a particular alternative to propose though. Onetwothreeip (talk) 19:45, 29 November 2023 (UTC)
- Essays do not have to be fully aligned with PAG. They often explore new ways of thinking and challenge our current PAG. Sometimes they end up becoming part of them and lead to their alteration. -- Valjean (talk) (PING me) 17:52, 29 November 2023 (UTC)
Avoiding an Rfc on kb limits
The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.
"RfCs are time consuming, and editor time is valuable. Editors should try to resolve their issues before starting an RfC....If you can reach a consensus, then there is no need to start an RfC." User VQuakr wants an RFC on kb limits, and has said "We've got a guideline that has referenced kB of readable prose for well over a decade; more than a half-dozen editors' involvement is warranted before changing." My understanding is there have been more than 6 editors involved and many have spent time explaining why the kb limits should be removed. I start this new section to get as many people involved as possible, in the hope we can avoid an RfC, but if there are good arguments to open one, then fair enough. How do resolve this as efficiently as possible? Tom B (talk) 08:28, 1 December 2023 (UTC)
- As long as the imperfect KB approximations are in the guideline, they will cause confusion, and should be removed. Entirely. We don't need an RFC; it's been discussed forever. SandyGeorgia (Talk) 16:45, 1 December 2023 (UTC)
- Indeed. I support removal. -- Valjean (talk) (PING me) 17:02, 1 December 2023 (UTC)
- +1. Removal of the kb values has been discussed for a long time. They are misleading. Jo-Jo Eumerus (talk) 17:07, 1 December 2023 (UTC)
- Indeed. I support removal. -- Valjean (talk) (PING me) 17:02, 1 December 2023 (UTC)
- The kilobyte guideline should remain, and the easiest way to refactor the table is to present the kilobytes measurements in terms of word count, as an estimated equivalent. Onetwothreeip (talk) 20:04, 1 December 2023 (UTC)
- Agree with Sandy. I would have supported a hard limit on the size of leads and yet despite the absence of one, there have been no problems. The kb limits only cause confusion and serve no real purpose. Hawkeye7 (discuss) 20:25, 1 December 2023 (UTC)
- Word count is a less confusing metric than kb because it's more clear that it refers to prose size. I don't agree with throwing out any guidance to keep the article size to a fixed length. The guidance is a necessary bulwark against the product of editors who write too much and don't use summary style. (t · c) buidhe 23:24, 1 December 2023 (UTC)
- Also agree, kB of readable prose is a confusing measure. The publishing industry uses word count, we have tools that calculate word count, that's what any guidelines should be written in terms of. Wasted Time R (talk) 00:56, 2 December 2023 (UTC)
- Also support. The kB values have always been confusing with the revision size also being in kB. Galobtter (talk) 14:22, 2 December 2023 (UTC)
Alternative proposal: include mark-up size
- Here is a proposal which retains the kilobyte size guideline, but frames it in equivalence to the prose guideline. The longstanding kilobyte guideline is useful for those who cannot use the various software tools, and articles for which these tools do not easily apply.
Readable prose size[a] | Estimated average markup size | What to do |
---|---|---|
> 15,000 words | > 100,000 bytes | Almost certainly should be divided or trimmed. |
> 9,000 words | > 60,000 bytes | Probably should be divided or trimmed, although the scope of a topic can sometimes justify the added reading material. |
> 8,000 words | > 50,000 bytes | May need to be divided or trimmed; likelihood goes up with size. |
< 6,000 words | < 40,000 bytes | Length alone does not justify division or trimming. |
< 150 words | < 1,000 bytes | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
|
Onetwothreeip (talk) 05:15, 2 December 2023 (UTC)
- @Onetwothreeip, won't someone who cannot use software tools, be unlikely to find kb useful, particularly when the word count, which everyone understands, is there? Surely it's the opposite of what you're saying? Tom B (talk) 07:01, 2 December 2023 (UTC)
- It's not at all obvious to me why we need to estimate kb sizes on the page. A link to a word counter like the one used on DYK is all what's needed. The problem of images still exist with your proposal. Jo-Jo Eumerus (talk) 07:29, 2 December 2023 (UTC)
- This proposal seems like a further example of the current readable prose in kB and markup size in kB getting confused. 15k words is much bigger than 100k kB markup size. —Femke 🐦 (talk) 07:50, 2 December 2023 (UTC)
- The markup size, measured in bytes, is easily the most accessible measure of a Wikipedia page's size, whereas the word count is not. Onetwothreeip (talk) 07:46, 2 December 2023 (UTC)
- It's not at all obvious to me why we need to estimate kb sizes on the page. A link to a word counter like the one used on DYK is all what's needed. The problem of images still exist with your proposal. Jo-Jo Eumerus (talk) 07:29, 2 December 2023 (UTC)
- Generally support removal of the kB stuff, per most of the above. I'm not utterly opposed to including a table column like the above, but it should not be "markup size"; we have a whole subthread up there somewhere with multiple explanations for why such a figure will not reliably relate to readable-prose size. And even if we kept a column for readable-prose kB size, it needs a footnote explaining both how to figure that out and that it is not a rule to enforce. — SMcCandlish ☏ ¢ 😼 07:36, 2 December 2023 (UTC)
- If those numbers are not accurately equivalent to markup size, then the numbers should be changed. I do not intend it to be equivalent to the byte size of the readable prose. For example, if the average 15,000-word article is 150kB, then that is what the "estimated average markup size" column should say. Onetwothreeip (talk) 07:49, 2 December 2023 (UTC)
- Markup size is totally irrelevant for the scope of this page. It's about length, and that is always measured by word count or bytes of prose, something that's very easy to find just by adding the relevant script to your profile. Furthermore, these guide figures are very useful for citing when articles come in too long (or too short) at FAC and elsewhere. We've discussed it repeatedly in the past and I see no case for removal. — Amakuru (talk) 08:56, 2 December 2023 (UTC)
- Amakuru: could you clarify what you mean by "no case for removal". That refers to the limits in general, right? Or do you refer to the second conversion column from word count into readable prose in kB? —Femke 🐦 (talk) 09:01, 2 December 2023 (UTC)
- Amakuru the argument here has been that we need the word count table but not the KB, which is an inaccurate approximation that is only confusing people. That is, this version is what is under discussion, with one editor opposing and reverting. SandyGeorgia (Talk) 15:05, 2 December 2023 (UTC)
- It may be your opinion that markup size should be irrelevant, but markup size and other non-word-count technical considerations are currently relevant to this guideline and always have been. Onetwothreeip (talk) 10:14, 2 December 2023 (UTC)
- It appears that no one but you thinks markup size is relevant, and multiple reasons, which you have not refuted, have been provided above for why they are not. — SMcCandlish ☏ ¢ 😼 10:28, 2 December 2023 (UTC)
- The inclusion of markup size is longstanding in the guideline and it is not my proposal to add it, because it's already there. It has nothing to do with my support. What I propose is changing how the current markup size guideline is described, to be in terms of the word count guideline. This would make it useful in situations where editors are not using specialised technical tools, and where those tools do not work. Onetwothreeip (talk) 11:36, 2 December 2023 (UTC)
- Onetwothreeip, you're badgering every other poster here. SandyGeorgia (Talk) 15:01, 2 December 2023 (UTC)
- User:SMcCandlish and User:SandyGeorgia, you're right. This is an example, repeated many times over the years, of longstanding IDHT behavior, and it's very destructive and prevents any change of this guideline. It won't stop until the editor is stopped. That's what their history on this guideline tells us. -- Valjean (talk) (PING me) 18:44, 2 December 2023 (UTC)
- Really? I was responding to someone who responded to me. They said I have not refuted something, so I responded to them refuting that thing. Onetwothreeip (talk) 20:52, 2 December 2023 (UTC)
- Valjean, that's more history and drama than I know anything about. I was just responding to the current/ongoing IDHT issue that after multiple editors at this page just over the last few days point out why using byte-size is "broken", nevertheless Onetwothreeip proposes to use byte size, then having those objections and ratioanles pointed out, Onetwothreeip has no substantive response about that just a re-iteration of a desire to use byte size no matter what. — SMcCandlish ☏ ¢ 😼 00:12, 3 December 2023 (UTC)
- Onetwothreeip, you're badgering every other poster here. SandyGeorgia (Talk) 15:01, 2 December 2023 (UTC)
- The inclusion of markup size is longstanding in the guideline and it is not my proposal to add it, because it's already there. It has nothing to do with my support. What I propose is changing how the current markup size guideline is described, to be in terms of the word count guideline. This would make it useful in situations where editors are not using specialised technical tools, and where those tools do not work. Onetwothreeip (talk) 11:36, 2 December 2023 (UTC)
- It appears that no one but you thinks markup size is relevant, and multiple reasons, which you have not refuted, have been provided above for why they are not. — SMcCandlish ☏ ¢ 😼 10:28, 2 December 2023 (UTC)
- Amakuru: could you clarify what you mean by "no case for removal". That refers to the limits in general, right? Or do you refer to the second conversion column from word count into readable prose in kB? —Femke 🐦 (talk) 09:01, 2 December 2023 (UTC)
- "I do not intend it to be equivalent to the byte size of the readable prose" means a proposal to do what you want has failed right from the start, since multiple editors already object to using markup byte size, and no one is in support of the idea. — SMcCandlish ☏ ¢ 😼 09:56, 2 December 2023 (UTC)
- User:SMcCandlish, you're right. This is an example, repeated many times over the years, of longstanding IDHT behavior, and it's very destructive and prevents any change of this guideline. -- Valjean (talk) (PING me) 18:41, 2 December 2023 (UTC)
- I am not the one who proposed the bytes measures in the table, of course, as they have been there on the page for many years. I propose that we adapt the bytes measures to be the article markup size for articles with the defined word count, rather than the alternative proposal of removing the bytes measures altogether. We cannot rely on a word count measure alone, as many editors are not using some largely unknown technical tools, and these tools cannot work for all articles. Onetwothreeip (talk) 10:18, 2 December 2023 (UTC)
- Which articles do these tools not work for? Genuinely curious since I maintain Wikipedia:Prosesize so if it doesn't work for you or gives bad results on some article let me know.
- I would understand if prosesize wasn't a gadget, but who exactly (apart from IP editors) can't turn on the gadget in preferences? Galobtter (talk) 14:36, 2 December 2023 (UTC)
- Also IP editors and people without javascript can use https://prosesize.toolforge.org/ Galobtter (talk) 14:39, 2 December 2023 (UTC)
- Prosesize has trouble with articles that contain mathematical markup. Recently there was a discussion concerning the size of John von Neumann, and the prosesize tool overstated the size of the article by 5,000 words. Hawkeye7 (discuss) 18:17, 2 December 2023 (UTC)
- Other than IP editors, the tools don't work for editors who are not aware of such tools. That is not at all a criticism of these tools, it is simply that most editors are not going to be using these tools that are ubiquitous for some editors. As the guidelines should be broadly accessible, the bytes size element should remain in the article, but can be modified or contextualised as required. Onetwothreeip (talk) 20:40, 2 December 2023 (UTC)
- The tools are mentioned at WP:SIZERULE - so anyone who reads that table should be aware of those tools, so I don't see the issue here. Galobtter (talk) 23:26, 2 December 2023 (UTC)
- @Galobtter: It's not the tools that are an issue. The vast majority of editors are not going to be involved in using these tools, and likely won't be aware of them, even if they are aware of the size guidelines. Many more editors are familiar with bytes as a measure of size, as that is how size is tracked in the page history. There is also the fact that the tool does not work for all articles. Onetwothreeip (talk) 02:17, 3 December 2023 (UTC)
- If they're not aware of the tools, then they are not aware of the guideline either. Hawkeye7 (discuss) 02:48, 3 December 2023 (UTC)
- There is much greater awareness of the size guidelines than the size tools, but even greater awareness and use of bytes as a measurement of article size. Even more than all that, editors are aware of the concept of something being very big or too big. Tools are a super-editor thing, as most editors don't think tools are for them. Onetwothreeip (talk) 03:17, 3 December 2023 (UTC)
- Nah. Having an awareness that a byte count is given in history page doesn't make it a tool for measurement of article size if the editor in question isn't aware of the guideline on article size and therefore is not doing article size measurment. To them it's just either trivia, or an indication of how much of the article changed between two edits. — SMcCandlish ☏ ¢ 😼 12:15, 3 December 2023 (UTC)
- There is much greater awareness of the size guidelines than the size tools, but even greater awareness and use of bytes as a measurement of article size. Even more than all that, editors are aware of the concept of something being very big or too big. Tools are a super-editor thing, as most editors don't think tools are for them. Onetwothreeip (talk) 03:17, 3 December 2023 (UTC)
There is also the fact that the tool does not work for all articles.
I asked this above - which ones? Galobtter (talk) 03:19, 3 December 2023 (UTC)- I previously did not answer this as somebody else had answered. The article 17th century in literature is another example, which I found using the random article feature. The tool displays the HTML size and wiki text size, but not anything similar to the word count for a prose article. Onetwothreeip (talk) 04:34, 3 December 2023 (UTC)
- Before I became aware of tools, I simply copy-pasted Wikipedia articles into Word or an online word counter. Those don't require much tech-awareness. —Femke 🐦 (talk) 08:14, 3 December 2023 (UTC)
- Nor does WordCounter.net, WordCount.com, and all the other fast-and-easy such web tools available with few-second "word count" search at Google. — SMcCandlish ☏ ¢ 😼 12:15, 3 December 2023 (UTC)
Readable prose size: the amount of viewable text in the main sections of the article, not including tables, lists, or footer sections.
The tool is simply reporting accurately that an article that is entirely a list has 0 readable prose. Galobtter (talk) 16:12, 3 December 2023 (UTC)- Putting aside the definition of readable prose, I'm not raising anything that isn't already known about the tool. It doesn't meaningfully assess the length of articles which are predominantly lists, for example. This is understandable, since it is designed for articles which are not lists. Onetwothreeip (talk) 20:43, 3 December 2023 (UTC)
- Before I became aware of tools, I simply copy-pasted Wikipedia articles into Word or an online word counter. Those don't require much tech-awareness. —Femke 🐦 (talk) 08:14, 3 December 2023 (UTC)
- I previously did not answer this as somebody else had answered. The article 17th century in literature is another example, which I found using the random article feature. The tool displays the HTML size and wiki text size, but not anything similar to the word count for a prose article. Onetwothreeip (talk) 04:34, 3 December 2023 (UTC)
- If they're not aware of the tools, then they are not aware of the guideline either. Hawkeye7 (discuss) 02:48, 3 December 2023 (UTC)
- @Galobtter: It's not the tools that are an issue. The vast majority of editors are not going to be involved in using these tools, and likely won't be aware of them, even if they are aware of the size guidelines. Many more editors are familiar with bytes as a measure of size, as that is how size is tracked in the page history. There is also the fact that the tool does not work for all articles. Onetwothreeip (talk) 02:17, 3 December 2023 (UTC)
- The tools are mentioned at WP:SIZERULE - so anyone who reads that table should be aware of those tools, so I don't see the issue here. Galobtter (talk) 23:26, 2 December 2023 (UTC)
- Markup size is totally irrelevant for the scope of this page. It's about length, and that is always measured by word count or bytes of prose, something that's very easy to find just by adding the relevant script to your profile. Furthermore, these guide figures are very useful for citing when articles come in too long (or too short) at FAC and elsewhere. We've discussed it repeatedly in the past and I see no case for removal. — Amakuru (talk) 08:56, 2 December 2023 (UTC)
- If those numbers are not accurately equivalent to markup size, then the numbers should be changed. I do not intend it to be equivalent to the byte size of the readable prose. For example, if the average 15,000-word article is 150kB, then that is what the "estimated average markup size" column should say. Onetwothreeip (talk) 07:49, 2 December 2023 (UTC)
Original proposal: remove KB, retain word count
Some useful rules of thumb for splitting or trimming articles, and combining small articles:
Readable prose size | What to do |
---|---|
> 15,000 words | Almost certainly should be divided or trimmed. |
> 9,000 words | Probably should be divided or trimmed, although the scope of a topic can sometimes justify the added reading material. |
> 8,000 words | May need to be divided or trimmed; likelihood goes up with size. |
< 6,000 words | Length alone does not justify division or trimming. |
< 150 words | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
Please note: These rules of thumb apply only to readable prose. Word counts can be found with the help of WP:Xtools, Shubinator's DYK tool or Prosesize.
- Because the #Alternative proposal: include mark-up size was inserted in to the middle of the discussion and has resulted in confusion, I've excerpted again from #Removing kb limits the original proposal which has been under discussion for some months. SandyGeorgia (Talk) 15:14, 2 December 2023 (UTC)
- Sorry, planned to add another section with a summary of viewpoints so far, but real life got in the way.
- I count 9 people in favour in this discussion, almost all who indicate the kB inclusion is confusing. Two others in the above discussion also support, one person who objects because of an alternative proposal. @VQuakr, do you agree this is a strong enough consensus? —Femke 🐦 (talk) 17:57, 2 December 2023 (UTC)
- Femke, delete my post (and this one) if it helps clean up the page ... as you wish ... but I saw that Amakuru was confused, so wanted to get this back on track somehow. SandyGeorgia (Talk) 20:24, 2 December 2023 (UTC)
- @Femke: No, as was already noted this needs a RFC to get sufficient participation. Also you're involved and shouldn't be evaluating the consensus, which isn't vote count. VQuakr (talk) 00:53, 3 December 2023 (UTC)
- I think you're alone in calling for an RfC. What I can do is requesting this to be closed be an uninvolved editor? —Femke 🐦 (talk) 08:09, 3 December 2023 (UTC)
- Removing the bytes from the guideline would be a monumental change requiring a lot more participation. I don't know if an RfC is what's required, maybe something even larger. Onetwothreeip (talk) 09:31, 3 December 2023 (UTC)
- @Femke: this proposal hasn't had enough visibility to have consensus. A RFC is the normal way to get that quorum. I'm quite confused why there's resistance to that, to be honest. VQuakr (talk) 09:57, 3 December 2023 (UTC)
- @Femke and SandyGeorgia: thanks for the clarification, and that's my bad - I thought this was one of those occasional calls to scrap size limits altogether. To be honest I actually think more in readable prose bytes than words - mainly because the stub/non-stub boundary is usually cited as around 1500 bytes - but it's pretty much a straight conversion anyway and the prose tool gives both, so if there's confusion occurring I can certainly support the removal you propose. It looks to me like we already have a consensus to go ahead with that anyway. It isn't actually a substantive change anyway because the guideline already specifies that "Please note: These rules of thumb apply only to readable prose and not to wiki markup size", meaning an interpretation involving markup size is simply not correct even with the status quo. Cheers — Amakuru (talk) 14:03, 3 December 2023 (UTC)
- Maybe we could replace kB with kilocharacters - I don't think people really mean bytes when they are counting length (certain emojis/scripts can be 4 bytes but one character, I think prosesize has had variable behaviour in how exactly to count that). Characters are probably the most objective measure of prose length. Galobtter (talk) 16:23, 3 December 2023 (UTC)
- When we have implemented the current consensus, a next step should be to propose the same conversions in other areas. DYK is much more beginner-facing, and we also have a byte measure of readable prose there. The WP:Stub page seems to indicate DYK is where the 1500 bytes comes from. —Femke 🐦 (talk) 17:06, 3 December 2023 (UTC)
- @Amakuru: Based on that, which is fairly common among editors, it would make sense to retain a bytes measure as a supplementary indicator of article length. Onetwothreeip (talk) 20:47, 3 December 2023 (UTC)
- Well that measure has been in place for many years, certainly. But the whole reason we're here is because it seems to repeatedly cause confusion. With that in mind I would concur with those above that a change is advisable. Galobtter's suggestion might well be a good one (although I'd just go with "1500 characters" and suchlike rather than the rather obscure-sounding "1.5 kilocharacters", or indeed just removing the "byte" equivalents would be fine with me. — Amakuru (talk) 21:17, 3 December 2023 (UTC)
- Maybe we could replace kB with kilocharacters - I don't think people really mean bytes when they are counting length (certain emojis/scripts can be 4 bytes but one character, I think prosesize has had variable behaviour in how exactly to count that). Characters are probably the most objective measure of prose length. Galobtter (talk) 16:23, 3 December 2023 (UTC)
- @VQuakr, do you think we should remove the kb limits, go with 123i's proposal, or keep the current kb limits? Tom B (talk) 16:04, 3 December 2023 (UTC)
- @Femke and SandyGeorgia: thanks for the clarification, and that's my bad - I thought this was one of those occasional calls to scrap size limits altogether. To be honest I actually think more in readable prose bytes than words - mainly because the stub/non-stub boundary is usually cited as around 1500 bytes - but it's pretty much a straight conversion anyway and the prose tool gives both, so if there's confusion occurring I can certainly support the removal you propose. It looks to me like we already have a consensus to go ahead with that anyway. It isn't actually a substantive change anyway because the guideline already specifies that "Please note: These rules of thumb apply only to readable prose and not to wiki markup size", meaning an interpretation involving markup size is simply not correct even with the status quo. Cheers — Amakuru (talk) 14:03, 3 December 2023 (UTC)
- I think you're alone in calling for an RfC. What I can do is requesting this to be closed be an uninvolved editor? —Femke 🐦 (talk) 08:09, 3 December 2023 (UTC)
Readable prose size | Average markup size of readable prose[a] | What to do |
---|---|---|
> 15,000 words | > 100,000 bytes | Almost certainly should be divided or trimmed. |
> 9,000 words | > 60,000 bytes | Probably should be divided or trimmed, although the scope of a topic can sometimes justify the added reading material. |
> 8,000 words | > 50,000 bytes | May need to be divided or trimmed; likelihood goes up with size. |
< 6,000 words | < 40,000 bytes | Length alone does not justify division or trimming. |
< 150 words | < 1,000 bytes | If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, the article could be expanded; see Wikipedia:Stub. |
|
Another way we can retain bytes as a common measure of size, but factored as equivalent to word count. Onetwothreeip (talk) 22:16, 3 December 2023 (UTC)
Summary
The talk page has gone silent for almost 3 days, after lots of activity. In this kb limit discussion: 13 editors commented, 11 want to remove the kb limit, Onetwothreeip wants to change it to bytes, VQuakr has not said whether they want to retain kb or change to bytes. Of the 13 editors, VQuakr wants an RfC, and Onetwothreeip doesn't "know if an RfC is what's required, maybe something even larger." Is that accurate?
VQuakr said, "you're involved and shouldn't be evaluating the consensus, which isn't vote count". Consensus doesn't require unanimity. Shall we get an admin to evaluate consensus in this discussion, the one above and in the archives, or can we manage this ourselves? Tom B (talk) 14:32, 6 December 2023 (UTC)
- With this strong of a consensus, we don't need an RfC or outside help. -- Valjean (talk) (PING me) 16:49, 6 December 2023 (UTC)
- There's talk about propagating this through other PAGs, etc. There is expressly not a strong consensus on this, because there are not enough participants or visibility to have a strong consensus. Has this proposal at least been advertised at WP:VPP?
- @Tpbradbury: I see no benefit whatsoever from changing from kB to bytes as a unit of measurement. VQuakr (talk) 17:14, 6 December 2023 (UTC)
- @VQuakr, thanks, do you think we should remove the kb limits or not? You said "this isn't a sufficient level of involvement to change a guideline per WP:CONLEVELS", but that guideline does not mention the word strong? In any case, there is strong consensus, to remove the confusing kb limits, no one has opposed removing them. onetwothreeIP wants to remove them and change to bytes and we do not know what you want? Even if you were both against, that would still be strong consensus for removing them. We generally want to waste people's time as little as possible. Who is talking about propagating it through PAGs Tom B (talk) 17:35, 6 December 2023 (UTC)
- I think VQuark refers to "Wikipedia has a standard of participation and consensus for changes to policies and guidelines.". It doesn't refer to a specific minimun, and 12 in support (including previous discussion), and 1 or 2 against easily meets this for a non-substantative chance imo. Anyway, we're going in circles. We should list this at WP:CR as a compromise, and VQuakr is welcome to post at VPP. —Femke 🐦 (talk) 17:53, 6 December 2023 (UTC)
- It could be 14-0 and you still wouldn't have quorum for this. VQuakr (talk) 20:16, 6 December 2023 (UTC)
- Can you point to the guidance in CONLEVELS or elsewhere, about what level of quorum is needed? And do you think we should remove kb limits or not pls? Tom B (talk) 09:35, 7 December 2023 (UTC)
- @Tpbradbury: that would be WP:PGCHANGE. Changes should reflect broad community consensus, and since discussion on this is being kept silo'd for some reason no broad consensus can yet exist. I see no benefit whatsoever to removing the kB portion of the guideline, and since there are drawbacks the answer is not to change it. VQuakr (talk) 16:58, 7 December 2023 (UTC)
- thank you, no one has explained a single drawback that has withstood scrutiny? Please may someone write down one drawback that is correct and outweighs the confusion of having the kb limits, when much more people understand what word limits means? Tom B (talk) 18:30, 7 December 2023 (UTC)
- Well no, the onus is on the proposers to come up with a benefit. Word count has the exactly analogous issue with prose size vs total word count. I don't see any difference with regard to confusion level, and using inconsistent units for various metrics is a big ol' drawback. VQuakr (talk) 20:36, 7 December 2023 (UTC)
- thank you, no one has explained a single drawback that has withstood scrutiny? Please may someone write down one drawback that is correct and outweighs the confusion of having the kb limits, when much more people understand what word limits means? Tom B (talk) 18:30, 7 December 2023 (UTC)
- @Tpbradbury: that would be WP:PGCHANGE. Changes should reflect broad community consensus, and since discussion on this is being kept silo'd for some reason no broad consensus can yet exist. I see no benefit whatsoever to removing the kB portion of the guideline, and since there are drawbacks the answer is not to change it. VQuakr (talk) 16:58, 7 December 2023 (UTC)
- Can you point to the guidance in CONLEVELS or elsewhere, about what level of quorum is needed? And do you think we should remove kb limits or not pls? Tom B (talk) 09:35, 7 December 2023 (UTC)
- It could be 14-0 and you still wouldn't have quorum for this. VQuakr (talk) 20:16, 6 December 2023 (UTC)
- I think VQuark refers to "Wikipedia has a standard of participation and consensus for changes to policies and guidelines.". It doesn't refer to a specific minimun, and 12 in support (including previous discussion), and 1 or 2 against easily meets this for a non-substantative chance imo. Anyway, we're going in circles. We should list this at WP:CR as a compromise, and VQuakr is welcome to post at VPP. —Femke 🐦 (talk) 17:53, 6 December 2023 (UTC)
- @VQuakr, thanks, do you think we should remove the kb limits or not? You said "this isn't a sufficient level of involvement to change a guideline per WP:CONLEVELS", but that guideline does not mention the word strong? In any case, there is strong consensus, to remove the confusing kb limits, no one has opposed removing them. onetwothreeIP wants to remove them and change to bytes and we do not know what you want? Even if you were both against, that would still be strong consensus for removing them. We generally want to waste people's time as little as possible. Who is talking about propagating it through PAGs Tom B (talk) 17:35, 6 December 2023 (UTC)
- Because this would be a very substantive change to policy, it requires a stronger and broader consensus that would normally be required of an editorial dispute. I also don't agree that 13 editors have commented, since there seems to be much fewer participants. Onetwothreeip (talk) 20:13, 6 December 2023 (UTC)
- The 12 people in favour: me, Tom B, SandyGeorgia, Valjean, Jo-Jo Eumerus, Hawkeye7, buidhe, Wasted Time R, Galobbter, SMcCandlish, Amakuru (mild), DFlhb (previous discussion). One person against, as they're in favour of an alternative proposal. I've asked an uninvolved editor to close at CR, they can decide on this question too. Again, feel free to post at VPP if you want more participation. There is no specified quorum for guideline changes, especially this small, and the high agreement makes clear that more participation won't change the outcome. —Femke 🐦 (talk) 20:38, 6 December 2023 (UTC)
- That's including some very weak support, and inferring support that isn't explicit. Regardless, it would be a very significant change, because the kilobytes measures are often quoted by editors in relation to this guideline. Rightly or wrongly, this guideline page is often seen as being summarised by the kilobytes measurements. A change of that magnitude should be representative. Onetwothreeip (talk) 09:46, 7 December 2023 (UTC)
- It also isn't true. @Femke: you're too involved with this to even accurately count !votes let alone remember this isn't a vote; each time you claim consensus when none exists your argument becomes weaker (especially when you use inaccurate numbers). VQuakr (talk) 17:02, 7 December 2023 (UTC)
- The top of this section is literally, mathematically summarised, with "Latest comment: 2 minutes ago, 68 comments, 13 people in discussion. And there have been other discussions above and in the archives, Tom B (talk) 09:45, 7 December 2023 (UTC)
- See WP:NOTAVOTE. There are important reasons to use bytes or characters rather than words because they have a clearer definition which works better for scientific and technical topics such as maths. The page size gadget uses kb as its standard unit whereas words are exceptional and only used once. For example, for this page:
- The 12 people in favour: me, Tom B, SandyGeorgia, Valjean, Jo-Jo Eumerus, Hawkeye7, buidhe, Wasted Time R, Galobbter, SMcCandlish, Amakuru (mild), DFlhb (previous discussion). One person against, as they're in favour of an alternative proposal. I've asked an uninvolved editor to close at CR, they can decide on this question too. Again, feel free to post at VPP if you want more participation. There is no specified quorum for guideline changes, especially this small, and the high agreement makes clear that more participation won't change the outcome. —Femke 🐦 (talk) 20:38, 6 December 2023 (UTC)
- HTML document size: 156 kB
- Prose size (including all HTML code): 21 kB
- References (including all HTML code): 3613 B
- Wiki text: 15 kB
- Prose size (text only): 10192 B (1679 words) "readable prose size"
- References (text only): 275 B
- Andrew🐉(talk) 10:51, 7 December 2023 (UTC)
- I am currently comparing Parinacota (volcano) with Mount Hudson with this tool and I notice that while there is an almost 50% difference in prose - Parinacota has 39916 characters (6291 words) and Mount Hudson 26331 characters (4193 words), the kb size difference is much smaller - only about 371.28kb for Parinacota vs 368.08kb for Hudson. Or for that matter, the images - removing them all from Parinacota in this revision drops the size by 30kb relative to the normal version, even though the character count went from 39916 characters (6291 words) to 40772 characters (6384 words). It's these discrepancies that make kb size a poor measure of prose. Jo-Jo Eumerus (talk) 11:26, 7 December 2023 (UTC)
- @Jo-Jo Eumerus: The Wikipedia size tool shows 25 kB for Hudson and 38 kB for Parinacota. You are listing the HTML document size, which is larger than the total Wiki markup size and not relevant to any of the discussions on this page regarding article size. VQuakr (talk) 16:51, 7 December 2023 (UTC)
- So there is both a markup size in kb and a html size in kb? That seems like another argument against using "kb" as a size estimate. A markup kb size would probably also be influenced by citation formatting, which has nothing to do with any metric of article size (that can't be more easily captured by html size) Jo-Jo Eumerus (talk) 17:16, 7 December 2023 (UTC)
- The information that is pulled from the Wiki size tool is copy/pasted in Andrew Davidson's comment, to which you replied just above. Using consistent units for the various metrics is a feature, not a bug. No, the relevant metric to article length is the prose size, which is not impacted by citation formatting. VQuakr (talk) 17:19, 7 December 2023 (UTC)
- So there is both a markup size in kb and a html size in kb? That seems like another argument against using "kb" as a size estimate. A markup kb size would probably also be influenced by citation formatting, which has nothing to do with any metric of article size (that can't be more easily captured by html size) Jo-Jo Eumerus (talk) 17:16, 7 December 2023 (UTC)
- @Jo-Jo Eumerus: The Wikipedia size tool shows 25 kB for Hudson and 38 kB for Parinacota. You are listing the HTML document size, which is larger than the total Wiki markup size and not relevant to any of the discussions on this page regarding article size. VQuakr (talk) 16:51, 7 December 2023 (UTC)
- To be specific, the Parinacota article has 50% more prose than the Mount Hudson article (that's not quite the same as there being a 50% difference). The markup size for the former is 8% larger than the latter. Two random articles should not be used to demonstrate a relationship (or lack thereof) between prose size and markup size. It's not a bad thing that we have more than one measure of size, and there are elements of articles that should be measured that aren't prose, such as use of templates. Onetwothreeip (talk) 19:55, 7 December 2023 (UTC)
- We don't have any need to measure use of templates, unless something hits the transclusion limit of the parser, which is quite hard to do. — SMcCandlish ☏ ¢ 😼 00:31, 8 December 2023 (UTC)
- I believe List of earthquakes in 2023 does, if anyone needs an example of one such page in the wild. I recall someone was having some kind of problem getting citation templates to work properly on it. -- asilvering (talk) 02:06, 8 December 2023 (UTC)
- Yes, it's been done. But remember that many templates contain no prose at all, consisting entirely or images, markup or categories. Hawkeye7 (discuss) 02:39, 8 December 2023 (UTC)
- You say there's no need to measure templates, then you give a reason to measure templates. Templates are an example of non-prose size where management applies. Onetwothreeip (talk) 08:23, 8 December 2023 (UTC)
- Not really, I just used poor wording. If you hit the tranclusion limit, you don't need to "measure" templates at all, you need to look for ways to remove a bunch of them (immediately, since the situation is breaking the source citation rendering at the bottom of the page!) that are not really necessary. Start with typographic convenience ones like
{{snd}}
and{{anchor}}
and{{official homepage}}
and so on, and also look for anything being used systematically throughout the page that the removal of which (or replacement of which by something exactly or roughly equivalent in plain markup instead of a template) would result in a large reduction in the number of template calls. One culprit may be excessive use of{{sfn}}
and other unnecessary citation templates, when bare<ref>
cites would work fine. (Various CITEVAR pundits will hate the idea of having to change citation styles, but there really is nothing wrong whatsoever with doing<ref>Smith (2023), p. 123.</ref>
, and you can even do something more helpful with<ref>[[#Smith (2023)|Smith (2023)]], p. 123.</ref>
and|ref=Smith (2023)
in the original full citation, with zero templates used beyond the one in the original cite, and this can make a big difference if you're citing that source over and over and over again.) Also, yes, consider splitting the article. If it's not practical to split, then you have little choice but to go template-hunting. But not "template measuring"; I'm skeptical there is any such thing or that it has much utility if it is a thing. — SMcCandlish ☏ ¢ 😼 11:36, 8 December 2023 (UTC)- Measures of markup size can identify template overuse risks and also excessive prose. Onetwothreeip (talk) 21:03, 8 December 2023 (UTC)
- Honestly, that doesn't really parse as a meangingful claim, much less a demonstrated one. Knowing what the markup size is tells us nothing whatsoever about what percentage of it is accounted for by templates and what proportion is prose. And it's not even really clear what you mean by "excessive prose" in the first place. — SMcCandlish ☏ ¢ 😼 08:12, 9 December 2023 (UTC)
- With 60k+ articles with broken anchor references these everything from with them, especially as they produce no visible error messages. Usually removing flag templates is the answer, otherwise splitting the article is usually a good idea. Once that limit is hit the article is usually problematic to load or edit on mobile. -- LCU ActivelyDisinterested «@» °∆t° 00:22, 9 December 2023 (UTC)
- I'm not quite sure what "these everything from with them" was going for, but, yeah, I had forgotten about flag templates as a major source of template code bloat in some classes of articles. There are probably a number of other over-used template types. — SMcCandlish ☏ ¢ 😼 08:12, 9 December 2023 (UTC)
- Templates can be surprising. {{YEAR}}, for example, invokes a dozen templates and a Lua module. Hawkeye7 (discuss) 10:06, 9 December 2023 (UTC)
- I'm not quite sure what "these everything from with them" was going for, but, yeah, I had forgotten about flag templates as a major source of template code bloat in some classes of articles. There are probably a number of other over-used template types. — SMcCandlish ☏ ¢ 😼 08:12, 9 December 2023 (UTC)
- Measures of markup size can identify template overuse risks and also excessive prose. Onetwothreeip (talk) 21:03, 8 December 2023 (UTC)
- Not really, I just used poor wording. If you hit the tranclusion limit, you don't need to "measure" templates at all, you need to look for ways to remove a bunch of them (immediately, since the situation is breaking the source citation rendering at the bottom of the page!) that are not really necessary. Start with typographic convenience ones like
- I believe List of earthquakes in 2023 does, if anyone needs an example of one such page in the wild. I recall someone was having some kind of problem getting citation templates to work properly on it. -- asilvering (talk) 02:06, 8 December 2023 (UTC)
- We don't have any need to measure use of templates, unless something hits the transclusion limit of the parser, which is quite hard to do. — SMcCandlish ☏ ¢ 😼 00:31, 8 December 2023 (UTC)
- Hi Andrew, thanks, Onetwothreeip said "I also don't agree that 13 editors have commented", when 13 people had commented. It's not about votes, but a more basic disagreement about totalling the number of people who have commented in a discussion, Tom B (talk) 18:34, 7 December 2023 (UTC)
- I am currently comparing Parinacota (volcano) with Mount Hudson with this tool and I notice that while there is an almost 50% difference in prose - Parinacota has 39916 characters (6291 words) and Mount Hudson 26331 characters (4193 words), the kb size difference is much smaller - only about 371.28kb for Parinacota vs 368.08kb for Hudson. Or for that matter, the images - removing them all from Parinacota in this revision drops the size by 30kb relative to the normal version, even though the character count went from 39916 characters (6291 words) to 40772 characters (6384 words). It's these discrepancies that make kb size a poor measure of prose. Jo-Jo Eumerus (talk) 11:26, 7 December 2023 (UTC)
- Andrew🐉(talk) 10:51, 7 December 2023 (UTC)
- Where is this alleged change in policy/guideline? Nothing is changing ... the guideline has always been based on readable prose, and we are dropping KB as an inaccurate approximation for word count, which isn't changing at all. SandyGeorgia (Talk) 20:47, 7 December 2023 (UTC)
- The significant change is in attempting to remove the use of kilobytes as a measurement of article or page size. Onetwothreeip (talk) 08:24, 8 December 2023 (UTC)
- That's not a significant change, because the kilobyte measurement is simply a direct 150 words = 1 kb multiplication. And the vast majority of measurement tools (including the tool that is commonly used within Wikipedia) give both metrics. Therefore, given (a) the potential for confusion with the bytes of markup prose listed in article histories, and (b) the redundancy of the kb measurement as against the words measurement, it has to go. — Amakuru (talk) 12:44, 8 December 2023 (UTC)
- It's actually the other way around, the word count measure is derived from the kilobyte measure. We can simply address the confusion, therefore keeping both measures, and greatly strengthening this article. Both the word count and markup size are widely used as indicators of size across Wikipedia, so removing one from the guideline would be a change that requires significant support. Onetwothreeip (talk) 21:06, 8 December 2023 (UTC)
- Could you explain where you got the idea that the "word count measure is derived from the kilobyte measure"? As far as I know, the word count is ... a count of the words ... and that statement is false. SandyGeorgia (Talk) 23:59, 8 December 2023 (UTC)
- And I'm not aware of any place where the KB measure is used; it is used on this page, where it has always been a problem. SandyGeorgia (Talk) 23:59, 8 December 2023 (UTC)
- This is the page before the word count measures were added. They were specifically calculated based on the kilobyte measurements which originate from the first edition of the page in 2003, last updated in 2007. These are commonly used across Wikipedia as a measurement of article and page length. Onetwothreeip (talk) 06:11, 9 December 2023 (UTC)
- That something 20 years ago was once calculated a particular way does not prove it was a good idea, and has nothing to do with what we should advise today and how to arrive at such numbers now. Look, we all already get that you really, really want to include KB measures of total markup size, but your case has not been convincing and just re-re-repeating it in various wording isn't working. — SMcCandlish ☏ ¢ 😼 08:12, 9 December 2023 (UTC)
- Okay well I never said that its age means that it's good, I'm in favour of updating the numbers. I was explaining, per request, how I know that the word count measure was derived from the kilobytes measure. Largely, I have been responding to those who have been responding to me. Onetwothreeip (talk) 09:58, 9 December 2023 (UTC)
- Fair enough. — SMcCandlish ☏ ¢ 😼 12:04, 10 December 2023 (UTC)
- Coincidentally, this diff today raises the kb size of the article from 42K to 48K and does nothing at all except format the references. Not so much as a comma is changed in the article's text. WhatamIdoing (talk) 23:11, 10 December 2023 (UTC)
- That would be because it adds archive URLs and template information for most of the article's sources, so that kind of size increase would be expected. Onetwothreeip (talk) 02:06, 11 December 2023 (UTC)
- I think this diff [26] is what was meant. We need to have an RfC about this, probably. Multiple editors I know of have for some time been complaining about all the constant watchlist churn of people using automated tools to completely pointlessly add
|url-status=live
|archive-url=...
|archive-date=...
. We have absolutely no need for archive-urls for websites that are not dead. (Except in the rare event that the page changes frequently and our archived snapshot has information which disappeared later, in which case we'd probably need to have an explanatory note and be using a separate Wayback template to just link to that version and not to the live page since we're not citing the live page and linking to it would be confusing.) It is entirely sufficient that Wayback or some other Web archiver has archived a copy, so it is there if/when the URL does eventually go dead. The dumb thing about this "pre-emptive archival template noise" is that if the URL does go dead later, having this noise in the template does not do readers a service, since it will still say|url-status=live
. So the cite template would have to be edited to fix that anyway, ergo that is the time to add the archive-url in the first place: when the source link has gone dead (or usurped or whatever). I really do not believe there is a consensus to go around adding archive-urls for perfectly live Web sources. — SMcCandlish ☏ ¢ 😼 22:32, 11 December 2023 (UTC)- Yes, that's the link. @Onetwothreeip, of course it's expected. This diff is an illustration of why looking at the file size (=the number of bytes shown on the history page) is an unreliable method for finding out what the article size is. That article uses almost 50 kb of wikitext to record ~25 kb of content. Another page with 50 kb of wikitext might actually have close to 50 kb of content.
- (Whether and how to record archived links is a different subject, for a different page.) WhatamIdoing (talk) 22:24, 13 December 2023 (UTC)
- The byte size itself a measure of article size. If what you mean is that it is an unreliable method of finding the prose size, that would be partially true, but byte size still correlates strongly with prose size. Onetwothreeip (talk) 10:10, 14 December 2023 (UTC)
- I think this diff [26] is what was meant. We need to have an RfC about this, probably. Multiple editors I know of have for some time been complaining about all the constant watchlist churn of people using automated tools to completely pointlessly add
- That would be because it adds archive URLs and template information for most of the article's sources, so that kind of size increase would be expected. Onetwothreeip (talk) 02:06, 11 December 2023 (UTC)
- That something 20 years ago was once calculated a particular way does not prove it was a good idea, and has nothing to do with what we should advise today and how to arrive at such numbers now. Look, we all already get that you really, really want to include KB measures of total markup size, but your case has not been convincing and just re-re-repeating it in various wording isn't working. — SMcCandlish ☏ ¢ 😼 08:12, 9 December 2023 (UTC)
- This is the page before the word count measures were added. They were specifically calculated based on the kilobyte measurements which originate from the first edition of the page in 2003, last updated in 2007. These are commonly used across Wikipedia as a measurement of article and page length. Onetwothreeip (talk) 06:11, 9 December 2023 (UTC)
- It's actually the other way around, the word count measure is derived from the kilobyte measure. We can simply address the confusion, therefore keeping both measures, and greatly strengthening this article. Both the word count and markup size are widely used as indicators of size across Wikipedia, so removing one from the guideline would be a change that requires significant support. Onetwothreeip (talk) 21:06, 8 December 2023 (UTC)
- That's not a significant change, because the kilobyte measurement is simply a direct 150 words = 1 kb multiplication. And the vast majority of measurement tools (including the tool that is commonly used within Wikipedia) give both metrics. Therefore, given (a) the potential for confusion with the bytes of markup prose listed in article histories, and (b) the redundancy of the kb measurement as against the words measurement, it has to go. — Amakuru (talk) 12:44, 8 December 2023 (UTC)
- The significant change is in attempting to remove the use of kilobytes as a measurement of article or page size. Onetwothreeip (talk) 08:24, 8 December 2023 (UTC)
Is there verifiable scientific basis for the article length guidelines?
Why isn't there any sourcing on the article size guidelines? This seems very much like a "trust me, bro" situation. And when I look at the discussions above to try to gain insight, things spin rapidly into secondary topics like "well it isn't the size per se, but about bandwidth, or editorial issues, or WP:UNDUE," or whatever. I realize that Wikipedia is built on a delicate balance of community consensus, but based on the discussions here, this guideline seems one of the most arbitrary; and therefore, one of the least respectable. Orange Suede Sofa (talk) 04:06, 6 July 2023 (UTC)
- Which part of our various manual of style pages do you feel to be the most scientifically grounded? CMD (talk) 05:08, 6 July 2023 (UTC)
- CMD, when you look at the MOS overall, it's just an arbitrary standard of how to format text that we all find consensus on. No one would demand of us that we "prove" that this or that spelling or punctuation is superior to another. It's primarily a matter of having a standard to avoid pointless disputes.
- The guidance on article size is a completely different beast. It makes claims about "readability" and that's a real-world issue that isn't just an arbitrary standard. Peter Isotalo 08:34, 6 July 2023 (UTC)
- I beg to differ with this comment as correctly formally needs to be adhered to, otherwise what is the point of an encyclopedia? Furthermore, there is only one way! I am but a mere man and stand to be corrected! Michaelcockrell7 (talk) 06:02, 12 November 2023 (UTC)
- This guideline is a standard to help avoid pointless disputes. CMD (talk) 10:05, 6 July 2023 (UTC)
- Why is the standard argued solely from the perspective of a reader who wants to read through one particular article in a single setting? Peter Isotalo 12:09, 6 July 2023 (UTC)
- What would you wish to base the standard on? CMD (talk) 13:03, 6 July 2023 (UTC)
- It isn't (that's not even true). But now that you mention it, wouldn't it be nice if someone expanded the other sections to include all of the logic discussed here over the years, like the maintenance burden, and the typically poor and redundant and off-topic prose found in many(most?) excessively long articles. SandyGeorgia (Talk) 13:54, 6 July 2023 (UTC)
- I think Peter Isotalo said it more succinctly than I did, which is that the guideline makes a claim to something that is concretely measurable, yet there is nothing here to back that claim up. I can easily see the difference between this and community consensus on things like serial commas or how we treat numerals. Right now the guideline is clearly a WP:COATRACK for many other topics, which I feel doesn't address any underlying issues. If an article has off-topic content, why isn't that specifically addressed instead of just waving an arbitrary size limit around? If I come across an article with redundant information, what happens if I cut out all the good stuff and the article goes below the size limit? Is the article now fixed, or does it have some other problem that the size guideline covered up? Orange Suede Sofa (talk) 19:03, 6 July 2023 (UTC)
- Sandy, my impression from discussions about the size of specific article is basically what Sofa pointed out: people will get hung up on counting kB or words at the expense of any other consideration. Even the 60 kB limit appears to have drifted towards being some sort of hard rule. I feel that people will often say that something's "too long to read" with this guideline as the only argument. Unless it's backed up by solid evidence, that's will not improve articles per se.
- CMD, I would like to focus the reader experience to start with, not all the other factors. What evidence can we find of any type of relevant reading behavior? Has there been discussion about this before? How can we find something more concrete than just guesswork and anecdotal evidence? Peter Isotalo 22:42, 6 July 2023 (UTC)
- If I recall correctly originally it was based on academic submission size......Oxford University Press. Moxy- 22:48, 6 July 2023 (UTC)
- If that's where the guideline comes from, then it's even more problematic than I thought. That's a guideline from a publisher of paper journals that makes no reference to readability; they're going to need a size limit for their own, and very different, practical reasons. Orange Suede Sofa (talk) 22:55, 6 July 2023 (UTC)
- When Wikipedia started the vast majority of us were academics so we simply follow academic norms. That said many many studies have been done about the 10,000 word count for readability and reader retention. I assume everyone has the capability of searching this. Moxy- 23:01, 6 July 2023 (UTC)
- I'm enough of an academic myself to know that academic norms vary; for example, Elsevier does not have strict word count limits, at least not for their engineering journals. And as for searching for the many studies, do we not observe WP:BURDEN here? Orange Suede Sofa (talk) 23:25, 6 July 2023 (UTC)
- We don't, this is a guideline. In general, we know that people don't read things past a certain length, but obviously it's not something which has a clear scientifically defined golden number of words people will read. So we have a guideline, same as any other style guideline, (and which is not strict,) which helps us as editors. CMD (talk) 00:25, 7 July 2023 (UTC)
- Since everyone is throwing around baseless claims about how they think people read articles, I'm going to join in and claim that nobody actually reads a long article from start to finish, and that all the editors defending this standard have various arguments that are neither cohesive nor based in any actual data. The even more disturbing thing is that there are participants here who I know from my fifteen years here have consistently whined about why even change anything because the WMF will just overrule it, yet here they are with equivalent ancien régime arguments that are nothing more than appeals to authority. The most disappointing thing for me, personally, is that I have respected 99% of Wikipedia's consensus policies to the point where I have taught them in public to others as respectable examples of how communities come together to arrive at a common good, but nobody here is able to agree on a common defense. The responses here are like a clown car; everyone has a different rationalization for something isn't justifiable. I won't be tendentious and argue about it any more, but now I'm going to use this as a counter-example of how even a long-standing and ultimately productive community like Wikipedia can find itself completely up in itself. Orange Suede Sofa (talk) 06:27, 7 July 2023 (UTC)
- We don't, this is a guideline. In general, we know that people don't read things past a certain length, but obviously it's not something which has a clear scientifically defined golden number of words people will read. So we have a guideline, same as any other style guideline, (and which is not strict,) which helps us as editors. CMD (talk) 00:25, 7 July 2023 (UTC)
- I'm enough of an academic myself to know that academic norms vary; for example, Elsevier does not have strict word count limits, at least not for their engineering journals. And as for searching for the many studies, do we not observe WP:BURDEN here? Orange Suede Sofa (talk) 23:25, 6 July 2023 (UTC)
- When Wikipedia started the vast majority of us were academics so we simply follow academic norms. That said many many studies have been done about the 10,000 word count for readability and reader retention. I assume everyone has the capability of searching this. Moxy- 23:01, 6 July 2023 (UTC)
- If that's where the guideline comes from, then it's even more problematic than I thought. That's a guideline from a publisher of paper journals that makes no reference to readability; they're going to need a size limit for their own, and very different, practical reasons. Orange Suede Sofa (talk) 22:55, 6 July 2023 (UTC)
- This page isn't (typically) used to address truly off-topic content, but rather level of detail. Keep in mind that some of the topics covered by articles here have literally millions of words written about them, and we need some way of identifying a good middle ground between that and a one-sentence stub in order to make an article that is reasonably useful. This also isn't the only page giving the message of "be concise" - cf WP:DETAIL.
- So with that in mind, what makes sense for drawing that line? WP:CANYOUREADTHIS proposes attention span times reading speed as the basis for that determination. It gives an average attention span of 30 to 40 minutes, cited to a 2005 reference - many more recent sources actually suggest smaller numbers, such as 15 or 20 minutes[27][28][29][30]. After that point, information processing is impeded[31] and information recall suffers[32]; cognitive fatigue[33] and mind wandering[34] both impact reading comprehension. Average reading speed meanwhile is roughly 238 words per minute[35], although there are some assumptions built into that estimation (education level, neurotypicality) that might support a lower number for accessibility purposes. So that calculation suggests a reasonable maximum around 9500 words of readable prose - lower than the current limits at TOOBIG.
- Other than readability, you could also consider, as Sandy mentioned, maintenance burden - but if anything that's likely to promote even shorter limits. Nikkimaria (talk) 04:55, 7 July 2023 (UTC)
- I deeply appreciate that you have been the first editor to make the effort to provide relevant data, and I have enormous amounts of respect for that. Yet, I predict that your analysis of
lower than the current limits at TOOBIG
will not result in an actual lowering of the guideline, indicating that the guideline itself is a WP:COATRACK of other issues, and shouldn't pretend to be based in data. Orange Suede Sofa (talk) 06:39, 7 July 2023 (UTC)- Nikkimaria, From what I can tell these links about attention span seem to all be about attention span while listening to a continuous spoken lecture.
- I would certainly agree that if you gathered a room full of ordinary adults and had someone at the front read a long Wikipedia article straight through in an unexciting manner, there would very likely be a significant proportion of the audience who were unable or uninterested in paying attention, pretty close to immediately, and if you kept the reading going for an hour almost none of the audience would catch every part.
- However, that doesn't really seem like the most relevant criterion for deciding what to include or not include in an encyclopedia article. –jacobolus (t) 12:39, 26 October 2023 (UTC)
- Nikkimaria, we absolutely need some form of upper limit on article size for technical or editorial reasons, but we need to argue those things separately from the reader experience. The concept of "readability" is here entirely built on the idea that unless an article is read in full, from start to finish, it does not live up to quality standards. There's no evidence that this has anything to do with what's good for readers. It's the underlying assumption on what "readability" is that needs to be addressed here.
- Regarding this issue, I emailed the WMF research list the other day and asked about research relevant to article length. I just received a very friendly reply with a "non-exhaustive" list of relevant research (everything in italics below):
- Improving Website Hyperlink Structure Using Server Logs[36]: Figure 3 shows that probability for a reader to click a link decreases substantially the later they appear in the text
- Research:Reading time[37]: characterizes how much time readers spent on a page reading an article. also interesting are the related projects under See also: i) impact of having section collapsed or not[38], as well as which parts of articles are read[39]
- A Large-Scale Characterization of How Readers Browse Wikipedia[40]: Figure 11c shows that readers have a much higher chance to stop navigating when encountering an article with low quality. this is related to the length in the sense that length is often used as one proxy to assess length.
- A large scale study of reader interactions with images on Wikipedia[41]: explores how images in articles help readers to navigate.
- Ongoing development of models to measure readability of Wikipedia articles across languages [42]
- This is the kind of research we need to be looking into. And in the mean time, I propose we remove the parts of the guideline that purport to be based on research.
- Peter Isotalo 06:55, 7 July 2023 (UTC)
- God yes. Let's hope that at long last sanity finally prevails. EEng 07:13, 7 July 2023 (UTC)
- If you agree that we need some kind of upper limit, but don't agree on the reader-experience-based metrics, what specifically would you propose as the upper limit based on technical/editorial reasons? Nikkimaria (talk) 01:00, 8 July 2023 (UTC)
- There seems to be point at which browsers struggle and lag to render the page (e.g. when going from reading to edit mode, when clicking preview, when saving the changes) making it kind of frustrating. I have a honkin'-fast machine with more RAM than god, and I've still been hit with this issue on really long articles. But I have no idea how to concretize it into a number, and I don't think there's any kind of research that could be cited. It's likely to vary widely by browser and by machine capabilities. So it seems to be a legit issue but one hard to reduce to a "this is too big" specific number. — SMcCandlish ☏ ¢ 😼 14:58, 26 October 2023 (UTC)
- I figure the WMF might have information on when bytesize or display size becomes too much. Granted, as noted repeatedly bytesize is more about the number of images in an article than about its prose contents. Jo-Jo Eumerus (talk) 17:29, 26 October 2023 (UTC)
- I have a strong suspicion that it has more to do with code complexity – how much render parsing that the browser has to do (elements and style applied to them). On Facebook, I can load screen after screen after screen of images and videos, while the article I'm thinking of where I had this page loading delay recently doesn't have any more images on it than a few screenfuls of Facebook. But it has much more text with complex markup in it. The total byte-size of all that text might be less than than that of a single image, but the browser has to do much more work to interpret and styled-display it. — SMcCandlish ☏ ¢ 😼 18:29, 26 October 2023 (UTC)
- Not at all. The speed of modern processors is very great, and the CPU time will not be noticeable. Download time is what is important. A Wikipedia article will load slowly on first download, afterwards it will be cached on the server and the network. A popular article will therefore load quickly, because it will be cached. See performance tuning for details Hawkeye7 (discuss) 19:30, 26 October 2023 (UTC)
- The quantity of text should really not any particular problem on the user browser side. The part that makes pages require more bandwidth is mostly images, and the part that is slowest to render for the server is going to be stuff like templates invoking scripts, fancy mediawiki features, footnotes, etc. In practice I've only ever had a problem with large numbers of math formulas, and only during specific times when the backend math renderer was having some kind of bug causing unexpected slowdowns, which seems to have been resolved. –jacobolus (t) 20:10, 26 October 2023 (UTC)
- Not at all. The speed of modern processors is very great, and the CPU time will not be noticeable. Download time is what is important. A Wikipedia article will load slowly on first download, afterwards it will be cached on the server and the network. A popular article will therefore load quickly, because it will be cached. See performance tuning for details Hawkeye7 (discuss) 19:30, 26 October 2023 (UTC)
- I have a strong suspicion that it has more to do with code complexity – how much render parsing that the browser has to do (elements and style applied to them). On Facebook, I can load screen after screen after screen of images and videos, while the article I'm thinking of where I had this page loading delay recently doesn't have any more images on it than a few screenfuls of Facebook. But it has much more text with complex markup in it. The total byte-size of all that text might be less than than that of a single image, but the browser has to do much more work to interpret and styled-display it. — SMcCandlish ☏ ¢ 😼 18:29, 26 October 2023 (UTC)
- I figure the WMF might have information on when bytesize or display size becomes too much. Granted, as noted repeatedly bytesize is more about the number of images in an article than about its prose contents. Jo-Jo Eumerus (talk) 17:29, 26 October 2023 (UTC)
- There seems to be point at which browsers struggle and lag to render the page (e.g. when going from reading to edit mode, when clicking preview, when saving the changes) making it kind of frustrating. I have a honkin'-fast machine with more RAM than god, and I've still been hit with this issue on really long articles. But I have no idea how to concretize it into a number, and I don't think there's any kind of research that could be cited. It's likely to vary widely by browser and by machine capabilities. So it seems to be a legit issue but one hard to reduce to a "this is too big" specific number. — SMcCandlish ☏ ¢ 😼 14:58, 26 October 2023 (UTC)
- I deeply appreciate that you have been the first editor to make the effort to provide relevant data, and I have enormous amounts of respect for that. Yet, I predict that your analysis of
- If I recall correctly originally it was based on academic submission size......Oxford University Press. Moxy- 22:48, 6 July 2023 (UTC)
- Why is the standard argued solely from the perspective of a reader who wants to read through one particular article in a single setting? Peter Isotalo 12:09, 6 July 2023 (UTC)
I appreciate that you ask, but I don't want to pull a figure out of thin air. I don't see that there's any more solid data than what's in the table right now. Plus, technical and editorial are two completely different things.
I'd like to focus first on getting rid of unsupported claims about what's best for readers. Can we start looking at options of how to clean up what's currently under "Readability"? I started tinkering on new wording on my own here for example. Should I make a concrete suggestion here on the talkpage?
Peter Isotalo 09:29, 8 July 2023 (UTC)
- Given that we agree there should be limits, I would not support getting rid of limits entirely unless/until we have a proposed replacement. Nikkimaria (talk) 12:59, 8 July 2023 (UTC)
- Re
Sandy, my impression from discussions about the size of specific article is basically what Sofa pointed out: people will get hung up on counting kB or words at the expense of any other consideration. Even the 60 kB limit appears to have drifted towards being some sort of hard rule. I feel that people will often say that something's "too long to read" with this guideline as the only argument. Unless it's backed up by solid evidence, that's will not improve articles per se.
Then I suggest you need to read more discussions, and that your reading has been selective. I don't support removing any text here, rather expanding it to include all the other reasons that have come up in all the other discussions over the years. First, I've never seen anyone in a real discussion refer to KB; readable prose is the relevant metric. Second, Ive never encountered an FA well over these limits that did not suffer from excess detail, verbosity, redundancy, and off-topic matter that could not be better included in a different article and summarized back to the main article, And every time I find that, I provide concrete example after example. It is not "counting words at the expense of any other consideration" (you can find same in an 8,000-word article); it is cutting words that were excess verbosity impeding readability to begin with. It's not just "too long to read"; it's more about "too boring to read as it doesn't come to the point and instead provides unencyclopedic trivia". Some editors believe that to meet comprehensive (WP:WIAFA) they must include every trival fact ever written on the topic, with no other discretion applied. We aren't writing journal articles; we're writing encyclopedic entries. If I ever see an FA that passes the recommended size considerably, and isn't overly detailed and excessively verbose, I'll support it at FAC or FAR. To date, I haven't. SIZE helps prevent poor writing, as well as helping assure that articles are maintainable and encyclopedic. SandyGeorgia (Talk) 13:23, 8 July 2023 (UTC)- PS, I believe it was Femke who tried to make the chart make more sense by removing the useless KB metric, and I think that got stalled. SandyGeorgia (Talk) 13:29, 8 July 2023 (UTC)
- I'm sorry, what are you commenting on here, Sandy? I haven't proposed removing the current size limit. Peter Isotalo 14:26, 8 July 2023 (UTC)
- I've quoted in green exactly what I'm commenting on (and agreeing with Nikkimaria that "I would not support getting rid of limits entirely unless ... "). SandyGeorgia (Talk) 18:28, 8 July 2023 (UTC)
- Okay, I don't see what you mean by that. I hadn't proposed specific size changes. I was talking about adjusting the text under "Readability" so that it doesn't include claims that it's based on actual research.
- I've quoted in green exactly what I'm commenting on (and agreeing with Nikkimaria that "I would not support getting rid of limits entirely unless ... "). SandyGeorgia (Talk) 18:28, 8 July 2023 (UTC)
- Regarding your other comments, I see that we have different perspectives on who invokes this guideline, how, where and why. I'm commenting from what I believe are my experiences. You're welcome to pick that apart if you want to, but this is my genuine impression of things. Take it or leave it.
- Now regarding the "Readability" section, below in gray is a concrete suggestion of how to reword it to get rid of the unverified claims about how people read articles. I'm excluding the shortcut note and see also for convenience. It's only for the main section, not the sub-sections.
Each Wikipedia article is in a process of evolution and is likely to continue growing. Other editors will add to articles when you are done with them. Wikipedia has practically unlimited storage space; however, long articles may be more difficult to navigate, and comprehend. An article that has grown to more than five paragraphs, or about 500 words, it should start being split up in to one or more sections. This helps organize content, especially for readers that are looking for information about a specific aspect of the article topic (see Wikipedia:Manual of Style and Wikipedia:Layout for guidance). Individual sections should not be so long that they impede the ability to find information and should be further divided into sub-sections. At around 10,000 words it may be beneficial to move some sections to other articles and replace them with summaries per Wikipedia:Summary style – see also Size guideline (rule of thumb) below. |
- I'm not putting this up for a vote, just discussion. I'd appreciate if we didn't start splitting up in oppose/support camps. Peter Isotalo 22:25, 8 July 2023 (UTC)
- Concur with (please) don't start !voting before discussing. Sorry I was rolling two answers in to one (what I quoted in green from Peter, and what Nikki said separately). Is this the page for addressing how articles are split into sections? I'm honestly confused by that suggestion, as we must have another MOS guideline somewhere about that specific subject, while this page is about the size of an article overall and when to think about splitting to a different article, as opposed to how to create sub-headings/sections. But if this is the page, I would expand the the second line from "difficult to navigate, and comprehend" to something along the lines of ... "difficult to navigate, maintain, and comprehend; and may contain extraneous, repetitive or off-topic content that would be better contained in a sub-article". I also wouldn't restrict the need to use summary style to article reaching 10,000 words, because the need to use summary style can be present well below that word count. SandyGeorgia (Talk) 23:49, 8 July 2023 (UTC)
- Suggested "difficult to navigate"-addition sounds like a good idea. I'm thinking we could also avoid getting into details of how paragraphs should look like here by just removing the first "An article that"-sentence and simply start with "Splitting up articles into sections helps..."
- Concur with (please) don't start !voting before discussing. Sorry I was rolling two answers in to one (what I quoted in green from Peter, and what Nikki said separately). Is this the page for addressing how articles are split into sections? I'm honestly confused by that suggestion, as we must have another MOS guideline somewhere about that specific subject, while this page is about the size of an article overall and when to think about splitting to a different article, as opposed to how to create sub-headings/sections. But if this is the page, I would expand the the second line from "difficult to navigate, and comprehend" to something along the lines of ... "difficult to navigate, maintain, and comprehend; and may contain extraneous, repetitive or off-topic content that would be better contained in a sub-article". I also wouldn't restrict the need to use summary style to article reaching 10,000 words, because the need to use summary style can be present well below that word count. SandyGeorgia (Talk) 23:49, 8 July 2023 (UTC)
- I'm not putting this up for a vote, just discussion. I'd appreciate if we didn't start splitting up in oppose/support camps. Peter Isotalo 22:25, 8 July 2023 (UTC)
- A thought regarding the number of words: how about not writing it out at all? Like maybe just "When an article has grown to a considerable size" or whatever? Peter Isotalo 01:10, 13 July 2023 (UTC)
- As far as I know, this is the first time that someone has mentioned the size of a section. What the MOS currently says is
Overly lengthy continuous blocks of text should be avoided; sections which are so long as to impede reader understanding should be broken down into subsections. There remains some disagreement regarding the precise point at which a section becomes too long, so editors are encouraged to use their own judgment on the matter.
(WP:MILMOS#SECTLEN) I question whether this is appropriate to have here, rather than in MOS:LAYOUT or Help:Section. I think it should be in MOS:LAYOUT, where people would be most likely to find. it. Hawkeye7 (discuss) 06:52, 10 July 2023 (UTC)- I agree. Peter Isotalo 01:17, 13 July 2023 (UTC)
- 2+2=4, yes your right! Michaelcockrell7 (talk) 06:05, 12 November 2023 (UTC)
Discussions seems to have dropped off, but I still think this issue is important. So here's an attempt at a second suggestion. My proposal is to replace what's currently under "Readability" with the text below in gray.
Each Wikipedia article is in a process of evolution and is likely to continue growing. Other editors will add to articles when you are done with them. Wikipedia has practically unlimited storage space; however, long articles may be more difficult to navigate, and comprehend. Once an article has grown large enough, it should be split into sections. This helps organize content, especially for readers that are looking for information about a specific aspect of the article topic. Individual sections should not be so long that they impede the ability to find information and should be further divided into sub-sections. For more guidance on how to organize sections, see Wikipedia:Manual of Style and Wikipedia:Layout. When an article has grown very large, it may be beneficial to move some sections to other articles and replace them with summaries per Wikipedia:Summary style – see also Size guideline (rule of thumb) below. |
Still only for discussion, but I would like to hear how close we might be to some sort of reasonable consensus. Peter Isotalo 20:10, 18 July 2023 (UTC)
- I would support adding "This helps organize content, especially for readers that are looking for information about a specific aspect of the article topic" to the existing second paragraph of this section. Nikkimaria (talk) 04:10, 19 July 2023 (UTC)
- no. You will start creating loopholes! IE. I, myself am commenting on this issue and don't even have a concept of the article! No, it is I who needs to read the article first, to understand, without going to subsections, then I will recommend audio in a scenario of size. Yours sincerely. Michaelcockrell7 (talk) 06:14, 12 November 2023 (UTC)
- On something way up near the top:
like the maintenance burden, and the typically poor and redundant and off-topic prose found in many(most?) excessively long articles.
I've been in the process of writing (not from scratch, but from a really bad allegedly C-class but in length only, more like Stub in quality) a detailed article on a semi-major topic, using virtually all available reliable source material (I'm even keeping a log on the talk page of more sources to get), and covering it all has produced a very long article. In the course of doing all this, these sources have been also very useful in other related topics, and it's become clear through direct experience that the "maintenance burden" argument is faulty. It is vastly easier to work the new sources and their facts into a single article, to keep the citations in good order, and to keep the material from becoming repetitive, self-contradictory, or otherwise problematic, than it has been to use the same sources across related articles. While I've opened a thread on the talk page about plans for splitting, I'm in no hurry to do it until I've exhausted the sources available to me, for that very reason. The difficulty of "mining" the sources for what they are worth across that topic space would become almost exponentially more troublesome, and thus discouraging and laziness-inducing (like leave this fact out of that other article even though per WP:SUMMARY it should be in there, etc.). Maybe even discouraging to continue at all. And see also Hawkeye7 way below: "Splitting an article means that some information will be duplicated, which increases our maintenance burden, as the two will have to be kept in step." While the guideline briefly touches on the idea that there is no big hurry in splitting up a long article, I think it should more clearly state the point that if someone is actively developing the material and doesn't want it split yet, that they should be listened to since they're doing (or doing a significant portion of) the work.Second, maybe there are lots of long articles that are full of redundancy and other crap prose, but – while I won't blow my own horn about the alleged quality of my writing – there is no redundancy in that long piece at all that I have not intentionally put in there in preparation for splitting into multiple articles, nor is anything in it off-topic, though some short bits have been written with an intent of merging them out to other pre-existing related articles or new ones that will result from the split-up. (They are not non sequitur or COATRACKS, but they are in a few places more detailed than necessary for that article, but are just right for a post-split side article, and will be swapped out with SUMMARY versions in the presently-long article). The point being, "long = low-quality" isn't a good assumption to make.I realize this has no real impact on the ultimate size of the more-or-less-completed article (or I wouldn't've proposed splitting it and been preparing for that). Just want to avoid people pre-emptively splitting it in the middle of my now three-month marathon of work on it. — SMcCandlish ☏ ¢ 😼 12:14, 27 July 2023 (UTC)- I think the situation you mention is a red herring in terms of the maintenance discussion. Articles built by a single user are not a big maintenance burden at any length. They are fully maintained by the individual in question, in the same way that a lengthy book written by a single author is similarly simple enough to maintain, with small updates by the author in new revisions. It's articles which sprawl over time due to various edits to particular sections and subsections by a number of different editors which are difficult. The difficulty of maintenance is also a function of broadness. The broader the concept, the more it could sprawl and the more difficult it is for any editor to be across the various topics covered in the article. (I don't know what article you are working on, but given you are close to using all available sources, I assume it must be somewhat specific.) Broadness also affects how summary style can be applied. The broader a topic, the more likely it is to cover something that could sensibly make up its own article. CMD (talk) 12:47, 27 July 2023 (UTC)
- I see where you're coming from, in the shape of the larger debate about this material, but my concern is narrower. The thing is, as soon as it got long, someone slapped a length "objection" tag on it, so it's not a red herring from my position; someone at least in theory wants to split it up before it's ready for that from the perspective of the person doing the work, but this guideline doesn't give me any solid rationale along those lines that I can cite. It's not urgent because no one is actually trying to force that split right this moment, but that could change in a day or an hour, meanwhile I have at least another 2 months probably of work to do on the piece, and a split up in the middle of it would be very disruptive to that work (or inimical if you like, since "disruptive" has a specially defined meaning in WP jargon). — SMcCandlish ☏ ¢ 😼 23:51, 27 July 2023 (UTC)
- I agree that the size limit is arbitrary and should go. SMcCandlish's argument is good, and I want to extend that argument to current event articles; it's true that those article may have proseline issues, but I think the length limit prematurely kneecaps those articles, and worse, it favours early coverage (which got freely added to the article) at the expense of later coverage (whose addition is impeded by the {{too long}} warning; even if the later coverage is more relevant, there's an inertia towards keeping what we already have). After {{too long}} was added to 2023 Israel-Hamas war, prose additions slowed down significantly, and surprisingly, that activity didn't shift to the child articles. Incidentally, there are far too many child articles, which are getting BLAR'd and reinstated repeatedly, and they're disorganized to the point where there's now an "outline" article for the war. It'll take a while to fix. If we hadn't added the tag, and allowed the article to keep growing, and only later did calmly reasoned splits, I think the overall result would be superior. (FTR, I'm mostly on the "pro-NOTNEWS" side, if you've followed those recent arguments elsewhere, but I think the size limit is a lose-lose for both sides of that argument, not a win for the "NOTNEWS"/"higher-level coverage" side as is thought).
- I see where you're coming from, in the shape of the larger debate about this material, but my concern is narrower. The thing is, as soon as it got long, someone slapped a length "objection" tag on it, so it's not a red herring from my position; someone at least in theory wants to split it up before it's ready for that from the perspective of the person doing the work, but this guideline doesn't give me any solid rationale along those lines that I can cite. It's not urgent because no one is actually trying to force that split right this moment, but that could change in a day or an hour, meanwhile I have at least another 2 months probably of work to do on the piece, and a split up in the middle of it would be very disruptive to that work (or inimical if you like, since "disruptive" has a specially defined meaning in WP jargon). — SMcCandlish ☏ ¢ 😼 23:51, 27 July 2023 (UTC)
- I think the situation you mention is a red herring in terms of the maintenance discussion. Articles built by a single user are not a big maintenance burden at any length. They are fully maintained by the individual in question, in the same way that a lengthy book written by a single author is similarly simple enough to maintain, with small updates by the author in new revisions. It's articles which sprawl over time due to various edits to particular sections and subsections by a number of different editors which are difficult. The difficulty of maintenance is also a function of broadness. The broader the concept, the more it could sprawl and the more difficult it is for any editor to be across the various topics covered in the article. (I don't know what article you are working on, but given you are close to using all available sources, I assume it must be somewhat specific.) Broadness also affects how summary style can be applied. The broader a topic, the more likely it is to cover something that could sensibly make up its own article. CMD (talk) 12:47, 27 July 2023 (UTC)
- As others said, while the data doesn't support any clear conclusion, people tend to seek out specific parts of articles, not read top-to-bottom; I doubt readers care about overall length, though they do care if our coverage is biased towards early events and against recent events and analysis, and if things are organized clearly (not prematurely split) and, ideally, easily accessible (not needing an "Outline" article to figure out where things are).
- My argument only addresses the impact on major current event articles, which is a minor point; many other good reasons have already been given here. DFlhb (talk) 08:25, 30 October 2023 (UTC)
- I would respectfully like to point out that technology is always evolving, loading and processing times are getting faster, and that Wikipedia articles tend only to keep growing and not shrinking, in the same way that annual global inflation is never 0% or negative. So to assign arbitrary and capricious numbers on article or section lengths depending on byte-based metrics will only be a Sisyphean task with constantly moving goalposts. Also, standing back and perhaps looking at the forest through the trees using a different perspective, why not simply let supply and demand dictate the market, rather than tampering with it? In other words, the more pageviews an article gets on average within a given RfC topic area (history, biography, language, society, science, etc.), obviously more people are demanding more info and more complexity from that page itself, without wanting to take the additional time 1) to find the right subarticle, and then 2) to load that subarticle; all of which which adds more time and less efficiency to the whole search-and-read process. I don't see why some detailed information that has been forked into one of a zillion subarticles of a particular page needs to be mutually exclusive and cannot also be included on the main page itself, in the appropriate subsection, in order to actually enhance the efficiency of the total "search and get the needed information quickly" process. Castncoot (talk) 01:50, 20 December 2023 (UTC)
- There is no evidence that more pageviews = desire for more complexity and more words. Pageviews are tied to article topic, and occasionally to promotion of an article (eg via DYK), rather than article quality. A market-based approach to determine desired length would be to compare pageviews for multiple versions of the same article - eg tell readers searching for Topic X that they can read Version A at 5,000 words or Version B at 15,000 words, and see which gets clicked on more - but we can't do that effectively. Nikkimaria (talk) 02:19, 20 December 2023 (UTC)
- We have been able to compare articles with their subarticles. We know that readers are directed to the main article by search engines even when their search query is for the subject of the subarticle. We know too that readers have resistance, reluctance or difficulty navigating to the subarticles. Hawkeye7 (discuss) 03:16, 20 December 2023 (UTC)
- And to segue onto your well-worn statement there Hawkeye7, main articles get an order of magnitude more pageviews than any of their subarticles. In my humble opinion, we should be responding to the market, our readers, rather than imposing our will upon the way they should search. I favor keeping main articles for the market and subarticles for the academic researchers. If we want Wikipedia to be financially self-sustaining, then we need to listen to the market whose members (readers) are going to voluntarily want to keep Wikipedia thriving through recurrent small donations made based on their own free goodwill as a result of their feeling respected in terms of Wikipedia honoring their needs. Castncoot (talk) 04:33, 20 December 2023 (UTC)
- Your market-based framing is a non-starter for several reasons—the easiest one being that it's absolutely not our job to ensure the site is "financially self-sustaining". Any compelling argument one can make about site policy can just as well be made without any mention whatsoever of donations or the site's finances. Remsense留 04:39, 20 December 2023 (UTC)
- To be clear, the market is our global readership who donates to keep Wikipedia alive. Wikipedia will never be bought or sold to corporate interests. Hence, we have to listen to our global readership rather than telling them what to do and how to search against their own instincts. Castncoot (talk) 04:45, 20 December 2023 (UTC)
- It is not our job to consider what will promote donations per se. If you want to argue in this vein, you can simply argue that it is better for the reader. Remsense留 04:48, 20 December 2023 (UTC)
- Very well then! It is indeed much better for the reader! Castncoot (talk) 08:20, 20 December 2023 (UTC)
- It is not our job to consider what will promote donations per se. If you want to argue in this vein, you can simply argue that it is better for the reader. Remsense留 04:48, 20 December 2023 (UTC)
- To be clear, the market is our global readership who donates to keep Wikipedia alive. Wikipedia will never be bought or sold to corporate interests. Hence, we have to listen to our global readership rather than telling them what to do and how to search against their own instincts. Castncoot (talk) 04:45, 20 December 2023 (UTC)
- One big problem is that search engines are really bad at matching people's queries to the most appropriate Wikipedia article to put in the little special wikipedia result box. If you search for the exact title or something extremely close, you'll get what you're looking for, but if you search for a synonym, even if Wikipedia has a redirect pointing from that term to the right page, mentions the synonym in the article text, etc., the search engine will often just leave it out.
- Personally I don't think Wikipedia should try especially hard to cater to the exact technical bugs and limitations of proprietary third party tools. But it is kind of unfortunate.
- First-party search on Wikipedia could be a lot better, but I'm not sure how much budget / technical attention there is for improving that. –jacobolus (t) 04:47, 20 December 2023 (UTC)
- Your market-based framing is a non-starter for several reasons—the easiest one being that it's absolutely not our job to ensure the site is "financially self-sustaining". Any compelling argument one can make about site policy can just as well be made without any mention whatsoever of donations or the site's finances. Remsense留 04:39, 20 December 2023 (UTC)
- And to segue onto your well-worn statement there Hawkeye7, main articles get an order of magnitude more pageviews than any of their subarticles. In my humble opinion, we should be responding to the market, our readers, rather than imposing our will upon the way they should search. I favor keeping main articles for the market and subarticles for the academic researchers. If we want Wikipedia to be financially self-sustaining, then we need to listen to the market whose members (readers) are going to voluntarily want to keep Wikipedia thriving through recurrent small donations made based on their own free goodwill as a result of their feeling respected in terms of Wikipedia honoring their needs. Castncoot (talk) 04:33, 20 December 2023 (UTC)
- We have been able to compare articles with their subarticles. We know that readers are directed to the main article by search engines even when their search query is for the subject of the subarticle. We know too that readers have resistance, reluctance or difficulty navigating to the subarticles. Hawkeye7 (discuss) 03:16, 20 December 2023 (UTC)
- For market demand, the closest evidence anyone has found so far is for journal articles. The demand is for articles with fewer than 10,000 words.[1][2][3][4][5][6][7][8] In terms of market research, we can look at the market for summaries. Readers usually only use the leads, which are about 500 words long, rather than scroll through 45,000 words e.g.List_of_Assassin's_Creed_characters, i.e. there is higher demand for more summarised info. [43] The Vietnam War article gets 400,000 views per month; assume 25% go past the lead i.e. about 100,000. Thousands of books on it are sold each year, but i'd be amazed if it's the same level of demand as for the WP article. I.e. all the demand evidence says there is higher demand for more summarised info, Tom B (talk) 19:25, 20 December 2023 (UTC)
- @Tpbradbury: should meeting market demand be our goal, though? And lists probably aren't a good example since they aren't generally intended to be linearly read as if they were prose. VQuakr (talk) 19:34, 20 December 2023 (UTC)
- you can ask castncoot, he brought up market demand! Writing an encyclopedia article to yourself, rather than readers seems self-indulgent! i take your point that there is bigger demand for the britney spears article, but the article on music, which isn't currently referenced, should take priority in many ways, Tom B (talk) 20:29, 20 December 2023 (UTC)
- That's nonsensical and does follow from my reply at all. Wikipedia is an encyclopedia. Writing encyclopedia articles on it is not self-indulgent; if a particular reader is looking for something hyper-summarized then Simple English Wikipedia is a separate project, or maybe a Wikimedia site shouldn't be their search target at all. Music is definitely referenced as of this writing. VQuakr (talk) 22:12, 20 December 2023 (UTC)
- hey, thank you, yes Music has references, but there's currently sections like composition that don't have any. I should go reference! "should meeting market demand be our goal, though?", a fair question. castncoot brought it up. there's demands for articles on pop culture, but I appreciate what you're saying e.g. there might not be demand for George Washington's political evolution, until someone supplies it. For hyper-summarised people read leads, which i've noticed smart editors have been working on, and discussed having a competition to accelerate the development of. We should prioritise leads, what do think? Tom B (talk) 11:01, 21 December 2023 (UTC)
- That's nonsensical and does follow from my reply at all. Wikipedia is an encyclopedia. Writing encyclopedia articles on it is not self-indulgent; if a particular reader is looking for something hyper-summarized then Simple English Wikipedia is a separate project, or maybe a Wikimedia site shouldn't be their search target at all. Music is definitely referenced as of this writing. VQuakr (talk) 22:12, 20 December 2023 (UTC)
- you can ask castncoot, he brought up market demand! Writing an encyclopedia article to yourself, rather than readers seems self-indulgent! i take your point that there is bigger demand for the britney spears article, but the article on music, which isn't currently referenced, should take priority in many ways, Tom B (talk) 20:29, 20 December 2023 (UTC)
- The submission guidelines for a scholarly papers published in typical academic journals are totally irrelevant here. The submission guidelines from a sonnet contest will suggest shorter word counts, and the submission guidelines from a publisher of textbooks for year-long undergraduate courses will suggest longer word counts, but who cares? None of these are apples-to-apples, and just listing a whole bunch of different journals doesn't make this claim any more convincing.
assume 25% go past the lead
– this is a wild guess, but it's also kind of irrelevant what the percentage is: if many readers find what they need in the lead, then the article is going a great job (for those readers) irrespective of how long the rest is. But to really determine how various readers engage with wikipedia takes a much more detailed study of reader behavior than aggregate page-view statistics on specific pages.higher demand for more summarised info
– this largely misses the point. Wikipedia serves a wide range of readers with widely varying needs and interests, and there are plenty of readers for whom "more summarized info" does not meet their needs. –jacobolus (t) 19:44, 20 December 2023 (UTC)- Journal articles are much more relevant to encyclopedia articles, than a sonnets or books. Have you evidence of relevant guidelines?Wikipedia:How_to_create_and_manage_a_good_lead_section#Importance_and_purpose_of_the_lead_section, says 40% of mobile readers go past the lead i.e. about 240,000 readers of the Vietnam war article. What is stopping most people scrolling through the 27,000 words in California Trail? The point castncoot was making was about market demand. To work out what demand is, or what readers need, we need to do better than 'plenty', Tom B (talk) 21:03, 20 December 2023 (UTC)
need to do better
That would be wonderful, please feel free! No one is stopping anyone from running a detailed observational study of Wikipedia reader behavior, though it would take quite a lot of effort/resources to do it. Maybe you could find a library science department somewhere or similar with the grant money / grad student time to work on that. –jacobolus (t) 21:57, 20 December 2023 (UTC)- Agree it would take a lot of effort. So, what is the best evidence we currently have? What is closest to an encyclopedia article: a journal article, book or sonnet? Some of us think it's a journal article, others think it's books. Journal articles have a limit of 10,000 words. The article on the United States is 9,000 words and on all Human history, 10,000 words. The current guideline says "almost certainly should be 15,000 words". Would changing it to "should be 15,000 words" remove confusion and help? Would reducing it to 10,000 words, increase the quality of the pedia by increasing summarisation? Would removing the guideline decrease quality by leading to very long articles that are comprehensive, but don't summarise as well? Tom B (talk) 11:19, 21 December 2023 (UTC)
- Am I sensing the beginnings of a breathtaking discussion here between several of us that could be a game changer to alter the face of Wikipedia forever? I'm starting an RfC here, and if anyone feels it would be more appropriately placed below, please feel free to do so.
- Agree it would take a lot of effort. So, what is the best evidence we currently have? What is closest to an encyclopedia article: a journal article, book or sonnet? Some of us think it's a journal article, others think it's books. Journal articles have a limit of 10,000 words. The article on the United States is 9,000 words and on all Human history, 10,000 words. The current guideline says "almost certainly should be 15,000 words". Would changing it to "should be 15,000 words" remove confusion and help? Would reducing it to 10,000 words, increase the quality of the pedia by increasing summarisation? Would removing the guideline decrease quality by leading to very long articles that are comprehensive, but don't summarise as well? Tom B (talk) 11:19, 21 December 2023 (UTC)
- Journal articles are much more relevant to encyclopedia articles, than a sonnets or books. Have you evidence of relevant guidelines?Wikipedia:How_to_create_and_manage_a_good_lead_section#Importance_and_purpose_of_the_lead_section, says 40% of mobile readers go past the lead i.e. about 240,000 readers of the Vietnam war article. What is stopping most people scrolling through the 27,000 words in California Trail? The point castncoot was making was about market demand. To work out what demand is, or what readers need, we need to do better than 'plenty', Tom B (talk) 21:03, 20 December 2023 (UTC)
- @Tpbradbury: should meeting market demand be our goal, though? And lists probably aren't a good example since they aren't generally intended to be linearly read as if they were prose. VQuakr (talk) 19:34, 20 December 2023 (UTC)
- There is no evidence that more pageviews = desire for more complexity and more words. Pageviews are tied to article topic, and occasionally to promotion of an article (eg via DYK), rather than article quality. A market-based approach to determine desired length would be to compare pageviews for multiple versions of the same article - eg tell readers searching for Topic X that they can read Version A at 5,000 words or Version B at 15,000 words, and see which gets clicked on more - but we can't do that effectively. Nikkimaria (talk) 02:19, 20 December 2023 (UTC)
- I would respectfully like to point out that technology is always evolving, loading and processing times are getting faster, and that Wikipedia articles tend only to keep growing and not shrinking, in the same way that annual global inflation is never 0% or negative. So to assign arbitrary and capricious numbers on article or section lengths depending on byte-based metrics will only be a Sisyphean task with constantly moving goalposts. Also, standing back and perhaps looking at the forest through the trees using a different perspective, why not simply let supply and demand dictate the market, rather than tampering with it? In other words, the more pageviews an article gets on average within a given RfC topic area (history, biography, language, society, science, etc.), obviously more people are demanding more info and more complexity from that page itself, without wanting to take the additional time 1) to find the right subarticle, and then 2) to load that subarticle; all of which which adds more time and less efficiency to the whole search-and-read process. I don't see why some detailed information that has been forked into one of a zillion subarticles of a particular page needs to be mutually exclusive and cannot also be included on the main page itself, in the appropriate subsection, in order to actually enhance the efficiency of the total "search and get the needed information quickly" process. Castncoot (talk) 01:50, 20 December 2023 (UTC)
RfC Reader's choice for extended length versus condensed length
Wrong forum, try WP:VPR. Broken and buried in the middle of a very long section.
|
---|
Inasmuch as multiple readers or even a single reader at different times may seek more extended or conversely more condensed versions of longer articles, and especially inasmuch as Wikipedia article sizes are generally expected to continue on an indefinite growth trajectory over time - in concept, should we begin to explore the idea of the creation of both an extended version as well as a condensed version of longer articles on the English Wikipedia mainspace, and hence allowing the reader the choice of either option at any given time? The condensed version would be one that heavily emphasizes tight summaries and forking to subarticles within English Wikipedia, (bearing no relationship to its counterpart article on Simple English Wikipedia); while the extended version would be more comprehensive and allow for more important points from subarticles to be included into its own sections and subsections, with generally much more tolerance of length vis-a-vis both sections and the article itself? Castncoot (talk) 01:40, 21 December 2023 (UTC)
|
Research-based observations
I've observations from two of the studies listed above, one on average reading time and one on the use of sections.
The measured average time that readers linger on articles that they actually want to read a bit more extensively is about 45 seconds.[44]
Applying a reading speed of 238 words/minute (as cited by Nikki above), that would mean that a reading session is about 180 words or just over 1 kB of prose. And that's assuming people people actually read the entire time rather than simply browse. So the average reading session is below a "don't bother to split"-limit that has been around for over 20 years.[45] It also has zero relevance to the current "proven" attention span argument, and is not even 10% of the limits proposed in the earliest "readers may tire"-argument from March 2004.[46]
Regarding how readers navigate articles, content further down in an article is far less likely to be read, but readers also seem perfectly capable of picking out specific sections they want to read, regardless of how far down they are.[47] If you look at the two images on the right that present the results you'll note how readers seem to favor quite specific sections, not just those that are closest to the top. To me it seems to indicate that proper organization of an article may matter far more than its total size.
Peter Isotalo 14:47, 7 July 2023 (UTC)
- That 2003 limit appears to be on total bytes, not readable prose as in the current page. Nikkimaria (talk) 01:00, 8 July 2023 (UTC)
- That's true, but the point is that it it's the origin of the "readers may tire"-argument used today. It didn't appear based on fact or research, but was simply something someone just made up at some point. Peter Isotalo 08:30, 8 July 2023 (UTC)
- It's not, though - the original limits were rather more heavily focused on technical limitations. Nikkimaria (talk) 12:59, 8 July 2023 (UTC)
- You're splitting hairs. The 2003 version clearly says that "[r]eaders may also tire". It wasn't the primary argument back then, but it's exactly this that morphed in to the "Readability" section that is currently based on pure conjecture and the notion of a fantasy reader type that doesn't exist.
- Yesterday you cited a whole bunch of research about attention span. Now the data says that the average reader reads at most 1 kB at a time. Are you going to argue that we reduce the maximum article size to 200 words? Peter Isotalo 14:35, 8 July 2023 (UTC)
- No, I'm going to argue that the data you've presented doesn't help us identify an alternate appropriate maximum article size. Nikkimaria (talk) 14:46, 8 July 2023 (UTC)
- (Disclaimer: I am the author of African humid period which is pretty lengthy) Well, I would argue that these data show that the amount of prose in the article is irrelevant to readers and thus can't be used to justify any maximum article size. That leaves editability (because VE in particular hangs up on very long articles), technical issues (the template limits) and connection issues (articles+image combinations that are overly large in terms of byte size can overtax connections) however. Jo-Jo Eumerus (talk) 05:31, 11 July 2023 (UTC)
- This is pretty much what I believe we need to be moving towards. Not saying we need to go full yolo and just abolish maximum size completely, but I think we need to focus on arguing for two separate issues:
- editorial capacity to maintain article quality
- technical limitations that actually make article loading difficult
- I think the issues need to be argued separately from each other. And I think the technical limitations absolutely need to be backed up by actual reader data. Personal experiences and anecdotal evidence makes for awfully messy discussions. Peter Isotalo 01:17, 13 July 2023 (UTC)
- I would formulate it as such:
- How long it takes in VE to edit a long article.
- For FAC/GAN/other content processes, how much work it takes to review a long article.
- How long a text+image combination can be before loading the article becomes difficult.
- Jo-Jo Eumerus (talk) 06:23, 13 July 2023 (UTC)
- Regarding requirements for review processes, is it really something that can be included here? Wouldn't that make this page something of an extension of the FA and GA criteria? Peter Isotalo 20:15, 18 July 2023 (UTC)
- I would formulate it as such:
- I agree with Jo-Jo that the amount of prose in an article is irrelevant to readers and thus can't be used to justify any maximum article size. I don't think attention span is a good argument for cutting article size because attention span varies from person to person. There's a possibility people will read an article from start to finish if they find it interesting enough but why does that matter? Everyone has their own reason for why they are reading a WP article in the first place. Some are only looking for a specific piece of information while others are just curious about the article subject. Many (most?) people don't read articles from start to finish even if they have a prose size of less than 50 kB. Does that mean articles should be split at a smaller size? It just doesn't seem logical to me. Volcanoguy 01:16, 21 July 2023 (UTC)
- I don't usually read a book all in one go. Hawkeye7 (discuss) 19:28, 22 July 2023 (UTC)
- Maybe not but this guideline currently gives the impression that people should be able to read an article in one go. Volcanoguy 06:37, 16 November 2023 (UTC)
- My view is that we need to get rid of the readability argument at least for the time being. The reasoning we've relied on for the past 20 years simply doesn't hold up to scrutiny. See the suggestion above for a rewording of what's currently under "Readability". Peter Isotalo 20:05, 22 July 2023 (UTC)
- Or at least moderate the readility claims, and start including real evidence-based material. E.g., MOS:DL was recently overhauled on the basis of a study (WMF's I think) that showed readers, especially on mobile (now over 50% of our readership at any given moment), don't read top-to-bottom but jump around all over the place; thread is over here. I also agree with much of what Jo-Jo Eumerus wrote, other than I think that FAC/GAN are non-concerns unless/until an article is headed for one of those processes. Readers generally don't notice or care about them, only a certain camp of editors do, and many of us are not in that camp. E.g., I consider it vastly more important to improve crappy Stub and C articles into B-class than to polish the chrome on articles that are already encyclopedic enough to be useful. And I'm not alone in that. I don't think articles should be split up on a GAN/FAC review basis unless there's certainty that the review is going to happen and soon. Splitting well can take a tremendous amount of work, and when it's done poorly, the results can be very reader unhelpful. — SMcCandlish ☏ ¢ 😼 12:14, 27 July 2023 (UTC)
- I don't usually read a book all in one go. Hawkeye7 (discuss) 19:28, 22 July 2023 (UTC)
- This is pretty much what I believe we need to be moving towards. Not saying we need to go full yolo and just abolish maximum size completely, but I think we need to focus on arguing for two separate issues:
- (Disclaimer: I am the author of African humid period which is pretty lengthy) Well, I would argue that these data show that the amount of prose in the article is irrelevant to readers and thus can't be used to justify any maximum article size. That leaves editability (because VE in particular hangs up on very long articles), technical issues (the template limits) and connection issues (articles+image combinations that are overly large in terms of byte size can overtax connections) however. Jo-Jo Eumerus (talk) 05:31, 11 July 2023 (UTC)
- No, I'm going to argue that the data you've presented doesn't help us identify an alternate appropriate maximum article size. Nikkimaria (talk) 14:46, 8 July 2023 (UTC)
- There is a counter-balancing technical point: splitting an article means that some information will be duplicated, which increases our maintenance burden, as the two will have to be kept in step. Hawkeye7 (discuss) 06:52, 10 July 2023 (UTC)
- It's not, though - the original limits were rather more heavily focused on technical limitations. Nikkimaria (talk) 12:59, 8 July 2023 (UTC)
- That's true, but the point is that it it's the origin of the "readers may tire"-argument used today. It didn't appear based on fact or research, but was simply something someone just made up at some point. Peter Isotalo 08:30, 8 July 2023 (UTC)
- Isn't that why we try to write ledes? Essentially, we begin each article with a heavily condensed version of the article that covers and summarizes the main points... this lets people decide whether they are done after the first few paragraphs or whether they want to go deeper (and either way, we have something tailored to their preference). jp×g 04:58, 9 August 2023 (UTC)
- Partly - see the explanation at WP:DETAIL. Nikkimaria (talk) 02:45, 10 August 2023 (UTC)
- "A reading session is about 180 words". That reflects my usage of Wikipedia as a reader as well. I usually consult it for concise answers of who, what, when, why, not to read in general more than a small section. That does not mean I don't support long articles though. Regards, Thinker78 (talk) 20:27, 22 October 2023 (UTC)
- The readers who are after concise answers tend to jump in to the section they hope contains the detailed information that they are looking for. Hawkeye7 (discuss) 22:47, 22 October 2023 (UTC)
Research about how long readers spend on the page before bouncing, how long a typical person can read continuously before tiring, etc. are not really that relevant to this question in my opinion. Wikipedia articles should not only be designed to be read in one sitting by a single committed reader, nor should they be judged by the typical reader behavior as described by a couple pieces of timing data stripped from other context. Articles are many things to many readers: readers arrive from various sources (search engine, social media / direct mail or message from an acquaintance, link in another web document, personal bookmark, wikilink from various other articles, ...) and have widely varying levels of preparation, interests, goals, needs, and reading styles. Some would be satisfied with a half-sentence, and others could read a book and still want more.
Some want a short description of WTF the subject is / what field it's in, some are procrastinating by browsing around following wikilinks from one topic to another, some are opening up 20 wiki articles in tabs and skipping back and forth between them, some are reading a textbook and opening wiki articles about the unfamiliar terms as a supplementary resource, some want to fact check a particular detail they heard on the news, some want to learn about a specific subtopic and might skip most of an article but compare details across multiple articles, some want to learn a new challenging technical subject and plan to repeatedly revisit an article over time, some are looking for an interesting bedtime read and will keep reading as long as they find a nice narrative arc, some are students writing a paper who want to plagiarize the article look at Wikipedia's sources so they can cite them, some are researchers who will follow up the cited sources and do a deep dive, some are book authors (or blog authors) who want to find freely available images to use in their own works, some are programmers using articles as a reference for code/formulas to adapt in implementing their own code, etc. etc.
I would guess a vanishingly small proportion (less than 1/100, maybe orders of magnitude less) of total page views to articles of even moderate length ever result in someone reading the whole article straight through end to end. But that doesn't mean any particular part of the article is bad, or too long, or off topic, or should be removed. For instance, a section of niche interest to specialists, containing tabular data, or including rarely needed technical reference details might only be examined carefully by a trivial fraction of page readers, but could still be important to include somewhere (ideally nearer the bottom than the top of a page). Such sections generally do no harm to anyone who isn't looking for them, so long as they don't become magnets for collections of unrelated trivia or material that is clearly unencyclopedic; deciding this should be left up to local consensus and editors' discretion.
To concretely/specifically understand how people use articles is going to take much more detailed research than anything I've seen above or elsewhere. It will take finding (a large number of) specific readers who have some particular goal in using their computer / the internet (write a paper, learn about a topic, answer a question, ...), tracking their full browsing session (to see how/whence they arrive at articles), and then analyzing it in detail, including asking them detailed questions about which parts they read / skimmed, which parts they needed to know, which parts they found meaningful/interesting, what they plan to come back to later, etc. etc.
Trying to set a uniform standard for how long every encyclopedia article should be, irrespective of subject matter, article style, relation to other articles, importance of the topic, etc. is hopeless, and should never have been described as a "guideline", even if qualified as a "rule of thumb". The "WP:TOOBIG standard" was inevitably going to be a half-assed justification to abusively plaster {{too long}} banners all over the place and wikilawyer about one editor or another's preferred inter-article organization scheme.
The important criteria for articles is that they have clear scope, are clearly organized, stay on topic, have a moderately clear narrative flow (esp. within sections), put the most important information nearer the top, are well illustrated especially near the top, link obviously to relevant nearby/overlapping topics, etc. What byte/word count an article has is nowhere near as important, and does not in my opinion meaningfully help readers except insofar as it focuses attention on one of these more important primary goals. To the extent that bikeshedding about byte counts distracts from those criteria, it is actively harmful. –jacobolus (t) 04:41, 26 October 2023 (UTC)
- In a word: yep. — SMcCandlish ☏ ¢ 😼 05:22, 26 October 2023 (UTC)
- Agreed. XOR'easter (talk) 16:28, 27 October 2023 (UTC)
- The main purpose of WP:SIZERULE is to compliment WP:NOT and make sure that Wikipedia stays an encyclopedia, rather than becoming a dumping ground of unmanagable crap. If you think readability is the overriding motive, you've obviously never read a Wikipedia article! Nosferattus (talk) 01:46, 28 October 2023 (UTC)
- I don't understand your point. Can you explain how "don't dump crap" is co-extensive with the standard of "try hard to split articles at 10k words and never allow them to grow bigger than 15k words", as this is employed by wikilawyers in practice? Those two things seem largely unrelated. There are heaps of individually short "unmanageable crap" articles strewn across the project, as well as several excellent very long articles. If the purpose is just to tell people not to dump crap, wouldn't it be better to just say so directly? –jacobolus (t) 02:05, 28 October 2023 (UTC)
- Generally have to agree with jacobolus on this. WP's crap problem is almost entirely in two forms: junk articles, mostly in popular-culture spheres, and indiscriminate "In popular culture" sections that just add as much trivia as humanly possible. Our average long article is not full of unencyclopedic crap, but is the work of concerted editors trying to be comprehensive. If the content in them were crap, this would not at all be an issue for article size and for splitting, but for removing crap per WP:NOT's various critieria. We would have no interest in taking unencyclopedic garbage found in a long article and spinning it out into new side articles of the same unencyclopedic garbage. (Aside: The problem of junk pop-cult articles is only ultimately solvable by adjusting WP:GNG to be more stringent, probably discounting entertainment news as sources that help establish notability. The fact that some random actor has credits in 5 movies or TV shows and has been mentioned repeatedly in entertainment news – which over-dwells on name-dropping of actors and serves little purpose other than promoting actors and the works they are in, to the benefit of the media-company advertisers who keep these publications alive at all, and thus they lack independence from the subject – doesn't really make the actor encyclopedically notable, it just makes them marginally competent enough in their field to not have given up and gone back to waiting tables or driving for Uber.) — SMcCandlish ☏ ¢ 😼 04:33, 28 October 2023 (UTC)
- The connection with WP:NOT lies in the spurious belief that an encyclopaedia must consist of small articles. This was necessary in the days of print encyclopaedias because there was limited space. Wikipedia, however, is WP:NOTPAPER and has no such limits. (The Encyclopaedia Britannica also has the Macropedia with its large articles, but this was not what the authors of WP:SIZE had in mind.) The "In popular culture" sections have a similar origin: the widespread belief that the Wikipedia should be restricted to articles on popular culture. By adding trivia they are, in their mind, saving the article from being deleted under WP:NOT. Hawkeye7 (discuss) 09:12, 28 October 2023 (UTC)
- We might take for example the Dictionary of Scientific Biography, which includes a biography of Isaac Newton which runs to 61 pages. –jacobolus (t) 02:24, 30 October 2023 (UTC)
- The connection with WP:NOT lies in the spurious belief that an encyclopaedia must consist of small articles. This was necessary in the days of print encyclopaedias because there was limited space. Wikipedia, however, is WP:NOTPAPER and has no such limits. (The Encyclopaedia Britannica also has the Macropedia with its large articles, but this was not what the authors of WP:SIZE had in mind.) The "In popular culture" sections have a similar origin: the widespread belief that the Wikipedia should be restricted to articles on popular culture. By adding trivia they are, in their mind, saving the article from being deleted under WP:NOT. Hawkeye7 (discuss) 09:12, 28 October 2023 (UTC)
- Generally have to agree with jacobolus on this. WP's crap problem is almost entirely in two forms: junk articles, mostly in popular-culture spheres, and indiscriminate "In popular culture" sections that just add as much trivia as humanly possible. Our average long article is not full of unencyclopedic crap, but is the work of concerted editors trying to be comprehensive. If the content in them were crap, this would not at all be an issue for article size and for splitting, but for removing crap per WP:NOT's various critieria. We would have no interest in taking unencyclopedic garbage found in a long article and spinning it out into new side articles of the same unencyclopedic garbage. (Aside: The problem of junk pop-cult articles is only ultimately solvable by adjusting WP:GNG to be more stringent, probably discounting entertainment news as sources that help establish notability. The fact that some random actor has credits in 5 movies or TV shows and has been mentioned repeatedly in entertainment news – which over-dwells on name-dropping of actors and serves little purpose other than promoting actors and the works they are in, to the benefit of the media-company advertisers who keep these publications alive at all, and thus they lack independence from the subject – doesn't really make the actor encyclopedically notable, it just makes them marginally competent enough in their field to not have given up and gone back to waiting tables or driving for Uber.) — SMcCandlish ☏ ¢ 😼 04:33, 28 October 2023 (UTC)
- I don't understand your point. Can you explain how "don't dump crap" is co-extensive with the standard of "try hard to split articles at 10k words and never allow them to grow bigger than 15k words", as this is employed by wikilawyers in practice? Those two things seem largely unrelated. There are heaps of individually short "unmanageable crap" articles strewn across the project, as well as several excellent very long articles. If the purpose is just to tell people not to dump crap, wouldn't it be better to just say so directly? –jacobolus (t) 02:05, 28 October 2023 (UTC)
- The main purpose of WP:SIZERULE is to compliment WP:NOT and make sure that Wikipedia stays an encyclopedia, rather than becoming a dumping ground of unmanagable crap. If you think readability is the overriding motive, you've obviously never read a Wikipedia article! Nosferattus (talk) 01:46, 28 October 2023 (UTC)
- (de-indent) But... we do have a 61 page biography of Isaac Newton here? It's just split across Category:Isaac Newton for summary style reasons. Let's say that book has around 450 words per page; 450*60 = 27000 words. From the PROSESIZE tool, Isaac Newton comes back with 7000 words, Early life of Isaac Newton at 5000 words, Later life of Isaac Newton at 5000 words, Religious views of Isaac Newton at 2500 words, Isaac Newton's occult studies at 4000 words, and Isaac Newton's apple tree at 2000 words. That's 25500 words out of 27000 already; surely the DSB biography delves into some of the science he was involved in, and if we include the other Newton-adjacent things in the category like rotating spheres, Leibniz–Newton calculus controversy, and others, Wikipedia surely beats out the Dictionary of Scientific Biography in word count. I don't think anyone is demanding Wikipedia narrow its focus on topics with a lot of ground to cover like Newton, just... spread it out, so that the casuals looking for a basic overview read just the lede of the main article, mildly more diligent people read all of the main Isaac Newton article, and graduate students or Newton fans who really want to drill down into the nitty gritty read the subarticles. And to be clear, there's some pretty savage dropoff in reader views in subarticles, traditionally (see this set of well-maintained articles, where the main article has ~3,000 hits daily, and the spinoff articles generally have single-digit daily hits, with just two of the subarticles squeaking up to ~100 hits daily). That suggests that stuffing the information back in the main article will just result in it getting skipped if it's getting so few clicks relatively: most readers aren't bothering. But screw it, I've certainly worked on plenty of single digit daily hit articles for that one random reader who's interested. So in-depth coverage is welcome, but it can be done in compatibility with existing summary style, and further, it's a good idea. SnowFire (talk) 02:22, 2 November 2023 (UTC)
- FTR, the Complete Dictionary of Scientific Biography (Scribner's, 2008) gives Newton about 35,000 words. EEng 08:20, 2 November 2023 (UTC)
- Sure. I only oppose arbitrary splitting, splitting just for the sake of it, or to meet arbitrary word limits. I would have opposed, for example, splitting John McCain III's political career in two at 2000. This comes in part, from splitting articles and then having to defend the new subarticle at AfD. If you had asked me who John McCain was, I would have said he was an admiral. Hawkeye7 (discuss) 02:56, 2 November 2023 (UTC)
- I can't agree with the rather odd conclusion "stuffing the information back in the main article will just result in it getting skipped" leapt to after this: "there's some pretty savage dropoff in reader views in subarticles, traditionally (see this set of well-maintained articles, where the main article has ~3,000 hits daily, and the spinoff articles generally have single-digit daily hits, with just two of the subarticles squeaking up to ~100 hits daily)." SnowFire seems to be assuming that the facts/content itself that has been shunted into a side page is intrinsically of lower reader interest (a quality that sticks to it, no matter where we put it), when there is no evidence of this, but a lot of counter-evidence from various studies of WP and general web usage that people are simply resistant to following links to additional pages to get information. Side articles get lower views because they don't match up with simple searches, they take more work to get to, and they pertain to narrowed scopes that match fewer aggregate interests. These are properties of the "container", the side article, not of a discrete fact that someone subjectively puts into that container. What this all tells me is that moving content into side-articles has a ghettoizing effect. The presence of "fact X" in John McCain means it will necessarily get more readers seeing it than moving it into something like Early life and military career of John McCain, and someone looking to whitewash or otherwise PoV-push in our content will probably know this and use it to their ill-motivated advantage. This effect is actually doubled, because side articles almost always have far fewer watchlisters and other interested parties, so it becomes much easier to completely suppress or PoV-alter material after it has been ghettoized to a side article. This isn't to say no long articles should be split, of course, but we have to be aware of potential consequences, and also should not be jumping to unsupportable conclusions about the intrinsic interest to the reader of something that could be moved to another article. That's putting the cart before the horse. "Fact X is in side article B" doesn't magically make fact X of lower innate interest to anyone, but it certainly will translate to more difficult findability by a reader looking for it, and lower visibilty/access by readers in general. — SMcCandlish ☏ ¢ 😼 07:20, 2 November 2023 (UTC)
- From my experience, biographical subarticles in particular get ignored by search engines. You would hope that a search for
john mccain pow vietnam
would turn up Early life and military career of John McCain in its first page of results, but it does not on either Google or Bing. You just get John McCain. Similarly with a search forjohn mccain senator
not finding either of the congressional career subarticles (which, in retrospect, were a bad idea to begin with). Wasted Time R (talk) 11:22, 2 November 2023 (UTC) - Much of what SMcCandlish says above is non-controversial. Yes, spinoff articles can be used for POV-pushing and can easily fall into a trap if the few watchers they have move on and they become "dumping grounds". And yes, picking and choosing which facts make the main article is inherently a powerful editorial decision, with facts relegated to subarticles likely being deemed as less important. That is in fact precisely what I was saying. And I think people are aware of the consequences of subarticles (I certainly am, at least). And it sounds like we both agree that there are still plenty of times when this is the right trade-off. So... this isn't something that goes against what I said.
- We're going off the rails as far as the claim about lesser facts. I'm not saying that less important facts are less important because they're in a side article (which is obviously backwards), but rather that less important facts should go in side articles (well, when there's cause for a side article at all, for topics with tons of stuff written about 'em that don't fit). McCain getting 89,116 votes in his 1982 election is probably less of reader interest than him being held as a POW in Vietnam, hence why one is in the Electoral history subarticle and the other is in the lede of the main article. (And why I picked a well-maintained set of articles that can reasonably be trusted as having good judgment on what facts to stick where.)
- Anyway, as far as research goes, per the initial comments in this section, the vast majority of hits (some of which are bots, in fairness) stick around on the page for a very short period of time - enough to read the first paragraph or maybe the lede. People who read the entirety of our longer articles are rare, and would become rarer if we weakened the SIZE guidelines and just started stuffing articles with everything. Being able to get to the gist is part of what makes a good writer. All of the potential problems with split-offs SMcCandlish mentioned are absolutely true, but (and here's the value judgment part) adhering to size guidance is still more important. We should be offering a snappy, concise 15-to-20 pager as an introduction to Newton (or whoever) for readers interested in that. SnowFire (talk) 18:04, 2 November 2023 (UTC)
- Agree with SnowFire; expanding size opens up to poor additions of everything under the sun, rather than encouraging encyclopedic focus,. SandyGeorgia (Talk) 18:22, 2 November 2023 (UTC)
- Wikipedia is a compendium of knowledge, but the form it takes - what is regarded as "encyclopaedic" - is not bound by the conventions of paper encyclopaedias. I've had readers who resented having to go to a subarticle for the details they were looking for, yet could find the time to complain about it on the talk page. Our solution to keeping the articles to an arbitrary size - which increasingly lacks cogent justification - is summary style, the creation of subarticles, since unlike a paper encyclopaedia we are not limited in the number of articles we can have. But there is an inherent tension between summary style and notability. There are the POV splits, and the practice of unloading toxic waste like "in popular culture" sections into subarticles (which then get nominated for deletion). Hawkeye7 (discuss) 23:16, 2 November 2023 (UTC)
- SnowFire's response was well put, and assuages many of my concerns, but Hawkeye7's issue about an arbitrary size limit remains a live one. I'm not sure what the way around it is (or would have posted a concrete proposal for it by now!). — SMcCandlish ☏ ¢ 😼 08:55, 3 November 2023 (UTC)
- Wikipedia is a compendium of knowledge, but the form it takes - what is regarded as "encyclopaedic" - is not bound by the conventions of paper encyclopaedias. I've had readers who resented having to go to a subarticle for the details they were looking for, yet could find the time to complain about it on the talk page. Our solution to keeping the articles to an arbitrary size - which increasingly lacks cogent justification - is summary style, the creation of subarticles, since unlike a paper encyclopaedia we are not limited in the number of articles we can have. But there is an inherent tension between summary style and notability. There are the POV splits, and the practice of unloading toxic waste like "in popular culture" sections into subarticles (which then get nominated for deletion). Hawkeye7 (discuss) 23:16, 2 November 2023 (UTC)
- Agree with SnowFire; expanding size opens up to poor additions of everything under the sun, rather than encouraging encyclopedic focus,. SandyGeorgia (Talk) 18:22, 2 November 2023 (UTC)
- From my experience, biographical subarticles in particular get ignored by search engines. You would hope that a search for
- My point is that "length of an encyclopedia article" in the broader world doesn't have some kind of inherent cap, even in a paper encyclopedia where there are relatively steep trade-offs for every extra page. This is not a Newton word count contest. In my opinion any "main" article should cover all of the important aspects of the subject in a reasonably self-contained way, in sufficient detail to match the importance/extent of each subtopic. A side article with even further detail doesn't absolve the main article from its "responsibility" to cover that subtopic. Too often on articles here, the {{main}} template and "summary style" is used as an excuse to put a uselessly and often misleadingly short summary, under the theory that anyone who cares will just click the link. This article size guideline should not be used as a justification for such changes. When material is removed (whether or not it gets summarized), it should be on the basis that the removed material was veering off topic or out of scope, giving undue consideration to a particular subtopic at the expense of the main subject, interrupting the narrative flow of the article, or the like. Not just that the whole article hit some hard word count limit. Different subjects take more or less detail to adequately cover. There are some subjects about which barely anything has been written, and the most we can write is a few hundred words, mostly about the broader context. For other subjects, 15,000 words is really not enough. –jacobolus (t) 14:36, 3 November 2023 (UTC)
- I think this is bringing up one issue to argue against something else. Bad writing is bad, but I could easily flip around your example and say that spinning a long digression off into a subarticle is improving "the narrative flow of the article." And I suspect this case is far more common than the improper spin-off that interrupts the flow. Discipline about size tends to improve writing, not make it worse. Sure, there's no "inherent" limit, but there is a practical limit, and if you're going over 10,000 words you've probably hit it.
- There's no shame in summary style removals. Take Chemistry and Category:Chemistry. These other articles in the tree are not discussing matters "off-topic" or "out of scope" for the top-level Chemistry article; just the top-level Chemistry article needs to be an encyclopedia article and not a five-volume textbook.
- Undoubtedly there are topics where 15,000 words are not enough, nor even 1,500,000 words. Great, make subarticles! Or a freely licensed Wikibook, perhaps. That doesn't argue against the reasonable WP:SIZE limits of how long one single page of an encyclopedia article should be. SnowFire (talk) 19:28, 3 November 2023 (UTC)
- Category:Chemistry is not an article, and it gets 1% of the views of Chemistry so is demonstrably not something that readers look at or care about in practice.
Chemistry article needs to be an encyclopedia article and not a five-volume textbook
– nobody has ever proposed anything like this hyperbolic straw man. –jacobolus (t) 21:51, 3 November 2023 (UTC)- I trust that you will understand what it is actually meant here: not the category page itself, but rather the articles within the category (Organic chemistry, etc.), many of which are topics that are perfectly validly part of chemistry and not off-topic, etc. SnowFire (talk) 23:35, 3 November 2023 (UTC)
- An article about a very expansive topic like "Chemistry" or "History" necessarily has a huge number of topics to cover. Not only subject content about the table of elements and chemical reactions and so on, but also meta-information about the history of chemistry, the methods and tools used by chemists, the relation of chemistry to other scientific disciplines, the practical applications of chemistry, the economic impact and organization of the chemical industry and other industries with a heavy reliance on chemistry, chemistry as a career, chemistry education from secondary through postgraduate school, the organization of the chemistry research community, etc. etc.
- Our current article is quite limited and if I try to imagine an ideal Wikipedia article about the subject, it would stretch easily to 15,000 words if not beyond – that is, we could probably triple the length of current article without getting bogged down with an excessive level of detail about any particular subtopic. Most of the additions belonging in an ideal article would be topics that our current article doesn't even mention let alone cover adequately, rather than additional detail about the subjects already discussed (though I'm sure there's room for that too). Disclaimer: I don't know that much about chemistry and have very limited personal connection to the subject, beyond taking a 1 year course in high school and sometimes watching NileRed youtube videos with my 4-year-old. –jacobolus (t) 00:52, 4 November 2023 (UTC)
- The chemistry article does look rather paltry right now — and I'm saying that as a physicist, so I'm not trying to hype my own field. :-) XOR'easter (talk) 18:15, 4 November 2023 (UTC)
- I trust that you will understand what it is actually meant here: not the category page itself, but rather the articles within the category (Organic chemistry, etc.), many of which are topics that are perfectly validly part of chemistry and not off-topic, etc. SnowFire (talk) 23:35, 3 November 2023 (UTC)
- Creating subarticles is not easy. First, unless we are going relax WP:GNG for subarticles, we have to have a subtopic that is itself notable. So we cannot have "Article (part one)" and "Article (part two)" (except for list articles). Usually, we look for a section that can be split off, but not all articles have these, and creating them may involve restructuring the whole article. Since the section will be replaced with a three or four paragraph summary, it will have to be larger than that, or we won't substantially reduce the size of the parent article, which would defeat the purpose of the exercise. The subarticle has to stand on its own, so we may have to add a background summary that fits it into the subarticle. So there is considerable work involved. Hawkeye7 (discuss) 22:09, 3 November 2023 (UTC)
- I wouldn't take the alleged consensus against "inherited notability" too seriously. I think that if there's valid sources, an AFD is very unlikely to succeed on even tiny subtopics, if it can be shown that the sources are of strong quality (something like Influences on J. R. R. Tolkien, perhaps). It only gets dicey when it's, say, a fictional character spin-off and all the sources are primary sources. (And I personally would be disinclined to even think of that as a huge problem, but eh, no need to re-fight the "fancruft" wars of 2007-2011).
- If you want to avoid duplicate content, my personal suggestion is to slap a {{Main}} at the very top of the lede section of a spin-off article as a clue for "we really expect you to have read the above article as background." The nice thing about writing for readers hardcore enough to find their way to a subarticle in the first place is that you can somewhat trust them to click links if need be. It doesn't look like Battle of Gettysburg, first day spends tons of time going over the basic background to the battle itself, for example - it's understood that a reader clueless about that needs to read the top-level article first. SnowFire (talk) 23:35, 3 November 2023 (UTC)
- Regarding this:
I wouldn't take the alleged consensus against "inherited notability" too seriously.
If there's some other encyclopedia project where people don't take alleged consensuses about article-worthiness deadly seriously, maybe I should be editing over there instead. In general, deciding how to organize content across multiple articles is a hard problem, and I don't see how this guideline helps in any meaningful way to solve it. XOR'easter (talk) 18:22, 4 November 2023 (UTC) - Agreed. I split Assessment of the Battle of Long Tan (4,500 word) off from the 16,000-word main article. Then had to defend the decision at AfD - twice (Wikipedia:Articles for deletion/Assessment of the Battle of Long Tan, Wikipedia:Articles for deletion/Assessment of the Battle of Long Tan (2nd nomination)) Hawkeye7 (discuss) 20:14, 4 November 2023 (UTC)
- Regarding this:
- Category:Chemistry is not an article, and it gets 1% of the views of Chemistry so is demonstrably not something that readers look at or care about in practice.
Removing Almost
WP:PG says "Be clear...plain, direct, unambiguous, and specific....Even in guidelines...do not be afraid to tell editors directly they must or should do something." WP:CREEP says, "Avoid instruction creep to keep...guideline pages easy to understand. The longer, more detailed, and more complicated you make the instructions, the less likely anyone is to read or follow whatever you write." Currently the guideline says, if an article is more than 15,000 words it Almost certainly should be divided or trimmed. Almost is definitely ambiguous and unclear, so it would make sense to simply remove Almost to avoid creep and make the guideline clear. What do people think? Tom B (talk) 17:59, 10 December 2023 (UTC)
- I would oppose changing the phrase to "Certainly should be divided or trimmed". I object to solidifying a requirement to trim an article at an arbitrarily chosen article size. Once there is evidence that no prose article should be greater than 15,000 words, then go ahead and remove the word "almost". Or double the size to 30,000 words to be safe and remove the word "almost". Otherwise, wait for size requirements to be scientifically proven before doing a major edit to the article. Removing that one word would make the guideline into a prescriptive command. Mburrell (talk) 19:36, 10 December 2023 (UTC)
- How about, "Should be divided or trimmed"? The evidence e.g. in WP:Tomat, suggests a length of 10,000 words, would you be happier with that? Tom B (talk) 20:04, 10 December 2023 (UTC)
- If we were designing articles based on one sigma from the peak of a bell curve, sure. If we are doing a prescriptive command, I would prefer four sigma, say about 30,000 words. I would have stated three sigma (probably the 15,000 word limit), but an internet article states that three sigma is 1% of all data beyond the bell curve, and I am more interested in being prescriptive for a fraction of a percent. Otherwise, we should be guiding people to a good conclusion, suggesting with words like "almost". Mburrell (talk) 20:28, 10 December 2023 (UTC)
- @Mburrell, are you certain that 30K words is four sigma? WhatamIdoing (talk) 07:56, 26 December 2023 (UTC)
- If we were designing articles based on one sigma from the peak of a bell curve, sure. If we are doing a prescriptive command, I would prefer four sigma, say about 30,000 words. I would have stated three sigma (probably the 15,000 word limit), but an internet article states that three sigma is 1% of all data beyond the bell curve, and I am more interested in being prescriptive for a fraction of a percent. Otherwise, we should be guiding people to a good conclusion, suggesting with words like "almost". Mburrell (talk) 20:28, 10 December 2023 (UTC)
- How about, "Should be divided or trimmed"? The evidence e.g. in WP:Tomat, suggests a length of 10,000 words, would you be happier with that? Tom B (talk) 20:04, 10 December 2023 (UTC)
- Since there is no rationale for it, it would be better to say "consideration should be given to splitting or trimming" Hawkeye7 (discuss) 20:06, 10 December 2023 (UTC)
- I would support the wording proposed by Hawkeye7. Mburrell (talk) 20:28, 10 December 2023 (UTC)
- But that is ambiguous too Tom B (talk) 20:44, 10 December 2023 (UTC)
- Ambiguous means "Open to multiple interpretations". There is no ambiguity here. Hawkeye7 (discuss) 21:08, 10 December 2023 (UTC)
- I can support removing "almost" to make the guideline simpler. We might as well remove "certainly" as well, which would also help to make it less prescriptive. Editors should not fear that every article with 15,001 words would be split, as this is a guideline and not a law. Arbitrarily increasing the guideline amount to 30,000 would not solve any problem here, as the guideline still would not stop articles exceeding that amount but permit many more articles in the 15,000-30,000 range. I agree that "consideration" would be too weak, as consideration could be given to splitting or reducing an article at any point. Onetwothreeip (talk) 21:46, 10 December 2023 (UTC)
- Editors should feat that every article with 15,001 words would be split. The relevant policy is WP:ADHERENCE:
Use common sense in interpreting and applying policies and guidelines; rules have occasional exceptions. However, those who violate the spirit of a rule may be reprimanded or sanctioned even if they do not technically break the rule.
Hawkeye7 (discuss) 23:21, 10 December 2023 (UTC)- It wouldn't be rational to fear that. This is first of all a guideline and not a rule, even rules can be ignored per WP:IAR, and there can always be local consensus for unusual cases. Onetwothreeip (talk) 02:03, 11 December 2023 (UTC)
- Editors should feat that every article with 15,001 words would be split. The relevant policy is WP:ADHERENCE:
- On the surface, to the extent that you talk about "removing almost" in this context, the discussion might not attract responses from those opposed to the larger issue, and reasonably so; after all, it makes sense to remain 'on-topic' to the question you posed. However, it's possible that the almost is what is keeping an uneasy consensus alive. What I mean by this, is that there are clearly editors here who wish to either remove the numbers from the table, or even dismantle the table entirely, and if you remove almost, that might push it past the tipping point, and activate other editors who prefer removing the entire sentence. I'm somewhat closer to the latter view than your view, although I don't really like either one. I'd vote for leaving it alone. Mathglot (talk) 06:21, 11 December 2023 (UTC)
- I would go for "should probably". The "almost certainly" language has always been, well, nonsense. I'm sorry that the OP feels this should have some kind more emphatic wording implying that a split is not optional, but it definitely should not, since this is a guideline, and a hotly disputed one right now, at that; it is not something like a legal policy imposed on us by WP:OFFICE. — SMcCandlish ☏ ¢ 😼 22:36, 11 December 2023 (UTC)
- If it was to say "should be divided or trimmed", that would provide clear and concise direction, while also being not too firm as to prevent exceptions. Onetwothreeip (talk) 19:56, 12 December 2023 (UTC)
- If I had to pick one, should probably works for me a lot better than should would. Mathglot (talk) 08:28, 26 December 2023 (UTC)
- If it was to say "should be divided or trimmed", that would provide clear and concise direction, while also being not too firm as to prevent exceptions. Onetwothreeip (talk) 19:56, 12 December 2023 (UTC)
Quality and the 15,000 guideline
Hi everyone, guidelines say articles over 15,000 'readable' words, "Almost certainly should be divided or trimmed." 15,000 comes from a compromise relating to a 2007 change to do with 100 kilobytes. I.e. 15,000 is not based on what would lead to higher quality, more readable articles, hence the readability discussion above. I've read the links helpfully posted by @Peter Isotalo and not found anything useful on readability. But what about using verifiable evidence on quality, which is very related to readability. Quality is not mentioned once in the article size guidelines? I looked at recently promoted featured articles - October 2023 - and found the largest was about 12,000 words. We could analyse 'recently' promoted featured article maximum length to help improve the guidelines, or put them on a better footing? Grateful for evidence on quality and readability, Tom B (talk) 20:05, 23 November 2023 (UTC)
- The problem is that quality is a different thing than quantity. Badly written text is badly written, no matter whether it's 1500 or 150000 words long. Jo-Jo Eumerus (talk) 07:24, 24 November 2023 (UTC)
- There's a semi-informal 10k limit for FAs, so this is to be expected; there's no causality. DFlhb (talk) 10:35, 24 November 2023 (UTC)
- hi @DFlhb, thank you, there is a formal length requirement yes, but no exact number as you intimate. I was surprised to find a recently promoted article at 12k, that might effectively be the informal limit? For me and others there is causality, the informal limit aids quality, Tom B (talk) 16:05, 24 November 2023 (UTC)
- Quality nor readability are not sensible reasons to sub-divide articles because quality has nothing to do with size while readability has to do with the chunking and navigational structure of topics at multiple levels – sentence, paragraph, section, page, topic, category and so forth.
- The real issue is the technical size of the page and this seems to be most affected by the amount of templates rather than the amount of prose. For example, the popular page Deaths in 2023 has an edit note that "References should be in <ref>[url & title]</ref> format, as full citations make the page too slow to load, and too big to edit."
- Andrew🐉(talk) 12:37, 24 November 2023 (UTC)
- @Andrew Davidson, thank you, we have a simple disagreement: you say quality has nothing to do with size, I say it does. For me the Napoleon article increases in quality from to 1,000 words, to 8,000 and starts decreasing before about 12,000 words. I got it promoted to GA at 8,000 words and it got demoted at 18,000. Don't most think the quality decreases at some point? We just disagree when? I appreciate it will be different amounts for different articles. I don't think technical size is the big issue any more. The consensus appears to be that readability is now key? Some think we should remove the limit, some like me think we should reduce it e.g. to 12,000, but I'm open to evidence, others might think the 15,000 guideline is fine. Everyone thinks their position will improve quality or not effect it? Tom B (talk) 16:26, 24 November 2023 (UTC)
- Tpbradbury said,
Don't most think the quality decreases at some point?
- That reminds me of Salieri's, "Too many notes" in Amadeus. In some cases it may, but not all, and it's not purely a function of length, imho, but of other factors. I believe that there are various human factors involved, one of which is the icons related to quality article awards. I notice many user pages with a string of GA or FA icons, and while I respect the work involved and applaud the improvement to the encyclopedia by these volunteer editors, once an article achieves the award, what happens then? Do these editors continue watching, improving, pruning, maintaining quality as the article grows from 8k to 18k after the award, or do they move on to something else? I'm not ashamed to say that I'd probably move on; I think that's human nature in large part (although I'm aware that some articles have long-term, non-OWNy watchers that remain active and I think that's commendable).
- I think another human factor that affects quality that is size-related in a way difficult to quantify is the basic structure of the article as manifested by the choice of section headers, how many of them there are, how deeply nested, and how much content in them. Section headers are the musculo-skeletal system of an article, and the larger it grows, the more difficult it becomes to move sections around, or to disassemble them and reorganize along different lines. Partly this is simply mechanical: moving a top level section with 22kb of content to a different point in the article requires finding the begin and end points, cutting it, finding the destination point, and pasting it. If you've done this, you know it's tedious to prepare and a bit white-knuckly to execute even for the tech-savvy, and probably scares away many editors as not worth the effort. Better tools designed for manipulating article sections could mitigate that problem.
- Far more difficult imho, and less amenable to new tools, is analyzing the section structure of an article, realizing that it could be improved by a different organization, designing a new structure, and moving the article towards that goal. That's not so hard for a small stub, but as it grows beyond a stub and gains section headers, my impression is that there are fewer and fewer editors willing to take a 40,000-foot view and reorganize the basic structure. When the original content is extremely poor in quality, WP:TNT is an option, and I've done this two or three times with buy-in at Talk, but if quality is not at the extreme end of bad and merely 'poor', there may be opposition to it which may make it impossible to carry out. The result is that articles tend to suffer from a kind of atherosclerosis as they grow, making it harder and harder to do a complete overhaul even if you could find a wiki-surgeon willing to tackle it, and it happens much earlier imho than the size limits mentioned in the table as split territory.
- I think this paradigm, if accurate, puts a lot of pressure on editors to get the basic organizational structure of the article right fairly early on while it's still relatively easy to adjust in order to avoid sclerosis later, but that doesn't always happen. Maybe a new type of reviewing team could help, sort of like Afc but with the goal of having a second look at articles around the time they transition from stubs to start class with a view to establishing a solid section structure amenable to future growth before the article grows too big and locks in something less than optimal. Mathglot (talk) 21:50, 26 December 2023 (UTC)
- In answer to your first question: most of us remain watching the articles as stewards or shepherds. Libel, nonsense and vandalism gets reverted but additions are not removed unless they are unreferenced. (I had a particular problem with Frank Borman when IPs started posting that he had died. I had to revert them until the news was reported in a RS.) FAs are comprehensive by nature and rarely grow although some topics like Batman by their nature require ongoing updates. Occasionally you get called back to an old FA when there is an FAR. GAs though can be substantially updated or rewritten.
- As a rule, articles will increase in size as new material is added since old material is only removed when a subarticle is created. A recent case of what you are talking about is John von Neumann. He was a polymath, which is to say a complicated subject from our point of view. Normally biographical articles are organised chronologically. The article had grown organically as a result of a series of editors (including myself) who had very different interests and areas of expertise. The readers were probably just as diverse. The obvious path to the article's growth was to create a series subarticles on the different areas of von Neumann's interest, wherein readers could find the detailed information that they were looking for. The problem was that this would involve major restructuring of the main article and considerable work setting up the subarticles and then summarising them. The issue that was then debated at length was whether this was worth the effort when the only issue was the size of the article (15,000 words). Hawkeye7 (discuss) 00:20, 27 December 2023 (UTC)
- That article is an example in similar space to what Mathglot discusses. The main issues in the John von Neumann article case were not size per se, they were issues with factual accuracy and writing quality. Even after substantial reworking, the most recent major edit to that article was to delete a subsection as being apparently fundamentally misguided. The length there served as a warning that brought the other issues to light, but presumably also made maintenance difficult in the preceding period as it was an increasingly large amount to monitor. CMD (talk) 02:29, 27 December 2023 (UTC)
- Tpbradbury said,
- @Andrew Davidson, thank you, we have a simple disagreement: you say quality has nothing to do with size, I say it does. For me the Napoleon article increases in quality from to 1,000 words, to 8,000 and starts decreasing before about 12,000 words. I got it promoted to GA at 8,000 words and it got demoted at 18,000. Don't most think the quality decreases at some point? We just disagree when? I appreciate it will be different amounts for different articles. I don't think technical size is the big issue any more. The consensus appears to be that readability is now key? Some think we should remove the limit, some like me think we should reduce it e.g. to 12,000, but I'm open to evidence, others might think the 15,000 guideline is fine. Everyone thinks their position will improve quality or not effect it? Tom B (talk) 16:26, 24 November 2023 (UTC)
Tables and lists
After a recent (and correct edit), a pargraph in the guideline now reads:
Readable prose is the main body of the text, excluding material such as footnotes and reference sections ("see also", "external links", bibliography, etc.), diagrams and images, tables and lists, Wikilinks and external URLs, and formatting and mark-up. The measure may substantially underestimate the amount of content in articles that summarize much of their information in tables, especially when these contain notes and explanations in text columns.
I propose that it would make more sense to remove ""tables and lists", and remove the newly added second sentence. Some articles (including some of our longest) consist almost entirely of lists (sometimes formatted as tables). — SMcCandlish ☏ ¢ 😼 16:34, 25 August 2023 (UTC)
- Your final sentence is true, but I'm having trouble seeing why the previous one follows from that. Could you explain? Nikkimaria (talk) 03:20, 26 August 2023 (UTC)
- What's not clear? The "readable" article content at a long list is the list. The current wording a) creates a loophole such that list articles are not subject to length limits at all, and b) another loophole whereby an article that consists of, say, 75% a list ignores the entire list for purposes of length calculation. I doubt anyone actually agrees that's a good idea. Hell, it could be a [bad] excuse to convert prose material into inappropriate lists/tables, just to skirt the length guidelines. — SMcCandlish ☏ ¢ 😼 05:10, 26 August 2023 (UTC)
- I am inclined to concur that tables and lists should be treated like normal wikitext. Jo-Jo Eumerus (talk) 07:40, 26 August 2023 (UTC)
- It is unclear to me why tables and lists should be treated by normal wikitext for the purpose of article size. Size limits are to to with readability, and tables are for data presentation. I am unaware of people who look up lists to read from beginning to end. Tables and lists are reference material, while articles are a presentation of information about a particular subject, which I judge to be completely different subjects. As an engineer, I have a steam table book written in the 50s (prior to computers and the internet) that is almost entirely tables about temperature and pressure for various fluids. I do not believe anyone would read the book from beginning to end as a subject matter description of steam temperature and pressure, one would just go to the table needed for the particular values. The point I am trying to make is that tables and lists should not be subject to readability limits, but certainly should be subject to technical limits, such as maximum character limit, or limits on how may citations can be included before the article breaks, or general reports on slow-down on download speed on limited access machines such as commonly used smartphones in nations with more limited data carriers. But putting a size limit on tables and lists based on the subjective readability limits would not be a good idea. It is not a loophole, it is a different perspective. Mburrell (talk) 21:58, 26 August 2023 (UTC)
- As people have noted in the paragraphs above, though, people mostly don't read articles top to bottom, either. Jo-Jo Eumerus (talk) 17:46, 27 August 2023 (UTC)
- Agree with Mburrell. SandyGeorgia (Talk) 11:40, 30 October 2023 (UTC)
- I agree too. Also, at least with tables, collapsing them can put them out of sight and out of mind. Riposte97 (talk) 22:15, 30 October 2023 (UTC)
- It is unclear to me why tables and lists should be treated by normal wikitext for the purpose of article size. Size limits are to to with readability, and tables are for data presentation. I am unaware of people who look up lists to read from beginning to end. Tables and lists are reference material, while articles are a presentation of information about a particular subject, which I judge to be completely different subjects. As an engineer, I have a steam table book written in the 50s (prior to computers and the internet) that is almost entirely tables about temperature and pressure for various fluids. I do not believe anyone would read the book from beginning to end as a subject matter description of steam temperature and pressure, one would just go to the table needed for the particular values. The point I am trying to make is that tables and lists should not be subject to readability limits, but certainly should be subject to technical limits, such as maximum character limit, or limits on how may citations can be included before the article breaks, or general reports on slow-down on download speed on limited access machines such as commonly used smartphones in nations with more limited data carriers. But putting a size limit on tables and lists based on the subjective readability limits would not be a good idea. It is not a loophole, it is a different perspective. Mburrell (talk) 21:58, 26 August 2023 (UTC)
- I am inclined to concur that tables and lists should be treated like normal wikitext. Jo-Jo Eumerus (talk) 07:40, 26 August 2023 (UTC)
- What's not clear? The "readable" article content at a long list is the list. The current wording a) creates a loophole such that list articles are not subject to length limits at all, and b) another loophole whereby an article that consists of, say, 75% a list ignores the entire list for purposes of length calculation. I doubt anyone actually agrees that's a good idea. Hell, it could be a [bad] excuse to convert prose material into inappropriate lists/tables, just to skirt the length guidelines. — SMcCandlish ☏ ¢ 😼 05:10, 26 August 2023 (UTC)
- Ah, okay, I misunderstood. I don't object to the principle but we may need to deal with the fact that the added sentence is true wrt the tools often used for assessment of this issue. Nikkimaria (talk) 13:12, 26 August 2023 (UTC)
- MOS uses the term appendix (sometimes, footers) to refer to the bottom matter (another term!) that we'd like to exclude, so maybe we could borrow that. Mathglot (talk) 09:23, 26 August 2023 (UTC)
- Sure, but it still shouldn't include "tables and lists" which are part of the main-body content of the article. — SMcCandlish ☏ ¢ 😼 05:24, 26 October 2023 (UTC)
- Disagree entirely that tables and lists should be added; the issue is readable prose, and tables are skimmed. SandyGeorgia (Talk) 11:39, 30 October 2023 (UTC)
- Tables and lists wouldn't be considered the readable prose in prose articles, but they would be considered the readable prose, or whatever is closest to that, for list articles. Onetwothreeip (talk) 09:43, 9 November 2023 (UTC)
- The problem with tables and lists is that we don't have an automated tool for counting the "readable prose" in them. The reason is that we have not determined a way of counting the text in them. We need to agree on this first. Only then can we consider size limits. Hawkeye7 (discuss) 18:22, 23 November 2023 (UTC)
- Tables and lists wouldn't be considered the readable prose in prose articles, but they would be considered the readable prose, or whatever is closest to that, for list articles. Onetwothreeip (talk) 09:43, 9 November 2023 (UTC)
- Disagree entirely that tables and lists should be added; the issue is readable prose, and tables are skimmed. SandyGeorgia (Talk) 11:39, 30 October 2023 (UTC)
- Sure, but it still shouldn't include "tables and lists" which are part of the main-body content of the article. — SMcCandlish ☏ ¢ 😼 05:24, 26 October 2023 (UTC)
On this point, I recommend Faked death as an example of the problem. It's 1,000 words if you exclude "lists". It's 4,000 words if you don't. In this instance, the latter is the correct/relevant number. WhatamIdoing (talk) 07:20, 26 December 2023 (UTC)
I think we may be running into an "I know it when I see it" problem. Correct me if my assumption here is wrong, but I think we would all wish to exclude the tables of (mostly) figures in List of municipalities in Alberta for purposes of prose calculation (because it ain't prose), but does anyone really want to exclude the table content in Wikipedia:Reliable sources/Perennial sources? The content of that table *is* prose (well, col. 5, anyway), and I want to count it for any prose calculation. (Yes, I know that's not an article, I just don't have a sample article at hand with long prose sections, and I'm too lazy to look now; please help me out by linking one.) Admittedly, that makes automated counting solutions more complex and that's unfortunate, but I don't think it would be fair to either include tables in both of those pages in the count, or to exclude tables in both. They require separate treatment, imho. Agree? Disagree? Mathglot (talk) 08:12, 26 December 2023 (UTC)
- I agree that we want to count the full article content but not simple lists and data tables (e.g., undescribed lists of notable people, tables of sports scores, names of songs in an album).
- I think that the problem should be solved in documentation, rather than code. That means that we say something like "You can use Wikipedia:Prosesize, but be aware that it undercounts the words in articles that have significant material formatted as lists or tables." Also, perhaps we should document the "exact numbers not important" part. When we say "10,000 words", we mean something like "9,000 to 11,000 words" – not 9,999 to 10,001 words. WhatamIdoing (talk) 17:13, 26 December 2023 (UTC)
- Just adding a link to Doctor Who (series 4)#Episodes, which has plenty of prose table content, replacing my poor example above. This link is thanks to helpful Teahouse responder User:Deltaspace42, who also points out that "pretty much all pages about episodes of TV series contain such table[s]". As I wasn't entirely sure that such tables existed in mainspace, it's very helpful to have this example, and to find out that it's representative of an entire class of articles. Mathglot (talk) 04:35, 28 December 2023 (UTC)
- ^ "European Journal of Futures Research". SpringerOpen. May 20, 2013. Retrieved November 26, 2023.
- ^ "Information for Authors". academic.oup.com. Oxford University Press. Retrieved November 26, 2023.
- ^ "Manuscript Submission Guidelines: AERA Open: Sage Journals". Sage Journals. January 1, 2023. Retrieved November 26, 2023.
- ^ "Early Modern Women: An Interdisciplinary Journal: Instructions for authors". Early Modern Women: An Interdisciplinary Journal. November 17, 2019. Retrieved November 26, 2023.
- ^ "Development and Change". OnlineLibrary.Wiley.com. Wiley. doi:10.1111/(issn)1467-7660. ISSN 0012-155X.
- ^ "Submissions". Global Labour Journal. February 3, 2022. Retrieved November 26, 2023.
- ^ "BGSU SSCI Journal Publishing Guide" (PDF). Retrieved November 26, 2023.
- ^ "Guide for authors". ScienceDirect.com by Elsevier. January 6, 2016. Retrieved November 26, 2023.