Wikipedia talk:AutoWikiBrowser/Typos/Archive 4
This is an archive of past discussions on Wikipedia:AutoWikiBrowser. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 1 | Archive 2 | Archive 3 | Archive 4 | Archive 5 |
cruse -> cruise
Three times now, AWB has "corrected" "the cruse of oil" to "cruise". The first instance was in February. Can this be a little more careful about correcting actual words? Thanks. Elizium23 (talk) 02:04, 19 September 2014 (UTC)
- @Elizium23: Unfortunately, editors using AWB are not always as careful as they should be. I have protected the word in Holy anointing oil from further accidents by adding the {{Not a typo}} template around it, so that AWB will no longer suggest making this change here. I think that overall the "cruse > cruise" rule is a good one, as "cruise" is a difficult word for people to spell correctly. -- John of Reading (talk) 06:39, 19 September 2014 (UTC)
Lowlines not Lowliness
Can I suggest AWB not autocorrect Lowlines cattle. Thanks! --Breno talk 10:59, 22 September 2014 (UTC)
- @Breno: I have added {{Not a typo}} to Lowline cattle so that AWB will not suggest this change in future. That's the only article mentioning these cattle. -- John of Reading (talk) 19:15, 22 September 2014 (UTC)
beneficient → beneficent
AWB corrects beneficient → beneficent. However, beneficient isn't really a typo. While some sources consider the spelling obsolete, others do not. -- Cpt.a.haddock (talk) 09:03, 28 September 2014 (UTC)
- The link you've labelled "others" is a page for "beneficent" not "beneficient". My Concise Oxford lists only "beneficent". -- John of Reading (talk) 11:27, 28 September 2014 (UTC)
- My bad. I actually can't find an entry for it in the current editions of any mainstream dictionaries. However, it is marked as obsolete usage in certain older editions. It is therefore not exactly a typo. And a brief analysis of results on Google Books, Google Scholar, and Google Trends suggests that it is not all that obsolete either. There are also 5 hits in the COCA corpus (post 2000) … I leave it up to you guys. Thanks. --Cpt.a.haddock (talk) 18:02, 28 September 2014 (UTC)
New typo lists available
FYI, new typo lists are available at Wikipedia talk:WikiProject TypoScan#Manual typo lists - September 2014, courtesy of Breno. GoingBatty (talk) 13:48, 4 October 2014 (UTC)
Ukelele -> Ukulele
Could someone please add Ukelele -> Ukulele Jamesmcmahon0 (talk) 14:20, 24 October 2014 (UTC)
- @Jamesmcmahon0: Not done Dictionary.com seems to indicate that "ukelele" is an acceptable variant. Thanks! GoingBatty (talk) 14:36, 24 October 2014 (UTC)
- @Jamesmcmahon0: @GoingBatty: Going even further, macmillandictionary.com (a major dictionary, very respected) has "ukelele" only, no "ukulele". It would be wrong for AWB Typos or any human editor to change "ukelele". — Preceding unsigned comment added by Chris the speller (talk • contribs) 13:27, 25 October 2014 (UTC)
- @GoingBatty: Thanks for that! I found this on Wikipedia:Database reports/Linked misspellings on closer inspection it seems that a number of the paged only apear on that list because the redirects have been incorectly tagged as mispellings rather than alternate spellings or otherwise. I will try and work through a few of the them to improve the catorgarisation and thus the usefulness of the report however I think there are some instance of genuine spelling mistakes on there that possibly aren't picked up by AWB yet, I would recommend you cast an experience eye over it! Thanks again Jamesmcmahon0 (talk) 16:00, 26 October 2014 (UTC)
- @Jamesmcmahon0: @GoingBatty: Going even further, macmillandictionary.com (a major dictionary, very respected) has "ukelele" only, no "ukulele". It would be wrong for AWB Typos or any human editor to change "ukelele". — Preceding unsigned comment added by Chris the speller (talk • contribs) 13:27, 25 October 2014 (UTC)
Viginia - Virginia
I started fixing Viginia but I think they are all typos of Virginia. One for AWB? ϢereSpielChequers 13:50, 5 January 2015 (UTC)
- It appears that the existing "Virginia_" rule should fix "Viginia", but it doesn't on 25th Antisubmarine Wing. Can anyone figure out why? GoingBatty (talk) 17:23, 5 January 2015 (UTC)
- The typo-fixer skips indented paragraphs in case they are quotations. -- John of Reading (talk) 17:46, 5 January 2015 (UTC)
- Oh, just like I should have seen in the instructions: "Typo fixing is prevented within: image names, templates names and parameters, wikilink targets, quotations, and any text that follows a colon or asterisk." (emphasis added) Thanks! GoingBatty (talk) 02:20, 6 January 2015 (UTC)
- The typo-fixer skips indented paragraphs in case they are quotations. -- John of Reading (talk) 17:46, 5 January 2015 (UTC)
7 game series → 7-game series
Here's another typo I see in sports articles, with 'series' and 'lead' as follow-on words. Stevie is the man! Talk • Work 18:56, 21 January 2015 (UTC)
Humourous
Back in 2012, there was a brief discussion of humourous => humorous which referenced the OED. On my talk page, CaesarsPalaceDude has informed me that the preferred Australian spelling is "humourous". Thought we should have some discussion before changing the rule. Thanks! GoingBatty (talk) 13:47, 23 January 2015 (UTC)
- After CaesarsPalaceDude pointed me to The Free Dictionary, I updated the "-orous" rule so it will not change "humourous". Discussion still appreciated if you find this bold move is not "humourous" (or "humorous"). :-) GoingBatty (talk) 19:17, 23 January 2015 (UTC)
- One might therefore expect to find 'humourous' at http://www.australiannationaldictionary.com.au, which specialises in Australianisms, but it doesn't seem to be there. Rothorpe (talk) 23:56, 25 January 2015 (UTC)
Targetted
Amakuru (talk · contribs) has removed the fixes "targetted > targeted" and "targetting" > "targeting", with the edit summary "These are legitimate British English spellings". But my paper copy of the COD only lists the -t- forms, and the results at Onelook.com for targetted and targetting show that no major dictionaries allow the -tt- forms. -- John of Reading (talk) 16:27, 13 January 2015 (UTC)
- Hi @John of Reading:, I do apologise for the above change, and any inconvenience caused by my bold change. I've done some more research and I see that you're absolutely correct in this matter. "Targetting" does not appear in either the Oxford or the Collins dictionaries as a present participle for "target". I was misled because instinctively (for some reason) I thought that the two-t version should be correct in British English, and when I found that AWB was removing it, I also went over to Wiktionary to "verify" my instinct, and found at https://en.wiktionary.org/wiki/targetting that our sister project does indeed list is as a British variant. But of course, Wiktionary is not a verifiable published dictionary! I will also have to go back to my articles and change my usage of this word accordingly. Thanks — Amakuru (talk) 22:51, 13 January 2015 (UTC)
- This has come up in the past. There is no variant of English in which "targetted" is an accepted spelling, and my challenge to find any non user-generated dictionary listing it as such remains open. Mogism (talk) 23:02, 13 January 2015 (UTC)
- Possibly just one dictionary and variant of English: the spelling does appear in the OED, though only in a cite from Scottish English in 1651: " The preachers spake freelie against the targetting of weomen's tailes, and the rest of their vanitie." ("targets" were "trimmings" in case anyone is puzzled). That doesn't change the (non-)validity of the spelling in modern English, and the Wiktionary enry has now been adjusted. There's something odd in British English, because the (mis-)spelling reached more than 5% of usage in the 1980s if this is accurate. Dbfirs 08:15, 1 February 2015 (UTC)
Hyphenation
Could consideration be given to removing the hyphenation "fix" of "one night stand" to "one-night stand"? While this does indeed change it to Wikipedia's current article title's format, a glance at every single entry on One-night stand (disambiguation) - or just googling the term - shows that the Wikipedia article is virtually alone in hyphenating the phrase in this way. It seems unnecessarily WP:BITE-y to be tagging edits using a perfectly valid form of the phrase as "typos" and "fixing" them.
On a more general note, I'm getting quite uncomfortable with the number of dubious hyphenation "fixes" which have crept into the list recently, to the extent that I find manually skipping hyphenation false-positives takes up well over half the time of any AWB run with typo fixing enabled. How feasible would it be to have the hyphenation "fixes" as a separate list which could be disabled, or a "Skip if only hyphenation fixed" check-box? Mogism (talk) 18:49, 30 January 2015 (UTC)
- The phrase "one-night stand" is so hyphenated by Oxford, Cambridge, American Heritage, Macmillan and Merriam-Webster dictionaries. "Googling the term" will, of course, only show that the web is seething with people who can barely write, can't spell or capitalize properly, and do not have a decent grasp of punctuation beyond usually putting a full stop at the end of a sentence. You seem to be implying that Wikipedia should be dumbed down to match that lack of writing ability. I totally disagree. If you find a huge number of hyphenation false positives, please share them on this forum; that way they can be explained or fixed. Chris the speller yack 21:26, 30 January 2015 (UTC)
- The page One-night stand (disambiguation) is basically a list of titles. The Typo rule does not change these titles. The people who crank out movies, recorded songs and breakfast cereal boxes are notoriously oblivious to proper hyphenation and other punctuation. They do not set any standard for the English language; dictionaries do. Chris the speller yack 21:35, 30 January 2015 (UTC)
- Yes, the OED (Third Edition updated 2004) has seven cites from 1878 to 2001 and none omits the hyphen. Omission is common in casual usage, but Wikipedia uses formal English in its main text. Dbfirs 08:30, 1 February 2015 (UTC)
"back and fourth" → "back and forth"
I'm seeing the typo "back and fourth" (or "back-and-fourth"). If there's hyphens in it, those are accepted as an alternative use, so keeping the hyphens in a typo correction is a must. Also, only change if it's lower case. On search, it's not a common misspelling -- just a few. Any objections to a fix for this? Is it worth it? Stevie is the man! Talk • Work 18:34, 1 February 2015 (UTC)
- @Stevietheman: When I search for "back and fourth", I only see two that need to be changed, which doesn't seem to me to be enough for a rule. However, if you've just manually fixed hundreds of them, then a rule would be appropriate. Thanks! GoingBatty (talk) 18:46, 1 February 2015 (UTC)
- Yes, there's just a few. I brought this up because I just saw one case fixed for an article in a project I watch changes for. Stevie is the man! Talk • Work 18:48, 1 February 2015 (UTC)
- @Stevietheman: I just fixed the two others I saw. GoingBatty (talk) 20:12, 1 February 2015 (UTC)
- Yes, there's just a few. I brought this up because I just saw one case fixed for an article in a project I watch changes for. Stevie is the man! Talk • Work 18:48, 1 February 2015 (UTC)
Commens Dictionary of Peirce's Terms
While working through some Philosophy related pages today I ran across this book title several times, Commens Dictionary of Peirce's Terms. A Google search confirmed that this is the correct spelling for this title, but of course AWB script wants to change it to "Commons". I have no idea how to work this into the typo rules, so I thought I would mention it here. Anyone able to assist with this? — Bill W. (Talk) (Contrib) (User:Wtwilson3) — 14:13, 3 February 2015 (UTC)
- Done. The rule will now not try to change "Commens" (capitalized). The proper name shows up in other places besides the dictionary name. Chris the speller yack 15:42, 3 February 2015 (UTC)
- @Wtwilson3: Italicizing the book title would also protect the dictionary title from being changed (but not the other proper names). GoingBatty (talk) 03:00, 4 February 2015 (UTC)
Last names
- $1$2ely
This rule gives false positives when it comes to the last name Densley
. (t) Josve05a (c) 18:42, 15 February 2015 (UTC)
- $1ight$2
This rule gives false positives when it comes to the last name Sligting
, like in Douglas Sligting. (t) Josve05a (c) 18:53, 15 February 2015 (UTC)
- Done. Both surnames have been given a free pass. Chris the speller yack 19:51, 16 February 2015 (UTC)
"moive" -> "movie"
Not a very common typo, but one that does appear, and there is no other possible explanation for "moive". I was going to add it myself, but after a couple of failed tries, I decided to stop before I blew something up. --AmaryllisGardener talk 21:43, 18 February 2015 (UTC)
- @AmaryllisGardener: I fixed the only instance of "moive" I could find to "movie". However, it could hypothetically also be "move", with someone accidentally pressing "oi" together since they're next to each other. GoingBatty (talk) 03:50, 19 February 2015 (UTC)
40 point game → 40-point game
In some sports articles, I'm seeing things like "40 point game" and "50 point performance". Would it make sense to have a typo rule that places a hyphen between the number and 'point'? Perhaps there's other follow-on words other than 'game' and 'performance' to consider as well. Stevie is the man! Talk • Work 18:24, 21 January 2015 (UTC)
- 'spread' is another follow-on word for this typo correction. Stevie is the man! Talk • Work 18:34, 21 January 2015 (UTC)
- Another thought: Perhaps this could accommodate spelled-out numbers as well. Stevie is the man! Talk • Work 18:53, 21 January 2015 (UTC)
- @Stevietheman: I see units of measure that I believe should be hyphenated too (e.g. "one-mile track", "10-inch record", "two-liter bottle") GoingBatty (talk) 03:18, 22 January 2015 (UTC)
- Those are good ideas for typo fixing as well. It would seem we would need to have a discussion for a period of time to make a full list of these kind of corrections before committing them to the typo list. Does that sound good? Stevie is the man! Talk • Work 13:51, 22 January 2015 (UTC)
- Periods of time would be good as well (e.g. "two-day meeting", "four-year degree"). GoingBatty (talk) 13:31, 23 January 2015 (UTC)
- Those are good ideas for typo fixing as well. It would seem we would need to have a discussion for a period of time to make a full list of these kind of corrections before committing them to the typo list. Does that sound good? Stevie is the man! Talk • Work 13:51, 22 January 2015 (UTC)
I'm still interested in doing something with this. If nobody beats me to it, soon I'll come up with a draft typo fix to fix all of these permutations. Stevie is the man! Talk • Work 16:13, 29 January 2015 (UTC)
- Actually, it won't be soon, as I'm having technical difficulties with my laptop, and will likely have to replace the hard drive soon. If anyone else wants to tackle this, please proceed. Stevie is the man! Talk • Work 21:24, 5 February 2015 (UTC)
- @Stevietheman: See the discussion below – Expand "n-year" rule? – it touches on this topic and much more. Chris the speller yack 21:54, 16 March 2015 (UTC)
Analy is not anal
Please change the rule for anal so it does not change the personal name Analy to Anal or anal. --DThomsen8 (talk) 21:15, 1 April 2015 (UTC)
"split between" → "split among"
Hello fellow spell checkers? How about a rule to change "split between" → "split among" when the number following is greater than two? Something like find="\bsplit\s+between\s+([3-9]|\d{2,3}|three|four|five|six|seven|eight|nine|ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|eighteen|nineteen|twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety)\s+" replace="split among $1 "
to start? Thanks! GoingBatty (talk) 23:09, 18 April 2015 (UTC)
"least" → "fewest"
How about another rule for "least" → "fewest" when followed by a plural word, such as "goals"? GoingBatty (talk) 23:26, 18 April 2015 (UTC)
Incorrect typo change
Could someone please check the "A n-something" rule? On the Mahalia Jackson, I believe it's changing "Her Aunt Bell told her one day she would sing" to "Her Aunt Bell told her one-day she would sing". Thanks! GoingBatty (talk) 04:03, 19 April 2015 (UTC)
- Done I'm not sure my fix is the most optimal, but I tested it and it seems to work. There should be no cases that use trigger words other than 'her' and 'one' in the phrase, and I'm unsure if there could be something other than 'day' in a phrase like that. Stevie is the man! Talk • Work 16:56, 19 April 2015 (UTC)
Belarussian -> Belarusian
Could someone write a rule to fix misspellings of Belarusian, particularly Belarussian as there are quite a lot of instances of it. Jamesmcmahon0 (talk) 13:09, 28 April 2015 (UTC)
- Collins Dictionary lists "Belarussian" as the primary spelling, along with three variations. If a major dictionary even allows a spelling as a variant, WP usually allows it, so there would have to be consensus established somewhere, and this page is not the proper forum. This is sort of in the same boat as "publically", which a lot of editors would like to stamp out, but we can't do that, for the same reason. Chris the speller yack 14:12, 28 April 2015 (UTC)
- See http://www.onelook.com/?w=Belarusian&ls=a and http://www.onelook.com/?w=Belarussian&ls=a and http://www.onelook.com/?w=Byelorussian&ls=a.
- —Wavelength (talk) 15:57, 28 April 2015 (UTC)
- Thanks, glad I posted here first! I really need to get into the habit of doing a dictionary search... Jamesmcmahon0 (talk) 19:02, 28 April 2015 (UTC)
Harmonising the use of abbreviation of Creative Commons licenses
I've created a bot request for harmonising the abbreviations of Creative Commons licenses used throughout Wikipedia.
Basically there are a lot of common misspellings of these abbreviations:
Bad | Good |
---|---|
CC-BY | CC BY |
CC-BY-NC | CC BY-NC |
CC-BY-SA | CC BY-SA |
CC-BY-NC-SA | CC BY-NC-SA |
CC-BY-ND | CC BY-ND |
CC-BY-NC-ND | CC BY-NC-ND |
cc-by | CC BY |
cc-by-nc | CC BY-NC |
cc-by-sa | CC BY-SA |
cc-by-nc-sa | CC BY-NC-SA |
cc-by-nd | CC BY-ND |
cc-by-nc-nd | CC BY-NC-ND |
Next to a bot to fix these typos, shall we add them to WT:AWB/T?
--Martsniez (talk) 10:38, 28 April 2015 (UTC)
- Hi, I am no expert in RegExp. But I believe this should do it, right?
<Typo word="CC BY" find="\b(cc|CC)(-by|BY)\b" replace="CC BY"/>
<Typo word="CC BY-NC" find="\b(cc|CC)(-by-nc|BY-NC)\b" replace="CC BY-NC"/>
<Typo word="CC BY-SA" find="\b(cc|CC)(-by-sa|BY-SA)\b" replace="CC BY-SA"/>
<Typo word="CC BY-ND" find="\b(cc|CC)(-by-nd|BY-ND)\b" replace="CC BY-ND"/>
<Typo word="CC BY-NC-SA" find="\b(cc|CC)(-by-nc-sa|BY-NC-SA)\b" replace="CC BY-NC-SA"/>
<Typo word="CC BY-NC-ND" find="\b(cc|CC)(-by-nc-nd|BY-NC-ND)\b" replace="CC BY-NC-ND"/>
--Martsniez (talk) 09:09, 30 April 2015 (UTC)
- Also no expert but I think;
<Typo word="CC BY" find="\b(cc|CC)-?\s*(by|BY)\b" replace="CC BY"/>
<Typo word="CC BY-NC" find="\b(cc|CC)-?\s*(by-nc|BY-NC)\b" replace="CC BY-NC"/>
<Typo word="CC BY-SA" find="\b(cc|CC)-?\s*(by-sa|BY-SA)\b" replace="CC BY-SA"/>
<Typo word="CC BY-ND" find="\b(cc|CC)-?\s*(by-nd|BY-ND)\b" replace="CC BY-ND"/>
<Typo word="CC BY-NC-SA" find="\b(cc|CC)-?\s*(by-nc-sa|BY-NC-SA)\b" replace="CC BY-NC-SA"/>
<Typo word="CC BY-NC-ND" find="\b(cc|CC)-?\s*(by-nc-nd|BY-NC-ND)\b" replace="CC BY-NC-ND"/>
- to find CC by-nc-sa etc. though this would also find CCby-nc-sa which is maybe not desirable? Jamesmcmahon0 (talk) 14:36, 30 April 2015 (UTC)
- this should fix that, right:
- to find CC by-nc-sa etc. though this would also find CCby-nc-sa which is maybe not desirable? Jamesmcmahon0 (talk) 14:36, 30 April 2015 (UTC)
<Typo word="CC BY" find="\b(cc|CC)[-\s](by|BY)\b" replace="CC BY"/>
<Typo word="CC BY-NC" find="\b(cc|CC)[-\s](by-nc|BY-NC)\b" replace="CC BY-NC"/>
<Typo word="CC BY-SA" find="\b(cc|CC)[-\s](by-sa|BY-SA)\b" replace="CC BY-SA"/>
<Typo word="CC BY-ND" find="\b(cc|CC)[-\s](by-nd|BY-ND)\b" replace="CC BY-ND"/>
<Typo word="CC BY-NC-SA" find="\b(cc|CC)[-\s](by-nc-sa|BY-NC-SA)\b" replace="CC BY-NC-SA"/>
<Typo word="CC BY-NC-ND" find="\b(cc|CC)[-\s](by-nc-nd|BY-NC-ND)\b" replace="CC BY-NC-ND"/>
--Martsniez (talk) 22:40, 30 April 2015 (UTC)
- Although I now see that this also matches al correctly typed abbreviations --Martsniez (talk) 09:32, 1 May 2015 (UTC)
AWB will only fix typos that are in prose (that is, not in tables, infoboxes and such). Are any of these appearing in prose and do you have example articles? Stevie is the man! Talk • Work 14:40, 30 April 2015 (UTC)
- On a similar bot request page User:Jamesmcmahon0 found over 30.000 articles affected. He produced a long list (long load) on here User:Jamesmcmahon0/CC abbreviation typos this includes a lot of references to CC-BY-SA instead of CC BY-SA, however you also see this in a lot prose like on Books_LLC. It is difficult to use the site search for this, as that does not distinguish between prose and infoboxes, and dashes and spaces. --Martsniez (talk) 22:37, 30 April 2015 (UTC)
- There are many (I'm sure there's a simple way to filter my long list to find the occurances in prose only and thus find the actual number) where it is written in a note in the references section[1] — Preceding unsigned comment added by Jamesmcmahon0 (talk • contribs) 08:04, 1 May 2015 UTC
Expand "n-year" rule?
Could we expand the new "n-year" rule to other time periods (e.g. n-month, n-week)? Also, how about a "n-year-old" rule? GoingBatty (talk) 18:46, 15 March 2015 (UTC)
- It would be possible to expand it to cover second, minute, hour, day, night, week, month, year, and season. In fact, I run such a F&R rule quite often, but I'm not sure it is ready for prime time as a Typo rule. Note that "second" and "minute" cause some false positives because they have other meanings as adjectives; we might want to go slow on those. It would also be possible to fix spelled-out numbers ("five-year contract" in addition to "5-year contract"). Again, I run such a F&R rule. One trouble with adding other units of time is that the list of modified nouns would also need to be expanded: a 3-year contract is common, but not a 3-hour contract; a 4-hour baseball game is common, but not a 4-year baseball game. I build the list of modified nouns by sampling hundreds or thousands of articles to see what nouns are frequently modified; I've done that for 'year' but not for the other time units. The list of modified nouns will get pretty long, but probably not as long as the existing "Self-" rule. I have also tackled compound modifiers with numbers and units that are not related to time: a 3-point game, a 4-page report, a 5-game series, a 6-item list, a 20-yard pass, a 55-gallon drum, a 32-acre lot, a 6-room house, a 2-cylinder engine, a 4-door sedan. As for "n-year-old", we would have to fix "4 year old boy", "4-year old boy" and "4 year-old boy". I avoid messing with "1 year old" as it could be "... he was 1 year old when he wrote his first novel". Should I work on expanding the rule to handle "month" next? Chris the speller yack 04:49, 16 March 2015 (UTC)
- Sounds like a lot of work to be done that will lead to loads of new typos to fix haha! I'm not too sure how the rules work, but would it be possible to add the modified noun to the edit summary, i.e. replaced 1 year contract -> 1-year contract instead of 1 year -> 1-year Jamesmcmahon0 (talk) 14:10, 16 March 2015 (UTC)
- Yes, it would be easy to include the modified noun in the edit summary, but at the cost of more lengthy summaries; I am often running hundreds of my own F&R rules along with Typo rules, and some edit summaries are getting truncated already. Also, the modified noun will not be completely helpful if the rule fixes more than one case: if "5 year veteran" and "20 year sentence" are fixed in the same article, the edit summary will show "5 year veteran → 5-year veteran (2)". Let's get comments from other users about whether adding 16 characters to the summary is adding any value for the editors and readers. Chris the speller yack 14:46, 16 March 2015 (UTC)
- @Chris the speller: Since you are already running this as F&R rules, I am happy to defer to your experience as to what's reasonable to make a typo rule for all of us to use. Thanks! GoingBatty (talk) 00:38, 17 March 2015 (UTC)
- OK, my next step is to add n-month and see how it goes from there. Chris the speller yack 01:19, 17 March 2015 (UTC)
- @Chris the speller: Since you are already running this as F&R rules, I am happy to defer to your experience as to what's reasonable to make a typo rule for all of us to use. Thanks! GoingBatty (talk) 00:38, 17 March 2015 (UTC)
- Yes, it would be easy to include the modified noun in the edit summary, but at the cost of more lengthy summaries; I am often running hundreds of my own F&R rules along with Typo rules, and some edit summaries are getting truncated already. Also, the modified noun will not be completely helpful if the rule fixes more than one case: if "5 year veteran" and "20 year sentence" are fixed in the same article, the edit summary will show "5 year veteran → 5-year veteran (2)". Let's get comments from other users about whether adding 16 characters to the summary is adding any value for the editors and readers. Chris the speller yack 14:46, 16 March 2015 (UTC)
- Sounds like a lot of work to be done that will lead to loads of new typos to fix haha! I'm not too sure how the rules work, but would it be possible to add the modified noun to the edit summary, i.e. replaced 1 year contract -> 1-year contract instead of 1 year -> 1-year Jamesmcmahon0 (talk) 14:10, 16 March 2015 (UTC)
I have expanded it to also fix 'n-month', to also fix spelled-out numbers ("three-month ban"), and greatly expanded the list of modified nouns. Maybe in a few days, after we see what kind of cheers or boos it inspires, I'll add 'n-week'. Chris the speller yack 03:01, 18 March 2015 (UTC)
- I've found one instance of a correction I expected but didn't happen. In Madison-Model High School, "twenty-five year existence" didn't correct to "twenty-five-year existence". On the typos page, apparently compound spelled-out numbers aren't covered (yet). Stevie is the man! Talk • Work 15:36, 18 March 2015 (UTC)
- @Stevietheman: I skipped these on purpose. I manually change "its twenty-five year existence" to "its 25-year existence" when I find these, or I switch to using the Age template. It's much easier to read, better than a group of 3 words hyphenated. Chris the speller yack 21:16, 18 March 2015 (UTC)
- OK, that makes perfect sense. Thanks! Stevie is the man! Talk • Work 21:28, 18 March 2015 (UTC)
- @Stevietheman: I skipped these on purpose. I manually change "its twenty-five year existence" to "its 25-year existence" when I find these, or I switch to using the Age template. It's much easier to read, better than a group of 3 words hyphenated. Chris the speller yack 21:16, 18 March 2015 (UTC)
- Does the rule account for "n" in "an" before "eight" and "eleven" and "eighteen" and "eighty"?
- —Wavelength (talk) 20:57, 18 March 2015 (UTC)
- @Wavelength: No, it doesn't look for or correct preceding "a"/"an", but the new rule below uses "a" or "an" (doesn't care which). There is another rule that fixes "a eight" and "an seven", I think. Chris the speller yack 21:06, 18 March 2015 (UTC)
- Does the rule account for proper names? Some examples are Five Acre Grove and Five Dollar Bill and Five Foot Thick and Five Island Harbour and Five Mile River and Ten Dollar Dinners and Ten Foot Pole and Ten Inch Hero and Ten Inch Men and Ten Mile Creek and Ten Pound Hammer and Ten Yard Fight and Ten Year Crusade and Ten Year Night and Ten Year Rule.
- —Wavelength (talk) 16:42, 19 March 2015 (UTC)
- @Wavelength: The current version of the rule only works when the number and the noun are both lowercase. So none of your examples will be affected. -- John of Reading (talk) 17:02, 19 March 2015 (UTC)
- I have encountered a few challenging expressions in articles about racehorses and horse races. I discussed the challenges on my talk page. I am watching this talk page for any reply from editors of WP:AWB/T.
- —Wavelength (talk) 19:34, 19 March 2015 (UTC)
Another kind of false positive: at Dwight Helminen we have "a 2004 second round draft choice". I fixed this one by adding an HTML comment to the article, but there are a few dozen "2004 second" false positives out there; multiply that by 20 or 30 different years and that's too many, I think. Examples for "2004 second" or "2005 second" have "second" followed by: album/and third/behind/compilation/count/deputy/edition/inauguration/place/release/semi-final/series/studio album/team/vice-deputy. I haven't tried to fix the rule.-- John of Reading (talk) 08:09, 23 March 2015 (UTC)
- @John of Reading: You are completely correct; that is way too many false positives, and I have tweaked the rule "A n-something" to fix it. Please see the next subsection, "Other number phrases", which deals specifically with that rule. Thanks for catching and reporting that. Chris the speller yack 04:32, 24 March 2015 (UTC)
Other number phrases
I was invited to join other number phrases to this, so let's build a list of them and determine whether they are worthy for AWB typo correction. (I meant to do this earlier but got caught up with other things.) I'm assuming that time period phrases using -week and -day will be covered under the above strategy. Stevie is the man! Talk • Work 15:55, 18 March 2015 (UTC)
Please add to this list or adjust items as you like:
- {number}-point (game|performance|spread)
- {number}-mile track
- {number}-inch record
- {number}-liter bottle
- {number}-game series
- {number}-round decision
- {number}-decade hiatus
- I have added a rule ( "A n-something") to handle these and many others; however, it only fixes those that are preceded by 'a' or 'an': a seven-point lead, a two-mile rack, a 7-inch record, a 2-liter bottle, a five-game series, an eight-round decision. It also fixes "n-month" and "n-year" things that are not caught by the earlier rule, such as "a three-month religious retreat". A different rule will still be needed to fix things like "their earlier 8 point lead evaporated". I think this is a pretty healthy start. Chris the speller yack 20:56, 18 March 2015 (UTC)
- Kewl. I'll go test this on a couple wikiprojects' articles. Thanks here too! Stevie is the man! Talk • Work 21:30, 18 March 2015 (UTC)
- It looks like we'll have to prevent fixes for phrases with particular secondary words, or bypass where the number is likely a year. In Bear Bryant, the phrase "a 1935 game against" gets snagged. Stevie is the man! Talk • Work 23:58, 18 March 2015 (UTC)
- Found another one: "a {year} game between". I've run into "a {year} game against/between" in three articles so far. Stevie is the man! Talk • Work 00:53, 19 March 2015 (UTC)
- And another: "a {year} game with". Stevie is the man! Talk • Work 01:02, 19 March 2015 (UTC)
- @Stevietheman: OK, no more messing with "a {year} game". Refresh status/typos, and enjoy! Chris the speller yack 02:22, 19 March 2015 (UTC)
- Thanks again. By the way, there's no need to ping me, as I watch this page. Stevie is the man! Talk • Work 04:35, 19 March 2015 (UTC)
- I have also exempted "a {year} second" and expanded both that and "a {year} game" to avoid anything that might be a 4-digit year, not just the 19th to 21st centuries. Chris the speller yack 04:24, 24 March 2015 (UTC)
- Thanks again. By the way, there's no need to ping me, as I watch this page. Stevie is the man! Talk • Work 04:35, 19 March 2015 (UTC)
- @Stevietheman: OK, no more messing with "a {year} game". Refresh status/typos, and enjoy! Chris the speller yack 02:22, 19 March 2015 (UTC)
- I have added a rule ( "A n-something") to handle these and many others; however, it only fixes those that are preceded by 'a' or 'an': a seven-point lead, a two-mile rack, a 7-inch record, a 2-liter bottle, a five-game series, an eight-round decision. It also fixes "n-month" and "n-year" things that are not caught by the earlier rule, such as "a three-month religious retreat". A different rule will still be needed to fix things like "their earlier 8 point lead evaporated". I think this is a pretty healthy start. Chris the speller yack 20:56, 18 March 2015 (UTC)
I have an idea for expansion of this rule as I just ran into an example ("another 45-day suspension"). So, I would suggest 'another', 'an additional', 'a second' and 'a third' (it's unusual to have more than a third something, I think) in addition to 'a' and 'an'. Stevie is the man! Talk • Work 16:17, 24 March 2015 (UTC)
And here's another example I just ran into: "an astounding 25-second lead". Perhaps also look for "a(n) {word ending in 'ing'}" before the number phrase? Stevie is the man! Talk • Work 18:04, 24 March 2015 (UTC)
- Yes, that would probably fix a few more without risking many false positives. Other possibilities are fixing "her 30 year news career", "their 25 year reunion", and, really, many adjectives, as in "controversial one point defeat". But we'll need to choose carefully to avoid runaway bloat ("controversial" would probably not make the cut). Chris the speller yack 18:19, 24 March 2015 (UTC)
- 'her', 'his', 'their' sound workable too. Also I want to note that I posted ideas directly above "a(n) {word ending in 'ing'}" in case you didn't eyeball them. :) Other adjectives could probably wait for us to see which ones are common and safe. Stevie is the man! Talk • Work 18:33, 24 March 2015 (UTC)
- Also it has occurred to me that there could be a few -ing words we would want to avoid, such as 'landing' (as in "a landing one mile away"). Stevie is the man! Talk • Work 18:45, 24 March 2015 (UTC)
- I ran a test yesterday across all articles in WP Louisville and WP Kentucky (with all above new ideas except words ending in -ing), and his/her/their got significant hits, while the others got no hits. 'his' got the most hits by far. A downside is that I also got a few false positives ("his one year of", "his one game for", "his one game at"). So, it would be useful to test for his/her/their, but I need a suggestion for dealing with the false positives. I imagine we could either check for the 'one' in these cases, or more generally look at the words after year/game/etc. to avoid prepositions. Stevie is the man! Talk • Work 18:35, 26 March 2015 (UTC)
- Here are some possible phrases with cardinal numbers.
- "n-stor(e)y (office, commercial, residential) building, tower, skyscraper"
- "n-page {book, report}"
- "n-act play"
- "n-degree {angle, weather, heat}"
- "n-acre {lot, property, farm, park}"
- "n-watt light bulb"
- "n-string {guitar, (musical) instrument}"
- "n-(bed, bath)room {suite, apartment, building, school(house), dormitory, hotel, hostel, motel, cruiser, (ocean) liner}"
- "n-course meal"
- "n-pin bowling (alley)"
- "n-hole golf (course)"
- "n-horsepower {motor, engine, vehicle}"
- "n-door {car, auto(mobile), vehicle, sedan"
- "n-yard line"
- "n-seater" [word]
- "n-wheeler" [word]
- "n-star {hotel, restaurant, officer, general}"
- "n-lane {road, street, highway, route, thoroughfare, swimming pool}"
- "n-member {association, audience, band, board, clan, class, club, committee, congregation, crew, faculty, family, orchestra, panel, squad, staff, team, tribe}"
- "n-line {verse, stanza, poem}"
- "n-{verse, stanza} poem"
- "n-seat {car, auto(mobile), vehicle, sedan, bus, helicopter, (aero, air)plane, bobsleigh, boat, canoe, kayak}"
- "n-car {train, accident, garage}"
- "n-time {candidate, champion, (prize) winner, loser, president}"
- "n-{carat, karat} gold (See wikt:carat and wikt:karat and "Carat (mass)" and "Carat (purity)".)
- "n-carat diamond
- "ten-gallon hat"
- "three-ring {binder, circus}"
- "one-way {mirror, street, window}"
- "two-decker"
- "two-edged sword"
- "two-minute silence"
- "n-stroke engine"
- "two-way {mirror, street, window}"
- "seven-second delay"
- —Wavelength (talk) 17:43, 27 March 2015 (UTC) and 18:41, 27 March 2015 (UTC) and 02:55, 30 March 2015 (UTC)
- Here are some possible phrases with ordinal numbers.
- "nth-{grade, year} {pupil, student, teacher, instructor, professor, exam(ination), course, test}"
- "nth-stor(e)y {suite, apartment, office, balcony, window}"
- "nth-class {ticket, lever, travel}"
- "nth-century {writer, author, poet, painter, artist, sculptor, composer, musician, book, painting, sculpture, composition, poem, discovery, invention, style, custom, ruler, leader, trip, voyage, journey, expedition, (postage) stamp, coin, medallion}"
- —Wavelength (talk) 17:43, 27 March 2015 (UTC) and 18:41, 27 March 2015 (UTC)
- I have expanded it (before the number) to also accept "first, second, third, additional, his, her, their, its". After the number, I have added "goal, page, member, decker, horsepower". Restrictions after the three-word phrase now include prepositions and verbs and anything else that is not a noun or adjective – "of, for, at, as, in, with, by, is, was" – as Stevie suggested above. Some of the items listed by Wavelength were already covered, or were in my list of things to investigate. I'll keep adding words as the investigation proceeds. Did I just say "three-word phrase? There's another one we all missed. Chris the speller yack 16:31, 31 March 2015 (UTC)
I just had to correct the associated rule to avoid correcting phrases like "1973 stage adaption" or "1982 stage play" or "2007 stage show". Another one I've run into while running AWB is "Of its 10 member schools...", but I'm not sure yet how we should proceed on avoiding that "fix" to "10-member". Stevie is the man! Talk • Work 13:44, 1 April 2015 (UTC)
- Here's a similar phrase I just ran into: "in which its fifteen member universities". Stevie is the man! Talk • Work 14:23, 1 April 2015 (UTC)
- Similarly, I've just run into "between its 157 member countries" -- John of Reading (talk) 10:29, 10 April 2015 (UTC)
- OK. Plugged it for "n member schools", universities, countries, nations, states. Chris the speller yack 16:09, 10 April 2015 (UTC)
- Reworked it to avoid "n member xxxxxxs", any word that ends with 's', including "organizations" and anything else we might run across. Chris the speller yack 16:24, 10 April 2015 (UTC)
- OK. Plugged it for "n member schools", universities, countries, nations, states. Chris the speller yack 16:09, 10 April 2015 (UTC)
"A 300 game" is a technical term in bowling; see Perfect game (bowling). That's an exception, isn't it, that shouldn't be hyphenated? -- John of Reading (talk) 15:36, 10 May 2015 (UTC)
- See also "3,000 hit club", with a recent picture of the day.—Wavelength (talk) 05:47, 12 May 2015 (UTC)
Abbreviation "aka"
Editors might consider making a rule for replacing "aka" and "a.k.a." with "also known as". It occurs in "Fort Abraham Lincoln".
—Wavelength (talk) 00:11, 20 May 2015 (UTC)
- I would definitely oppose that. While it may have originated as an initialism, "aka" is a perfectly acceptable word in British English (complete with OED entry and so forth). People misusing AWB to impose their personal preferences on articles under the guise of "fixes" generate enough problems as it is; if a rule like this is added to the main WP:AWB/T regex the backlash will be very noisy and bad-tempered. – iridescent 10:12, 23 May 2015 (UTC)
- We should be basing the typo fixes on what a preponderance of English dictionaries say. If there's any rule that goes against this, let us know and it will be examined. Stevie is the man! Talk • Work 19:13, 23 May 2015 (UTC)
- WP:ABBR says of "a.k.a." and "AKA" that they "Should only be used in small spaces, otherwise use the full phrase. It does not need to be linked. Never use aka." Never mind that OED explains what the meaning of "aka" is; that dictionary is, to its credit, helping readers who look that up, but that doesn't mean that "aka" is formal enough to use in an encyclopedia entry. Let's test a rule that expands all three abbreviations to find out whether it affects "small spaces" such as tables and infoboxes. Chris the speller yack 05:13, 25 May 2015 (UTC)
- WP:ABBR is not and never has been any kind of policy, it's the personal opinions of the two editors who wrote it, and as with almost all the MOS, compliance with it has never been compulsory; the wave of disdain for the MOS shown here seems to me to be a fairly accurate reflection of the active editor base's attitude towards the small handful of people who want to give the MOS some kind of official status. (What is policy is "Where Wikipedia does not mandate a specific style, editors should not attempt to convert Wikipedia to their own preferred style, nor should they edit articles for the sole purpose of converting them to their preferred style, or removing examples of, or references to, styles which they dislike".) Most editors see AWB as a minor irritant at the best of times; every time an AWB user "corrects" someone's perfectly acceptable grammar, punctuation or hyphenation (particularly if they use an edit summary of "typo fixing" or similar), it just hastens the day when someone loses patience, goes to Arbcom and gets the typo-fixing function of AWB shut down altogether.
- (That sounds like hyperbole, but speaking as both a former Arb and as someone with over 60,000 AWB edits, I can say with a reasonable degree of confidence that if WP:AWB/T were hauled before Arbcom in its current state, it would be shut down. Anyone around long enough to remember Date delinking and the huge stack of bans resulting from it—or remembers what happened to Betacommand—will know how easily Arbcom can go into wrath-of-god mode when it comes to complaints about script-assisted editing. Every edit like this has the potential to convince another editor that the bathwater is so irritating, it justifies throwing out the baby.) – iridescent 10:57, 25 May 2015 (UTC)
- I am unable to participate in a discussion with editors who don't feel that WP should employ reasonably formal language and widely accepted standards of punctuation, such as hyphenation of compound modifiers. There are a few editors who object to the actions of other editors who follow the policies and guidelines that WP provides. The former can, of course, contribute to WP, and the latter will clean up afterwards. This is how things have gone for the entire history of WP. I am dropping out of this discussion. Chris the speller yack 14:59, 25 May 2015 (UTC)
- I totally agree with you. Iridescent's complaint is meritless in my judgment (unless a much better example can be found, as hyphenation of a compound modifier is the way it's supposed to be, not according to me, but according to the rules of English grammar itself which we must enforce). We should be using only formal English supported by a preponderance of dictionaries here. And we should correct articles without any fear. If people don't agree with the typo corrections, they can discuss here. As for guidelines, they should be regarded as the usual way to write an article with very few exceptions. If anyone disagrees with a guideline, they need to take the effort to formally change it, or at least have a public discussion about it. If anyone truly dislikes AWB edits, I realize there are a few cranks out there, but most editors seem to accept their cleanups, which are reasonable the vast majority of the time. Stevie is the man! Talk • Work 15:57, 25 May 2015 (UTC)
- I am unable to participate in a discussion with editors who don't feel that WP should employ reasonably formal language and widely accepted standards of punctuation, such as hyphenation of compound modifiers. There are a few editors who object to the actions of other editors who follow the policies and guidelines that WP provides. The former can, of course, contribute to WP, and the latter will clean up afterwards. This is how things have gone for the entire history of WP. I am dropping out of this discussion. Chris the speller yack 14:59, 25 May 2015 (UTC)
- It is my understanding that AWB doesn't correct typos within tables, infoboxes, etc., so in that respect, it's a go. The only aspect I'm unsure of is whether "a.k.a." isn't considered formal enough in most writings. "aka" surely is too informal. But it seems like "a.k.a." is in common enough use that maybe we shouldn't correct it. *But* maybe let's correct "aka" to "a.k.a.". Stevie is the man! Talk • Work 16:02, 25 May 2015 (UTC)
More comma rules
I see a new rule was created to add a comma after "However" in some circumstances. Commas also belong after "Subsequently" and "Consequently" when they lead a sentence. Can anyone think of instances where they would NOT be followed by commas? Thanks! GoingBatty (talk) 17:36, 5 June 2015 (UTC)
- I added a rule to handle "Consequently" and some other transitional words. I held off for "Subsequently" because of uses other than as a transitional word: "Subsequently filed reports corroborated his claim that the light had turned red for the garbage truck." "As a result" needed special treatment to avoid "As a result of ...". Chris the speller yack 22:33, 5 June 2015 (UTC)
Encyclopædia Britannica
Editors might wish to make a rule to correct misspellings of Encyclopædia Britannica. Sometimes "e" occurs instead of "æ", and sometimes "ttan" occurs instead of "tann".
—Wavelength (talk) 01:30, 4 June 2015 (UTC)
- @Wavelength: There are already rules in Wikipedia:AutoWikiBrowser/Typos#New additions that cover those scenarios. Do you see places where the typo rules aren't making those corrections but should be? Thanks! GoingBatty (talk)
- GoingBatty, many instances of "encyclopedia brittanica" are listed at https://en.wikipedia.org/w/index.php?title=Special%3ASearch&search=%22encyclopedia+brittanica%22&fulltext=Search.
- —Wavelength (talk) 02:02, 6 June 2015 (UTC)
- @Wavelength: AWB and WPCleaner don't apply the typo rules within citation templates, so these will need to be fixed manually. See Wikipedia:AutoWikiBrowser/Typos#Usage for more information. GoingBatty (talk) 02:12, 6 June 2015 (UTC)
- GoingBatty, thank you. I have now read that section, and I hope that I understand it correctly. I edited "Dordrecht".
- —Wavelength (talk) 02:54, 6 June 2015 (UTC)
- @Wavelength: Great - together we've fixed all the articles with "Encyclopedia Brittanica". GoingBatty (talk) 03:50, 6 June 2015 (UTC)
Problem with a regex
Hi !
I don't know if this is the good place to ask this question. If not, can you please say to me where to ask it ?
I have a problem with this regex. I only want it to detect the second "[[" and not the first. In the example, I only want the regex to detect "[[Krishna (fleuve)]]" and not "[[Nâgârjuna Sâgar]] on the [[Krishna (fleuve)]]".
Do you know where the problem is ? I think it's a question of cupidity, but I don't know where/if I must put the "?,$" symbols.
Thank you ! Simon Villeneuve (talk) 10:36, 15 June 2015 (UTC)
- @Simon Villeneuve: Presuming you want to detect the wikilinks with the parentheses, try
\[\[([\w\s]+) \(([\w\s]+)\)\]\]
instead. Good luck! GoingBatty (talk) 04:14, 16 June 2015 (UTC)- Yes !! Thank you very much ! Simon Villeneuve (talk) 10:44, 16 June 2015 (UTC)
"Master's degree" and "Bachelor's degree" – how about all lower case?
We have a rule that changes "Degree" to "degree" whenever the apostrophe is missing or misplaced in "Master's Degree" or "Bachelor's Degree". But there are many hundreds of articles that are something like "He got his Bachelor's Degree at ..." or "She got her Master's degree in ..."; shouldn't "Master's" and "Bachelor's" be in lower case (when they are not the first word of a sentence, as in "Master's degrees have been awarded since 1952")? It seems safe enough to change "his/her/a Master's degree". Can anyone think of situations where a capital letter is appropriate? I guess "Associate degree" needs the same treatment, too. We would have to split the "Bachelor's/Master's degree" rule into two to effect the capitalization change. Chris the speller yack 00:59, 24 June 2015 (UTC)
- I have no objection. I was wondering why our typo rules were leaving them capitalized anyway. Stevie is the man! Talk • Work 10:21, 25 June 2015 (UTC)
- I have created the three new rules. The existing rule "Bachelor's/Master's degree" is still effective for cases where those begin a sentence. I expect such cases to be rather rare, but I am not anxious to yank that rule yet. Chris the speller yack 21:23, 25 June 2015 (UTC)
FYI
@I dream of horses, Anomie, and Reguyla: (t) Josve05a (c) 19:56, 29 June 2015 (UTC)
Typo rule around_
Could someone add [Aa]rounf
to this rule? (t) Josve05a (c) 13:56, 5 July 2015 (UTC)
- @Josve05a: Done in this edit. GoingBatty (talk) 15:09, 5 July 2015 (UTC)
- Ty! (t) Josve05a (c) 15:25, 5 July 2015 (UTC)
"Therefore" followed by a comma?
"Therefore" was removed from the rule "Furthermore," and the edit summary quoted the Chicago Manual of Style 5.69: "When [transitional adverbs] are used in such a way that there is no real break in continuity and no call for any pause in reading, commas should be omitted." However, CMOS uses the example "I therefore urge you all to remain loyal;" In that case, there is no reason for any pause in reading (and the rule would not try to change it, being in lower case), but when "Therefore" begins a sentence, there would be a pause: "Therefore, viruses in the microbial food web act to reduce the population of bacteria." I therefore suggest that "Therefore" be put back into the rule. How do others feel? Chris the speller yack 13:57, 12 July 2015 (UTC)
- I concur with your position. The typo removal was incorrect. Stevie is the man! Talk • Work 16:34, 12 July 2015 (UTC)
- Also, I find the change to be on the disruptive side, with the changer deciding this on his own, without discussing here first. Stevie is the man! Talk • Work 17:05, 12 July 2015 (UTC)
- Many online punctuation guides specify a comma after transitional words and phrases (such as "therefore") when they begin a sentence. byu.edu • purdue.edu • wheaton.edu. It seems pretty clear that the CMOS 5.69 example was meant to discourage a comma after "therefore" only in the middle of a sentence. I will restore the rule. I think it was removed in good faith. If someone decides to remove it again, it would be nice to point out an example or two where it is thought that AWB is suggesting "many bad changes". Chris the speller yack 02:55, 13 July 2015 (UTC)
- @Chris the speller: How about adding something like
(?!.{1,30},)
to these rules? For example, at Designated Targets a sentence begins "Meanwhile in the Pacific, the Japanese...", and adding yet another comma makes a choppy sentence. -- John of Reading (talk) 06:15, 18 July 2015 (UTC)- @John of Reading:In that example, the comma after the three-word prepositional phrase is superfluous. Only prepositional phrases longer than four words need a comma. Better would be "Meanwhile, in the Pacific the Japanese ..." Chris the speller yack 13:58, 18 July 2015 (UTC)
- @Chris the speller: How about adding something like
- Many online punctuation guides specify a comma after transitional words and phrases (such as "therefore") when they begin a sentence. byu.edu • purdue.edu • wheaton.edu. It seems pretty clear that the CMOS 5.69 example was meant to discourage a comma after "therefore" only in the middle of a sentence. I will restore the rule. I think it was removed in good faith. If someone decides to remove it again, it would be nice to point out an example or two where it is thought that AWB is suggesting "many bad changes". Chris the speller yack 02:55, 13 July 2015 (UTC)
- I'm confused. It seems that it can or needn't have a comma. Therefore no rule should be applied. All the best: Rich Farmbrough, 22:07, 19 July 2015 (UTC).
- I find lots of instructions to use a comma when "Therefore" begins a sentence:
- byu.edu
- "James is not feeling well. Therefore, he will not be here today."
- owl.english.purdue.edu
- "Use a comma after a transitional element (however, therefore, nonetheless, also, otherwise, finally, instead, thus, of course, above all, for example, in other words, as a result, on the other hand, in conclusion, in addition)"
- wheaton.edu
- "Many adverbs that end in –ly and transitions at the beginning of a sentence need to be followed by a comma, too." ... "therefore"
- zencomma.wordpress.com
- "She left early. Therefore, he was lonely." ... "When the conjunctive adverb is at the beginning of the first example, it needs to be followed by a comma to separate it from the rest of the sentence"
- byu.edu
- I find no instructions to omit the comma when "Therefore" begins a sentence. The only instruction to omit the comma was the snippet from CMOS above, but "therefore" in that case does not begin the sentence, and there is no pause when saying "I therefore urge you all to remain loyal". However, there is a pause when saying "Therefore, he will not be here today." There have been hundreds of these changes made, and not one case has been shown to be wrong. Chris the speller yack 04:50, 20 July 2015 (UTC)
- I find lots of instructions to use a comma when "Therefore" begins a sentence:
Possible correction: a 18th-century → an 18th-century
I recently had "a 18th century" correct to "a 18th-century" and I wondered if the 'a' should also be corrected to 'an'. I don't believe we have a rule for these cases, as I tested AWB on an article with the typo (Santa Maria Annunziata di Fossolo, Bologna). There are apparently many articles with this typo. This would also apply to "a 8th", "a 11th", "a 8[0-9]th", etc. Does anyone see an issue with doing a correction like this? Stevie is the man! Talk • Work 16:30, 13 August 2015 (UTC)
- These should be fine. I have rules that look for "a" plus a relevant number plus an optional st/nd/rd/th. There are plenty of false matches where the ordinal suffix is missing, such as "Missa a 8", but I can't remember seeing any false matches where the suffix was present. -- John of Reading (talk) 16:52, 13 August 2015 (UTC)
- OK, I put in a correction for "a 8th", "a 11th" and "a 18th". The regex logic gets hairier with higher numbers but higher numbers seem to be much less common anyway. Stevie is the man! Talk • Work 18:18, 13 August 2015 (UTC)
Informal expression "phenom"
Sometimes an article uses the informal expression "phenom" in reference to a person of phenomenal performance. Here are search results for phenom. One example is the article "Klaas-Erik Zwering" (version of 19:00, 11 August 2014). I propose a rule for changing "phenom" to "phenomenon" in each of those instances, and providing a justification in the edit summaries. However, WP:PEACOCK might take precedence.
—Wavelength (talk) 23:02, 19 August 2015 (UTC)
- I agree that "phenom" is not formal enough for an encyclopedia. But AWB Typos have no way to provide a justification in the edit summary. Chris the speller yack 04:05, 20 August 2015 (UTC)
Hung vs. hanged
My understanding is that "hung" is the proper past tense word, unless you're referring to a hanging, in which case "hanged" is appropriate. Using that thought process, I'm thinking of adding a rule to change "hung himself/herself" to "hanged himself/herself". Any objections or expansions? GoingBatty (talk) 02:40, 31 August 2015 (UTC)
Suggested spelling correction
The rules for months of the year seem quite complicated but the rule for January does not catch the misspelling "Janury". This is a reasonably common error due to phonetic transcription and it's not in Wiktionary as a word in any language. Could somebody add a rule for it, please? BethNaught (talk) 09:11, 1 September 2015 (UTC)
- @BethNaught: Added in this edit. I also fixed each misspelling. Thanks for the suggestion! GoingBatty (talk) 10:18, 1 September 2015 (UTC)
- Many thanks! BethNaught (talk) 10:35, 1 September 2015 (UTC)
minor issue with {{Commons}} → {{Commons category}} conversion
This isn't that big of a deal, but whenever the 1st unnamed parameter in template:commons begins with the "category:" prefix, AWB converts {{Commons}} to {{Commons category}}. In most cases, this isn't an issue, but when an article topic has both a commons gallery and category with the same name, using both templates yields link boxes that appear identical, but link to different pages. E.g., if you run the current version of AWB on the amphetamine article, it will attempt to convert the 2 commons boxes that link to Commons:Amphetamine and Commons:Category:Amphetamine in the Amphetamine#External links section (shown below to the left) to the set of boxes below to the right. Might be worth disabling this fix or preventing the change when a commons gallery and commons category with the same names are linked to via the {{commons}} template. Seppi333 (Insert 2¢) 13:36, 6 September 2015 (UTC)
Wikimedia Commons has media related to Amphetamine. Wikimedia Commons has media related to Category:Amphetamine. |
Wikimedia Commons has media related to Amphetamine. Wikimedia Commons has media related to Amphetamine. |
- Edit: I replaced
{{Commons|Category:Amphetamine}}
with{{Commons category|Amphetamine|Category:Amphetamine}}
to circumvent the problem in this example; the general issue still remains though. Seppi333 (Insert 2¢) 13:57, 6 September 2015 (UTC)
- This isn't AWB/Typos related. Please move discussion to the general discussion area. Stevie is the man! Talk • Work 14:14, 6 September 2015 (UTC)
Academic degrees...
Due to an actual error made by an editor, I've discovered that there's some typo correction going on to say "bachelor's" and "master's" instead of "Bachelor's" and "Master's" when referring to academic degrees being gone through via listification through AWB by @The Quixotic Potato:. The rationale by the editor seems to be that "other sources don't say they have to be capped", but I don't know what our MOS says, never mind the fact that they should be proper nouns as titles, just like "Doctor of Medicine", etc. I don't believe that uncapping is correct, because they are not being referred to in general when relating to a person who has one; in that case they are a specific degree as noted here. MSJapan (talk) 19:47, 13 September 2015 (UTC)
- @MSJapan: I don't think we have a typo correction that changes specific degrees. The corrections here only go after the generic usages ("bachelor's degree"), as far as I know. It's possible an AWB user was correcting something else by their own volition. Could you give us a link to an errant change? Stevie is the man! Talk • Work 19:54, 13 September 2015 (UTC)
- I am only using the default AWB typolist, no custom rules. The Quixotic Potato (talk) 20:17, 13 September 2015 (UTC)
- Then explain this, where it's not academic, or what looks like specific usage here, unless specific usage doesn't apply to particular people getting particular degrees in particular subjects? Also, that fact that it hit and changed three non-academic usages tells me there's a problem somewhere. MSJapan (talk) 22:25, 13 September 2015 (UTC)
- The first one shouldn't have been accepted (apparently part of a proper name), while the second bunch of changes looks correct. The second bunch is referring to a kind of degree in a subject or at a school - they read like generic non-proper usages. Stevie is the man! Talk • Work 23:14, 13 September 2015 (UTC)
- Then explain this, where it's not academic, or what looks like specific usage here, unless specific usage doesn't apply to particular people getting particular degrees in particular subjects? Also, that fact that it hit and changed three non-academic usages tells me there's a problem somewhere. MSJapan (talk) 22:25, 13 September 2015 (UTC)
- I am only using the default AWB typolist, no custom rules. The Quixotic Potato (talk) 20:17, 13 September 2015 (UTC)
- @MSJapan:Thanks, I think I understand your confusion now.
- I already explained three times that the freemason-related edits are my fault, mea culpa, I don't know anything about freemasons, so I thanked you for letting me know. The regexps don't take into account that there are freemasons on this planet. You linked one example, and there are 2 more. I can make AWB skip articles that contain the word "mason". Problem solved.
- The edit in the second link you posted is completely correct, and there is consensus for that edit. You misunderstand the Manual of Style, and your edits on my talkpage contain many misconceptions.
- Maybe this helps:
- The Associated Press Stylebook (AP) recommends no capitals when referring to degrees in general terms (bachelor’s, master’s, doctorate, associate degree) but always capitalizing specific degrees (Bachelor of Arts, Master of Science), whether or not they directly precede or follow a name. The Associated Press Stylebook (AP) recommends no capitals when referring to degrees in general terms (bachelor’s, master’s, doctorate, associate degree) but always capitalizing specific degrees (Bachelor of Arts, Master of Science), whether or not they directly precede or follow a name.
- and
- "Academic degrees are capitalized only when the full name of the degree is used, such as Bachelor of Arts or Master of Social Work. General references, such as bachelor’s, master’s or doctoral degree, are not capitalized."
- You seem to think that "specific usage" means: when used in relation to one specific person. In reality specific usage means when referring to one specific degree (you know, when the full name of the degree is used), like "Master of Social Work".
- "General" in this context means stuff like "bachelor’s degree, master’s degree, associate degree", because this is a category of degrees, and not a specific degree.
- I will now continue with my work, I have waited a long time and I have over 70.000 articles on my todo-list, and it is clear that there is consensus for my edits and that the various manuals of style I could find [2] [3] say that "bachelor’s degree" & "master’s degree" should not be capitalized. The Quixotic Potato (talk) 00:53, 14 September 2015 (UTC)
- I see, so grammarbook isn't OK for support when I have a concern, but it is OK for you to refute me. I get it. Moreover, in that second edit, there are multiple changes, and two instances that specifically say "Bachelor's degree in English" and "Master's degree in English" would seem to fall under your "needs to be capitalized as full names" rule, but they are not. so I would like you to take a look at that again and read the context, which is what I think you are not seeing via this "preparsed list of changes." MSJapan (talk) 01:07, 14 September 2015 (UTC)
So you mean to tell me "So-and-so, who received a Masters' in English" is not supposed to be capitalized, despite it being a "specific" reference to a specific degree? I may not get it, but you clearly can't explain it, yet you want to make thousands of changes to the encyclopedia because of it.MSJapan (talk) 01:17, 14 September 2015 (UTC)
- Sigh. It is quite difficult to have a debate with someone who responds this slowly. Now you've finally explained what you mean after 2 days of waiting. Thank you. "Bachelor of Arts in English" and "Master of Arts in English" would be the full, specific, names. But, to be honest, I am bored and I understand why you are confused and I don't want to debate this bullshit anymore so I'm gonna be bold and remove the typofixes so I can move on. The Quixotic Potato (talk) 01:12, 14 September 2015 (UTC)
- Like I expected my edit has been reverted. Now I don't have to debate this anymore (it is impossible to blame me for the existence of those regexptypofixes now) and I can continue with my work. The regexptypofixes were written by someone else and there is clear consensus to keep them, and you shouldn't blame me for following consensus, even if you disagree with everyone else. I hope you understand that it is up to you to create a new consensus if you want to change the regexptypofixes. I wish I had (even) more patience but I don't want to have to explain this again and again. The Quixotic Potato (talk) 01:37, 14 September 2015 (UTC)
- To be blunt, the opinions of a poster at "grammarbook.com blogs" mean precisely zero on Wikipedia. Can you point to a single genuine style guide that recommends the use of "Bachelor's Degree" and "Master's Degree" in an academic context other than immediately following a name ("John Smith, Bachelor of Arts")? I have never seen one, and I can point you to plenty which say the opposite. I'm one of the most vocal opponents of people using AWB to make inappropriate changes because "it's what the MOS says" (the recent habit of adding gratuitous hyphens after "newly" is one which particularly grates), but as far as I can see The Quixotic Potato is acting entirely correctly here. ‑ iridescent 20:06, 13 September 2015 (UTC)
- AWB/Typos removes the hyphen after "newly". Stevie is the man! Talk • Work 20:14, 13 September 2015 (UTC)
- Relax guys, I think MSJapan made a simple mistake (see also my talkpage). Regexps can be quite confusing. So far we have at least 4 Wikipedians and 3 Manuals of Style that support the current regexps. The Quixotic Potato (talk) 20:09, 13 September 2015 (UTC)
- Thanks. Now I can see the regexp, I think we can word round the problem to avoid future confusion. Fiddlersmouth (talk) 22:53, 13 September 2015 (UTC)
- Well, we still gotta explain it to MSJapan. I will try again (see above, my comment dated 00:53, 14 September 2015). The Quixotic Potato (talk) 00:53, 14 September 2015 (UTC)
- Thanks. Now I can see the regexp, I think we can word round the problem to avoid future confusion. Fiddlersmouth (talk) 22:53, 13 September 2015 (UTC)
- @MSJapan: Alice Jackson Stuart earned a Bachelor of Arts and a Master of Arts in English, which are capitalized per this University of Virginia web page. However, when we use the generic bachelor's degree and master's degree, they would not be capitalized, per The Journal of Blacks in Higher Education article and this University of Virginia web page and this Library of Virginia web page. These are used references for the Wikipedia article on Stuart. Hope this helps! GoingBatty (talk) 01:34, 14 September 2015 (UTC)
- I don't think it is very complicated, but I seem to be unable to explain this to MSJapan. I will use my own username as an example:
- The Quixotic Potato graduated with a bachelor's degree in English.
- The Quixotic Potato graduated with a Bachelor of Arts in English.
- The words "bachelor's degree" are not capitalized because they are referring to a category of degrees. The wordcombo "bachelor's degree in English" is not the specific name of a degree, because the degree is called "Bachelor of Arts".
- But "Bachelor of Arts" in the second example is capitalized because this is the name of a specific degree. The Quixotic Potato (talk)
- They're both "specific degrees" - one can't get a B.S. in English, for example, Therefore, a bachelor's in a field is the same as as bachelor's in <X> in a field. MSJapan (talk) 21:07, 14 September 2015 (UTC)
- Like I explained before, in this context "specific degree" means "the name of a specific degree". You seem to think that just because the wordcombo "bachelor's degree in English" can only refer to 1 specific degree (called Bachelor of Arts in English) it therefore must also be capitalized as if it is the name of a degree. We capitalize "Eiffel Tower", but that doesn't mean that the words "the most famous tower in Paris" should therefore also be capitalized, even though it is clear that they can only refer to the Eiffel Tower...
- When the name of the specific degree is written in full (e.g. Bachelor of Arts in English) then we should use capitalization. The generic "bachelor's degree" should not be capitalized. Do you honestly believe that there is a chance that you are correct and that we are all wrong (all the Wikipedians, the writers of the manuals of style and even the authors of the pages GoingBatty linked to)?
- According to your userpage you are a native speaker of English, you claim you attend or have attended Harvard and have a BA, MBA and MA and you have 10yrs of experience on Wikipedia. According to my userpage I am a potato. I assume you are capable of finding someone who can explain to you why you are wrong (IRL or on Wikipedia). Please stop wasting my time. The Quixotic Potato (talk) 21:51, 14 September 2015 (UTC)
- I think the problem is that you do not understand what you are doing, and if you did, you could explain it. Let me break it down differently, and maybe you'll see where the issue is.
- . "Jane Doe has a bachelor's degree" - refers to a class of degrees - we don't know in what field the degree is in, period. "Dingbat College offers bachelor's degrees" is also a nonspecific reference with the same meaning - they offer multiple degrees in the same class. This is what the "nonspecific wording" is above.
- . "Jane Doe has a Bachelor of Science in Physics" - that is a specific degree, and should also be clear.
- . "Jane Doe has a Bachelor's degree in Physics" - this is also a specific degree, and in fact the same as in #2. It does not list "of Science", but a degree is offered either one way or the other, so omission makes no difference to the specificity of a degree. It is a particular degree class (bachelor's), in a particular field (Physics) just like in #2. These references are what you are changing, thereby claiming that this construction is the same as #1, and it is not. Jane Doe has a degree of a specific class in a specific field; just because the full title is not specified does not change the caps, because the caps aren't tied to the full title, but the specificity of the degree. MSJapan (talk) 01:21, 16 September 2015 (UTC)
- Why do you claim that it matters if we know what field the degree is in? You call this "specificity".
- You claim that "the caps aren't tied to the full title, but the specificity of the degree"....
- I have already explained why you are confused. I don't have to explain stuff to you. If you want me to explain stuff to you then you should pay me. I will simply keep following the consensus. The fact that you disagree with everyone else is irrelevant. Feel free to write your own manual of style. You can probably (ask someone to) write a Javascript that ensures that you see the capitalization you want to see.
- Associated Press Style Guide "bachelor’s degree in English"
- Santa Barbara City College Manual of Style "master’s degree in comparative literature"
- Yale Style Guide: Should be capitalized when the formal title is used; lowercased when used informally (Bachelor of Arts degree/bachelor’s degree; Master of Arts degree/master’s degree).
- Western Michigan University Manual of Style Academic degrees are capitalized only when the full name of the degree is used, such as Bachelor of Arts or Master of Social Work. -- "master's degree in English"
- Dixie State University AP Style Quick Reference: Capitalize, however, when the formal name of the degree is used
- Michigan Technological University Editorial Guide Capitalize academic degrees and disciplines in full, complete use.
- LinguisTech (reference website for language professionals) Degrees should only be capitalized when they are written in full and not when mentioned informally.
- It seems like you claim that we should write: "Jane Doe has a bachelor of science degree." (nota bene: according to you this is not specific, we do not know what field the degree is in).
- In reality we should write "Jane Doe has a Bachelor of Science degree." because "Bachelor of Science" is a specific degree, it is the full formal name of a degree, and the field it is in is irrelevant.
- Here are some more examples:
- Colby-Sawyer College Style Guide Capitalize full and formal names of specific degrees: Bachelor of Science, Bachelor of Arts, Doctor of Philosophy.
- Ferris State University - Marketing and Communications Style Guide: "Use uppercase for specific degrees, such as Master of Science or Bachelor of Arts, lowercase for non-specific degrees, such as bachelor’s degree.".
- We don't have to know what field a degree is in, that is irrelevant, "Master of Science" or "Bachelor of Arts" are both specific degrees and should be capitalized.
- Again, please stop wasting my time. The Quixotic Potato (talk) 02:21, 16 September 2015 (UTC)
- @MSJapan: I didn't see you respond to my post above, so I'll ask again in case you missed it - you don't agree with the capitalization used by The Journal of Blacks in Higher Education article and this University of Virginia web page and this Library of Virginia web page? GoingBatty (talk) 02:44, 16 September 2015 (UTC)
- The Journal of Blacks in Higher Education "bachelor's degree in English"
- University of Virginia web page "bachelor's degrees in political science and law"
- Library of Virginia web page "bachelor's degree in English" - "master's degree in French"
- The Quixotic Potato (talk) 03:52, 16 September 2015 (UTC)
@Going Batty: I did miss it, actually. I don't disagree with it, because my question is "what does the MOS say?", not "what website's usage are we following?" Clearly, this is a big question.
I'm also going to point out that a Bachelor's of Science itself is not a specific degree but a class, and I've got a problem with Quixotic Potato not seeing that, especially if he wants to run a script on thousands of "typos". The further question nobody's asking and should is "why, in an encyclopedia, are we not referring to degrees formally in keeping with the overall tone, as we discourage informal usage?" MSJapan (talk) 01:35, 17 September 2015 (UTC)
- I was thinking about our conversation and the fact that there are many HFA people on Wikipedia. If I would've thought about this sooner I wouldn't have used such imprecise language. I know quite a few, many are colleagues, some are friends, and there is even one that I love, and I've noticed that it is a good idea to communicate slightly differently than I would otherwise do. If you want me to I can try to explain this in a different way that is less ambiguous (I hope). Or we can simply stop talking about this subject; our conversation so far hasn't been very productive. The Quixotic Potato (talk) 22:29, 17 September 2015 (UTC)
- @MSJapan: You make an excellent point about referring to degrees formally, so I have made this edit to the Alice Jackson Stuart, which I believe has capitalization we all agree upon. Thanks for this creative solution! GoingBatty (talk) 23:58, 17 September 2015 (UTC)
- @The Quixotic Potato: I think implementing GoingBatty's format would solve the problem entirely. Therefore, to @GoingBatty: that being the case, is there any chance of getting consensus to make that the accepted form for referring to degrees in both the correction of existing usages as well as for usage in new articles going forward in cases where it is appropriate (mainly for degrees earned by article subjects)? Obviously, references in general terms (such as university offerings or the top-level articles about the degrees in general) are fine, as they do conform to appropriate capitalization and usage already. MSJapan (talk) 05:16, 18 September 2015 (UTC)
- @MSJapan: I'm struggling to think of a typo rule we could use that would reliably know how to change "bachelor's degree" to either "Bachelor of Arts" or "Bachelor of Science" (or anything else), but please suggest something here if you can think of something. As far as getting consensus on what format to use on all articles, you might want to start by posting a suggestion at Wikipedia talk:Manual of Style/Biographies. Good luck! GoingBatty (talk) 20:41, 18 September 2015 (UTC)
- I am glad to see that the topic has changed to something more constructive.
- Both "master's degree in English" and "Master of Arts in English" are accepted forms, that is unlikely to change, I think you want to make "Master of Arts in English" the preferred form.
- I am neutral, because I do not really care.
- If you want to you can edit pages that contain something like:
- "Jane Doe earned her master's degree in English in 1997."
- and change it to:
- "Jane Doe earned her Master of Arts in English in 1997."
- Of course there are many pages that contain something like:
- "Jane Doe earned her master's degree in 1997."
- In that case you will have to find a source that says that it is (for example) a Master of Arts degree. If you are lucky it will state what field the degree is in (e.g. Master of Arts in English).
- AutoWikiBrowser is not (yet) able to help with this task, I think you will have to make edits like this manually. If you want to get consensus to make "Master of Arts" the preferred form then maybe you can propose adding a couple of sentences to the Manual of Style. The Quixotic Potato (talk) 21:55, 18 September 2015 (UTC)
- @The Quixotic Potato: I think implementing GoingBatty's format would solve the problem entirely. Therefore, to @GoingBatty: that being the case, is there any chance of getting consensus to make that the accepted form for referring to degrees in both the correction of existing usages as well as for usage in new articles going forward in cases where it is appropriate (mainly for degrees earned by article subjects)? Obviously, references in general terms (such as university offerings or the top-level articles about the degrees in general) are fine, as they do conform to appropriate capitalization and usage already. MSJapan (talk) 05:16, 18 September 2015 (UTC)
Change A.M. to a.m.?
Per Wikipedia:Manual_of_Style/Dates_and_numbers#Time_of_day I do not see capital A.M. Should we add a typo rule for that? -- Magioladitis (talk) 08:43, 19 September 2015 (UTC)
- I don't have a hard objection to this, but aren't guidelines/MOS issues usually fixed via General Fixes? Some editors may object to something like this being called a typo, but if we say it's MOS, we're on more solid ground. Stevie is the man! Talk • Work 13:10, 19 September 2015 (UTC)
- Stevietheman I think it's better of capitalisation issues are left to typo fixing instead of hard-coded. Especially, after the objections stated below. -- Magioladitis (talk) 14:03, 19 September 2015 (UTC)
- Better not: if you look at AM you see there are several legitimate uses of "A.M." in other contexts. See also wikt:A.M.. It is a Jewish calendar epoch. BethNaught (talk) 13:13, 19 September 2015 (UTC)
- @Magioladitis and BethNaught: If we were to have a rule, it would have to be sure "A.M." was proceeded by #:## or ##:##. GoingBatty (talk) 14:01, 19 September 2015 (UTC)
- OK, that makes sense. Thanks. BethNaught (talk) 09:10, 20 September 2015 (UTC)
- Sounds good, but perhaps we could handle top-of-the-hour times by looking for "at #" before "A.M.". Also, "P.M" could be handled in a wider variety of cases. Stevie is the man! Talk • Work 14:13, 19 September 2015 (UTC)
- @Magioladitis and BethNaught: If we were to have a rule, it would have to be sure "A.M." was proceeded by #:## or ##:##. GoingBatty (talk) 14:01, 19 September 2015 (UTC)
More typo possibilities - Give me your thoughts
While testing the previous discussed typo correction, I ran into four additional correction ideas, one I've already implemented because it seemed straightforward enough. Here's the other 34:
- "an nth generation something" → "an nth-generation something"
- Partly done "an nth-placed finish" or "an nth placed finish" → "an nth-place finish" (I'm not sure if using 'placed' here is disallowed in formal English)
- Done "an nth minute opener/substitute" → "an nth-minute opener/substitute" (sports-related; possibly other following words)
- "a/some/the left over" → "a/some/the leftover" ("left over" is incorrect in this context according to Wiktionary, although "left-over" is considered an alternative (but maybe change for consistency?); possibly other preceding words)
Thoughts, yeas/nays on these? Stevie is the man! Talk • Work 21:09, 13 August 2015 (UTC)
- I just added another idea to the list. Stevie is the man! Talk • Work 22:41, 19 August 2015 (UTC)
- After some research and testing, I implemented "nth-minute something". "nth-generation something" seems a little hairy to me at the moment but if someone else would like to take that on, be my guest. The remaining ones need grammar review, and I'm not sure exactly where to turn yet. @Chris the speller: do you have any thoughts about these suggestions, or where I may look to research the grammar aspects? Stevie is the man! Talk • Work 16:57, 20 August 2015 (UTC)
- For "nth-place finish", it continues to be unclear whether "placed" instead of "place" is incorrect grammar, but I added a seek for the "placed" possibility in "nth-place something" to fix more typos while preserving "placed". Stevie is the man! Talk • Work 14:11, 23 September 2015 (UTC)
Please add
These are some typos I found just by doing a search. I already fixed most of them manually but it's good to be proactive I think and include them in AWB in case they pop up again. These are just suggestions though, I won't be offended if anyone opposes any of these.
- bradcast --> broadcast (EXCEPT "The BradCast" capitalized, which appears to be the title of a radio show, but there are other legit spelling errors for this one, surprisingly)
- suprass/ed/ing --> surpass/ed/ing
- inpending --> impending
- inpenetrable --> impenetrable
Thanks. -- Ϫ 04:39, 27 September 2015 (UTC)
- @OlEnglish: If you fixed most of them, how many of these errors were there? We generally don't add a Typo rule unless a couple of dozen or more instances of a misspelling have been seen. A quick look at your contributions did not turn up any large-scale correction of these goofs. Chris the speller yack 13:36, 27 September 2015 (UTC)
- @OlEnglish: If "Bradcast*" (capital B, lowercase c, any suffix) should always have an "o" added, the existing "broadcast" rule could be tweaked. It appears the "Imp-" rule would already fix "inpenetrable". GoingBatty (talk) 14:13, 27 September 2015 (UTC)
- @Chris the speller: Hmm. Yes, there were less than a dozen of the above mentioned errors, but considering how few spelling errors are left to fix in Wikipedia, I would think that any spelling error occurring even twice, made by different users, would be common enough and good enough reason to add a rule, so as to prevent any future occurrences. But perhaps there is a technical reason I'm not aware of, oh well. @GoingBatty: Ah that's good. Thank you. -- Ϫ 20:22, 27 September 2015 (UTC)
- And voila, a quick peek at the talk archives uncovered a discussion mentioning just such a technical reason. So I answered my own question :) -- Ϫ 20:32, 27 September 2015 (UTC)
- @OlEnglish: I updated the "Broadcast" rule to also fix "Bradcast" (but not "BradCast") and fixed all the articles. GoingBatty (talk) 21:17, 27 September 2015 (UTC)
- @OlEnglish: If "Bradcast*" (capital B, lowercase c, any suffix) should always have an "o" added, the existing "broadcast" rule could be tweaked. It appears the "Imp-" rule would already fix "inpenetrable". GoingBatty (talk) 14:13, 27 September 2015 (UTC)
Chhattisgarh, Jharkhand, and Telangana
Please include the following commonly misspelt Indian states:
- Jharkand (Jharkhand)
- Chattisgarh, Chhatisgarh (Chhattisgarh)
- Chattisgarhi, Chhatisgarhi (Chhattisgarhi, the language)
- Telengana (Telangana)
I had a round cleaning these up a few months ago. But more appear to have cropped up. Thanks.--Cpt.a.haddock (talk) 17:39, 8 September 2015 (UTC)
- I added a rule for Chhattisgarh and Chhattisgarhi, and it also fixes lower case. Chris the speller yack 16:05, 9 September 2015 (UTC)
- @Chris the speller: Thanks. Are the others not notable enough?--Cpt.a.haddock (talk) (please ping when replying) 12:46, 28 September 2015 (UTC)
- @Cpt.a.haddock: I have not investigated the "Jharkand" and "Telengana" misspellings yet; I haven't given up on them, just stepped away for a while. But "Jharkand" only pops up in 12 articles, and we usually only add a rule if a couple of dozen are found. I plan to do something for "Telengana" soon. Chris the speller yack 14:12, 28 September 2015 (UTC)
- @Chris the speller: Ah. Those 12 have cropped up in the last couple of months. Cheers.--Cpt.a.haddock (talk) (please ping when replying) 15:28, 28 September 2015 (UTC)
- @Cpt.a.haddock: I added a rule to fix "Telengana", and it also capitalizes "telangana". Also, three sections below this one is a section "Please add", where the topic of adding rules for a small number of misspellings is discussed. Chris the speller yack 19:54, 28 September 2015 (UTC)
- @Chris the speller: Ah. Those 12 have cropped up in the last couple of months. Cheers.--Cpt.a.haddock (talk) (please ping when replying) 15:28, 28 September 2015 (UTC)
- @Cpt.a.haddock: I have not investigated the "Jharkand" and "Telengana" misspellings yet; I haven't given up on them, just stepped away for a while. But "Jharkand" only pops up in 12 articles, and we usually only add a rule if a couple of dozen are found. I plan to do something for "Telengana" soon. Chris the speller yack 14:12, 28 September 2015 (UTC)
- @Chris the speller: Thanks. Are the others not notable enough?--Cpt.a.haddock (talk) (please ping when replying) 12:46, 28 September 2015 (UTC)
Prince-elector
AWB just offered to "correct" Prince-elector in Albert of Mainz to Prince-elect.
Before: Cardinal '''Albert of Brandenburg''' ({{lang-de|Albrecht von Brandenburg}}; 28 June 1490 – 24 September 1545) was [[Prince-elector|Elector]] and [[Archbishop of Mainz]] from 1514 to 1545, and [[Archbishop of Magdeburg]] from 1513 to 1545.
After: Cardinal '''Albert of Brandenburg''' ({{lang-de|Albrecht von Brandenburg}}; 28 June 1490 – 24 September 1545) was [[Prince-elect|Elect]] and [[Archbishop of Mainz]] from 1514 to 1545, and [[Archbishop of Magdeburg]] from 1513 to 1545.
Regards, Bazj (talk) 11:20, 9 November 2015 (UTC)
- @Bazj: I can't reproduce this, and cannot see anything in WP:AWB/T that would do it. Also, AWB's built-in typo fixer never adjusts the left hand side of a piped link. Can you post your settings file somewhere so that someone can try it using exactly the same settings as you are using? -- John of Reading (talk) 11:58, 9 November 2015 (UTC)
Spelling of "toponymy"
Please add a rule to correct "toponomy" to "toponymy", in harmony with wikt:toponymy.
—Wavelength (talk) 02:33, 30 September 2015 (UTC)
- toponomy redirects to toponymy, and the latter article doesn't mention "toponomy" as an alternate spelling. However, wikt:toponomy says it's an "Alternative spelling of toponymy". GoingBatty (talk) 03:15, 30 September 2015 (UTC)
- The word "toponymy" means "the study of place names", but "toponomy" has other definitions (sometimes in addition to "toponymy") according to http://www.onelook.com/?w=toponomy (#6, 7, 8, 10, 11). Therefore, the spelling "toponomy" is ambiguous, but the spelling "toponymy" is unambiguous. The combining form "‑onymy" means "name" (http://www.etymonline.com/index.php?allowed_in_frame=0&search=toponymy&searchmode=none), but the combining form "‑nomy" means "arranging, regulating" (http://www.etymonline.com/index.php?allowed_in_frame=0&search=astronomy&searchmode=none).
- —Wavelength (talk) 20:10, 30 September 2015 (UTC)
- @Wavelength: Based on your most recent post, it seems that this would not be a good candidate for a typo rule. GoingBatty (talk) 23:24, 1 October 2015 (UTC)
- @GoingBatty: Thank you for your consideration of my request.—Wavelength (talk) 23:34, 1 October 2015 (UTC) and 00:03, 2 October 2015 (UTC)
- W. K. Sullivan used the spelling toponomy in the Encyclopaedia Britannica in 1876 (he was writing about poems), but it seems clear that the normal modern spelling is toponymy for the place-name sense. We have over 100 instances of toponomy, lots of them as section headings. I'd be inclined to change them to the usual spelling, but perhaps by hand? Dbfirs 17:54, 27 December 2015 (UTC)
False positives for teamwork rule
I'm going through all the articles that contain "team work" and letting the typo rule change it to "teamwork" where appropriate. However, there are many false positives, such as "he and his team work on...." Anyone want to see if they can tweak the rule to ignore some of these false positives? Thanks! GoingBatty (talk) 14:24, 27 December 2015 (UTC)
- I added that one, and I'll be happy to fix the false positives. Could you list the articles that have them so I can test a new rule? Stevie is the man! Talk • Work 15:26, 27 December 2015 (UTC)
- I don't understand why you want to impose concatenation. I agree that the concatenated form is becoming more common under American influence, but why are we imposing a rule at all? Dbfirs 16:04, 27 December 2015 (UTC)
- Webster, Oxford and Wiktionary don't show an alternative of "team work". If anyone can show "team work" is a valid alternative in today's use of English, then this should be removed. Stevie is the man! Talk • Work 16:15, 27 December 2015 (UTC)
- The big Oxford (Second Edition) has both team work and team-work with just one cite for teamwork as a concatenation. I agree that usage has changed considerably over the past fifty years, and that entry has not yet been updated for the Third Edition. Perhaps I'm just out of date? Does Wikipedia have a policy of preference for concatenated forms? Dbfirs 16:47, 27 December 2015 (UTC)
- I don't have answers for those questions (maybe someone else does). We're just in the business of correcting typos. So, perhaps it should be determined if "team work" is a typo in the sense that this isn't the common contemporary usage. Based on my lookups when I created the rule, I couldn't find "team work". I don't see how this correction could be controversial; however,if anyone wants to start a process for determining consensus on keeping it or not, that would be perfectly fine. I'm not married to the idea of keeping it. Stevie is the man! Talk • Work 17:18, 27 December 2015 (UTC)
- Well obviously you won't find an entry for team work in most dictionaries because it consists of two words. The OED has it under team. The two-word term would merit an entry in Wiktionary since it satisfies the three cites rule, but you will not find an entry because the expression is just the sum of the two parts. If you want some cites, here are three: The efforts undertaken by every team member for the achievement of the teams objective is known as team work. from managementstudyguide.com; Good understanding of team work and team building are critical for your business success or corporate office career from time-management-guide.com; and Everyone agrees team work is (like apple pie) a good thing from the UK National Health Employers' website. There are many more. Dbfirs 17:41, 27 December 2015 (UTC)
- Also obvious is that teamwork as one word is extremely common (type 'teamwork' on Google). I will abide by a consensus decision either way, but it seems to me that 'teamwork' is by far the prevalent usage, and so that's my stance. I don't see harm from this correction at any rate. Stevie is the man! Talk • Work 00:37, 28 December 2015 (UTC)
- Well obviously you won't find an entry for team work in most dictionaries because it consists of two words. The OED has it under team. The two-word term would merit an entry in Wiktionary since it satisfies the three cites rule, but you will not find an entry because the expression is just the sum of the two parts. If you want some cites, here are three: The efforts undertaken by every team member for the achievement of the teams objective is known as team work. from managementstudyguide.com; Good understanding of team work and team building are critical for your business success or corporate office career from time-management-guide.com; and Everyone agrees team work is (like apple pie) a good thing from the UK National Health Employers' website. There are many more. Dbfirs 17:41, 27 December 2015 (UTC)
- I don't have answers for those questions (maybe someone else does). We're just in the business of correcting typos. So, perhaps it should be determined if "team work" is a typo in the sense that this isn't the common contemporary usage. Based on my lookups when I created the rule, I couldn't find "team work". I don't see how this correction could be controversial; however,if anyone wants to start a process for determining consensus on keeping it or not, that would be perfectly fine. I'm not married to the idea of keeping it. Stevie is the man! Talk • Work 17:18, 27 December 2015 (UTC)
- The big Oxford (Second Edition) has both team work and team-work with just one cite for teamwork as a concatenation. I agree that usage has changed considerably over the past fifty years, and that entry has not yet been updated for the Third Edition. Perhaps I'm just out of date? Does Wikipedia have a policy of preference for concatenated forms? Dbfirs 16:47, 27 December 2015 (UTC)
- Webster, Oxford and Wiktionary don't show an alternative of "team work". If anyone can show "team work" is a valid alternative in today's use of English, then this should be removed. Stevie is the man! Talk • Work 16:15, 27 December 2015 (UTC)
- @Stevietheman: Here's the list of articles where I thought it was NOT appropriate to change "team work" to "teamwork":
- I don't understand why you want to impose concatenation. I agree that the concatenated form is becoming more common under American influence, but why are we imposing a rule at all? Dbfirs 16:04, 27 December 2015 (UTC)
- Thanks! GoingBatty (talk) 02:33, 28 December 2015 (UTC)
- You'd also have to consider "team work" as opposed to "individual work" (e.g. a cart being pulled by two oxen instead of one), with "team work" referring to the work performed by the group of individuals, not necessarily referring to active cooperation. Consider also Team#Interdependent teams versus Team#Independent teams; you could talk about "team work" on the part of the latter, even though they're just individuals whose successes get amalgamated. Nyttend (talk) 22:38, 29 December 2015 (UTC)
- Honestly, I am having trouble wrapping my head around this. Perhaps a link to a source that expands upon this would help. Stevie is the man! Talk • Work 17:17, 1 January 2016 (UTC)
- Thank you for the list. I could probably work around most of these easily. Stevie is the man! Talk • Work 17:17, 1 January 2016 (UTC)
- You'd also have to consider "team work" as opposed to "individual work" (e.g. a cart being pulled by two oxen instead of one), with "team work" referring to the work performed by the group of individuals, not necessarily referring to active cooperation. Consider also Team#Interdependent teams versus Team#Independent teams; you could talk about "team work" on the part of the latter, even though they're just individuals whose successes get amalgamated. Nyttend (talk) 22:38, 29 December 2015 (UTC)
- Thanks! GoingBatty (talk) 02:33, 28 December 2015 (UTC)
This agglutination has been a problem with AWB typos for years. It needs very careful and informed consideration before such rules are added. All the best: Rich Farmbrough, 01:33, 1 January 2016 (UTC).
- We're always happy to deal with false positives, or re-consider entries. But we can't do that unless someone brings up a specific issue. Stevie is the man! Talk • Work 17:17, 1 January 2016 (UTC)
- Well thank you for that. I have been offering specific issues since the day this talk page was started in August 2006, but since I no longer use AWB for typo fixing, I don't come across many specific issues. I am still happy to provide some shreds of institutional memory when I can. All the best: Rich Farmbrough, 00:42, 2 January 2016 (UTC).
- Well thank you for that. I have been offering specific issues since the day this talk page was started in August 2006, but since I no longer use AWB for typo fixing, I don't come across many specific issues. I am still happy to provide some shreds of institutional memory when I can. All the best: Rich Farmbrough, 00:42, 2 January 2016 (UTC).
Removed - this is just too complicated. Thanks to everyone who tried to explain that, but it took looking at the various examples deeply to realize how complicated a working rule would have to be. Therefore, I conclude it's not worth it to have the rule. Stevie is the man! Talk • Work 22:40, 6 January 2016 (UTC)
New typo rule?
Hey there. I was doing some typo correction and when I got to Chalco de Díaz Covarrubias AWB wanted to correct "native americans" to "native Americans". I'm thinking "Native Americans" is the correct capitalization. If that sounds reasonable, would someone mind adding a new rule? I would do it myself but I don't know how. Cheers. Braincricket (talk) 21:28, 7 December 2015 (UTC)
- @Braincricket: Added in this edit. GoingBatty (talk) 01:56, 8 December 2015 (UTC)
- @GoingBatty:Thanks! Braincricket (talk) 19:52, 8 December 2015 (UTC)
- @Braincricket and GoingBatty: Just so you know, I processed nearly 10,000 pages over the past two days with AWB and → Native American was among the top suggestions. Cheers!
{{u|Checkingfax}} {Talk}
07:59, 13 January 2016 (UTC)
- @Braincricket and GoingBatty: Just so you know, I processed nearly 10,000 pages over the past two days with AWB and → Native American was among the top suggestions. Cheers!
If word or phrase is already piped, I consider ]]'s to be a typo
In a situation like this: [[Pressman Toy Corporation|Pressman company]]'s, I consider putting the 's outside the brackets to be a typo. It is crufty and ugly, and should be like this: [[Pressman Toy Corporation|Pressman company's]]
Can somebody create a typo fix rule to scan for those and fix them? The regex search would be for wikilinks that are piped and contain a ]]'s instead of a 's]] and the replacement would be to replace the ]]'s right-hand portion of the wikilink with a 's]] Cheers! {{u|Checkingfax}} {Talk}
09:18, 13 January 2016 (UTC)
- As far as I know, we don't do MOS-related fixes as typos - we only look at misspellings and limited grammar misuses. Also, I'm not sure where in the MOS there is a preference for one approach over the other. If you can point that out in the MOS, that would be something the AWB developers could look at. Stevie is the man! Talk • Work 12:02, 13 January 2016 (UTC)
- @Checkingfax: Interesting how [[toy]]s and [[toys]] look the same to the reader (toys and toys), but links followed by an apostrophe do not (Pressman company's vs. Pressman company's) per MOS:PIPE. Since AWB ignores the typo rules within wikilinks, I don't know if the AWB typo rules could technically support your request. Instead, you could try a find and replace rule, such as:
\[\[(.*?)\|(.*?)\]\]'s
→[[$1|$2's]]
(which I have not tested). Hope this helps! GoingBatty (talk) 17:05, 13 January 2016 (UTC)
people that → people who
There are many instances of "people that" which should be changed to "people who", but quite a few false positives. I'm thinking that "people that verb" are those to be changed, but it's hard to create a regex for that. Maybe \bpeople\s+that\s+([a-z]+ed|had|went|were)\b
would catch many past tense verbs. Any other suggestions? GoingBatty (talk) 19:33, 4 February 2016 (UTC)
- Trying out this rule - feedback is appreciated, as always. GoingBatty (talk) 19:13, 6 February 2016 (UTC)
- Two of my edits were reverted, so I did more research, and not everyone agrees that "people that" is wrong. For example, see 10 Grammar Mistakes People Love To Correct (That Aren't Actually Wrong). GoingBatty (talk) 20:29, 6 February 2016 (UTC)
- Hello; the construction "people that" (or "man that", "person that", etc.) is in fact recommended by certain style guides (especially in the social sciences) for restrictive clauses. The change to "who" (which is also common and acceptable) is unnecessary. Thanks. Doremo (talk) 21:53, 6 February 2016 (UTC)
- Two of my edits were reverted, so I did more research, and not everyone agrees that "people that" is wrong. For example, see 10 Grammar Mistakes People Love To Correct (That Aren't Actually Wrong). GoingBatty (talk) 20:29, 6 February 2016 (UTC)
Well-received
<Typo word="well received" find="\b([Ww])ell-received\b(?=\.| by\b| in\b| at\b)" replace="$1ell received"/>
"Well-received" is a not a mistake. It's an attested variant of "well received", as with many other compound adjectives beginning with "well-". [5][6] Deryck C. 21:52, 18 February 2016 (UTC)
- The existing rule only changes "well-received" when the compound modifier is postpositive. This is in agreement with collinsdictionary.com, macmillandictionary.com and dictionary.com. No action is needed here. Chris the speller yack 23:29, 18 February 2016 (UTC)
Therefore, ?
I've noticed that a new feature of AWB's typo correcting actions is adding a comma to the following words:
Accordingly, Consequently, Furthermore, Indeed, Meanwhile, Moreover, Nevertheless, Therefore, etc.
I have recently received a message on my talk page informing me that my AWB edit changing Therefore to Therefore, was 'not good style' according to the Chicago Manual of Style 5.69
I'm not sure if there has already been a discussion about this somewhere.
Could someone please look into this as it could potentially affect 100's of AWB edits.
Kind Regards Marek.69 talk 22:19, 23 February 2016 (UTC)
- This was indeed discussed recently. Stevie is the man! Talk • Work 23:02, 23 February 2016 (UTC)
- I'm going to post over there. I still think it's not warranted. - Eponymous-Archon (talk) 23:33, 23 February 2016 (UTC)
- Thanks for the link Stevietheman. Best Regards -- Marek.69 talk
- Sorry to raise this agin, but I still think these "corrections" are unwarranted (agreeing with Rich). The Chicago Manual of Style indeed lacks a direct example of an initial "therefore" in the section cited above, but at several points it gives examples for other items that use such an initial "therefore" without a comma. For example, at "6.55 Semicolons with “however,” “therefore,” “indeed,” and the like" it provides an example with a clause-initial "therefore" that lacks a comma: The trumpet player developed a painful cold sore; therefore plans for a third show were scrapped.; or 5.199 he had betrayed the king; therefore he was banished. In contrast Appendix A of the same work provides an example with a comma in its text (that is, not in an example), so the manual is not even consistent (my point). Clearly other style guides disagree, but many will also say that short initial phrases shouldn't get a comma. My point simply is that this is a matter of preference and not at all a settled "rule", and so should be left alone. - Eponymous-Archon (talk) 6:50 pm, Yesterday (UTC−5)
- @Eponymous-Archon: I'm not a punctuation expert, so I won't try to express an opinion on your general point. But your examples with lowercase "therefore" won't be changed by the AWB rule as it stands, because the rule is only written to add a comma after uppercase "Therefore". -- John of Reading (talk) 16:14, 24 February 2016 (UTC)
- Sorry to raise this agin, but I still think these "corrections" are unwarranted (agreeing with Rich). The Chicago Manual of Style indeed lacks a direct example of an initial "therefore" in the section cited above, but at several points it gives examples for other items that use such an initial "therefore" without a comma. For example, at "6.55 Semicolons with “however,” “therefore,” “indeed,” and the like" it provides an example with a clause-initial "therefore" that lacks a comma: The trumpet player developed a painful cold sore; therefore plans for a third show were scrapped.; or 5.199 he had betrayed the king; therefore he was banished. In contrast Appendix A of the same work provides an example with a comma in its text (that is, not in an example), so the manual is not even consistent (my point). Clearly other style guides disagree, but many will also say that short initial phrases shouldn't get a comma. My point simply is that this is a matter of preference and not at all a settled "rule", and so should be left alone. - Eponymous-Archon (talk) 6:50 pm, Yesterday (UTC−5)
Meanwhile Gardens
There is a place called "Meanwhile Gardens". AWB inserts a comma after Meanwhile. Can someone update this regex to exclude "Meanwhile gardens" please? The Quixotic Potato (talk) 12:03, 11 March 2016 (UTC)
- Done -- John of Reading (talk) 15:25, 11 March 2016 (UTC)
Franciscans-based
At least 3 times now, AWB has hit Spanish conquest of Yucatán with "Franciscans-based" in the phrase probably the majority of the Franciscans based there at the time. This is not a typo. Simon Burchell (talk) 12:11, 7 March 2016 (UTC)
- @Simon Burchell: I've added {{Not a typo}} to the article so that AWB users won't be prompted to make this mistake again. -- John of Reading (talk) 12:22, 7 March 2016 (UTC)
- That's great - thanks. Simon Burchell (talk) 12:29, 7 March 2016 (UTC)
- I'm not sure of all the ins and outs of this, but generally phrases like this ("something-based") use singular nouns followed by "-based", so no final s on the noun. The exception of course is for place names that end in s, but those should be capitalized, so that could help with the rule that the bot is using. "Franciscans-based", for example, wouldn't be good usage anyway. The rule then is "X based" should be hyphenated, except when X begins with a lower-case letter and ends in a single s. This won't always works since some words end in s in the singular, but not many and a number of those end in two s's (e.g., business). - Eponymous-Archon (talk) 14:12, 7 March 2016 (UTC)
- This seems like another non-ENGVAR neutral change. Americans, perhaps because of their large German heritage, seem to prefer more agglutinative styles, hyphenating (sometimes optionally) where the rest of the world would leave a space, and joining (sometimes optionally) where the rest of the world would use a hyphen. AWB should not be enforcing a national preference, unless it can be demonstrated to comply with WP:COMMON. In general the more separated style makes sense to everyone, the more agglutinative style jars badly for some readers. All the best: Rich Farmbrough, 19:02, 31 March 2016 (UTC).
- I'm OK with an evidence-based discussion of this, including specific examples where typo fixes are unnecessary. We don't want to be correcting something where a correction isn't called for. Stevie is the man! Talk • Work 20:17, 31 March 2016 (UTC)
- This seems like another non-ENGVAR neutral change. Americans, perhaps because of their large German heritage, seem to prefer more agglutinative styles, hyphenating (sometimes optionally) where the rest of the world would leave a space, and joining (sometimes optionally) where the rest of the world would use a hyphen. AWB should not be enforcing a national preference, unless it can be demonstrated to comply with WP:COMMON. In general the more separated style makes sense to everyone, the more agglutinative style jars badly for some readers. All the best: Rich Farmbrough, 19:02, 31 March 2016 (UTC).
- I'm not sure of all the ins and outs of this, but generally phrases like this ("something-based") use singular nouns followed by "-based", so no final s on the noun. The exception of course is for place names that end in s, but those should be capitalized, so that could help with the rule that the bot is using. "Franciscans-based", for example, wouldn't be good usage anyway. The rule then is "X based" should be hyphenated, except when X begins with a lower-case letter and ends in a single s. This won't always works since some words end in s in the singular, but not many and a number of those end in two s's (e.g., business). - Eponymous-Archon (talk) 14:12, 7 March 2016 (UTC)
- That's great - thanks. Simon Burchell (talk) 12:29, 7 March 2016 (UTC)
Changing "for" to "of" after "accused"
Many pages have "for" after "accused", instead of the correct preposition "of". Apparently, some people are confusing "accused of" with "punished for" and "forgiven for".
—Wavelength (talk) 02:50, 12 March 2016 (UTC)
- @Wavelength: There seem to be some false positives. Maybe limiting it to "(is|was) accused for" would work? GoingBatty (talk) 14:08, 3 April 2016 (UTC)
- GoingBatty, thank you for your reply. This now seems to me to be an area of too much uncertainty for editors who do not have a knowledge of the subject matter. Even when I know that "accused for" is incorrect, I do not know whether the author intended "accused of" [a wrong that might not have happened] or "blamed for" [a wrong that definitely happened] or something else. Instead of other editors trying to acquire knowledge of the subject matter, it might be better if one or more editors focus on helping writers of incorrect English to improve their use of English.
- —Wavelength (talk) 01:14, 4 April 2016 (UTC)
- @Wavelength: That's a good reason to not have a type rule for this - thanks for the explanation. GoingBatty (talk) 01:57, 4 April 2016 (UTC)
between ... to→ between ... and
Can anyone see a potential problem with or source of false positives for
<Typo word="between ... and" find="\b([Bb]etween (?:[0-9,.]+|zero|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|(?:twen|thir|four|for|fif|six|seven|eigh|nine)(?:teen|ty)) )to\b" replace="$1and"/>
? I'm not very experienced with regex, so I may have made some mistakes.
Additionally, what am I doing wrong with the search
insource:/([Bb]etween ([0-9,.]+|zero|one|two|three|four|five|six|seven|eight|nine|ten|eleven|twelve|(twen|thir|four|for|fif|six|seven|eigh|nine)(teen|ty)) )to /
?
When I use the search box I get the expected results, but when I put it in AWB using "Wiki search (text)", I get a very large number of anomalous results. — crh 23 (Talk) 13:52, 8 April 2016 (UTC)
- The list returned from that search in AWB appears to be searching for zero in article body and sorting by relevance — crh 23 (Talk) 20:21, 8 April 2016 (UTC)
- I've gone through about 100 edits using a database dump with the above regex, and I have found no false positives, so I'm adding the regex to the list. — crh 23 (Talk) 16:12, 15 April 2016 (UTC)
mali → Mali
Hi, I seem to be getting a lot of false positives on Regex typo fixing mali → Mali. I edit a lot of Eastern European articles in which these false positives are quite frequent (I believe 'mali' means 'little' in Slavic). Is there a way to modify the Regex to overcome this problem? -- Marek.69 talk 21:20, 28 April 2016 (UTC)
- @Marek69: could you give a couple of examples where mali should not be corrected to Mali? — crh 23 (Talk) 11:13, 29 April 2016 (UTC)
"Indeeed" being he last word in a bullet point
Could The code be changed as to not try and "fix the "Indeed[comma]" issue with Oscar Brand discography (§ I Love Cats (1994/5-Alcazar/Alacazam!))? (t) Josve05a (c) 17:36, 2 May 2016 (UTC)
- It looks to me that the problem in the rule causing this, would be fixed by replacing the lookahead
<Typo word="Furthermore," find="\b(Accordingly|Consequently|Even\s+so|Furthermore|In\s+other\s+words|Indeed|Meanwhile(?!\s+Gardens)|Moreover|Nevertheless|On\s+the\s+other\s+hand|Therefore|For\s+example)(?=\s)" replace="$1,"/>
(?=\s)
with(?= )
. However, it looks like a vast majority of the rules on this page use\s
over[ ]
: anyone know why newlines and tabs need to be matched in all those cases? I've not fixed it for now, as I feel like there's something going on that I don't understand. — crh 23 (Talk) 08:52, 3 May 2016 (UTC)
"They where"
I've gone through "They where" a couple of times now changing them to "They were", including another twenty from the last month. I think this would be worth adding to AWB for the future. ϢereSpielChequers 08:56, 12 May 2016 (UTC)
- The mistake is possibly caused by the wine–whine merger, which can also lead to "were is/are/was/were" instead of "where is/are/was/were".
- —Wavelength (talk) 18:42, 12 May 2016 (UTC)
High-profile
The new "High-profile" rule begins "At every word break, look back at the previous text to see whether..." Is this as inefficient as it looks? -- John of Reading (talk) 10:57, 2 August 2016 (UTC)
- @John of Reading: Hmm, not sure. The current regex is
\b(?<!(?:[Bb]ecause\s+of\s+(?:his|her|its|their)|(?:achiev(?:es?|ed|ing)|creat(?:es?|ed|ing)|display(?:s?|ed|ing)|has|have|keep(?:s?|ing)|kept|(?:main|re)tain(?:s?|ed|ing)|with)\s+a)\s+)([Hh])igh(?<!(?:[A-Z][A-Za-z]+|specified|the)\s+High)[-\s]+profile(?!,|\s+(?:and|as|in|of))\b
which without the lookbehinds and lookaheads becomes\b([Hh])igh[-\s]+profile\b
which is a pretty normal regex. I think the order of execution depends on the regex engine, it could be that it does the lookbehinds last. As a side note, not sure why it matches "high-profile" and replace it with itself? Pinging Stevietheman — crh 23 (Talk) 11:30, 2 August 2016 (UTC)
- This regex finds instances of "high profile" as an adjective and replaces it with "high-profile". The "inefficiency" is avoiding uses where "high profile" isn't an adjective. I suppose it doesn't have to seek "high-profile" but when it "replaces" it, there's no change. At any rate, I tested this across all pages of the English Wikipedia that contain this word. Let me know if it incorrectly corrects something. Stevie is the man! Talk • Work 12:10, 2 August 2016 (UTC)
- Also, as far as I know, it processes the lookaheads/lookbehinds after it finds the core text, and in the vast majority of articles, this core text won't be there. Therefore, I don't believe there is an efficiency issue. Stevie is the man! Talk • Work 12:14, 2 August 2016 (UTC)
- If that is indeed the case, then it's perfectly fine. It would only be a problem if the regex is strictly matched left to right, as it would match every word boundary, check the exclusions with the negative lookbehind, and only then attempt to match "[hH]igh profile". I'm going to stop it from matching "high-profile", as that increases the frequency of testing the negative lookbehinds for no benefit (that I can see). Otherwise, I think this looks good. — crh 23 (Talk) 13:42, 2 August 2016 (UTC)
False positive with "onboard" → "on board"
I just wanted to report a false positive for the subject typo fix, in case anyone thinks there's a way to fix it. For now, I've just applied the {{not a typo}} template. Stevie is the man! Talk • Work 19:39, 11 August 2016 (UTC)
Romansh digraph being treated as "id est"
I keep having to revert "corrections" that change the Romansh digraph "ie" to "i.e.": [7], [8], [9]. Might be an issue in other articles too that deal with other languages. --Terfili (talk) 23:37, 12 August 2016 (UTC)
World-famous
@GoingBatty: let's have a discussion about whether the recent correction to Nate Thayer was incorrect or not, and whether we should revert the entire rule based on that. I reviewed the change in the article, and as far as I know, the correction looked, well, correct. My experience with the rule is that I've done a lot of seemingly correct corrections with no complaints whatsoever. Stevie is the man! Talk • Work 21:23, 18 August 2016 (UTC) Also calling Chris the speller. Stevie is the man! Talk • Work 21:26, 18 August 2016 (UTC)
- @Cmacauley: Could you please join this conversation, since you reverted my correction on the Nate Thayer article? Thanks! GoingBatty (talk) 02:28, 19 August 2016 (UTC)
- @GoingBatty: @Stevietheman: @Cmacauley: Have you looked at Macmillan dictionary? Macmillan is very dependable for hyphenation issues, and they make a point of specifying whether compound modifiers are hyphenated only when they appear before the noun. Chris the speller yack 02:38, 19 August 2016 (UTC)
- It's clearly hyphenated no matter what. Also per [10], [11], [12] and [13], applicable in both British and American English. Stevie is the man! Talk • Work 14:34, 19 August 2016 (UTC)
- @GoingBatty: @Stevietheman: @Cmacauley: Have you looked at Macmillan dictionary? Macmillan is very dependable for hyphenation issues, and they make a point of specifying whether compound modifiers are hyphenated only when they appear before the noun. Chris the speller yack 02:38, 19 August 2016 (UTC)
"after it's" → "after its" in "Its (after)" rule?
Recently, I ran into a clause "modeled after it's" that wasn't corrected. Given that Wikipedia frowns upon using contractions in non-quoted prose, does anyone see a problem with fixing "after it's"? Stevie is the man! Talk • Work 19:52, 11 August 2016 (UTC)
- I explored creating a rule (rather, expanding a current rule) for this but it looked to be overly complicated. Instead, I just went through all the cases and corrected them via Find/Replace. Stevie is the man! Talk • Work 11:02, 7 September 2016 (UTC)
Scretary
There are a large number of instances of people typing "scretary" when they mean "secretary" - too many to fix by hand and the number and variety of articles suggests that it will be repeatedly made. It's probably worth doing a run through to fix all the current ones and then adding it to the general fixes list. Thryduulf (talk) 12:27, 31 August 2016 (UTC)
- I went through and manually fixed all of them a few days after leaving the above comment. Today, I've just fixed another few including one "scretaries" → "secretaries". There is definitely scope therefore for this to be added to the list by someone who understands how (I don't). Thryduulf (talk) 22:02, 21 September 2016 (UTC)
- @Thryduulf: Done in this edit. GoingBatty (talk) 15:33, 27 November 2016 (UTC)
Question about # meter lead
1993 World Championships in Athletics – Men's 10,000 metres includes the text "Tanui sprinted out to a quick 5 meter lead, expanding to a 10 meter lead". Why does AWB change this to "10-meter lead", but doesn't change the "5 meter lead"? Thanks! GoingBatty (talk) 15:30, 27 November 2016 (UTC)
- Because "5 meter lead" is preceded by a word we don't check for: 'quick'. Stevie is the man! Talk • Work 15:41, 27 November 2016 (UTC)
"award winning" → "award-winning"
This seems to be a somewhat common misspelling, using a space instead of a hyphen. However, I'm wondering this word's use in puffery should mean we shouldn't correct it. Any thoughts? Stevie is the man! Talk • Work 20:21, 31 December 2016 (UTC)
- There are many instances where the expression seems to be substantiated.
- —Wavelength (talk) 20:49, 31 December 2016 (UTC)
- Thanks. I can see that in my testing, although there does seem to be some puffery uses too. I went ahead and added it. I guess we don't have to be puffery police when we're just correcting typos. :) Stevie is the man! Talk • Work 15:07, 1 January 2017 (UTC)
"instant grat" → "instant great" incorrect
There's a rule which replaces grat with great (named "_Great" in the typo list). However, instant grat (short for instant gratification) is a real concept, referring to songs released on iTunes during the pre-order phase of an album. Could the rule be modified to put in instant grat as an exclusion? Harryboyles 13:36, 14 January 2017 (UTC)
- Done -- John of Reading (talk) 14:07, 14 January 2017 (UTC)
- A very quick test shows that the exception for instant grat works. Thanks! Harryboyles 16:04, 14 January 2017 (UTC)
Halfway vs half way
What on earth is going on with AWB enforcing "halfway" over "half way"? Both spellings are usually considered acceptable, and the choice is stylistic. It is a bit much having AWB enforce an arbitrarily preferred spelling where no error exists. Simon Burchell (talk) 11:40, 14 January 2017 (UTC)
- @Simon Burchell: more appropriate arena for this question is WT:AWB/T. But this specific change was added here by Chris the speller. --Edgars2007 (talk/contribs) 12:27, 14 January 2017 (UTC)
- @Simon Burchell:@Edgars2007: Dictionaries determine what is acceptable, and I found none that accept "half way". While digging through them again, I see that Collins says "also half-way", so I have changed the rule accordingly. It will still change "half way" to "halfway". Chris the speller yack 14:46, 14 January 2017 (UTC)
- A Google search for
"half way" site:theguardian.com
finds about 13,700 results, from that one source alone. Please remove this rule ASAP. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:50, 14 January 2017 (UTC)- General question: Should the Wikipedia be endorsing the misspelling of a word, even if by a mass media outlet? Stevie is the man! Talk • Work 18:59, 14 January 2017 (UTC)
- From the Grauniad's own style guide, halfway, halfwit. Mr Stephen (talk) 23:20, 14 January 2017 (UTC)
- The 13,700 hits is a crock. Some are movie titles, and the first one I examined, "Which plays should you leave halfway through?", has it spelled correctly, and nowhere does it have the wrong spelling. Maybe Google search equates "halfway" and "half way" in some cases. A search for
"halfway through" site:theguardian.com
gives twice as many hits as"half way through" site:theguardian.com
, so apparently most of the contributors to that site get it right. The AWB rule requires "half way" to be followed by a qualifying word such as "across" or "through", and the Google search does not; this is definitely apples and oranges. At any rate, the Gaurnida doesn't outrank a good dictionary. If the rule were removed, it would do nothing to help the many WP articles that have both "halfway" and "half way". Chris the speller yack 18:22, 15 January 2017 (UTC)- Chaucer used "half way" (but spelt it "half wey"); Shakespeare used "half way" in The Taming of the Shrew (but spelt it "halfe way"); California had a place called Half Way; more recently Harold Pinter used "half way" in The Caretaker (1960); various books use "half way" such as Half Way House (Maurice Hewlett), Half Way Home (C. W. Gill - 2011); Half Way to 1983: Governor Tatari's 24 Months in Office : Oct. 1979 - 1981; Half Way Between Everything (Marilyn Dennes, Susan Gilchrist, 2005); " Many inns like this were called “Half Way House” because they were half way between one town or village and the next." (Dolls In Canada by Marion E. Hislop - 1997 - Page 14). I could go on and on ... I agree that we should manually check consistency within one article, but half way, half-way and halfway are all acceptable. Why are we even considering this hypercorrection by bot? Dbfirs 21:11, 15 January 2017 (UTC)
- Our concern is with contemporary English. Also, there is creative license for how language is used in titles. Our concern is what up-to-date reliable English dictionaries recommend. Halfway isn't spelled "half way" in any of them, apparently. Unless anyone can find "half way" as a legitimate spelling, I support the correction. Stevie is the man! Talk • Work 21:22, 15 January 2017 (UTC)
- Both half and way are found in all dictionaries (two-word terms are not listed of course, but they are cited), along with half-way and halfway. We had this argument in 2012. I cited Harold Pinter and a few recent books. I could find lots more.
Could we please also remove midway from the bot?I'd support the correction if it could be limited to articles written in American English where concatenation is the current fashion. Dbfirs 21:28, 15 January 2017 (UTC)- Oxford and Cambridge (U.S. and British) dictionaries don't agree with your position. If they did, they would show "half way" as an alternative. As for arguments, I'm only willing to see cites from dictionaries as arguments. Stevie is the man! Talk • Work 21:50, 15 January 2017 (UTC)
- The OED under the entry half- says: "The two elements are often written separately when the adj. is in the predicate (see half adv. 1); the use of the hyphen mostly implies a feeling of closer unity of notion in the compound attribute, as in half-blind, half-dressed, half-raw, viewed as definite states; but it is often merely for greater syntactical perspicuity, on which ground it is regularly used when the adjective is attributive, thus I am half dead (or half-dead) with cold; a half-dead dog.". I quoted three examples where the OED cites half way (Chaucer, Shakespeare and Pinter; I missed the fourth one by Goldsmith) though I agree that only Pinter is modern. The OED puts the hyphenated form first with the concatenated form as an alternative. Dbfirs 22:04, 15 January 2017 (UTC)
- As you have shown, it's either "half-way" or "halfway" per a dictionary entry (what AWB tolerates at this point). I don't think we should be concerned about a particular writer's (external to Wikipedia) usage. Stevie is the man! Talk • Work 22:21, 15 January 2017 (UTC)
- You are, of course, entitled to your opinion, but I have provided numerous examples of half way as two words in common usage, and you have not established that this usage is proscribed. Dbfirs 00:02, 16 January 2017 (UTC)
- My position is based on dictionary entries, not my feelings. "Common usage" does not bound English writing in an encyclopedia as there is all kinds of incorrect, but common usages. We are going for the correct here. As is usual, please feel free to start an RfC if you would like to establish a consensus decision on this matter. Stevie is the man! Talk • Work 00:19, 16 January 2017 (UTC)
- No dictionary proscribes the adjacent use of two words that appear in the dictionary. Dictionaries do not make rules about style. The Oxford English Dictionary has several cites for half way. I can see that we are not going to agree, so may we compromise by keeping halfway for all articles in American English, and half-way for all articles in British English (since the OED puts the hyphenated form first)? Dbfirs 00:30, 16 January 2017 (UTC)
- I agree with the above. Looking at my previous sentence, I doubt any dictionary contains most of the two-word combinations there, yet they are correct as written. Hyphenated "half-way" is an acceptable compromise, but honestly, "half way" is not wrong. Simon Burchell (talk) 10:16, 16 January 2017 (UTC)
- I am not in agreement per my earlier statements. Dictionaries ordinarily say if separated word usages ("half way") are an alternative. In this case, no dictionaries appear to do that. I side with correct uses only. As for leaving "half-way" as is (btw, two-word term), the current typo rule does this. Also, technically speaking, there is no way to split typo corrections for U.S./British English. If "common use" is going to be forced here, IMHO, this requires a community consensus to go against the dictionaries. Stevie is the man! Talk • Work 15:12, 16 January 2017 (UTC)
- Also I will reiterate that two UK-based dictionaries (Cambridge/Oxford), in their entries for this word, say it's 'halfway' with no alternatives. Stevie is the man! Talk • Work 15:20, 16 January 2017 (UTC)
- Sorry to butt in, but the above is just not true. The OED specifically gives "half-way" as the primary form, with "halfway" relegated to "accepted variant" and numerous examples of the unhyphenated "half way" form in the usage examples. ‑ Iridescent 15:28, 16 January 2017 (UTC)
- No dictionary proscribes the adjacent use of two words that appear in the dictionary. Dictionaries do not make rules about style. The Oxford English Dictionary has several cites for half way. I can see that we are not going to agree, so may we compromise by keeping halfway for all articles in American English, and half-way for all articles in British English (since the OED puts the hyphenated form first)? Dbfirs 00:30, 16 January 2017 (UTC)
- My position is based on dictionary entries, not my feelings. "Common usage" does not bound English writing in an encyclopedia as there is all kinds of incorrect, but common usages. We are going for the correct here. As is usual, please feel free to start an RfC if you would like to establish a consensus decision on this matter. Stevie is the man! Talk • Work 00:19, 16 January 2017 (UTC)
- You are, of course, entitled to your opinion, but I have provided numerous examples of half way as two words in common usage, and you have not established that this usage is proscribed. Dbfirs 00:02, 16 January 2017 (UTC)
- As you have shown, it's either "half-way" or "halfway" per a dictionary entry (what AWB tolerates at this point). I don't think we should be concerned about a particular writer's (external to Wikipedia) usage. Stevie is the man! Talk • Work 22:21, 15 January 2017 (UTC)
- The OED under the entry half- says: "The two elements are often written separately when the adj. is in the predicate (see half adv. 1); the use of the hyphen mostly implies a feeling of closer unity of notion in the compound attribute, as in half-blind, half-dressed, half-raw, viewed as definite states; but it is often merely for greater syntactical perspicuity, on which ground it is regularly used when the adjective is attributive, thus I am half dead (or half-dead) with cold; a half-dead dog.". I quoted three examples where the OED cites half way (Chaucer, Shakespeare and Pinter; I missed the fourth one by Goldsmith) though I agree that only Pinter is modern. The OED puts the hyphenated form first with the concatenated form as an alternative. Dbfirs 22:04, 15 January 2017 (UTC)
- Oxford and Cambridge (U.S. and British) dictionaries don't agree with your position. If they did, they would show "half way" as an alternative. As for arguments, I'm only willing to see cites from dictionaries as arguments. Stevie is the man! Talk • Work 21:50, 15 January 2017 (UTC)
- Both half and way are found in all dictionaries (two-word terms are not listed of course, but they are cited), along with half-way and halfway. We had this argument in 2012. I cited Harold Pinter and a few recent books. I could find lots more.
- Our concern is with contemporary English. Also, there is creative license for how language is used in titles. Our concern is what up-to-date reliable English dictionaries recommend. Halfway isn't spelled "half way" in any of them, apparently. Unless anyone can find "half way" as a legitimate spelling, I support the correction. Stevie is the man! Talk • Work 21:22, 15 January 2017 (UTC)
- Chaucer used "half way" (but spelt it "half wey"); Shakespeare used "half way" in The Taming of the Shrew (but spelt it "halfe way"); California had a place called Half Way; more recently Harold Pinter used "half way" in The Caretaker (1960); various books use "half way" such as Half Way House (Maurice Hewlett), Half Way Home (C. W. Gill - 2011); Half Way to 1983: Governor Tatari's 24 Months in Office : Oct. 1979 - 1981; Half Way Between Everything (Marilyn Dennes, Susan Gilchrist, 2005); " Many inns like this were called “Half Way House” because they were half way between one town or village and the next." (Dolls In Canada by Marion E. Hislop - 1997 - Page 14). I could go on and on ... I agree that we should manually check consistency within one article, but half way, half-way and halfway are all acceptable. Why are we even considering this hypercorrection by bot? Dbfirs 21:11, 15 January 2017 (UTC)
- The 13,700 hits is a crock. Some are movie titles, and the first one I examined, "Which plays should you leave halfway through?", has it spelled correctly, and nowhere does it have the wrong spelling. Maybe Google search equates "halfway" and "half way" in some cases. A search for
- A Google search for
- @Simon Burchell:@Edgars2007: Dictionaries determine what is acceptable, and I found none that accept "half way". While digging through them again, I see that Collins says "also half-way", so I have changed the rule accordingly. It will still change "half way" to "halfway". Chris the speller yack 14:46, 14 January 2017 (UTC)
@Iridescent:, I don't have access to the link you used, but I will take your word for it. The current typo fix doesn't correct half-way or halfway. I don't know why examples of "half way" are provided but not shown as an alternative in the entry itself. Note that my position is not controlling on this typo fix, as others add/update these things, and this typo fix was not created by me, but I frankly don't think there's a strong reason to change my position. If the dictionary thought that "half way" was common enough, it would have noted that as an alternative. I would prefer a community consensus to decide this. Stevie is the man! Talk • Work 15:38, 16 January 2017 (UTC)
Also it might help to see examples of the OED examples of "half way". That might illuminate things. Stevie is the man! Talk • Work 15:40, 16 January 2017 (UTC)
- These are the examples the OED uses of the unhyphenated form. Bear in mind that the OED focuses primarily on earliest usage of a form, so most of them are 17th-century, but there's no suggestion that "half way" isn't still an acceptable usage and it isn't marked as archaic:
- Adv
- c1405 (▸c1390) Chaucer Reeve's Tale (Hengwrt) (2003) Prol. l. 52 Lo Depeford and it is half wey pryme.
- 1530 J. Palsgrave Lesclarcissement 861/2 Halfe waye, au milieu du chemyn, or a my chemyn.
- a1616 Shakespeare Taming of Shrew (1623) i. i. 62 I-wis it is not halfe way to her heart.
- 1717 tr. A. F. Frézier Voy. South-Sea 106 A little above half way up a high mountain.
- 1757 G. Shelvocke, Jr. Shelvocke's Voy. round World (ed. 2) vi. 198 Before I had got half way off.
- 1766 O. Goldsmith Vicar of Wakefield I. x. 96 About half way home.
- 1960 H. Pinter Caretaker iii. 77 He's nutty, he's half way gone.
- Noun
- 1634 T. Herbert Relation Some Yeares Trauaile 13 Cape of good Hope..being the halfe way into India.
- c1665 L. Hutchinson Mem. Col. Hutchinson (1973) 20 In the halfe way betweene Owthorpe and Nottingham.
- Prep
- 1613 S. Purchas Pilgrimage 488 A cloth..which reacheth halfe way the thigh.
- 1706 I. Watts Devotion & Muse in Horæ Lyricæ i. iii, Faint devotion panting lies Half way th' ethereal hill.
- ‑ Iridescent 15:56, 16 January 2017 (UTC)
- Thanks. It should be noted that we don't fix typos in quotes of any text or in titles. Typo fixing is strictly done in encyclopedic prose. Also, I assume that the English Wikipedia is effectively based on Modern English (assumed further to be based on contemporary English dictionary entries), but I don't know where this is stated (we might have to get a ruling on that by itself). In any of the Modern English examples, I don't think the current rule would change them (per the regex, a space followed by across/around/round/between/down/from/into/line/out/point/through/up must follow). Stevie is the man! Talk • Work 16:08, 16 January 2017 (UTC)
- In addition to those cites from the OED, I can find dozens of examples of half way followed by "across/around/round/between/down/from/into/line/out/point/through/up" etc. in modern English. I suggested the compromise because I discovered that the use of half way as two words in British English, though very common in British English when I learnt to read, is becoming less common in the twenty-first century, and is rare in American English. I do admire and approve the excellent work of Chris and Stevie in correcting typos and spellings, but the theory that the use of two words instead of a concatenated form is an incorrect spelling, just because some dictionaries omit to mention the alternative, seems like WP:OR to me. Dbfirs 09:42, 17 January 2017 (UTC)
- I would say the reverse, that I (and I'm only speaking for myself) am going by WP:RS, that is, dictionary entries by widely agreed official sources. Doing research to find examples of different uses or misuses would seem to be leaning to WP:OR. Stevie is the man! Talk • Work 12:53, 17 January 2017 (UTC)
- By your own criterion, the Oxford English Dictionary would not include recent citations (such as Pinter) that contained spelling errors. The interpretation of lack of entries in some dictionaries is what I consider original research. Anyway, if we go with the compromise, there is no original research or loss of honour (or even honor) on either side. Dbfirs 13:22, 17 January 2017 (UTC)
- I would say the reverse, that I (and I'm only speaking for myself) am going by WP:RS, that is, dictionary entries by widely agreed official sources. Doing research to find examples of different uses or misuses would seem to be leaning to WP:OR. Stevie is the man! Talk • Work 12:53, 17 January 2017 (UTC)
- In addition to those cites from the OED, I can find dozens of examples of half way followed by "across/around/round/between/down/from/into/line/out/point/through/up" etc. in modern English. I suggested the compromise because I discovered that the use of half way as two words in British English, though very common in British English when I learnt to read, is becoming less common in the twenty-first century, and is rare in American English. I do admire and approve the excellent work of Chris and Stevie in correcting typos and spellings, but the theory that the use of two words instead of a concatenated form is an incorrect spelling, just because some dictionaries omit to mention the alternative, seems like WP:OR to me. Dbfirs 09:42, 17 January 2017 (UTC)
Opostegidae
There is a family of moths called Opostegidae. AWB regex typo fixing has been changing it automatically to Oppostegidae (note the second p), but it shouldn't. If someone could fix that I would appreciate it. Thank you, SchreiberBike | ⌨ 01:07, 18 January 2017 (UTC)
- @SchreiberBike: Done. This is another rule that matches
[a-z]+
. I'm not keen on those, as they so often end up damaging unusual or foreign words that the rule-writer never thought of. -- John of Reading (talk) 07:09, 18 January 2017 (UTC)
"Comercial"
Can someone add this to the typo list? I'm not really confident in doing it. Comercial → Commercial. Appreciated! --Jennica✿ / talk 01:41, 19 January 2017 (UTC)
- It's probably not feasible to have an AWB correction for this. Comercial is a proper noun in several names, so we would be safe to only look for lower-case uses. 'comercial' is the Spanish/Portuguese word for 'commercial', even though that shouldn't ordinarily be seen in prose. However, it still shows up in spots that AWB will try to "correct", such as lists of titles that aren't properly formatted. Another thing is that this typo appears very infrequently, as I found only 6 articles in the whole of Wikipedia with this typo that could be legitimately corrected. I just corrected most of those. Stevie is the man! Talk • Work 13:16, 19 January 2017 (UTC)
- Oops, I assumed we didn't have this already, but the typo fix is there that avoids it when capitalized. It will still (apparently) fix the lower-case word, and that may be iffy per my previous review. Stevie is the man! Talk • Work 13:32, 19 January 2017 (UTC)
A small "need-fix" report
During some typo scanning I found out a couple of non typos which maybe are easy to fix, in order to avoid problems:
- niger → Niger creates problem with scientific binomial names, where niger is rather common as species name, like in Black duiker Cephalophus Niger see diff.
- Sark based → Sark-based creates an error when changing the Sark based publishing company into the Sark-based publishing company diff
- team mate → teammate, according to User:MilborneOne: common usage is still team mate in Br English diff
- Ganes → Games, became error for Ganes Creek named after Thomas Gane, a very uncommon error I guess and should be easy to see and avoid, I should have identified that one. diff
Dan Koehl (talk) 19:19, 13 February 2017 (UTC)
- I created the teammate typo fix. All the dictionaries I saw, including British ones Cambridge and Oxford, show it without the space. I need more than a user disagreeing with it to change it. I need reliable sources (dictionaries) that show "team mate" is an alternative spelling. I will review the others. Stevie is the man! Talk • Work 19:37, 13 February 2017 (UTC)
- This is what I thought was interesting, because in that case I trusted AWB. Dan Koehl (talk) 19:43, 13 February 2017 (UTC)
- My review so far:
- niger→Niger looks like a tough one. Not sure where to go with that yet.
- Sark based→Sark-based in the diff given actually looks correct. The publishing company is based on Sark (island), therefore it is Sark-based.
- Ganes→Games with 'Creek' following it affects four articles, so I added regex code to avoid it.
Stevie is the man! Talk • Work 20:05, 13 February 2017 (UTC)
- @Stevietheman:
- Id say niger in a scientific name in most cases comes like (Somegenus niger) which means italic, and within parenthesis.
- Regarding Sark-based, please see discussion on my talk page.
- @Stevietheman:
- Thanks so far for your kind assistance. Dan Koehl (talk) 20:35, 13 February 2017 (UTC)
- I have responded to #2 in your user talk. #1 is not quite as simple as it seems. False positive testing for text being inside of italics is not very simple using regex. It can be done, and I've done it in my own Find&Replace's, but it's not reliable enough to replicate for AWB Typos, which needs to avoid an editor's second-guessing as much as possible. Stevie is the man! Talk • Work 20:47, 13 February 2017 (UTC)
- Edit conflictI was responsible for "Sark based"; and I'm definitely not versatile enough to be able to decide when to use proper nouns as noun adjuncts or not. (I now noticed that OED lists ten "London-based" to one "London based"; I did not look for "Sark-based".)
- I suppose that that "niger" invariably should occur as the second part of a species binomen; and these should always be italicised. The first part of the binom should be a capitalised word or an abbreviation "capital"+"full stop" (like Tyrannosaurus rex or T. rex). Is it possible to make the bot avoid a combination like [''][A-Z][[a-z]+|.][ ][niger''] (where the ' and the space perhaps should be quoted; I do not know your regexp conventions)? JoergenB (talk) 20:57, 13 February 2017 (UTC)
- Thanks for your suggestion @JoergenB:. @Stevietheman:, if JoergenBs suggestion doesnt work, maybe an alternative could be to run a search, and put {{Not a typo|niger}} on the instances that can be found on enwiki? Would such operations in general make sense, or it is meant just for a couple of handpicked cases? Dan Koehl (talk) 21:02, 13 February 2017 (UTC)
- The niger→Niger "fix" is odd, because we already have regex that avoids two single quotes after 'niger', and when I run this in my Find&Replace typo test, it doesn't do the "fix" on List of mammals of Ghana. So, the regex works technically, but somehow fails when run as a typo fix. Stevie is the man! Talk • Work 12:45, 15 February 2017 (UTC)
- I have created a phab ticket for this issue. Stevie is the man! Talk • Work 13:39, 15 February 2017 (UTC)
- Per the ticket, it turns out another entry on List of mammals of Ghana had its italics off-balance, and this affected how typos were processed. Stevie is the man! Talk • Work 21:22, 15 February 2017 (UTC)
- Same problem at Wikipsecoes, I removed it, while we are waiting for a result from the Phab ticket. Dan Koehl (talk) 18:06, 20 February 2017 (UTC)
- @Dan Koehl: the phab ticket was closed as invalid. As I said above, the italics used in the article were off-balance. Apparently, the typo fixing software needs italics (two single quotes, or '') to be in total balance for the typo fixes to work properly. I think this has to do with words inside italics being off-limits to such fixes. So, to fix any false-positive you run into, you will need to inspect the article and balance the italics. This shouldn't happen very often. Stevie is the man! Talk • Work 21:35, 20 February 2017 (UTC)
- Same problem at Wikipsecoes, I removed it, while we are waiting for a result from the Phab ticket. Dan Koehl (talk) 18:06, 20 February 2017 (UTC)
arch rival
Dear @Stevietheman:, According to user @Struway2:, in British English, arch rivals is either spaced or hyphenated, please see his revert of my edit.I think I have changed that word on at least 20 other occasions though, without any objections from other users. Dan Koehl (talk) 08:40, 28 February 2017 (UTC)
- Neither OED nor Chambers include archrival as a single word. OED hyphenates it. Searching Chambers for both words gives a list of entries related to arch. It doesn't include rival specifically, but does list the similar arch enemy, spaced. I think the reason you've had no objections is that, even if anyone's watching the articles you've made this change in, many (most?) people ignore typo-fixer edits on the not unreasonable assumption that those who specialise in this field generally get it right. cheers, Struway2 (talk) 10:36, 28 February 2017 (UTC)
- @Dan Koehl, Struway2, and Chris the speller: This one looks shaky indeed. While the typo fix doesn't change the hyphenated version to a non-hyphenated version, it changes those with a space to an unspaced, non-hyphenated version. Even without evidence of "arch rival" being allowed as two separate words, our changing "arch rival" to "archrival" does seem biased to U.S. English. Any thoughts, Chris? Stevie is the man! Talk • Work 21:14, 28 February 2017 (UTC)
- Note that the link above does not lead to OED but to oxforddictionaries.com, which is quite another thing. Changing "arch rival" or "arch-rival" to "archrival" is completely defensible in American English, according to any dictionaries I can find. I am not surprised to find that some version of OED (full, short or compact) can be found that shows it as two words, so perhaps the rule should be retired. Chris the speller yack 23:27, 28 February 2017 (UTC)
- @Chris the speller: oxforddictionaries.com is by Oxford University Press and the website is connected to OED -- thus, I think anyone looking up the spelling of the word would have a case for saying "arch-rival" is a viable alternative spelling. Since AWB typo fixers have to work across articles where either U.S. or UK English could be in force, I think it would be safest to retire the rule. Stevie is the man! Talk • Work 23:44, 28 February 2017 (UTC)
- Note that the link above does not lead to OED but to oxforddictionaries.com, which is quite another thing. Changing "arch rival" or "arch-rival" to "archrival" is completely defensible in American English, according to any dictionaries I can find. I am not surprised to find that some version of OED (full, short or compact) can be found that shows it as two words, so perhaps the rule should be retired. Chris the speller yack 23:27, 28 February 2017 (UTC)
- @Dan Koehl, Struway2, and Chris the speller: This one looks shaky indeed. While the typo fix doesn't change the hyphenated version to a non-hyphenated version, it changes those with a space to an unspaced, non-hyphenated version. Even without evidence of "arch rival" being allowed as two separate words, our changing "arch rival" to "archrival" does seem biased to U.S. English. Any thoughts, Chris? Stevie is the man! Talk • Work 21:14, 28 February 2017 (UTC)
- Agree. Dan Koehl (talk) 23:45, 28 February 2017 (UTC)
- OK, I see that the rule is gone. Fine work! Chris the speller yack 14:26, 1 March 2017 (UTC)
- Agree. Dan Koehl (talk) 23:45, 28 February 2017 (UTC)
honorary
@Stevietheman and Chris the speller:, According to user @Scope creep:, in British English, honorary is spelled honourary, see this diff. Dan Koehl (talk) 02:13, 4 March 2017 (UTC)
- I just did a Google Ngram and it looks like "honourary" is almost never used in British books. See ngram. A couple of web discussions and dictionaries support that. SchreiberBike | ⌨ 03:36, 4 March 2017 (UTC)
- @Dan Koehl, SchreiberBike, and Scope creep: An interesting web source is this page, which explains that "honourary" is a common misspelling. It's understandable that someone might believe that "honourary" is an extension of "honour" and should therefore be spelled with a "u", but I have seen sources that say "honorary" is derived from the French word "honoraire", not the English word "honour". So "honorary" is the correct spelling on either side of the Atlantic. British dictionaries such as collinsdictionary.com and oxforddictionaries.com agree. Wikipedia's style is to use a form that is understandable to all readers. Chris the speller yack 05:12, 4 March 2017 (UTC)
- Thanks, @SchreiberBike and Chris the speller:. Dan Koehl (talk) 07:59, 4 March 2017 (UTC)
- Coolio. scope_creep (talk) 12:07, 4 March 2017 (UTC)
- Thanks, @SchreiberBike and Chris the speller:. Dan Koehl (talk) 07:59, 4 March 2017 (UTC)
- @Dan Koehl, SchreiberBike, and Scope creep: An interesting web source is this page, which explains that "honourary" is a common misspelling. It's understandable that someone might believe that "honourary" is an extension of "honour" and should therefore be spelled with a "u", but I have seen sources that say "honorary" is derived from the French word "honoraire", not the English word "honour". So "honorary" is the correct spelling on either side of the Atlantic. British dictionaries such as collinsdictionary.com and oxforddictionaries.com agree. Wikipedia's style is to use a form that is understandable to all readers. Chris the speller yack 05:12, 4 March 2017 (UTC)
- The latest cite for the spelling honourary in the OED is from 1825, and that dictionary (in it's draft Third Edition, March 2014) tags the spelling as "now nonstandard". Dbfirs 12:48, 4 March 2017 (UTC)
- It looks like Merriam-Webster is showing 'honourary' as legitimate in British English. The question for me, like in the previous "archrival" matter, is whether we should correct a word when a major dictionary anyone can look up says the pre-corrected spelling is accepted. Stevie is the man! Talk • Work 15:40, 4 March 2017 (UTC)
- There is also a suggestion in a comment at the webpage linked by Chris the speller that Australia's Macquarie Dictionary shows 'honourary' as acceptable. I don't have a subscription to it to verify, however. Stevie is the man! Talk • Work 15:57, 4 March 2017 (UTC)
- I think that the spelling 'honourary' should not be corrected, if its accepted. Its a type of word which may be cited from an older book or manuscript, and a correction will in such a case not benefit the text. If seldom used, but still accepted, I think we should let it remain unchanged. Dan Koehl (talk) 16:03, 4 March 2017 (UTC)
- @Dan Koehl: AWB doesn't correct any words in quotations. Also, I don't think we should decide what common usage is, as that tends to be original research and opinion-oriented. I think we need to make this decision strictly on what major dictionaries say. And as long as someone can look up Merriam-Webster and see that 'honourary' is a valid form, our rule needs adjustment or removal. Stevie is the man! Talk • Work 16:10, 4 March 2017 (UTC)
- I think that the spelling 'honourary' should not be corrected, if its accepted. Its a type of word which may be cited from an older book or manuscript, and a correction will in such a case not benefit the text. If seldom used, but still accepted, I think we should let it remain unchanged. Dan Koehl (talk) 16:03, 4 March 2017 (UTC)
- Agree, please adjust or remove. (I guess I expressed myself confusing above, what I meant with "let it remain unchanged" is to accept the use of 'honourary', instead of correcting it with AWB.) Dan Koehl (talk) 16:15, 4 March 2017 (UTC)
- Whoa! Merriam-Webster is an American dictionary, and should not be considered an authority on British English. If dictionaries of British English don't count "honourary" as a standard spelling, then WP should not. Chris the speller yack 16:34, 4 March 2017 (UTC)
- I am in no hurry to actually adjust or remove the rule in this case. M-W may not be an authority on British English, but general readers will see it as a reliable source because it's a major dictionary. Also, like I said above, I'm not sure if Australia agrees with the UK on this. At any rate, I'm willing to accept this as a bit too fuzzy to change right now. Stevie is the man! Talk • Work 16:49, 4 March 2017 (UTC)
- Dictionaries published a century ago should not be considered authorities on current English. If a current dictionary of Australian English considers the spelling valid, then it could be retained in articles written in Australian English, but not in current British English (per the most recent OED). Dbfirs 16:54, 4 March 2017 (UTC)
- M-W is a current major dictionary. Also, we're talking about universal typo correction, where the AWB user, although responsible for their edits, will nevertheless be presented with this correction no matter what article they are working on. AWB Typos doesn't know what variant of English is being used in any particular article, and thus it will try to correct this in any Australia-oriented article. Stevie is the man! Talk • Work 16:58, 4 March 2017 (UTC)
- Sorry, I don't have access to the full current M-W dictionary. Does it really think that "honourary" is current British English? The on-line link was obviously just an extract, possibly outdated. Dbfirs 17:43, 4 March 2017 (UTC)
- I don't have full access, but without any additional information, since M-W is presenting that publicly, I think face value is a stronger aspect than what is possibly under the covers. Now, whether we accept M-W's apparent conclusion can be a different matter. Like I said above, I have concern that what will be normally seen as a RS will continually give us reasonable questioning over this correction by well-meaning wiki editors. Stevie is the man! Talk • Work 17:56, 4 March 2017 (UTC)
- The problem with these simplified websites is that they don't explain what they mean. Perhaps the full dictionary records the historic occasional spelling, and the web developer just copied the word. Merriam-Webster's Learner's Dictionary regards "honourary" as a misspelling. Dbfirs 18:14, 4 March 2017 (UTC)
- I don't have full access, but without any additional information, since M-W is presenting that publicly, I think face value is a stronger aspect than what is possibly under the covers. Now, whether we accept M-W's apparent conclusion can be a different matter. Like I said above, I have concern that what will be normally seen as a RS will continually give us reasonable questioning over this correction by well-meaning wiki editors. Stevie is the man! Talk • Work 17:56, 4 March 2017 (UTC)
- Sorry, I don't have access to the full current M-W dictionary. Does it really think that "honourary" is current British English? The on-line link was obviously just an extract, possibly outdated. Dbfirs 17:43, 4 March 2017 (UTC)
- M-W is a current major dictionary. Also, we're talking about universal typo correction, where the AWB user, although responsible for their edits, will nevertheless be presented with this correction no matter what article they are working on. AWB Typos doesn't know what variant of English is being used in any particular article, and thus it will try to correct this in any Australia-oriented article. Stevie is the man! Talk • Work 16:58, 4 March 2017 (UTC)
- Dictionaries published a century ago should not be considered authorities on current English. If a current dictionary of Australian English considers the spelling valid, then it could be retained in articles written in Australian English, but not in current British English (per the most recent OED). Dbfirs 16:54, 4 March 2017 (UTC)
- I am in no hurry to actually adjust or remove the rule in this case. M-W may not be an authority on British English, but general readers will see it as a reliable source because it's a major dictionary. Also, like I said above, I'm not sure if Australia agrees with the UK on this. At any rate, I'm willing to accept this as a bit too fuzzy to change right now. Stevie is the man! Talk • Work 16:49, 4 March 2017 (UTC)
- Whoa! Merriam-Webster is an American dictionary, and should not be considered an authority on British English. If dictionaries of British English don't count "honourary" as a standard spelling, then WP should not. Chris the speller yack 16:34, 4 March 2017 (UTC)
Everyone also please note this previous meaty discussion from 2013 on this subject. It seemed to be strongly in favor of 'honorary' being correct no matter what, although it didn't go without any objection. Stevie is the man! Talk • Work 17:03, 4 March 2017 (UTC)
- Thanks for finding the "previous meaty discussion", which is one of the places where I saw the source as the French word "honoraire". Note at the end of that discussion the reference to WP:COMMONALITY. Though "honourary" might be tolerated by a few people in the UK this month, in Australia next month, and in Canada the month after that, "honorary" is the much better choice in any country at any time. Even the editor who originally objected said "Coolio". Chris the speller yack 19:16, 4 March 2017 (UTC)
- It's useful that the plaintiff (for loss of a better word) thinks it's OK, although that was said before further discussion appeared. I'm concerned about future plaintiffs bringing to us an entry from a major dictionary that contradicts our position. If we are going to say that OED (and other UK English dictionaries) overrule major U.S. dictionaries on British English, maybe we should state that somewhere. Without additional hard info, I am sanguine about the current rule as long we also have a consensus about UK dictionaries overriding U.S. ones on British English. Re: WP:COMMONALITY, "Insisting on a single term..." seems to oppose universal typo correction if there are regional varieties, but given we don't accept M-W's entry, this doesn't figure in here. If an insistent plaintiff from Australia shows us a hard dictionary entry per my earlier wondering, that may open this up again. Stevie is the man! Talk • Work 19:46, 4 March 2017 (UTC)
- (driveby comment) The OED—who, despite their occasional eccentricities like refusing to acknowledge the "-ise" suffix, are usually the final arbiter when it comes to BrEng usage, only accepts the no-u "honorary" as a legitimate current usage. Their etymology is from the Latin honararius, not the French honoraire, but the basic point (that its etymology is from a word that has never included the letter u, and not from "honour") remains valid. (
In modern British English the spelling of derivative formations and other related words varies between honour- and honor-, with spellings in honor- generally being used for those words where the connection with a Latin etymon or model is more evident, and spellings in honour- for those words where the derivative relationship with honour is most obvious; compare e.g. honoured, honourable, honourless, beside e.g. honorary, honorific, honorand. Johnson 1755 likewise has honourable beside honorary, but there is much more variation among his contemporaries.
if you really want OED chapter-and-verse.) I agree strongly that per WP:COMMONALITY, since there's nowhere on Earth where "honorary" is an invalid spelling, we should only ever be using that other than in direct quotations. (I also agree with the point made at the "meaty discussion" that most of the sources claimed to accept "honourary" as an acceptable spelling turn out to be apocryphal when one actually checks, and that the relevant style guides for Canadian military ranks, Australian academic degrees, etc, actually say nothing of the kind.) ‑ Iridescent 19:56, 4 March 2017 (UTC)
- (driveby comment) The OED—who, despite their occasional eccentricities like refusing to acknowledge the "-ise" suffix, are usually the final arbiter when it comes to BrEng usage, only accepts the no-u "honorary" as a legitimate current usage. Their etymology is from the Latin honararius, not the French honoraire, but the basic point (that its etymology is from a word that has never included the letter u, and not from "honour") remains valid. (
- It's useful that the plaintiff (for loss of a better word) thinks it's OK, although that was said before further discussion appeared. I'm concerned about future plaintiffs bringing to us an entry from a major dictionary that contradicts our position. If we are going to say that OED (and other UK English dictionaries) overrule major U.S. dictionaries on British English, maybe we should state that somewhere. Without additional hard info, I am sanguine about the current rule as long we also have a consensus about UK dictionaries overriding U.S. ones on British English. Re: WP:COMMONALITY, "Insisting on a single term..." seems to oppose universal typo correction if there are regional varieties, but given we don't accept M-W's entry, this doesn't figure in here. If an insistent plaintiff from Australia shows us a hard dictionary entry per my earlier wondering, that may open this up again. Stevie is the man! Talk • Work 19:46, 4 March 2017 (UTC)
high profile
I've reverted this error twice. This is the more recent: "Several years as party leader earned Salmond an unusually high-profile for an SNP politician". This should not be hyphenated. EddieHugh (talk) 23:20, 18 March 2017 (UTC)
- @EddieHugh: This is just a false positive. I will fix it shortly. I tested this typo fix extensively so there shouldn't be many of these. Stevie is the man! Talk • Work 23:26, 18 March 2017 (UTC)
- Done. Atchom, please re-load typo fixes to avoid this false positive. Stevie is the man! Talk • Work 23:38, 18 March 2017 (UTC)
Targeted -> targetted
The Regex typo fixing is making this correction (with the double "t"). But British and American dictionaries both tell me that the original spelling "targeted" was correct. --Gronk Oz (talk) 13:17, 25 March 2017 (UTC)
- @Gronk Oz: The rule is the other way round, I think, converting double-T into single-T. Here's a test diff. -- John of Reading (talk) 14:10, 25 March 2017 (UTC)
- OOPS! You're right John of Reading - I need a red-face icon. Another editor reverted that correction and I took their word for it. Sorry to bother you, and thank you so much for your assistance. --Gronk Oz (talk) 15:09, 25 March 2017 (UTC)
- @John of Reading: I need to ask for your guidance here. The other editor continues to insist that "targetted" is a national variant so it should not be changed. Is there something like a spelling help desk where we can come to a consensus about how to spell a word?--Gronk Oz (talk) 08:37, 27 March 2017 (UTC)
- @Gronk Oz: Try Wikipedia talk:Typo Team. But comparing this list of dictionaries with this list, I would say the case is clear. -- John of Reading (talk) 08:46, 27 March 2017 (UTC)
- Brilliant - I gotta bookmark that site! Though I notice it omits Wiktionary. Wow, thanks for that. {{smiuley2)) --Gronk Oz (talk) 09:39, 27 March 2017 (UTC)
- @Gronk Oz: Try Wikipedia talk:Typo Team. But comparing this list of dictionaries with this list, I would say the case is clear. -- John of Reading (talk) 08:46, 27 March 2017 (UTC)
- @John of Reading: I need to ask for your guidance here. The other editor continues to insist that "targetted" is a national variant so it should not be changed. Is there something like a spelling help desk where we can come to a consensus about how to spell a word?--Gronk Oz (talk) 08:37, 27 March 2017 (UTC)
- OOPS! You're right John of Reading - I need a red-face icon. Another editor reverted that correction and I took their word for it. Sorry to bother you, and thank you so much for your assistance. --Gronk Oz (talk) 15:09, 25 March 2017 (UTC)
Semi-protected edit request on 26 March 2017
This edit request to Wikipedia:AutoWikiBrowser/Typos has been answered. Set the |answered= or |ans= parameter to no to reactivate your request. |
metre WilliamLucking (talk) 01:55, 26 March 2017 (UTC)
- Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format. — IVORK Discuss 03:46, 26 March 2017 (UTC)
- Is this a request to change metre to meter, or vice versa? We shouldn't ask AWB to do that, because either spelling can be correct, depending on context. Certes (talk) 17:07, 26 March 2017 (UTC)
Sorry, new to this... Three XML errors found with an XML parser-
- word=Milli- SI prefix - word boundary token outside of the quotes, should be inside
- word=Eur(asia/ope) - the attribute name replace needs a space before it
- word=Springfield - an errant backslash appears after the tag — Preceding unsigned comment added by WilliamLucking (talk • contribs) 15:22, 27 March 2017 (UTC)
- Done Thanks for the great catches! Stevie is the man! Talk • Work 15:42, 27 March 2017 (UTC)
- Greetings again, I have compared the list from https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings against this Typos project and the Typos regex entries will correct about 93 - 94% of misspellings appearing on the Common Misspellings entry. Do you have an interest in knowing the missing words or is the Common Misspellings entry considered non-authoritative? — Preceding unsigned comment added by WilliamLucking (talk • contribs) 18:56, 27 March 2017 (UTC)
- @WilliamLucking: Since you've done the work, it would be interesting to see how
WP:LCMWP:AWB/T could be expanded. The two lists have different strengths and weaknesses; I wouldn't say that either is "authoritative". A typo can only be listed in WP:AWB/T if there are very few false positives and only one sensible fix; and these fixes are not run inside references or anywhere that might be a quotation. Whereas the searches in WP:LCM are used by editors who should be careful enough to skip the correct uses [proper names, foreign words, archaic or informal quotations, ...] or to choose the best fix depending on the context [beared > bared, bearded, bore, borne]. -- John of Reading (talk) 20:33, 27 March 2017 (UTC) link corrected John of Reading (talk) 06:16, 28 March 2017 (UTC)- Thank you for the explanation as to the differences and the nuances of these correction techniques. — Preceding unsigned comment added by WilliamLucking (talk • contribs) 21:05, 27 March 2017 (UTC)
- To add to JofR's answer, we have typos that are not just pure misspellings, such as comma spacing and adjective hyphenations, and normally when we add a fix of any kind, we check to see if there are actually a significant amount of the typos in the Wikipedia to correct. Also, not all misspellings are simple to fix, as many depend on context, like the "a part (not apart) of" fix I just added. LCM may give us some good ideas for expansion, but I think we would need to see which misspellings are actually very common, like if they appear 20 or more times (my rule of thumb). It's not the best use of development time to create a correction for a misspelling that only appears a few times at most. By the way, please sign your comments with ~~~~. Stevie is the man! Talk • Work 22:00, 27 March 2017 (UTC)
- @WilliamLucking: Since you've done the work, it would be interesting to see how
- Greetings again, I have compared the list from https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings against this Typos project and the Typos regex entries will correct about 93 - 94% of misspellings appearing on the Common Misspellings entry. Do you have an interest in knowing the missing words or is the Common Misspellings entry considered non-authoritative? — Preceding unsigned comment added by WilliamLucking (talk • contribs) 18:56, 27 March 2017 (UTC)
Using Typo Lists on Other Wikis
So I use AWB for the Little Shop Of Horrors wiki, and It says: No Typos found for littleshop.wikia.com/wiki/Little Shop Of Horrors Wiki:AutoWikiBrowser/Typos. Should I make that a page so I can have specific Typos that pertain to my Wiki? And If so, how would I make said list (Like: Audrey 2 -> Audrey II, Mushnick -> Mushnik)FiveCraft (talk) 20:59, 4 April 2017 (UTC)
Sciurus niger
Sciurus niger
should not be corrected to Sciurus Niger
. Any way to add this as an exception to the Niger-rule? (t) Josve05a (c) 15:02, 16 April 2017 (UTC)
- @Josve05a: there is already a false positive detection for this rule that deals with this situation, but occasionally it doesn't work because it depends on the scientific name being italicized and previous such italicizing being in balance. If you could give me a link to the article where this correction was attempted, I will try to fix it so it won't happen again. Stevie is the man! Talk • Work 15:45, 16 April 2017 (UTC)
- @Stevietheman: Please see Special:Diff/775698089&oldid=775417128 (please disregard my misclick on the revert button on later edits, I ment to click "Thank") (t) Josve05a (c) 16:03, 16 April 2017 (UTC)
- @Josve05a: your putting the term in italics seems to have fixed it. Stevie is the man! Talk • Work 21:01, 16 April 2017 (UTC)
- @Stevietheman: Please see Special:Diff/775698089&oldid=775417128 (please disregard my misclick on the revert button on later edits, I ment to click "Thank") (t) Josve05a (c) 16:03, 16 April 2017 (UTC)
- "niger" is also used correctly in other cases, e.g. a caption in RYB color model. This may be so rare that we can ignore it. Certes (talk) 14:19, 17 April 2017 (UTC)
Using the AWB typo-list
I am running AWB on Norwegian WP, using the typo-list for Norwegian bokmål correcting errors. I also have a "private" list of typos in my Find and replace - Normal settings. My experience is that AWB will not stop, and do corrections from the common AWB-typo list unless it also finds a word to correct from my personal settings list. Is it possible to make AWB do corrections, even if the corrections are rules taken only from the AWB-typo list? To explain in a better way; Let's say I got a "personal list" in my personal settings that is set up to correct, hypotetically, Norvay to Norway and Sveden to Sweden, but I also have my AWB checked to make corrections from the common AWB-typo list, consisting of 300+ misspellings. My problem is... My AWB only makes a stop if it finds the word 'Norvay' or 'Sveden' in an article. If it then also finds any errors matching with the common typo list, it corrects those errors too. But let's say there's an article only containing an error from the common AWB typo list... then my AWB does not stop up to correct this, unless the article also includes an error to be corrected from my personal list of typos... in my exaple, this article also has to include the word Sveden or Norvay. --TorbjørnS-AWB (talk) 17:34, 11 May 2017 (UTC)
- @TorbjørnS-AWB: This is controlled by the tick boxes on the "Options" tab. You should un-tick "Skip if no replacement" and un-tick "Skip if no typo fixed". Then AWB will always try to run your Find+Replace rules AND the standard set of typo fixes. You may want to visit the "Skip" tab as well, and tick "Skip if no changes are made". Then if none of your rules make a change AND none of the typo fixes make a change, AWB will go straight on the next article. -- John of Reading (talk) 17:43, 11 May 2017 (UTC)
- Thanks a lot for your answer. I will try this, and see how it works out with this new settings. --TorbjørnS-AWB (talk) 18:33, 11 May 2017 (UTC)
- @John of Reading: Seems like it works just the way I wanted it to work when I changed those settings. --TorbjørnS-AWB (talk) 18:40, 11 May 2017 (UTC)
- Thanks a lot for your answer. I will try this, and see how it works out with this new settings. --TorbjørnS-AWB (talk) 18:33, 11 May 2017 (UTC)
n-story
- . Chris the speller yack 16:23, 20 June 2017 (UTC)Resolved– figures over 3 characters long will be avoided
@Chris the speller: The new rule wants to damage "in a 2012 story." at Cato Institute. Perhaps add a test on the number of digits? -- John of Reading (talk) 08:55, 20 June 2017 (UTC)
- You are absolutely right. That's what I'll do. Chris the speller yack 13:35, 20 June 2017 (UTC)
@Stevietheman: Thanks for adding stor(e)y to the general case, but doesn't the addition of "store?y" to the part of the expression without the four digit check catch "in a 2012 story"? My apologies if I missed a subtlety which cleverly avoids that problem. Certes (talk) 18:53, 12 July 2017 (UTC)
- @Certes: In my change, I added false-positive code to prevent correction of "in a 2012 story". "A n-something" already had constructs for making this easy to do. If this is failing, please let me know. Stevie is the man! Talk • Work 19:01, 12 July 2017 (UTC)
- (ec) Yes I just realised that, thanks and sorry for bothering you. Certes (talk) 19:02, 12 July 2017 (UTC)
third season finale
Hello typo fixers! Would like some input on the "third season finale" rule. AWB's typo fixer will change it to "third-season finale" on the Agents of S.H.I.E.L.D. article, and Adamstom.97 reverted it in this edit. Thanks! GoingBatty (talk) 22:36, 4 July 2017 (UTC)
- @GoingBatty: The typo fix you made appears correct to me (obvious compound adjective), and it appears another editor made the same fix and it's sticking so far. Stevie is the man! Talk • Work 18:32, 12 July 2017 (UTC)
- @GoingBatty: I'd actually read a difference in meaning there. "Third-season finale" means the finale of the third season but "third season finale" could have a different meaning (perhaps there was no finale for the first season, so the "third season finale" took place at the end of the fourth season). SchreiberBike | ⌨ 02:28, 17 July 2017 (UTC)
- While the latter approach is possible, how frequent would it occur? {{Not a typo}} can be used for these instances. Stevie is the man! Talk • Work 15:00, 5 September 2017 (UTC)
- @GoingBatty: I'd actually read a difference in meaning there. "Third-season finale" means the finale of the third season but "third season finale" could have a different meaning (perhaps there was no finale for the first season, so the "third season finale" took place at the end of the fourth season). SchreiberBike | ⌨ 02:28, 17 July 2017 (UTC)
qualitly → qualitely?
I just had AWB correct this wrong word to another wrong word. Qualitely isn't a word. --Jennica✿ / talk 05:27, 18 July 2017 (UTC)
- @Jennica: This incorrect fix has been fixed. Was it supposed to be 'quality'? (a link to the article in question would help) I might be able to add a fix that would cover it. Stevie is the man! Talk • Work 16:15, 5 September 2017 (UTC)
- Yeah, it was supposed to be quality. It was so long ago I don't remember what article it was. --Jennica✿ / talk 16:41, 5 September 2017 (UTC)
- OK, thanks. I will indefinitely hold off working on a new fix since I can't see the context and a search for 'qualitly' currently turns up nothing. Stevie is the man! Talk • Work 16:44, 5 September 2017 (UTC)
- Yeah, it was supposed to be quality. It was so long ago I don't remember what article it was. --Jennica✿ / talk 16:41, 5 September 2017 (UTC)
welshing or Welshing?
In History of the National Crime Syndicate, the typo rules want to change "welshing" to "Welshing" in the sentence "Underworld folklore states that he was shot for welshing on a gambling debt...". What's correct: lowercase or uppercase? Thanks! GoingBatty (talk) 21:07, 17 September 2017 (UTC)
- A quick look at on-line dictionaries and my Merriam Webster's shows lower case for that meaning. SchreiberBike | ⌨
Fiance / Fiancee
Fiance -> Fiancé, as well as possibly Fiancee --> fiancée.
Would it make sense to add a rule for these rules with accent for this word? Including with the accent seems to be the generally accepted practice. Shaded0 (talk) 18:35, 15 September 2017 (UTC)
- Also - I think this rule might get conflicts(?) with the Finance typo rule:
- <Typo word="Fiancé" find="\b([Ff])iance(e?)\b(?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})" replace="$1iancé$2"/>
- <Typo word="Finance" find="\b([Ff])ia?(?:ni?an[ai]?n?|na?)c(e[ds]?|ing|ially|ials?)(?<!iance|inanc(?:e[ds]?|ing|ially|ials?))\b" replace="$1inanc$2"/>
year round → year-round
AWB Typos automatically changes "year round" to "year-round". I found that Merriam-Webster says the phrase should be hyphenated, but in its entry for "all year round" it does not. Most other sources only use the hyphen when it is a compound adjective describing the noun it precedes: "It is a year-round school" but "The school is open year round". I think the rule should be removed from automatic typo fixes. Thanks, SchreiberBike | ⌨ 02:50, 17 July 2017 (UTC)
- Any thoughts? I could remove it from the list, but I don't know enough regex to be sure I'm not messing things up. SchreiberBike | ⌨ 05:02, 27 July 2017 (UTC)
- The rules have inappropriately added a hyphen to the following sentences:
- "Adults have been recorded on wing nearly year-round in the southern part of the range."
- "Adults are on wing year-round in the southern part of the range."
- "Adults have been recorded on wing year-round."
- If I remove the text below, will that solve the problem without messing anything up?
<Typo word="year-round" find="\byear\s+round\b(?<=\b(?:[Aa]|[Aa]ctive|[Aa]lmost|[Aa]nd|are|[Aa]vailable|[Ff]or|[Ff]ound|[Ll]ive[ds]?|[Mm]aintained|[Nn]early|[Oo]ccurs?|[Oo]f|[Oo]ffers|[Oo]pen|[Oo]perate[ds]?|[Pp]recipitation|[Pp]rovide[ds]?|[Pp]ublic|[Rr]ainfall|[Rr]esident|there|[Tt]o|[Uu]sed|[Ww]eather|wing|[Ww]ith)\s+year\s+round)" replace="year-round"/>
- I'm not competent in regex, but it looks like I could also just remove
|[Nn]early
and|wing
to solve those specific problems. I'm not sure which is better - Thank you. SchreiberBike | ⌨ 22:46, 17 August 2017 (UTC)
- If there's no objection, after a couple of days I will remove the code above. As I am not competent as a programmer, I hope someone will warn me if that will cause problems. Thank you. SchreiberBike | ⌨ 04:18, 2 September 2017 (UTC)
- I took it out. SchreiberBike | ⌨ 21:49, 4 September 2017 (UTC)
- If there's no objection, after a couple of days I will remove the code above. As I am not competent as a programmer, I hope someone will warn me if that will cause problems. Thank you. SchreiberBike | ⌨ 04:18, 2 September 2017 (UTC)
- The rules have inappropriately added a hyphen to the following sentences:
- I have restored the rule because there really hasn't been proper discussion leading to its wholesale removal (speaking for myself as a significant typos editor, I'm on a long break and didn't know this was being discussed). Do you have links to where this typo fix erred? We generally start from that in such discussions. At any rate, if the rule has caused an error, usually it can be fixed rather than removed. Stevie is the man! Talk • Work 15:11, 5 September 2017 (UTC)
- @SchreiberBike: (sorry I didn't ping above) Links where the errors occurred would help me better understand the context of sentences you show above. Stevie is the man! Talk • Work 16:18, 5 September 2017 (UTC)
- @Stevietheman: Thanks for your attention to this. It is difficult to make a list of errors caused by the program, because generally I catch them and fix them before saving. Usually I also remember to remove the text from the edit summary, so I can only make a list of places where I failed to do those things. Based on an edit summary search for "year round", it tried to do it here, here and here, but I undid it and failed to remove the edit summary. It did it here with the phrase "Adults have been recorded on wing nearly year-round in Florida", here with the phrase "Adults have been recorded nearly year-round in Florida", here with the phrase "Adults have been recorded on wing nearly year-round" and here with the phrase "Adults have been recorded on wing year-round".
I ran AWB on the phrases below to test:
* 18 hits for "Adults are on wing nearly year round". AWB adds the hyphen in every case where it is not already there.
* 53 hits for "Adults have been recorded on wing year round". I checked a sample and AWB adds the hyphen in every case where it is not already there.
* 3 hits for "Adults are on wing year round in the southern part of the range". AWB adds the hyphen in every case.
I hope that makes sense. SchreiberBike | ⌨ 20:12, 5 September 2017 (UTC)
- @Stevietheman and I dream of horses: Any other ideas here? Earlier today I saw and fixed this change. Is there an existing consensus that this is a good idea? Should we seek other opinions as to whether or not the hyphen is appropriate? So far, no one has said that this rule is helpful, just that there's been little discussion about its removal. SchreiberBike | ⌨ 23:24, 16 September 2017 (UTC)
- @SchreiberBike: I don't have an opinion one way or the other.
- You could do like I've done in similar situations, and surround word(s) with {{not a typo}}, which is designed for this situation (that is, something that is a typo sometimes but not all the time). I dream of horses If you reply here, please ping me by adding {{U|I dream of horses}} to your message (talk to me) (My edits) @ 01:19, 17 September 2017 (UTC)
- Is there any objection if I remove
|[Nn]early
and|wing
to solve those specific problems? As I said above, I am not competent as a programmer and I really don't understand regex enough to be editing it. Thanks, SchreiberBike | ⌨ 04:13, 29 September 2017 (UTC)- I just ran into this problem again and I removed
|[Nn]early
and|wing
as described above. SchreiberBike | ⌨ 19:43, 4 October 2017 (UTC)
- I just ran into this problem again and I removed
- @Stevietheman and I dream of horses: Any other ideas here? Earlier today I saw and fixed this change. Is there an existing consensus that this is a good idea? Should we seek other opinions as to whether or not the hyphen is appropriate? So far, no one has said that this rule is helpful, just that there's been little discussion about its removal. SchreiberBike | ⌨ 23:24, 16 September 2017 (UTC)
- @Stevietheman: Thanks for your attention to this. It is difficult to make a list of errors caused by the program, because generally I catch them and fix them before saving. Usually I also remember to remove the text from the edit summary, so I can only make a list of places where I failed to do those things. Based on an edit summary search for "year round", it tried to do it here, here and here, but I undid it and failed to remove the edit summary. It did it here with the phrase "Adults have been recorded on wing nearly year-round in Florida", here with the phrase "Adults have been recorded nearly year-round in Florida", here with the phrase "Adults have been recorded on wing nearly year-round" and here with the phrase "Adults have been recorded on wing year-round".
Can't find: Puting → Putting
I'm trying to find this rule so that it can avoid "Isla Puting Bato". Can anyone help? ~ Tom.Reding (talk ⋅dgaf) 15:05, 18 January 2018 (UTC)
- @Tom.Reding: The rule is named "-tting". To find this, I got AWB to look at the few pages containing "Isla Puting Bato" and found that the typo-fixer wanted to change Archaeology of the Philippines. Then at the bottom right I clicked on the "Typos" tab to see the "Find" and "Replace" regexes of the rule that fired. Then I searched for that "Find" string in the list of typo rules to find the name of the rule. -- John of Reading (talk) 15:13, 18 January 2018 (UTC)
- Oooh, I've never used that tab before! Thanks John of Reading! ~ Tom.Reding (talk ⋅dgaf) 15:37, 18 January 2018 (UTC)
Should we also avoid Tanjung Puting? Certes (talk) 15:54, 18 January 2018 (UTC)
- Certes, yes; Done. ~ Tom.Reding (talk ⋅dgaf) 15:59, 18 January 2018 (UTC)
Error in "individuals" correction
I can't tell whether it's useful to report incorrect corrections (or if a failed correction is still a success because it brought the typo to a human AWB user's attention), but if it's worth reporting it, AWB has just suggested replacing the small typo "indiviluals" with the slightly more wrong "individualals". --Lord Belbury (talk) 18:19, 10 April 2018 (UTC)
- Lord Belbury, on what page? ~ Tom.Reding (talk ⋅dgaf) 18:34, 10 April 2018 (UTC)
- @Tom.Reding: You can try this using the regex tester. The relevant rule is named
Individual*
, but I can't immediately see how to fix it without making it much less "fuzzy". -- John of Reading (talk) 18:42, 10 April 2018 (UTC) - It was this version of "Indian Students' Union and Hostel", I've since fixed it. --Lord Belbury (talk) 18:43, 10 April 2018 (UTC)
- @Tom.Reding: You can try this using the regex tester. The relevant rule is named
Assesor
I think we should add Assesor as a typo for Assessor. I may fix the couple of dozen currently around, but for the future it would be worth adding. ϢereSpielChequers 23:47, 7 March 2018 (UTC)
A n-something
Could someone please update the "A n-something" rule to also fix "n game suspension" (e.g. BALCO scandal has the phrase "4 game suspension") and "n game sweep" (e.g. Lou Piniella has the phrase "4 game sweep")? Thanks! GoingBatty (talk) 02:26, 30 October 2017 (UTC)
- GoingBatty, added, but to rule "n-something contract/deal/run/etc." instead, which was more appropriate. ~ Tom.Reding (talk ⋅dgaf) 20:19, 10 April 2018 (UTC)
Proposed modification to the Coca-Cola regex
I have a minor improvement to the Coca-Cola regex, and would like some feedback if possible before implementing it:
Current Regex
<Typo word="Coca-Cola" find="\b[Cc]oca(\s|-)?[Cc]ola\b" replace="Coca-Cola"/>
Proposed New Regex
<Typo word="Coca-Cola" find="\b[Cc]o(ke|ca)(\s|-)?[Cc]ola\b" replace="Coca-Cola"/>
This would not only catch incorrect title-casing and a missing hyphen (current behavior), but also the somewhat infrequent misspelling of "Coke-Cola". As far as I can see, the only instance of a false positive that this would cause in the main article namespace would be on Company Names Tribunal, where a company was previously named "Coke Cola Limited". This could easily be fixed with a {{Not a typo}} tag. Phuzion (talk) 04:30, 1 December 2017 (UTC)
linguistic cited forms and rules like "i.e."
I'm surely bringing this up in the wrong place, since it's not the sort of thing that can be set right just by changing regular expressions, but it does relate to typos, so here goes. I am getting tired of fixing AWB's wrongly inserted "i.e." and "e.g." in articles on individual languages which are trying to discuss the diphthong ie or a spelling rule regarding the letters eg or whatnot. I'd love it if rules like these could be disabled in philological articles... (However you might detect that: being in a category whose name contains "language"?) 4pq1injbok (talk) 18:55, 15 December 2017 (UTC)
- 4pq1injbok, could you link any examples? Perhaps there is some common-ish surrounding text that can used to ignore these cases. ~ Tom.Reding (talk ⋅dgaf) 19:41, 10 April 2018 (UTC)
Typo McDonald's
Currently: <Typo word="McDonald's" find="\bM[Cc][Dd]onalds\b" replace="McDonald's"/>
.
- this would appear to target the restaurant chain. However, it will generate false positives for couples named "McDonald" referred to in the plural: "The McDonalds moved to New York..."[a]
- if intended to target the restaurant chain, the typo name should be changed to make that clearer
- The find "\bMa?[Cc][Dd]onalds" would match a common misspelling, but only with " restaurant\b" or " fries\b" or similar follow-up word(s), otherwise will have too many false positives. Mathglot (talk) 01:59, 31 March 2018 (UTC)
Notes
- ^ Examples: McDonald family, Duncan v McDonald, Jack McDonald (musician); however this is probably a pretty small false positive list in total, and is probably dwarfed by the number of good hits so could be weeded out manually.
- Mathglot, added some; feel free to add more. ~ Tom.Reding (talk ⋅dgaf) 19:48, 10 April 2018 (UTC)
Ciruit
Ciruits seem popular with judges, racing drivers and electronics enthusiasts alike. Should they be added to the existing Circuit typo case? Certes (talk) 16:12, 12 April 2018 (UTC)
- Certes, done! ~ Tom.Reding (talk ⋅dgaf) 16:24, 12 April 2018 (UTC)
- Thanks! And yes, I think Ciruit de la sarthe is a redirect from misspelling. Certes (talk) 16:36, 12 April 2018 (UTC)
Battalion and Gendarme
Common misspellings of the words battalion (batallion, battallion, batalion) and gendarme (gendarm) keep popping up.--Catlemur (talk) 10:26, 26 May 2018 (UTC)
- @Catlemur: There is already a rule for "battalion" that seems to cover the misspellings you provided. Are you seeing articles where these are not being fixed?
Also, I don't see any instances of "gendarm" in Wikipedia.GoingBatty (talk) 16:46, 19 June 2018 (UTC) - @Catlemur: "Gendarm" seems to be an acceptable spelling, per Wikitionary. GoingBatty (talk) 16:50, 19 June 2018 (UTC)
- I went over the "battalion" ones half a year ago fixing quite a few.--Catlemur (talk) 16:51, 19 June 2018 (UTC)
- Gendarm might be okay in German but definitely not in English. Mathglot (talk) 18:09, 19 June 2018 (UTC)
- I went over the "battalion" ones half a year ago fixing quite a few.--Catlemur (talk) 16:51, 19 June 2018 (UTC)
Bandmate rule
Using the "Bandmate/Roommate/Teammate" rule, which has a comment stating "none of these spaced or hyphenated", I changed "band mates" to "bandmates" on Genesis (band). @Ritchie333: then reverted my edit stating that "band mates" is not a typo. Should this rule be reviewed?
- Also pinging @Stevietheman: who added the rule. GoingBatty (talk) 16:43, 19 June 2018 (UTC)
- Pinging @Ritchie333: and @Stevietheman: again to discuss. GoingBatty (talk) 03:29, 25 July 2018 (UTC)
- Which word do you think is misspelled? Ritchie333 (talk) (cont) 08:50, 25 July 2018 (UTC)
Mantian
Wikipedia mentions the Chinese names "Ni Mantian" and "Feng Mantian". Can these be exceptions to the mantain typo fix? Certes (talk) 00:30, 30 July 2018 (UTC)
Lieutenant
An editor has reported and fixed misspellings such as Lieutenatn, lieutennat, lieutennat, lietuenant and Luietenant. Can the existing pattern
- <Typo word="Lieutenant" find="\b([Ll])[ieu]{2,3}t[ae]{1,2}nt?[ae]{1,2}(?<![Ll]ieutena)n(ts?|cy)\b" replace="$1ieutenan$2"/>
be expanded safely to include those? Certes (talk) 11:07, 6 August 2018 (UTC)
More misspellings
I noticed a lot of the misspelling from all the list in Wikipedia:Lists_of_common_misspellings (0–9, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, Wikipedia:Lists of common misspellings/Repetitions and Wikipedia:Lists of common misspellings/Grammar and miscellaneous) aren't listed. There's a lot of them and I'm not really good at adding misspelling so maybe something could be arrange to add them? – BrandonXLF (t@lk) 16:31, 8 September 2018 (UTC)
- @BrandonXLF: Many of the entries at Wikipedia:Lists of common misspellings are too subtle to be handled by regular expressions. Some of the words listed there are correct in some contexts and incorrect in others, or have more than one possible fix. -- John of Reading (talk) 06:43, 12 September 2018 (UTC)
- @John of Reading: I'm just saying there's some good ones like under Wikipedia:Lists of common misspellings/Grammar and miscellaneous there's a lot like switching an European to a European, I'm assuming those would be easy to add. – BrandonXLF (t@lk) 12:12, 12 September 2018 (UTC)
- @BrandonXLF: The rule labelled
word="A …"
already corrects "an European". But yes, if a typo is common enough to be worth the extra overhead (20 or more occurrences, maybe?) and has no false positives, it can be added fairly easily. -- John of Reading (talk) 13:17, 12 September 2018 (UTC)
- @BrandonXLF: The rule labelled
- @John of Reading: I'm just saying there's some good ones like under Wikipedia:Lists of common misspellings/Grammar and miscellaneous there's a lot like switching an European to a European, I'm assuming those would be easy to add. – BrandonXLF (t@lk) 12:12, 12 September 2018 (UTC)
Awhile
Using AWB it has tried to convert awhile in to a while, I can see that some awhiles may be typos, but probably not enough to justify having this in AWB. ϢereSpielChequers 11:08, 15 September 2018 (UTC)
SI units and word breaks
In Nora Neset Gjøen, <Typo word="J (joule)" find="([\d\.]+(?:\s| |-)?[µmkMGT])j\b" replace="$1J"/>
wants to capitalise the J in At the end of 2013 Gjøen signed…
I assume that the middle of jø
matches \b
, because ø is the wrong sort of letter, and she seems to be generating 2,013 Gigajoules. It may be too rare an error to bother fixing, but something to beware of. Certes (talk) 13:12, 16 September 2018 (UTC)
Apostrophe S
@Smasongarrison: This edit to the "apostrophe S" punctuation rule is correct according to MOS:PUNC. However, it may be counterproductive to include this fix in WP:AWB/T, as it operates so frequently that it makes diffs hard to check. I have the "regex typo fixing" checkbox unticked at present so that I can see and check the results of my own spelling rules. -- John of Reading (talk) 15:57, 10 October 2018 (UTC)
- Yeah, I was worried about that too. Not sure what to do about it though. Smasongarrison (talk) 17:58, 10 October 2018 (UTC)
- Can we get an estimate of how many mainspace pages would cause this rule to fire? If many, I volunteer to help reduce that #, so we can get down to a more manageable count, as the rule appears useful. WP:TYPO would want to help too, I image. ~ Tom.Reding (talk ⋅dgaf) 18:23, 10 October 2018 (UTC)
- @Tom.Reding: I ran the first 1% of a database scan, and found 3,000. So something like 300,000 articles. <gulp> -- John of Reading (talk) 19:02, 10 October 2018 (UTC)
- So, about one edit in 20 will have a 's correction. That may be a hit worth taking to fix such a widespread problem. Is it worth including the ʻokina as an apostrophe-like character? I think (but I may be wrong) that it only appears legitimately before a vowel. It is abused in several articles, e.g. Halawa, Hawaii has
womenʻs temples
. Is it also worth adding capital S to the search to fix articles such as The Paz Show withLET`S CALL THE WHOLE THING OFF
? Certes (talk) 23:20, 10 October 2018 (UTC) - @John of Reading: time to put my money where my mouth is... I just finished downloading & unpacking the latest dump, and ran this myself to get a full list of pages. "Only" 30,085 pages found breaking that 1 rule (ignoring redirects and comments). Kind of reminds me of when Trappist uncovered many more pages with various citation maintenance required... Took a while to whittle down, but I'm glad he did! ~ Tom.Reding (talk ⋅dgaf) 02:34, 11 October 2018 (UTC)
- @Tom.Reding: Just a thought: did you remember to increase the "Limit list to" figure on the "Searching" tab of the database scanner? -- John of Reading (talk) 06:56, 15 October 2018 (UTC)
- Oh god... ~ Tom.Reding (talk ⋅dgaf) 12:19, 15 October 2018 (UTC)
- I've just come across this same issue. Unless you guys have been extraordinarily productive it doesn't look that bad. Though I am using a slightly out of date dump, I'm seeing 9000 total typo issues in the first 30% of the dump. (And about 25% of these are fixed.) All the best: Rich Farmbrough, 11:06, 6 December 2018 (UTC).
- I've just come across this same issue. Unless you guys have been extraordinarily productive it doesn't look that bad. Though I am using a slightly out of date dump, I'm seeing 9000 total typo issues in the first 30% of the dump. (And about 25% of these are fixed.) All the best: Rich Farmbrough, 11:06, 6 December 2018 (UTC).
- Oh god... ~ Tom.Reding (talk ⋅dgaf) 12:19, 15 October 2018 (UTC)
- @Tom.Reding: Just a thought: did you remember to increase the "Limit list to" figure on the "Searching" tab of the database scanner? -- John of Reading (talk) 06:56, 15 October 2018 (UTC)
- So, about one edit in 20 will have a 's correction. That may be a hit worth taking to fix such a widespread problem. Is it worth including the ʻokina as an apostrophe-like character? I think (but I may be wrong) that it only appears legitimately before a vowel. It is abused in several articles, e.g. Halawa, Hawaii has
- @Tom.Reding: I ran the first 1% of a database scan, and found 3,000. So something like 300,000 articles. <gulp> -- John of Reading (talk) 19:02, 10 October 2018 (UTC)
collpase->collapse
I fixed over 200 articles yesterday, using AWB. I think it should be monitored regularly. Thanks.Uziel302 (talk) 06:39, 16 December 2018 (UTC)
- Added by myself based on collaspe-collapse that appeared. Uziel302 (talk) 00:43, 17 December 2018 (UTC)
False positive: wont
Could someone please change the "-n't" rule in the Contractions section so it will not try to change correct instances of "wont"? (e.g. A Day in the Life of a Tree) Thanks! GoingBatty (talk) 14:05, 23 December 2018 (UTC)
- @GoingBatty and Smasongarrison: I've tweaked the rule. It was also damaging "cant", a valid spelling sometimes. -- John of Reading (talk) 15:08, 23 December 2018 (UTC)
- Thanks! Smasongarrison (talk) 16:16, 23 December 2018 (UTC)
New rule: United State's → United States'
I've tried adding a new rule to change United State's → United States'
, but can't get it to pick up typos in these articles using AWB or WPCleaner. What am I doing wrong? Thanks! GoingBatty (talk) 17:41, 24 December 2018 (UTC)
- @GoingBatty: Fixed - you had a typo in "Typo"! Certes (talk) 17:48, 24 December 2018 (UTC)
- @Certes: Facepalm Thanks! Guess I need to lay off the egg nog! GoingBatty (talk) 18:07, 24 December 2018 (UTC)
Parc
The following rule seems to have the effect of changing the perfectly proper french word 'parc' into the nonsense 'pharmac'...
<Typo word="Pharmacy" find="\b([pP])h?a(?:m[mr]*|r(?:am|[mr]*))[aei]?c(?<![pP]harmac)(|eutic[a-z]+|i(?:es|sts?)|o(?:log(?:i[cs][a-z]+|y)|p[aeio]+l?|thera[a-z]+)|y)\b" replace="$1harmac$2"/>
Sorry, I'm not an AWB user; perhaps someone can stop it from making the change unlesss there is at least one letter after 'parc'. Imaginatorium (talk) 09:23, 2 January 2019 (UTC)
- We should probably remove the "|" in "|eutic", to limit the change to words which have further letters after the variant of "pharmac". Certes (talk) 10:34, 2 January 2019 (UTC)
- I agree, and Done -- John of Reading (talk) 12:22, 2 January 2019 (UTC)
Thanks for the rapid response. I have looked at this a bit more, though, and really wonder if this is useful. Obviously it corrects some misspellings of "pharma__", but I can't honestly imagine any common misspelling, and the regexp looks far too powerful, so I can imagine erroneous mappings being much more common. (For example, how about "Parma"?) What is the procedure for assessing the error rate? Imaginatorium (talk) 08:40, 5 January 2019 (UTC)
Add efficiency improvements to doc?
Smasongarrison, thank you for your efficiency improvements. Would you be willing to put basic examples of what's-faster-than-what in WP:AWB/RegEx (perhaps under #Tips and tricks)? ~ Tom.Reding (talk ⋅dgaf) 02:12, 12 January 2019 (UTC)
- Sure, I'd be happy to do that. Let me think about how I can go about it systematically. Smasongarrison (talk) 20:18, 12 January 2019 (UTC)
- Starting AWB and activating Regex typos, I get an error mssage re "Mbit" ("too many )"). Smasongarrison -DePiep (talk) 16:22, 15 January 2019 (UTC)
- thanks for letting me know. I'll fix that asap! Smasongarrison (talk) 16:33, 15 January 2019 (UTC)
- Starting AWB and activating Regex typos, I get an error mssage re "Mbit" ("too many )"). Smasongarrison -DePiep (talk) 16:22, 15 January 2019 (UTC)
Women's and children's
Could one or both of the "Men's" rules be safely extended to cover "Women's" too? Certes (talk) 19:28, 19 January 2019 (UTC)
The "Children's" rule may need to be a bit tighter, as childrenswear is being changed to children'swear. Certes (talk) 19:28, 19 January 2019 (UTC)
- @Certes: I've fixed the "Children's" rule. It would be hard to merge the rules; "mens" is tricky because of mens rea and similar Latin phrases. -- John of Reading (talk) 19:47, 19 January 2019 (UTC)
- Thanks. What I meant was simply extending "Men's" to include "Women's" so it will fix "womens basketball", etc. The combined rule would exclude "womens rea", but fortunately that nonsensical phrase is rare. Or is there already a separate "Women's" rule that I missed? Certes (talk) 20:41, 19 January 2019 (UTC)
- @Certes: (After a quick test) The rule named
-men's
corrects various misspellings of "women's" -- John of Reading (talk) 21:07, 19 January 2019 (UTC)- Great. I only looked at "Men's" and overlooked "-men's". Sorry for bothering you. Certes (talk) 21:21, 19 January 2019 (UTC)
- @Certes: Have you spotted how to find out which typo rule is making a change? View the "Typos" tab at the bottom right of the AWB window, and the "On this page" list tells you the "find" and "replace" strings of the relevant rule(s). Without this it's nearly impossible! -- John of Reading (talk) 21:36, 19 January 2019 (UTC)
- @John of Reading: Thanks but I'm using JWB which doesn't have that feature, and yes it's tedious pinning down which rule applies. Certes (talk) 21:38, 19 January 2019 (UTC)
- @Certes: Have you spotted how to find out which typo rule is making a change? View the "Typos" tab at the bottom right of the AWB window, and the "On this page" list tells you the "find" and "replace" strings of the relevant rule(s). Without this it's nearly impossible! -- John of Reading (talk) 21:36, 19 January 2019 (UTC)
- Great. I only looked at "Men's" and overlooked "-men's". Sorry for bothering you. Certes (talk) 21:21, 19 January 2019 (UTC)
- @Certes: (After a quick test) The rule named
- Thanks. What I meant was simply extending "Men's" to include "Women's" so it will fix "womens basketball", etc. The combined rule would exclude "womens rea", but fortunately that nonsensical phrase is rare. Or is there already a separate "Women's" rule that I missed? Certes (talk) 20:41, 19 January 2019 (UTC)
Gbit rule improperly changes "GB/sec"
@Smasongarrison: The change made on January 15 needs to be adjusted. In IBM Blue Gene it got me into trouble; I have been happily watching AWB change "Gb/sec" to "Gbit/sec" for years, and did not notice that this time (using a different rule) it was changing "GB/sec" (gigabytes, not gigabits). I received a lecture about the difference between bytes and bits – hard to take, being a longtime systems engineer. Chris the speller yack 15:59, 1 February 2019 (UTC)
- While we're looking at that rule: is the "|㎇" at the end helpful. I don't think that Unicode character means "gigabits per second", and due to \b it would only match if followed by letters (1.23 ㎇xyz) anyway. Certes (talk) 17:08, 1 February 2019 (UTC)
- Done. @Chris the speller and Certes: references to Gbytes removed, but the whole ensemble of "Xbit" rules needs a revamp. ~ Tom.Reding (talk ⋅dgaf) 00:17, 2 February 2019 (UTC)
(Un)Successful fuzzy rule is unsuccessful
Could someone please review the "(Un)Successful" fuzzy rule? It will change "succesfull" → "successfull", and then depend on the "-ful" rule to change "successfull" → "successful". (e.g. Wayne Clarke (footballer)) Thanks! GoingBatty (talk) 02:18, 9 February 2019 (UTC)
- I'm skeptical about fuzzy rules, only due to the guidance given on the rule page "Do not expect rules to be applied in the order they appear", so self-contained rules should be the goal. ~ Tom.Reding (talk ⋅dgaf) 14:13, 9 February 2019 (UTC)
Domain & URL look-arounds no longer appear necessary
Testing the "Improv(e/ise)" rule (find="\b([iI])mp(?:or|re)v([a-z]+)\b" replace="$1mprov$2"
), I inserted into my sandbox
- "Imporve" as plain text
- "g/Imporve/s" buried in a URL,
- "m.Imporve.R" buried in a URL,
- "e.Imporve.I" at the beginning of a URL,
and only the plain text version was fixed. I suspect the domain/URL look-arounds (?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})
were incorporated into the surrounding error prevention scaffolding in AWB. Also, there are numerous rules without these checks, and I've never had to add them - I've only added them when I saw them in surrounding rules. I'll remove them (carefully) in a day or so if no issues. ~ Tom.Reding (talk ⋅dgaf) 22:02, 31 January 2019 (UTC)
- @Tom.Reding: Make sure you've had feedback from someone using JWB. AWB hides URLs before running these regular expressions, but I'm not sure about the other tool. -- John of Reading (talk) 07:25, 1 February 2019 (UTC)
- @Certes: could you see if you get the same result via JWB? User:Tom.Reding/sandbox is ready to test. ~ Tom.Reding (talk ⋅dgaf) 14:25, 1 February 2019 (UTC)
- The only change JWB suggests is the first one: Imporve→Improve on line 4. Certes (talk) 15:23, 1 February 2019 (UTC)
- @Certes: could you see if you get the same result via JWB? User:Tom.Reding/sandbox is ready to test. ~ Tom.Reding (talk ⋅dgaf) 14:25, 1 February 2019 (UTC)
- @Tom.Reding: At Amissville, Virginia, the simplified rule wants to correct
www.lva.virginia.gov
towww.lva.Virginia.gov
. The previous rule would not have done this. -- John of Reading (talk) 07:57, 7 February 2019 (UTC)- JWB does not suggest capitalising "virginia" there. Are you using AWB or some other tool? Certes (talk) 11:13, 7 February 2019 (UTC)
- It appears that the omission of "http://" or "https://" before the "www." is enough for it not to be considered a URL (using AWB). I don't see why "www."s shouldn't be considered URLs though. ~ Tom.Reding (talk ⋅dgaf) 12:48, 7 February 2019 (UTC)
- Phab'd @ T215698. ~ Tom.Reding (talk ⋅dgaf) 14:35, 9 February 2019 (UTC)
- It appears that the omission of "http://" or "https://" before the "www." is enough for it not to be considered a URL (using AWB). I don't see why "www."s shouldn't be considered URLs though. ~ Tom.Reding (talk ⋅dgaf) 12:48, 7 February 2019 (UTC)
- JWB does not suggest capitalising "virginia" there. Are you using AWB or some other tool? Certes (talk) 11:13, 7 February 2019 (UTC)
Going off at a slight tangent, JWB often suggests changing filenames such as "File:Speld itt rong.jpg", which is unhelpful if that's what the file is actually called. Does AWB do the same? Certes (talk) 11:13, 7 February 2019 (UTC)
- I'm pretty sure AWB ignores filenames, not sure how strictly though (given the above). ~ Tom.Reding (talk ⋅dgaf) 12:48, 7 February 2019 (UTC)
Overly specific "Capitalisation" section: rename or upmerge?
Many of the typos under "Capitalisation" include other typo fixes, and some are first-letter-case-insensitive, making it slightly ambiguous where to place a rule for "Continents and subcontinents" if the first letter is case-insensitive (for example, does "America" go under "A" or "Capitalisation" > "Continents and subcontinents"?). I certainly don't want to see 2 "Continents and subcontinents", 1 under "Capitalisation" and the other under a different heading; that would be equally, if not more, confusing/tedious/slow while looking through the long typo list.
So, I'd like to either:
- rename "Capitalisation" to something more inclusive (suggestions?), or
- upmerge all subheadings of "Capitalisation" to the current level of "Capitalisation".
I'd prefer #1, if we can find a good heading name. ~ Tom.Reding (talk ⋅dgaf) 14:15, 31 January 2019 (UTC)
- As you say, we have two classes of fix: "capitalisation only" and "capitalisation combined with other fixes". Perhaps the "other typo fixes" should be extended to the case where the initial is correctly capitalised. For example, it looks as if we fix algonkin→Algonquin but don't fix Algonkin. Then it might make more sense to combine the second class with the main body, noting in some way that we're capitalising [Aa]→A rather than preserving case with something like ([Aa])→$1. Certes (talk) 14:32, 31 January 2019 (UTC)
- 'Algonkin' is a good example, and part of the reason for wanting to fix the "Capitalisation" heading - many of those rules would benefit from case-insensitivity.
- However, I think we should maintain the current "Capitalisation" subsections, and possibly add more in the future. I like grouping related typos together, as that makes any systematic changes/checks/etc. easier to perform. Making sure not to WP:OVERCAT, of course. ~ Tom.Reding (talk ⋅dgaf) 14:58, 31 January 2019 (UTC)
- One potential issue I see is that negative look-behinds would need to be added to the newly-combined first-letter-case-insensitive rules, to avoid correct self-matching (for example, as required for "Georgia"). This is probably more efficient than having 2 similar rules run, though. @Smasongarrison: could you provide any input on this? Even if slightly less efficient, I would argue it's better to combine similar rules to avoid forking. ~ Tom.Reding (talk ⋅dgaf) 17:06, 31 January 2019 (UTC)
- I like combining rules whenever possible because fewer rules is almost always faster than more rules. The negative look-behinds is a good way to do it, although I think that some programs that use the typos can't process look behinds. Smasongarrison (talk) 18:31, 31 January 2019 (UTC)
- "Rule groups", "Grouped rules", or something along those lines, I think are reasonable replacements for "Capitalisation". Does anyone have a preference? ~ Tom.Reding (talk ⋅dgaf) 13:51, 9 February 2019 (UTC)
- If we rename 4.29 from "Capitalization" to something like "Rule groups", then I think 4.28 and 4.30 - 4.33 would then have to be subheadings of "Rule groups". Therefore, I suggest upmerging instead. GoingBatty (talk) 15:39, 9 February 2019 (UTC)
Whoops! Not sure where this one came from.
AWB Typos flagged "enmeies" and suggested correcting with "emmeies", the intended word was "enemies". Probably the first error i've seen in this fine collection of regular expressions.
For context, the sentence was: Slippy: This is Slippy! We've discovered new enmeies up here! --Pawngpawng (talk) 02:56, 11 February 2019 (UTC)
- Probably from
word="Emm-"
, which correctly concludes that anything of the form "enm..." that isn't whitelisted isn't a word, and assumes that emm... was intended. Although rule order isn't guaranteed, we might consider this rule to be fuzzy and encourage it to fire last, after other rules have had a chance to apply a more specific correction. Certes (talk) 11:18, 11 February 2019 (UTC)
privately-
Are the "Privately (1)" and "Privately (2)" rules incorrectly fixing legitimate instances of the adjective "privately-owned"? (e.g. Lufussa) Thanks! GoingBatty (talk) 02:25, 16 February 2019 (UTC)
- I could be wrong. According to Case Western University:
Do not use a hyphen ... [i]n compound adjectives that contain adverb forms ending in "ly" unless the hyphen is needed for clarity.
Example: Three federally funded programs were renewed.
Example: The privately owned companies did not disclose financial details.
Example: early-morning talk
Rules that match correct spellings?
Are the typo rules supposed to avoid matching properly spelled words?
- "Collaborate" matches the properly spelled "collaboration" (e.g. Meego)
- "Prestigious" matches the properly spelled "prestigious" (e.g. Kristalina Georgieva)
- "Translate" matches the properly spelled "translate" (e.g. Quidditch Benelux)
- "-tility" matches the properly spelled "utilities" (e.g. Metro Tunnel)
Thanks! GoingBatty (talk) 18:21, 9 February 2019 (UTC)
- I think we need to find a consensus between "overuse" of look-behinds, and compatibility with JWB, which can't use them, so omits those rules. The downside of not using look-behinds is, in AWB, the Typo tab will be filled with cases where it matched the correct text. And performing many unnecessary replacements is slower than using look-behinds. The upside of not using look-behinds is that JWB, and possibly other software(?), is able to use the typo rules. Which of these is more important?
- If we decide against look-behinds (limiting them anyway), that's a good argument to keep capitalizations in their own section, since capitalization + another typo = look-behind needed. It would be useful to know how many non-capitalization rules have and still need a look-behind (i.e. would adding look-behinds to capitalizations greatly increase the look-behind count, or if it would be a drop in the bucket?). ~ Tom.Reding (talk ⋅dgaf) 12:18, 11 February 2019 (UTC)
- A few thoughts from a non-expert:
- Could JWB be enhanced to remove look-behinds rather than skip the entire expression? The safest approach is to apply this tactic only to look-behinds which we mark as "efficiency only" in some way, but we may even be able to let JWB remove all look-behinds unless some of them actually prevent inappropriate changes.
- Could AWB be enhanced to filter typos from its list where the replacement text equals the replaced text?
- Could we use look-aheads instead, or would that be grossly inefficient as we'd have to check every start position rather than just those that matched the typo?
- Look-behinds may be coming soon to JavaScript, so this problem may eventually solve itself. Interesting reading, and a couple of useful links at the bottom: [14].
- — Certes (talk) 15:43, 11 February 2019 (UTC)
- @Joeytje50: would question #1 above be doable in JWB, perhaps by appending a specific comment at the end of specific rules which have 'efficiency-only' look-behinds? ~ Tom.Reding (talk ⋅dgaf) 05:06, 15 February 2019 (UTC)
- @Reedy: would question #2 above be doable in AWB? If >= maybe, I'll create a phab ticket. ~ Tom.Reding (talk ⋅dgaf) 05:11, 15 February 2019 (UTC)
- Could AWB add lookbehinds automatically, changing
L → R
toL(?<!r) → R
? r is R made suitable for searching; I don't know this flavour of regex in detail but that may mean replacing $1 by \1, etc. For example, if we write([Ss])pel+(ed|ing) → $1pell$2
, could AWB actually run([Ss])pel+(ed|ing)(?<!\1pell\2) → $1pell$2
? Does this work? Is it efficient? Certes (talk) 12:24, 6 March 2019 (UTC)
- A few thoughts from a non-expert:
New infobox image?
The infobox image shows that AWB/T uses replacewith=
. Since AWB/T actually uses replace=
, would someone like to change the infobox image? Thanks! GoingBatty (talk) 18:21, 10 March 2019 (UTC)
"In the mean time" vs "In the meantime"
I recently created a new rule to change "In the mean time" to "In the meantime" and updated the relevant articles, including Henry Gage (soldier). My edit was reverted by Andreas Philopater as "stylistically inferior". Looking for thoughts from the experts here about this rule. Thanks! GoingBatty (talk) 16:33, 20 March 2019 (UTC)
- "Mean time" would mean "average time", while "meantime" means "concurrently" or similar, as described in Merriam-Webster and many other reliable sources. I'm failing to find an authoritative source recommending "in the mean time". I suppose there might exist some earlier period in English usage where "mean time" was common, but it has no place in article prose except for inside quotation marks. ~ Tom.Reding (talk ⋅dgaf) 17:01, 20 March 2019 (UTC)
- But why would you take an American dictionary as the sole standard? And why would you rule out a legitimately existing option within the full range of the English language, simply because American English is poor in alternatives? --Andreas Philopater (talk) 21:24, 20 March 2019 (UTC)
- Andreas Philopater, I'll be interested in further discussion after you provide a reliable source supporting your position. ~ Tom.Reding (talk ⋅dgaf) 22:49, 20 March 2019 (UTC)
- @Andreas Philopater:: It looks like Tom.Reding referred to "Merriam-Webster and many other reliable sources" instead of taking "an American dictionary as the sole standard". Since American English can be different from English in other parts of the world, I look forward to other reliable sources. Thanks! GoingBatty (talk) 01:52, 23 March 2019 (UTC)
- But why would you take an American dictionary as the sole standard? And why would you rule out a legitimately existing option within the full range of the English language, simply because American English is poor in alternatives? --Andreas Philopater (talk) 21:24, 20 March 2019 (UTC)
- Since both forms are available in English, there's no need to force either one as an automatic "correction". --Andreas Philopater (talk) 21:24, 20 March 2019 (UTC)
- "Mean time" has a specific meaning, that is the local mean time is determined by the sun as distinct from local standard time. It is defined as noon by local mean time is when the sun is directly above the local meridian. Meantime does not have the same meaning at all. "In the meantime" means "while something else is currently happening". I don't know which use is appropriate in the article concerned but I do know that you cannot simply switch between "mean time" and "meantime". American English has nothing to do with it. - Nick Thorne talk 22:00, 20 March 2019 (UTC)
- Upon re-reading the OP, I think the rule is appropriate. "in the mean time' does not actually make any sense, given what mean time really means. - Nick Thorne talk 22:03, 20 March 2019 (UTC)
- And of course it isn't conceivable that the "mean time" in "in the mean time" signifies something other than the "mean time" in "Greenwich Mean Time". Because language always lends itself to neat categorisation and polysemy just isn't a thing. --Andreas Philopater (talk) 19:31, 21 March 2019 (UTC)
- @Andreas Philopater: If you find some examples in the English Wikipedia where "in the mean time" means something other than "while something else is currently happening", could you please post them here? Thanks! GoingBatty (talk) 01:52, 23 March 2019 (UTC)
- I've looked but can't see any on Wikipedia. They occasionally occur elsewhere, when reporting the average duration or interval of some event.
- Maybe run this but do a negative lookahead for "of|between"? – Certes (talk) 11:19, 23 March 2019 (UTC)
- @Andreas Philopater: If you find some examples in the English Wikipedia where "in the mean time" means something other than "while something else is currently happening", could you please post them here? Thanks! GoingBatty (talk) 01:52, 23 March 2019 (UTC)
- As others have said, "mean time" is a term of art for average time, such as the mean time of solar noon or the mean time between failures. The other meaning, a rough synonym for "meanwhile", is usually written "meantime" but is sometimes written "mean time". The question is whether the latter is valid or should be corrected. It's listed in Wiktionary and The Free Dictionary, and there examples of its use attributed to Reuters reports here, but I can't find a RS to confirm whether it's actually right. Certes (talk) 00:40, 21 March 2019 (UTC)
"nationalwide" → "nationwide"
Doesn't occur too often, but a useful addition in my opinion. --bender235 (talk) 21:22, 15 April 2019 (UTC)
"based off of" → "based on"
This is reasonably common but incorrect usage. Dalziel 86 (talk) 01:44, 25 May 2019 (UTC)
- Am I not seeing this as first rule under Wikipedia:AutoWikiBrowser/Typos#Incorrect_phrases? Shenme (talk) 04:27, 1 June 2019 (UTC)
HTML entities
So I've been running database scans and finding that people often write certain non-ASCII characters as HTML entities. General consensus seems to be that for wikitext readability, this shouldn't be done for Latin alphabet-based letters, certainly, and some common symbols. There are many hundreds of instances, so it would be nice to have some semi-automated help fixing them. Below is a list I've put together of the most frequent occurrences that should be universally safe. I'm hoping this syntax with work with AWB and friends...would anyone be interested in testing them out and adding to the official list if they work? Thanks! -- Beland (talk) 23:30, 13 March 2019 (UTC)
<Typo find="°" replace="°"/>
<Typo find="§" replace="§"/>
<Typo find="é" replace="é"/>
<Typo find="£" replace="£"/>
<Typo find="ç" replace="ç"/>
<Typo find="ü" replace="ü"/>
<Typo find="ä" replace="ä"/>
<Typo find="ó" replace="ó"/>
<Typo find="ö" replace="ö"/>
<Typo find="á" replace="á"/>
<Typo find="é" replace="é"/>
<Typo find="è" replace="è"/>
<Typo find="í" replace="í"/>
<Typo find="ñ" replace="ñ"/>
<Typo find="à" replace="à"/>
<Typo find="ë" replace="ë"/>
<Typo find="É" replace="É"/>
<Typo find="ö" replace="ö"/>
<Typo find="ø" replace="ø"/>
<Typo find="ß" replace="ß"/>
<Typo find="¶" replace="¶"/>
<Typo find="Ü" replace="Ü"/>
<Typo find="á" replace="á"/>
<Typo find="í" replace="í"/>
<Typo find="£" replace="£"/>
<Typo find="ã" replace="ã"/>
<Typo find="ê" replace="ê"/>
<Typo find="ä" replace="ä"/>
<Typo find="‰" replace="‰"/>
<Typo find="Í" replace="Í"/>
<Typo find="Ö" replace="Ö"/>
<Typo find="å" replace="å"/>
<Typo find="Á" replace="Á"/>
<Typo find="Å" replace="Å"/>
<Typo find="ú" replace="ú"/>
<Typo find="ô" replace="ô"/>
<Typo find="â" replace="â"/>
<Typo find="€" replace="€"/>
<Typo find="ø" replace="ø"/>
<Typo find="ō" replace="ō"/>
<Typo find="ā" replace="ā"/>
<Typo find="¶" replace="¶"/>
<Typo find="ü" replace="ü"/>
<Typo find="ó" replace="ó"/>
<Typo find="ŏ" replace="ŏ"/>
<Typo find="ā" replace="ā"/>
<Typo find="é" replace="é"/>
<Typo find="ć" replace="ć"/>
<Typo find="è" replace="è"/>
<Typo find="°" replace="°"/>
<Typo find="ś" replace="ś"/>
<Typo find="ż" replace="ż"/>
<Typo find="ñ" replace="ñ"/>
<Typo find="à" replace="à"/>
<Typo find="š" replace="š"/>
<Typo find="ş" replace="ş"/>
<Typo find="ī" replace="ī"/>
<Typo find="ō" replace="ō"/>
<Typo find="ǫ" replace="ǫ"/>
<Typo find="è" replace="è"/>
<Typo find="ū" replace="ū"/>
<Typo find="ł" replace="ł"/>
<Typo find="č" replace="č"/>
<Typo find="ë" replace="ë"/>
<Typo find="ŭ" replace="ŭ"/>
<Typo find="Č" replace="Č"/>
<Typo find="é" replace="é"/>
<Typo find="â" replace="â"/>
<Typo find="Ż" replace="Ż"/>
<Typo find="á" replace="á"/>
<Typo find="ö" replace="ö"/>
<Typo find="É" replace="É"/>
<Typo find="á" replace="á"/>
<Typo find="ä" replace="ä"/>
<Typo find="ó" replace="ó"/>
<Typo find="Á" replace="Á"/>
<Typo find="Š" replace="Š"/>
<Typo find="ä" replace="ä"/>
<Typo find="ń" replace="ń"/>
<Typo find="Ā" replace="Ā"/>
<Typo find="á" replace="á"/>
<Typo find="a" replace="a"/>
<Typo find="á" replace="á"/>
<Typo find="Ł" replace="Ł"/>
<Typo find="ó" replace="ó"/>
<Typo find="ü" replace="ü"/>
<Typo find="£" replace="£"/>
<Typo find="ţ" replace="ţ"/>
<Typo find="°" replace="°"/>
<Typo find="ě" replace="ě"/>
<Typo find="ó" replace="ó"/>
<Typo find="ž" replace="ž"/>
<Typo find="ř" replace="ř"/>
<Typo find="é" replace="é"/>
<Typo find="Ō" replace="Ō"/>
<Typo find="í" replace="í"/>
<Typo find="§" replace="§"/>
<Typo find="Ç" replace="Ç"/>
<Typo find="ę" replace="ę"/>
<Typo find="ż" replace="ż"/>
- @Beland: most of this is duplication of AWB's WP:AWB/UNICODIFY feature. There are only a handful of exceptions, which can be made into temporary typo rules until they're added to the unicodifying function. See Special:Diff/899831606 for reference. ~ Tom.Reding (talk ⋅dgaf) 18:07, 1 June 2019 (UTC)
- Aha, good to know that's already available. I was trying to test with JWB, but neither the regular rules nor the new ones are working. (And I don't think it supports unicodify?) I'll remove the duplicates from the live listing for now. -- Beland (talk) 18:13, 1 June 2019 (UTC)
shortended
shortended - shortened - just spotted this typo not being fixed by AWB and there are a few of them. ϢereSpielChequers 18:21, 21 June 2019 (UTC)
"Peace Price" issue?
Peace Price → P$1 Prize--Pawngpawng (talk) 03:56, 24 June 2019 (UTC)
- What is the issue? --DannyS712 (talk) 04:08, 24 June 2019 (UTC)
- Should be fixed now -- John of Reading (talk) 06:19, 24 June 2019 (UTC)
Billoard - billboard
I'm going to do an AWB run to fix the ones that are there now, but can we put Billoard/bilboard - billboard in? thanks for fixing Also we need to take "encyclopaedic - encyclopedic" out per Engvar. ϢereSpielChequers 14:19, 26 June 2019 (UTC)
encyclopaedia > encyclopedia
@Tom.Reding: Since this edit, the typo fixer has been suggesting we change encyclopaedia
to encyclopedia
. Was that your intention? Lots of dictionaries list encyclopaedia
as a valid spelling. -- John of Reading (talk) 06:30, 7 June 2019 (UTC)
- How about something like
<Typo word="Encyclopedia (1)" find="\b([eE])ncyl?c?l?op(a?e|æ)a?di(as?|c)\b(?<![eE]ncyclop(a?e|æ)di[ac]s?)" replace="$1ncyclop$2di$3"/>
(not tested)? Certes (talk) 08:59, 7 June 2019 (UTC) - @John of Reading: I don't think I changed this specific behavior (though I did add 2
l?
s for good measure) - the intention was simply to merge (and non-controversially embellish) existing similar rules. It would be better to find & ask the original editor of rule "Encyclopedia(3)". ~ Tom.Reding (talk ⋅dgaf) 11:36, 7 June 2019 (UTC)- @Tom.Reding and Certes: The old "Encyclopedia(3)" used to match
\b([eE])ncyclop(?:a?|ea)di(as?|c)\b
, not -ae-. Looking at my own rule set, I find I have two rules. One matches \b([eE])ncy(?:cloap|clop|clp|clpo|clpop|lcop)(?:ad|ead)(ias?|iac|iacal|ial|ians?|ic|ical|ically|icity|ism|ists?)\b
- (2nd syllable either valid or invalid, 3rd syllable invalid) and corrects it to
$1ncycloped$2
, and the other matches \b([eE])ncy(?:cloap|clp|clpo|clpop|lcop)(aed|æd|ed)(ias?|iac|iacal|ial|ians?|ic|ical|ically|icity|ism|ists?)\b
- (2nd syllable invalid, 3rd syllable valid) and corrects it to
$1ncyclop$2$3
. Any good? -- John of Reading (talk) 12:27, 7 June 2019 (UTC)- @John of Reading: ah, I see - that was definitely not intentional. Certes' solution seems the most intuitive, though yours seem more thorough, and possibly less prone to typos in the regex. Will take some time to digest these options. It would be nice to have 1 rule, if possible. ~ Tom.Reding (talk ⋅dgaf) 13:02, 7 June 2019 (UTC)
- I think that one rule is perfectly possible but would only be efficient and practical with a negative look-behind, making the rule unusable in JWB. Certes (talk) 13:31, 7 June 2019 (UTC)
- I've disabled the faulty rule, following WereSpielChequers' reminder, below. -- John of Reading (talk) 17:54, 26 June 2019 (UTC)
- I think that one rule is perfectly possible but would only be efficient and practical with a negative look-behind, making the rule unusable in JWB. Certes (talk) 13:31, 7 June 2019 (UTC)
- @John of Reading: ah, I see - that was definitely not intentional. Certes' solution seems the most intuitive, though yours seem more thorough, and possibly less prone to typos in the regex. Will take some time to digest these options. It would be nice to have 1 rule, if possible. ~ Tom.Reding (talk ⋅dgaf) 13:02, 7 June 2019 (UTC)
- @Tom.Reding and Certes: The old "Encyclopedia(3)" used to match
What is RegExTypoFix doing here?
I've only been using RegExTypoFix for a short time so I apologize in advance if this is something incredibly obvious... but what is it doing in the photo I have attached to the right? Looks like its changing a apostrophe but I can't see the difference or the reason for it. A good majority of the "typos" it finds are just this sort of change... so is this a glitch, or is this intended, and why? TheAwesomeHwyh 21:22, 3 July 2019 (UTC)
- @TheAwesomeHwyh: The difference here is that some kind of typesetter's apostrophe has been replaced with a straight quote mark,
'
. This is recommended by MOS:PUNCT. In general, to see which typo rules have made changes to the current page, switch to the "Typos" tab at the bottom right of the AWB window. -- John of Reading (talk) 06:00, 4 July 2019 (UTC)- Ah, thanks! TheAwesomeHwyh 17:13, 4 July 2019 (UTC)
MB/s != Mbits/s
I was running AWB on some random pages, and one fix that it suggested was on Intel 810, replacing 266MB/s
with 266Mbit/s
. 266MB/s is megabytes per second, Mbits/s is megabits per second. 1 megabyte = 8 megabits, so they're not interchangeable. I'm not regex-competent enough to make the fix myself. The one that triggered this is:
([\d\.]+(?:[−―–—\s]| )?)(?:M(?:B(?:it(?:s\/se?c?|\/s)|ps|\/se?c?)|b(?:its\/se?c?|ps|\/se?c?))|m[bB](?:it(?:s\/se?c?|\/s)|ps|\/se?c?))\b
.
- Frood (talk!) 05:30, 29 August 2019 (UTC)
- @Frood: I've deleted part of this rule and the similar "kilobit" rule. What's left is still quite hard to understand! Before this edit by Smasongarrison (talk · contribs), the rules didn't touch anything with an uppercase ASCII "B". -- John of Reading (talk) 06:59, 29 August 2019 (UTC)
Strange typo problem
Mayor of Cebu City has the phrase "oath of office" in it twice. AWB is correcting a double "of" as a typo ([[WP:AWB/T|typo(s) fixed]]: of of → of), but only in the first occurrence. Is there some invisible character in the article causing this? MB 00:32, 12 September 2019 (UTC)
- @MB: Yes. Pasting the wikitext out of the edit window into Notepad++ shows that the text contains many soft hyphens. -- John of Reading (talk) 04:03, 12 September 2019 (UTC)
- It sounds as if the soft hyphen is being misinterpreted as a word delimiter. Can we do anything about that, short of replacing \b by something much more complex? Certes (talk) 10:33, 12 September 2019 (UTC)
- @MB: Fixed the article so AWB won't make an incorrect edit. GoingBatty (talk) 01:12, 17 September 2019 (UTC)
Request to expand "A to An"
Could someone please expand the "A to An" rule to include:
- "a ana.." (e.g. "a analyst", "a analytics", "a anatomical")
- "a anc.." (e.g. "a ancestor", "a anchor", "a ancient")
- "a and.." (e.g. "a android", but NOT "a and ")
- "a ang.." (e.g. "a angel", "a angled", "a anglicized", "a angry")
- "a ani.." (e.g. "a animal", "a animated", "a anime")
- "a ann.." (e.g. "a annual")
- "a ano.." (e.g. "a anonymous")
- "a ant.." (e.g. "a anterior", "a antiqued", "a antagonist", "a anti-", "a anticipatory")
- "a MVP"
Thanks! GoingBatty (talk) 02:22, 17 September 2019 (UTC)
- I added a "A to An (2)" rule. Feel free to combine it with the "A to An" rule and/or expand it. GoingBatty (talk) 02:44, 19 September 2019 (UTC)
- Interesting and surprising discovery!
[aA](?!AA?T?|ED|FN|l(?:do|guien\b)|LL|MD|nd\b|NG|OA|p(?:agar\b|robat\b)|rtelor\b|RS|s\b|tahualpa\b|UD|ustriei\b|WG|ZN)?\b
- I don't see why the original does not correct all those. Code above is the exclusions for words starting with 'a' or 'A'. Regards, Sun Creator(talk) 22:16, 28 September 2019 (UTC)
Onboard
- Also "onboard" is a real word, please don't assume it should be "on board" ϢereSpielChequers 20:59, 7 October 2019 (UTC)
- @WereSpielChequers: The "Onboard" rule tries to peek at what's coming next; there's discussion in the archives. -- John of Reading (talk) 06:07, 8 October 2019 (UTC)
- Thanks John, I'll read that archive and reread the dictionary definition, I may have got that word wrong. ϢereSpielChequers 07:58, 8 October 2019 (UTC)
- @WereSpielChequers: The "Onboard" rule tries to peek at what's coming next; there's discussion in the archives. -- John of Reading (talk) 06:07, 8 October 2019 (UTC)
lower case tests
Filim and Offred have way too many false positives to do, but as long as you make it case sensitive:
- offred - offered
- filim - film
- mainy - mainly
I'm doing the current batch but it would be good to get these tests into AWB. ϢereSpielChequers 20:02, 8 October 2019 (UTC)
List of low frequency typos you can load on AWB
Hi guys, I know this page is dedicated to high frequency typos, but there is some high frequency origin of typos that can be addressed as well. The most common I found is switching between adjacent chars, removing, duplicating and replacing chars. Levenshtein distance 1 in formal language. I took all common words, made on them all possible variations and removes the legitimate words from the output. I then searched those 200K variations across Wikipedia dumps. What I found helped me create a list of less frequent replacements and a list of the articles where they are found. You can load those lists from Wikipedia:AutoWikiBrowser/Settings/Autocorrect and the talk page and start fixing thousands of obvious typos across Wikipedia, few seconds per fix. I hope you will find this list useful. Any feedback is much appreciated! Uziel302 (talk) 14:03, 21 July 2019 (UTC)
- How do you determine intent in ambiguous cases? How do you know *bacronym is acronym and not backronym? Is *baettled meant to be settled, or battled? And so on. Mathglot (talk) 17:41, 12 September 2019 (UTC)
- bacronym is a valid alternative spelling for backronym and probably shouldn't be changed. On the other hand, \bacroynm\b, i.e. acroynm [sic] as a word, should be safe to correct to acronym. (\b is a word boundary.) \baettled\b is ambiguous and might need manual attention because aettled could be a typo for ettled, fettled, kettled, mettled or nettled, but the adjacency of A to S on most keyboards may make it worth being bold and correcting to settled. Certes (talk) 18:37, 12 September 2019 (UTC)
- I don't have a sophisticated way to guess, I just guess and people can easily type the right correction on Wikipedia:Correct typos in one click. Uziel302 (talk) 16:26, 10 October 2019 (UTC)
- I've gone through some of Uziel's lists and made some suggestions later on this page. I agree with others that we can't introduce a whole batch without checking that each individual rule is sufficiently safe for AWB. But it is time that we can start looking at the potential typos that AWB doesn't yet pick up to see which ones can be added to AWB. ϢereSpielChequers 15:54, 11 October 2019 (UTC)
- I don't have a sophisticated way to guess, I just guess and people can easily type the right correction on Wikipedia:Correct typos in one click. Uziel302 (talk) 16:26, 10 October 2019 (UTC)
- bacronym is a valid alternative spelling for backronym and probably shouldn't be changed. On the other hand, \bacroynm\b, i.e. acroynm [sic] as a word, should be safe to correct to acronym. (\b is a word boundary.) \baettled\b is ambiguous and might need manual attention because aettled could be a typo for ettled, fettled, kettled, mettled or nettled, but the adjacency of A to S on most keyboards may make it worth being bold and correcting to settled. Certes (talk) 18:37, 12 September 2019 (UTC)
Bulit
"Bulit" is a surname, bulit a typo of built. Could we make this test case sensitive please? This would avoid some current false positives. ϢereSpielChequers 11:43, 11 October 2019 (UTC)
- Or might it mean "bullet" or even "Bullet"? Certes (talk) 11:54, 11 October 2019 (UTC)
- In theory yes, but I've cleared the current backlog and they were all "built". ϢereSpielChequers 17:33, 11 October 2019 (UTC)
Trivial changes
I see that new "typos" (broadly defined) to clean up spurious spacing have been added and removed. WP:AWBRULES 4 sensibly states that we shouldn't save a page just to do this. However, would it make sense to create a new class of minor correction – probably not called typos – which are applied if and only if the page is being saved anyway to make more significant changes? Certes (talk) 10:55, 13 October 2019 (UTC)
- Aren't these the general fixes?
- Of the two new rules, the one that removes spaces at the end of paragraphs is not needed, as the general fixes already do this. The other, replacing double spaces by single spaces, contradicts MOS:DOUBLE SPACE, which allows both styles. -- John of Reading (talk) 12:34, 13 October 2019 (UTC)
- Thanks, I thought this might already be covered somewhere. I also use double spaces after a sentence (as above) but occasionally condense multiple spaces mid-sentence if they seem distracting when I'm editing a page for other reasons. Certes (talk) 12:48, 13 October 2019 (UTC)
In the 1970's
I have had one of my AWB edits reverted on the basis that grocer's apostrophe's are acceptable in decades, and a comment on my talkpage. As far as I'm aware, the only correct way to handle this is "though born in the 1960s, their tastes were more for 1970's fashion". Since this is a standard AWB fix, it would be better to discuss this here rather than on my talkpage. ϢereSpielChequers 15:48, 14 October 2019 (UTC)
- Having had a canter over to the style guide [19], it seems to be ambivalent on the point of the 1970's -v- 1970s, though does state that some [other] style guides prefer the latter to the former without specifying a preference for Wikipedia which suggests that either is acceptable (with the usual caveat regarding consistency in articles). -86.130.28.61 (talk) 16:10, 14 October 2019 (UTC)
- The Wikipedia style guide is at MOS:DECADE and says we should be using no apostrophe. -- John of Reading (talk) 16:18, 14 October 2019 (UTC)
- OK. I missed that one. I was just about to add that it is probably an issue that is not worth getting excited about. If the style guide does say 1970s rather than 1970's then I will happily concede the point. -86.130.28.61 (talk) 16:41, 14 October 2019 (UTC)
Can't this be used to clean up code?
Why revert, John of Reading? Why not simply disable? — Guarapiranga (talk) 08:24, 15 October 2019 (UTC)
- @Guarapiranga: Because AutoWikibrowser hides all templates from the text before running these find and replace rules. -- John of Reading (talk) 14:52, 15 October 2019 (UTC)
Veill
@Tom.Reding: I'm not a fan of wide-ranging rules, myself! From the first 2% of a database scan, I find false matches with éveille, Merveilles, [Rr]eveille, Reveillon, Veillet, Veilleux, Veillon, and Veillot. -- John of Reading (talk) 20:07, 15 October 2019 (UTC)
- Constrained - thank you for that analysis.damn French ~ Tom.Reding (talk ⋅dgaf) 16:12, 16 October 2019 (UTC)
predominate/predominately
My edit was reverted with the comment that predominant was more common/preferred/modern. That seems to be backed-up in [20]. MB 15:14, 20 October 2019 (UTC)
ammasso/i
I have hit a problem with some of these rules in Italian language fixes on the English wikipedia. I don't speak a word of Italian so feel uncomfortable when AWB prompts me with changes such as ammasso → amasso or indeed comprese → compresse. Is there any possibility that we have some over confident rules in AWB? ϢereSpielChequers 13:51, 24 October 2019 (UTC)
He quartered
Hi, "He quartered" is not necessarily a typo of headquartered, could that test be removed please? ~ Tom.Reding (talk ⋅dgaf) 03:03, 19 October 2019 (UTC) Some you could usefully add would be:
- "featureed" as a typo for "featured" ~ Tom.Reding (talk ⋅dgaf) 17:15, 16 October 2019 (UTC)
- "unveilled" - "unveiled" ~ Tom.Reding (talk ⋅dgaf) 17:15, 16 October 2019 (UTC)
- "receving" - "receiving" ~ Tom.Reding (talk ⋅dgaf) 17:15, 16 October 2019 (UTC)
- "sigend" - "signed" ~ Tom.Reding (talk ⋅dgaf) 03:03, 19 October 2019 (UTC)
- "voage/voyae" - "voyage" ~ Tom.Reding (talk ⋅dgaf) 03:03, 19 October 2019 (UTC)
- "gulity" - "guilty" ~ Tom.Reding (talk ⋅dgaf) 03:03, 19 October 2019 (UTC)
- "sporano" - "soprano" ~ Tom.Reding (talk ⋅dgaf) 03:03, 19 October 2019 (UTC)
- "aritst" - "artist" ~ Tom.Reding (talk ⋅dgaf) 03:03, 19 October 2019 (UTC)
- "prometed" - "promoted" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "outsed" - "ousted" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "registred" - "registered" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "servicable" - "serviceable" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "bethroted" - "betrothed" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
| "preciptiation" - "precipitation" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "pardonned" - "pardoned" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "alliegence" - "allegiance" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
- "parternship" - "partnership" ~ Tom.Reding (talk ⋅dgaf) 16:44, 25 October 2019 (UTC)
I've dealt with the current crop of them all, but it would help if AWB could catch them in the future. ϢereSpielChequers 23:03, 30 September 2019 (UTC)
- Plus "invitiation" - "invitation"
- "restaured" - "restored"
- "highlited" - "highlighted" (still making my way through User:Uziel302/oddwords, fixing the current examples and identifying ones we can be confident are typos)ϢereSpielChequers 08:14, 15 October 2019 (UTC)
Request new typos
Can someone please add unservicable -> unserviceable, spellt-spelled and liensman -> linesman to the list please? Bellowhead678 (talk) 17:37, 27 October 2019 (UTC)
- unservicable -> unserviceable - I agree this could go into AWB - I have just dealt with the existing examples
- "liensman" would appear to be archaic and rare, but where it is used on Wikipedia it seems to be correct.
- "spellt" is definitely a typo, but could be either spelled or spelt depending on the version of English, I suggest not suitable for AWB as you don't know which of those to go with ϢereSpielChequers 17:54, 27 October 2019 (UTC)
heriditary
- heriditary->hereditary I have just fixed all 14 please can we put the word into AWB for the future.
- remphasised - reemphasised
- exceled - excelled
- pallisaded - palisaded
- debutting - debuting debutted - debuted ϢereSpielChequers 17:54, 27 October 2019 (UTC)
Dash fix
I'm not sure if this is considered a typo or belongs somewhere else. In this edit (see line 62), the dash was changed in three highlighted bullet points, but not in the prior one (April 24, 1775-December 1775). MB 02:29, 5 November 2019 (UTC)
Nemesis rule
This rule currently turns typo 'archnemesis' into nemesis. Sun Creator(talk) 02:49, 5 November 2019 (UTC)
harvard rule
Not to capitalize when a domain harvard.edu to Harvard.edu Sun Creator(talk) 13:19, 3 November 2019 (UTC)
- Fixed by adding a lookahead. Those exceptions were removed in this edit by Tom.Reding; the discussion is now in Archive 4. -- John of Reading (talk) 13:34, 3 November 2019 (UTC)
- Thanks. Same issue with disney. Sun Creator(talk) 14:50, 3 November 2019 (UTC)
- And Ireland. All URL look-arounds should be restored. Sun Creator(talk) 16:10, 4 November 2019 (UTC)
- I disagree, based on cost/benefit. The cost is a small # of FPs, and probable rule-opacity for JWB users; performance too, but it's probably? a negligible difference (I might run some tests). The largest benefit is allowing JWB users access to these many rules.
- However, if JWS/JS-in-browsers allows lookaheads, I'm for restoring those only. @Certes: do you know? (I tried [21] & [22] without meaningful success) ~ Tom.Reding (talk ⋅dgaf) 16:36, 4 November 2019 (UTC)
- Yes, lookaheads work in JS and JWB, including variable width ones such as
(?=a+b)
. I think they're considered so basic that they're not listed in the feature availability table. Lookbehinds are the only problem. Certes (talk) 16:51, 4 November 2019 (UTC) - I agree. Lookahead worked in
JWBWPCleaner last time I tried it. The '*' operator does NOT work in JS orJWBWPCleaner, although it was allowed in years past, but it's a form of exploit as it can hang a computer by resource overload. Sun Creator(talk) 18:59, 4 November 2019 (UTC)- In what context does '*' not work? I ran JWB on my sandbox, replacing
a(?=n*)
byb
, and it duly changed each 'a' to 'b' regardless of how many 'n's followed it. Certes (talk) 07:31, 5 November 2019 (UTC)- My mistake, I meant WPC not JWB. Sun Creator(talk) 11:36, 5 November 2019 (UTC)
- In what context does '*' not work? I ran JWB on my sandbox, replacing
- Yes, lookaheads work in JS and JWB, including variable width ones such as
- Same problem with india btw and recall others apple etc. Sun Creator(talk) 18:59, 4 November 2019 (UTC)
- Done ~ Tom.Reding (talk ⋅dgaf) 17:28, 6 November 2019 (UTC)
instentence
"on instentence of her mother meets a few prospective grooms", the "(As/Re)sistant" ""-(st)ance"" rule changes it to 'instantence', but 'insistence' would be correct. I'm not sure anything can be done about this, just leaving it here in case it sparks some solution. Sun Creator(talk) 17:44, 5 November 2019 (UTC)
- Are you sure? regex101 says that doesn't match, and I'm not getting the change when I do a dummy typo run on this talk page. Certes (talk) 23:27, 7 November 2019 (UTC)
- Yes, test works in AWB at User:Sun_Creator/sandbox2. Sun Creator(talk) 23:37, 7 November 2019 (UTC)
- Ah, AWB is matching not "(As/Re)sistant" but the more general "-(st)ance" rule (which JWB ignores due to a lookbehind). Certes (talk) 00:13, 8 November 2019 (UTC)
- Right, my mistake, it is the "-(st)ance" rule. Sun Creator(talk)
- Ah, AWB is matching not "(As/Re)sistant" but the more general "-(st)ance" rule (which JWB ignores due to a lookbehind). Certes (talk) 00:13, 8 November 2019 (UTC)
- Yes, test works in AWB at User:Sun_Creator/sandbox2. Sun Creator(talk) 23:37, 7 November 2019 (UTC)
Punctuation/apostrophe rule
Is this rule (\w+)[´ˈ׳᾿‘’′Ꞌꞌ`;]s\b(?<!'\w[´ˈ׳᾿‘’′Ꞌꞌ`;]s|&[#\w]{1,99};s) really necessary? My first thought is that it seems rather trivial. I would prefer typos to be typos, not styling. Sun Creator(talk) 06:36, 28 October 2019 (UTC)
- @Sun Creator: see WT:AWB/T#Move "'s" rule to WP:GENFIXES?. WP:AWB/T was the fastest & most convenient way to fix the very large # of pages using random apostrophes (I recall 1000s of pages like this, but it is much more under control now). ~ Tom.Reding (talk ⋅dgaf) 12:32, 28 October 2019 (UTC)
- Seems to me that this is approaching WP:COSMETICBOT. And it is therefore in the interest of WP:AWB users to revert the rule, the alternative is lots of manually time checking and skipping, or the misuse of this rule. I also note this issue has been raised at Wikipedia_talk:AutoWikiBrowser#Skip_changing_apostrophes. Sun Creator(talk) 14:28, 28 October 2019 (UTC)
- @Sun Creator: you shouldn't let your dislike of the rule (for any reason) bias and/or cloud your judgement.
- "
[A]pproaching WP:COSMETICBOT
": is a slippery slope argument; if it violated said guideline, it would have indeed be removed. - "
[M]anually time checking
": all typos require a manual check. The instances of this rule firing on a page is probably still on-par with other common typo rules, and the best way to keep it there/reduce it is for WP:AWB, WP:JWB, WP:WPCleaner, etc. users to find and correct them periodically. There could be pages where this rule dominates (it's been several months since I last checked), but those pages are in the very small minority. Were it not for considerable effort by several editors after this rule's creation, which brought the hit-rate down immensely, there could be a basis for this argument. IIRC, there were a few complaints initially, but most of high firing rate pages (see my 1st response) were addressed quickly. If that is again the case now, then it won't last for long. - "
[S]kipping
": why would you skip? - "
[M]isuse of this rule
": what misuse? There have been, to my knowledge, essentially no false positives (which would otherwise be opportunities to constrain the rule, rather than eliminate it).
- "
- To quantify your 'time' concern, and as a checkup since my last one 6~8 months ago, I'm scanning the latest (Oct 20) database dump for a list of candidates for this rule, then sorting pages by the # occurrences. This will give an upper limit to the rule's distribution, as I can't replicate perfectly all of AWB's many typo constraints (if someone else has, I would love to have it), I invariably pick up a few more than would actually fire.
- To further address your 'time' concern, I changed the rule so that its edit summary is much shorter & consistent, i.e.
’s → 's
, regardless of the affected word. Previously, the entire word was included to make finding potential false positives easier, but that has not been necessary for some time (possibly never, but it's better to have erred on the side of caution). As a bonus side-effect, the rule is now much faster. - I'm agnostic to whether the rule should be in both WP:GenFixes & WP:AWB/T, but it should definitely be at least one, and WP:AWB/T is the easiest entry point for non-maintainers. You are by all means welcome to try to hasten the rule's addition to WP:GenFixes, and then argue for its WP:AWB/T removal. That would be a better use of your time. ~ Tom.Reding (talk ⋅dgaf) 23:21, 28 October 2019 (UTC)
- @Sun Creator: you shouldn't let your dislike of the rule (for any reason) bias and/or cloud your judgement.
- AWB is semi-automatic, so a typo rule itself can't violate WP:COSMETICBOT. However, using AWB/T without some prior selecting of articles, is likely to result in being presented with a large percentage of articles with only cosmetic changes. If each of the cosmetic changes are saved, then in my view it would amount to a violation of WP:COSMETICBOT. Sun Creator(talk) 00:14, 29 October 2019 (UTC)
For future reference, the current distribution is thus:
# "'s" # pages % of total 1 78,572 70% of total 2 20,546 18% of total 3 7,303 6.5% of total 4 3,077 2.7% of total 5 1,535 1.4% of total 6 766 0.68% of total 7 421 0.37% of total 8 314 0.28% of total 9 14 0.01% of total 10+ 38 0.03% of total
The % values here (3rd column) are more relevant/meaningful than the # count (2nd column), since the # count is only a ceiling, due to imperfect scanning. The % values show that the "'s" rule is very heavily weighted in the 1-4-per-page range, which together comprise ~97% of all affected pages, which I think is perfectly acceptable. ~ Tom.Reding (talk ⋅dgaf) 17:02, 30 October 2019 (UTC)
- After adding a few exceptions to my personal scanning regex, I update the distribution, which is now much steeper. The change is mostly due to that, and partly due to my running typo fixes on the 9 & 10+ pages lists. ~ Tom.Reding (talk ⋅dgaf) 03:38, 1 November 2019 (UTC)
- I've done maybe 5K of these edits now and no complains so far, so that's good. Maybe people are more forgiving then in the past or fewer people watching pages, or perhaps a combination of both. Either way, leave this rule running. Sun Creator(talk) 00:47, 8 November 2019 (UTC)
The 76 slowest typos
I tried running a database scan for all AWB typos, and after an hour it said ETC: 31500 minutes, or over 3 weeks! As a sanity check, it only reached 0.19% done; 1/0.0019 = 526 h = just over 3 weeks... Because of this, and partly for fun, I decided to see what the slowest rules were, and if/how they could be improved. I ran each typo rule 110x (~2 CPU sec/rule on average) on the WP:AWB/T page (this page was the easiest option to quickly code), and determined each rule's run time as a multiple of the fastest rules' run time (as a means of normalization). The results range from 1~355x, with an average and median of ~37x, and a stdev of ~31x. This is a list of the top 2% of rules, which all run > 130x slower than the fastest rules, and 3~10 stdev slower than the mean, so they are the worst of the worst.
- Improved
<Typo word="-ment" find="\b([A-Za-z]*(?:[aA](?:gree|r(?:ma|range))|[dD]ocu|[pP]ay)|[aA](?:mend|rgu)|[eE](?:nviron|xperi)|[iI]mprove|[sS](?:eg|tate))m(?:an|e(?:mt|tn)|n(?:et)?)(a[lr][a-z]*|ed|s?)\b(?<!Segman)" replace="$1ment$2"/><!--avoid surname Segman-->
- Improved
<Typo word="-ally (1)" find="\b((?:[A-Z][a-z]*|[a-z]+)(?:[cd]i|er|gi|i(?:[cn]|on)|li|n[it]|ot|son|[tv]i))aly\b(?<!Finaly|qualy)" replace="$1ally"/><!--avoid B(r)ialy, Castaly, Finaly, qualy--><!--see also "-ically", "-ually"-->
- Improved
<Typo word="-ference" find="\b((?:[A-Z][a-z]*|[a-z]+)(?:con|trans)|[cC](?:ircum|on)|[dD](?:e|if)|[iI]n(?:dif|ter)?|[pP][dr]e|[rR]e|[tT]rans)f(?:er(?:an|e(?:m|r[ae]n)|ne?|r[ae]n)|fer(?:e(?:m|r[ae]n)|r[ae]n)|r[ae]n)(c(?:e[drs]?|ing)|t(?:ial(?:ly|s?)|ly|s?))\b(?<!Defrance)" replace="$1feren$2"/>
- Improved
<Typo word="-XXX(ed/er/ing/ive)" find="\b([A-Z][a-z]*[aeiou]|[a-z]+[aeiou])([bdfgklmnprstvz])\2{2,}(e(?:d|rs?)|i(?:ngs?|ons?|ves?)|ors?)\b" replace="$1$2$2$3"/>
- Improved
<Typo word="-ally (2)" find="\b((?:[A-Z][a-z-]*|[a-z-]+)(?:[enu]|ic?))alyl?\b(?<!(?:Ann?|B(?:allyhe|i|on|ri)|br?i|C(?:onne|re)|D(?:e|o[nu])|F(?:e|in)|G(?:lene|re)|He|K(?:an|e(?:nn?e)?|i(?:lte|nn?s?e))|M(?:cNealy|e)|me|N(?:an|e)|Que?|S(?:e|[hm]e|pezi)|Vit|Whe)aly|[lL]inalyl|[sS]ialyl)" replace="$1ally"/><!--avoid many proper names-->
- Improved
<Typo word="-ish" find="\b([A-Za-z]+?)i?sih(e(?:[ds]|rs?)|ing(?:ly)?|ly)?\b(?<!asih|A(?:isih|riningsih|sih)|Bersih|esih|Finarsih|ingsih|K(?:asih|osasih)|[rs]sih|M(?:a(?:drasih|ss?ih)|essih|irajoucsih)|N(?:esih|ingsih|urnaningsih)|Su(?:kaesih|mbangsih)|T(?:laksih|sih)|Y(?:ingtsih|ulianingsih))" replace="$1ish$2"/><!--avoid proper names with -asih -esih -rsih -ssih, e.g., Bersih, Finarsih, Kasih, Kosasih, Madrasih, Masih, Massih, Messih, Nesih, Sukaesih, Nurnaningsih, Ningsih, Ariningsih, Yulianingsih, Asih, Tsih, Aisih, Tlaksih, Mirajoucsih, Sumbangsih, Yingtsih-->
- Improved
<Typo word="-fering" find="\b([A-Z][a-z]*|[a-z]+)fereing(s)?\b" replace="$1fering$2"/>
- Improved
<Typo word="-ology" find="\b([A-Z][a-z]*|[a-z]+)ol(?:[ai]?|ol)g(y(?<![vV]olgy\b)|i(?:c[a-z]*|es|sts?))\b" replace="$1olog$2"/>
- Improved
<Typo word="-ing" find="\b([bB]ak|[cC](?:a[kr]|ontinu)|[dD](?:a(?:nc|r)|i(?:v|s(?:bak|c(?:a[kr]|ontinu)|d(?:a(?:nc|r)|iv|riv)|f(?:ak|eatur|orc)|giv|hav|l(?:anc|iv)|mak|notic|ra[kv]|s(?:av|h(?:a[rtv]|in)|ka[rtv])|tak|us|w(?:a[kv]|hin)))|riv)|[eE]n(?:bak|c(?:a[kr]|ontinu)|d(?:a(?:nc|r)|iv|riv)|f(?:ak|eatur|orc)|giv|hav|l(?:anc|iv)|mak|notic|ra[kv]|s(?:av|h(?:a[rtv]|in)|ka[rtv])|tak|us|w(?:a[kv]|hin))|[fF](?:ak|eatur|orc)|[gG]iv|[hH]av|[lL](?:anc|iv)|[mM](?:ak|is(?:bak|c(?:a[kr]|ontinu)|d(?:a(?:nc|r)|iv|riv)|f(?:ak|eatur|orc)|giv|hav|l(?:anc|iv)|mak|notic|ra[kv]|s(?:av|h(?:a[rtv]|in)|ka[rtv])|tak|us|w(?:a[kv]|hin)))|[nN]otic|[rR]a[kv]|[sS](?:av|h(?:a[rtv]|in)|ka[rtv])|[tT]ak|[uU]s|[wW](?:a[kv]|hin))eing(s)?\b" replace="$1ing$2"/>
- Improved
<Typo word="-ining" find="\b([A-Z][a-z]*|[a-z]+)inig(ly|s?)\b(?<!\b(?:Bre|He|K(?:le|urt)|Lap|Me|Nar(?:ir)?|Re|Stee|[tT]|We)inig\b)" replace="$1ining$2"/><!--avoid (Br/Kl/M/H/R/St/W)einig, (Nar/Narir/Kurt/Lap/T)inig. 'ing' typos can be false positive i.e 'paintinig'-->
- Improved
<Typo word="-ation" find="\b([A-Z][a-z]*|[a-z]+)ati?oin(al(?:ly)?|ed|ing|s?)\b" replace="$1ation$2"/>
- Improved
<Typo word="-ceive" find="\b([AIMRU]?[aeimnprsu]*[pP]er|[dD]e|[IMPRU]?[aeilmnprsu]*[cC]on|[rR]e|[tT]rans)c(?:e?|eie|ie?)v(ables?|e(?:[ds]?|r(?:s(?:hip)?)?)|ing)\b" replace="$1ceiv$2"/>
- Improved
<Typo word="-nally" find="\b([A-Z][a-z]*[a-mo-z]|[a-z]+[a-mo-z])(?:anlly|nalyl)\b" replace="$1nally"/><!--avoid incorrect to incorrect change on -nanlly-->
- Improved
<Typo word="-acious" find="\b([A-Z][a-z]*|[a-z]+)acitous(?<!anthracitous)(ly|ness(?:es)?)?\b" replace="$1acious$2"/>
- Improved
<Typo word="-bility" find="\b([A-Z][a-z]*|[a-z]+)b(?:il|li)(?:li?)?t(ies|y)\b" replace="$1bilit$2"/>
- Improved
<Typo word="-vement" find="\b([A-Z][a-z]*|[a-z]+)vment(al|ed|ing|s?)\b" replace="$1vement$2"/>
- Improved
<Typo word="-acity" find="\b([A-Z][a-z]*|[a-z]+)act?iy\b" replace="$1acity"/>
- Improved
<Typo word="-tional(ly)" find="\b([A-Z][a-z]*|[a-z]+)tion(?:a(ly)|nal(ly)?)\b" replace="$1tional$2$3"/>
- Improved
<Typo word="-(a/e/i/o/u)(c/n/o/r/s)king" find="\b([A-Z][a-z]*[aeiou][cnors]|[a-z]+[aeiou][cnors])kign\b" replace="$1king"/>
- Improved
<Typo word="-itely" find="\b([A-Z][a-z]*[lnst]|[a-z]+[lnst])(?<![qQ]ual)itly\b" replace="$1itely"/>
- Improved
<Typo word="-ictive" find="\b([A-Z][a-z]*|[a-z]+)icitve(ly|s?)\b" replace="$1ictive$2"/>
- Improved
<Typo word="-wed/-wing" find="\b([A-Z][a-z]*|[a-z]+)ww(ed|ing|s)\b" replace="$1w$2"/>
- Improved
<Typo word="-ately_" find="\b([A-Z][a-z]*[bcdgimstv]|[a-z]+[bcdgimstv])atly\b" replace="$1ately"/>
- Improved
<Typo word="-(c/l/t)ious" find="\b([A-Z][a-z]*[clt]|[a-z]+[clt])ioous([a-z]*)\b" replace="$1ious$2"/>
- Improved
<Typo word="-tion(s)" find="\b([A-Z][a-z]*|[a-z]+)tio(?:i|(s))n\b" replace="$1tion$2"/>
- Improved
<Typo word="-eaning" find="\b([A-Z][a-z]*|[a-z]+)ean(?:in|ni)ng\b" replace="$1eaning"/>
- Improved
<Typo word="-solutely" find="\b([A-Z][a-z]*|[a-z]+)solutly\b" replace="$1solutely"/>
- Improved
<Typo word="-ively" find="\b([A-Z][a-z]*|[a-z]+)ivly\b" replace="$1ively"/>
- Improved
<Typo word="-ceiving" find="\b([AIMRU]?[aeimnprsu]*[pP]er|[dD]e|[IMPRU]?[aeilmnprsu]*[cC]on|[rR]e|[tT]rans)c(?:ei|ie)ve(ables?|ing)" replace="$1ceiv$2"/>
- Improved
<Typo word="(-)Coming" find="\b([A-Z][a-z]*c|[a-z]+c|[cC])om[em]ing(s)?\b(?<!Commings)" replace="$1oming$2"/><!--avoid surname Commings-->
- Improved
<Typo word="-(g/p)ressive" find="\b([A-Z][a-z]*[gp]res|[a-z]+[gp]res)i(ons?|ve[a-z]*)\b" replace="$1si$2"/>
- Not done (2.5~3.5x gain only)
<Typo word="-ification" find="\b([dD](?:e|is)|[mM]is|[rR]e)?([cC](?:ert|lass)|[eE]lectr|[fF]ort|[iI]dent|[mM](?:agn|od)|[nN]ot|[pP](?:erson|ur)|[qQ]ual|[sS]pec|[uU]n|[vV]er)(?:fici?ati?|if(?:cati?|ic(?:at|iati?)))on(s)?\b" replace="$1$2ification$3"/>
- Improved
<Typo word="-tally" find="\b([A-Z][a-z]*[b-eghj-z]|[a-z]+[b-eghj-z])talyl?\b" replace="$1tally"/><!--avoid names Naftaly, Nataly-->
- Improved
<Typo word="-sequence" find="\b([A-Z][a-z]*s|[a-z]+s|[sS])equesece([ds])?\b" replace="$1equence$2"/>
- Improved
<Typo word="Its (after)" find="\b([aA](?:bove|[lm]ong(?:st)?|r(?:e|ound)|t)|[bB](?:e(?:low|tween|yond)?|oth|y)|[cC]elebrat(?:e[ds]?|ing)|[dD]uring|[fF]rom|[hH][eo]ld|[iI]n(?:to)?|[kK]eep|[mM]ade|[oO](?:f|n(?:to)?|ver)|[tT](?:hrough(?:out)?|o)|[uU](?:nder(?:neath)?|p(?:on)?)|[wW]ith(?:in|out)?)\s+it[´ˈ׳᾿‘’′Ꞌꞌ`;']s\b" replace="$1 its"/>
- Improved
<Typo word="(Ad/E/Inter/O/…)Mission" find="\b([aA]d|[cC]om|[dD]e(?:ad|com|sub|trans)|[eE]|[iI]nter|[oO]|[pP]er|[rR]e(?:ad|com|sub|trans)?|[sS]ub|[tT]rans)?mis[is](bl[ey]|on(?:ar(?:ies|y)|s?)|ve(?:ly)?)\b" replace="$1missi$2"/>
- Improved
<Typo word="-Graph-" find="\b([A-Z][a-z]*g|[a-z]+g|[gG])rpah([a-z]*)\b" replace="$1raph$2"/>
- Not done (2.5~4.5x gain only)
<Typo word="-ely" find="\b([aA]ctiv|[cC]los|[dD]ens|[eE]ntir|[fF](?:als|ierc)|[iI](?:mmens|n(?:activ|clos|dens|entir|f(?:als|ierc)|immens|l(?:a(?:rg|t)|i[kv]|o(?:n|os))|nam|precis|s(?:ever|incer|pars)|wid))|L(?:a(?:rg|t)|i[kv]|on)|l(?:a(?:rg|t)|i[kv]|o(?:n|os))|[nN]am|[pP]recis|[sS](?:ever|incer|pars)|[uU]n(?:activ|clos|dens|entir|f(?:als|ierc)|immens|l(?:a(?:rg|t)|i[kv]|o(?:n|os))|nam|precis|s(?:ever|incer|pars)|wid)|[wW]id)l+e?y\b(?<!Densley)" replace="$1ely"/>
- Improved
<Typo word="-tifie(d/s)" find="\b([bB]eau?|[cC]er|[fF]or|[iI]den|[jJ]us|[mM]or|[nN]o|[qQ]uan|[rR](?:a|e(?:beau?|c(?:er)?|for|iden|jus|mor|no|quan|r(?:a|ec)|tes))|[tT]es|[uU]n(?:beau?|cer|for|iden|jus|mor|no|quan|r(?:a|ec)|tes))tife([ds])\b" replace="$1tifie$2"/><!--see also "-tified"-->
- Improved
<Typo word="-geni(s/z)e" find="\b([A-Z][a-z]*gen|[a-z]+gen)ei([sz][a-z]+)\b" replace="$1i$2"/>
- Not done (1.8~2.2x gain only)
<Typo word="-rance" find="\b([aA](?:ppea|ssu)|[cC]lea|[dD]elive|[eE]n(?:du|t)|[fF][lr]ag|[hH]ind|[iI](?:gno|nsu)|[pP]erseve|[rR]ememb|[sS]eve|[tT](?:empe|ole))e?rea?n([ct][a-gi-z][a-z]*)\b(?<![iI]nsurency\b)" replace="$1ran$2"/><!--avoid Insurgency-->
- Improved
<Typo word="-soning" find="\b([A-Z][a-z]*son|[a-z]+son)inig\b" replace="$1ing"/>
- Improved
<Typo word="-ilities" find="\b([A-Z][a-z]*il|[a-z]+il)l+ities\b" replace="$1ities"/>
- Improved
<Typo word="-duction" find="\b([aA](?:[bd]|utopro)|[cC]o(?:n|pro)|[dD]e(?:xtro)?|[hH]yperpro|[iI]n(?:tro)?|[kK]inopro|[nN]onpre|[oO]verpre|[pP](?:ostpre|r[eo])|[rR]e(?:d?|intro|[pt]ro)|[sS](?:e|u(?:perpro|rpro))|[uU]nderpro|[yY]pro)du(?:c[it]|ti)on(s)?\b" replace="$1duction$2"/>
- Improved
<Typo word="-fully" find="\b([A-Z][a-z]*ful|[a-z]+ful)y\b" replace="$1ly"/>
- Not done (1.02~1.2x gain only)
<Typo word="-able (1)" find="\b([aA](?:ccept|rgu)|[cC](?:ap|onfigur)|[fF]orgiv|[hH]ospit|[iI]n(?:[aA](?:ccept|rgu)|[cC](?:ap|onfigur)|[fF]orgiv|[hH]ospit|[mM]istak|[nN]ot|[oO]ppos|[sS]cal|[tT]ranslat|[uU]s|[vV](?:alu|ulner))|[mM]istak|[nN]ot|[oO]ppos|[sS]cal|[tT]ranslat|[uU](?:s|n(?:[aA](?:ccept|rgu)|[cC](?:ap|onfigur)|[fF]orgiv|[hH]ospit|[mM]istak|[nN]ot|[oO]ppos|[sS]cal|[tT]ranslat|[uU]s|[vV](?:alu|ulner)))|[vV](?:alu|ulner))(?:[eiu]a?)b(ilit(?:ies|y)|l[ey])\b" replace="$1ab$2"/>
- Improved
<Typo word="-aking" find="\b([bB](?:re)?|[cC]re|[fF](?:re)?|[lL]e|[mM](?:is(?:b(?:re)?|cre|f(?:re)?|le|m|pe|[rt]|s(?:cre|[hlo]|ne?|pe|t(?:re)?)|w(?:re)?))?|[pP]e|[rR](?:e(?:b(?:re)?|cre|f(?:re)?|le|m|pe|[rt]|s(?:cre|[hlo]|ne?|pe|t(?:re)?)|w(?:re)?))?|[tT]|[sS](?:cre|[hlo]|ne?|pe|t(?:re)?)|[wW](?:re)?)kaing(s)?\b" replace="$1aking$2"/>
<Typo word="Duplicated words" find="\b(a(?:[ms]?|nd?|re)|b(?:e(?:come)?|y)|could|d(?:id|o)|for|go|h(?:a(?:s|ve)|e|im|ow)|i[fst]s?|m(?:ade|e|ore)|no|o(?:[fr]|ther)|sh(?:e|ould)|t(?:h(?:e(?:ir|[mny]?|se)|[iu]s)|o)|w(?:as|ere|h(?:at|e(?:n|re)|i(?:ch|le)|om?|y)ith|ould))\s+\1\b" replace="$1"/><!--avoid "in", per talk in Archive 3-->
- Improved
<Typo word="More/Less/etc. than_" find="\b([bB](?:etter|igger|raver)|[gG]reater|[hH]igher|[mM]ore|[lL](?:arger|ess(?:er)?|o(?:nger|wer))|[oO]lder|[rR]ather|[sS](?:horter|ma(?:ller|rter))|[tT](?:aller|hi(?:cker|nner))|[wW]orse|[yY]ounger)\s+then\s+(?!than\b)" replace="$1 than "/><!--avoid ends of sentences, e.g., "Life was better then."; too many false positives for "other then"-->
- Maybe (5~11x gain)
<Typo word="-(t)an(ce/t)" find="\b([aA](?:c(?:cep|qu(?:ain|it))|dmit)|[bB]la|[cC]omba|[eE]xpec|[hH](?:abi|e[rs]i)|[iI](?:mp[ao]r|nh(?:abi|e[rs]i))|[mM]ili|[nN]oncomba|[pP]it|[rR]e(?:luc|mit|pen))t[ei]n((?:c[eiy]|t(?<!\b[rR]emittent))[a-z]*)\b" replace="$1tan$2"/><!--allow remittent-->
- Improved
<Typo word="-iness" find="\b([cC]raz|[dgDG]ust|[fF]u(?:nn|st)|[hH](?:a(?:st|z)|ill)|[lL](?:az|o(?:nel|rdl|vel|wl)|ust)|[mM]ust|[nN]ast|[rR]ust|[sS](?:ill|unn)|[tT](?:ast|rustworth)|[uU]ntrustworth|[wW]orth)yness\b" replace="$1iness"/>
- Improved
<Typo word="-field" find="\b([aA](?:ir)?|[bB](?:a(?:ck|ttle)|[lr]oo[km])|[cC](?:an|hester|o(?:al|rn))|[dD]own|[gG]a[rs]|[hH]ome|[iI]n|[mM](?:a(?:ke|ns|se)|i(?:d|ne))|[oO](?:il|ut)|[sS](?:cho|hef|now|pring)|[uU]p)?feild([a-z]*)\b" replace="$1field$2"/><!--avoid surname Feild-->
- Improved
<Typo word="known as" find="\b(a(?:lso|re|s)|Also|b(?:e(?:came|en|st|tter)|ut)|Be(?:st|tter)|[cC]ommonly|[fF]requently|[gG]enerally|is|[mM]ostly|[nN]ormally|Often|o(?:ften|r)|perhaps|[uU]sually|W(?:ell|idely)|w(?:as|e(?:ll|re)|idely))\s+know(?:ed|s?)\s+(as|for)\b" replace="$1 known $2"/>
- Not done (1.5~1.9x gain only)
<Typo word="-an(ce/t)" find="\b([aA](?:bund|dam|ttend)|(?:[dD]is|[rR]e)?[aA]ppear|[aA]sson|[cC]o(?:gni[sz]|nson)|[dD](?:efend|isson)|[iI]gnor|[mM]erch|[oO]xid|[rR]ecogni[sz]|[sS]erv|[vV]ac)(?:and|en)(c(?:es?|ies?|y)|t(?:ly|s?))\b" replace="$1an$2"/>
- Not done (1.2~1.3x gain only)
<Typo word="A n-something" find="\b([\d]+[\d,\.]*|e(?:ight(?:een|y?)|leven)|f(?:i(?:ft(?:een|y)|ve)|o(?:rty|ur(?:teen)?))|hundred|nine(?:t(?:een|y))?|one|s(?:even|ix)(?:t(?:een|y))?|t(?:en|h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)))\b(?<=\b(?:[aA](?:dditional|n?)|first|[hH](?:er|is)|[iI]ts|second|th(?:eir|ird)|Their)\s+[\da-z]+)(?: |\s+)(?!member\s+[a-z]+s\b)(acre|bed|cylinder|d(?:ay|ecker|oor)|foot|g(?:a(?:llon|me)|oal)|h(?:o(?:le|rsepower|ur)|uman)|inch|lit(?:er|re)|m(?:an|e(?:mber|t(?:er|re))|i(?:le|nute)|onth)|ounce|p(?:a(?:ge|ssenger)|erson|o(?:int|und))|r(?:o(?:om|und)|unner)|s(?:e(?:a(?:son|t(?:er)?)|cond)|ong|t(?:age|ore?y))|ton|vote|w(?:eek|heel(?:e[dr])?|oman)|y(?:ard|ear))(?=[,\s]|-(?:deep|high|long|old|tall|wide)\b)(?!\s+(?:a[st]|by|deep|for|high|i[ns]|long|o(?:f|ld)|t(?:all|here)|w(?:as|i(?:de|th)))\b)(?<!\b\d{4}\s+(?:game|s(?:e(?:ason|cond)|ong|t(?:age|ory))|vote))(?<![dD]uring\s+h(?:er|is)\s+one\s+season|told\s+h(?:er|im)\s+one\s+day|send\s+for\s+h(?:er|im)\s+one\s+day)" replace="$1-$2"/><!--Note: If the n-something potentially has a year as the 'n', be sure to add the 'something' to the "(?<!\b\d{4}\s+" false-positive alternation list.-->
- Improved
<Typo word="-(s)ible" find="\b([aA]dmis|[dD](?:efen|ivi)|[fF]ea|[iI][mnr](?:admis|d(?:efen|ivi)|fea|mer|osten|p(?:lau|os)|rever|[st]en|vi)|mer|[oO]sten|[pP](?:lau|os)|[rR]ever|[stST]en|[vV]i)sab(ility|l[ey])\b" replace="$1sib$2"/>
- Improved
<Typo word="-anging" find="\b([aA]rr|[pP]?[rR]earr|(?:[eE]x|[iI]nter|[sS]hort|[uU]n)?[cC]h|[dD]er|[rR])an(?:egi|gei)?ng\b" replace="$1anging"/>
- Not done (0.99~1.3x loss/gain only)
<Typo word="n-year" find="\b([\d]+[\d,\.]*|e(?:ight(?:een|y?)|leven)|f(?:i(?:ft(?:een|y)|ve)|o(?:rty|ur(?:teen)?))|hundred|nine(?:t(?:een|y))?|one|s(?:even|ix)(?:t(?:een|y))?|t(?:en|h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)))(?: |\s+)(month|year)\b(?<= [\da-z]+(?: [a-z]+|\s+[a-z]+))(?=\s+(?:a(?:bsence|ffair|greement|ss(?:ignment|ociation))|b(?:a(?:n|ttle)|reak)|c(?:a(?:mpaign|reer)|ease-?fire|losure|o(?:m(?:a|petition)|ntract|urse)|ruise|ycle)|d(?:e(?:a(?:dline|l)|lay|ployment)|rought|uration)|e(?:ffort|n(?:gagement|listment)|x(?:hibit(?:ion)?|i(?:le|stence)|pedition|tension))|feasibility|g(?:ap|estation|uest)|h(?:i(?:atus|story)|ospital)|i(?:llness|n(?:cumbent|jury|ternship|vestigation))|j(?:ail|ourney)|l(?:ay-?off|ea[sv]e|ife-?span|o(?:an|ckout))|m(?:aintenance|i(?:litary|ssion)|o(?:dernization|ratorium))|notice|overhaul|p(?:artnership|eriod|lan|osting|r(?:ison|o(?:cess|fessional|gram(?:me)?|ject)))|r(?:e(?:c(?:overy|urring)|fit|gular|ign|lationship|s(?:earch|idency|tricted))|otation|un)|s(?:abbatical|cho(?:larship|ol)|e(?:ason|ntence)|iege|ojourn|p(?:an|e(?:aking|ll))|t(?:a(?:rter|y)|int|r(?:ike|uggle)|udy)|u(?:bs(?:cription|idy)|pen(?:ded|sion)))|t(?:e(?:nure|rm)|our|r(?:aining|eatment|i(?:al|p)|uce))|v(?:eteran|isit|oyage)|w(?:a(?:it(?:ing)?|r)|orkshop))\b)" replace="$1-$2"/>
- Not done (2.2~3.2x gain only)
<Typo word="-struct" find="\b((?:[dD]e|[mM]is|[rR]e)?[cC]on|(?:[iI]n|[nN]on)?[drDR]e|[iI]n(?:fra)?|[mM][ai]cro|[oO]b|[sS]u(?:b|per))(?:s(?:ruct|t(?:ruc|truct|uct))|truct)(ed|i(?:ng|on(?:is[mt]s?|s?)|vis[mt]s?)|ive(?:ly)?|ors?|s?|ur(?:al(?:ly)?|es?))\b" replace="$1struct$2"/><!--Error 'Instruction(s) => Instructtions' but maybe a hidden control character-->
- Not done (2.3~2.6x gain only)
<Typo word="-ually" find="\b([aA]sex|[cC]as|[eE](?:q|vent)|[fF]act|[gG]rad|[mM](?:an|ut)|[sS]ex|[tT]act|[uU](?:nus|s)|[vV]is)(?:al?|u[al]?)ly\b" replace="$1ually"/><!--avoid Annaly-->
- Not done (0.92~1.2x loss/gain only)
<Typo word="n-year-old" find="\b(\d+|e(?:ight(?:een|y?)|leven)|f(?:i(?:ft(?:een|y)|ve)|o(?:rty|ur(?:teen)?))|hundred|nine(?:t(?:een|y))?|one|s(?:even|ix)(?:t(?:een|y))?|t(?:en|h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)))(?:\s+year(?:\s+|-)|-year\s+)[oO]ld\s+(b(?:oys?|r(?:idge|others?)|uilding)|c(?:h(?:ild(?:ren)?|urch)|o(?:llege|mpany))|d(?:aughter|esign)|f(?:a(?:cility|rmhouse|ther)|emales?)|g(?:irls?|rand(?:daughter|father|mother|son))|h(?:igh\s+school|ouse)|institution|la(?:ndmark|w)|m(?:a(?:les?|n(?:sion)?)|en|iddle\s+school|other)|patient|record|s(?:chool|isters?|on|t(?:ructures?|udents?))|t(?:heat(?:ers?|res?)|r(?:adition|ees?))|wo(?:m[ae]n|rld\s+record))\b" replace="$1-year-old $2"/>
- Improved
<Typo word="-mentary" find="\b([aA]li|[cC]om(?:pl[ei])?|[dD]ocu|[eE]le|[fF]rag|[mM]o|[pP]arlia|[rR]udi|[sS](?:edi|upple))men(?:atr|t(?:a|er|r))(i(?:ans?|es|ly)|y)\b" replace="$1mentar$2"/>
- Improved
<Typo word="-en(ce/t)" find="\b([aA]ccid|[cC]li|[dD]isobedi|[eE]xcell|[iI]ngredi|[lL]eni|[oO]bedi|[sS]uperintend|[tT]ranscend|[vV]iol)an(c[ey]|t[a-z]*)\b(?<!Violant[aei])" replace="$1en$2"/><!--avoid the names Violant[aei]-->
- Improved
<Typo word="-vel" find="\b([blBL]e|[dD]ri|[gG](?:a|r[ao])|[hH]o|[mM]ar|[nN][ao]|[rR][ae]|[tT]r[ao]|[sS](?:h(?:o|ri)|[nw]i))vle(s)?\b" replace="$1vel$2"/>
- Not done (3.8~5.8x gain only)
<Typo word="-rious" find="\b([cC][au]|[dD]eli|[fF]u|[hH]ila|[iI](?:llust|n(?:dust|ju))|[lL](?:abou?|uxu)|[mM]yste|[nN]oto|[pP]reca|[sS]e|[vV](?:a|icto))r(?:i(?:o(?:iu|ui)|uo)|o(?:iu|ui?)|riou)s(ly|ness)?\b(?<!\b[sS]erous\b)" replace="$1rious$2"/>
- Maybe (5~13x gain)
<Typo word="-tified" find="\b([bB]eau?|[cC]er|[fF]or|[iI]den|[jJ]us|[mM]or|[nN]o|[qQ]uan|[rR](?:a|ec)|[tT]es)ta?fi(abl[ey]|cat(?:es?|ions?)|e[ds])\b" replace="$1tifi$2"/><!--see also "-tifie(d/s)"-->
- Improved
<Typo word="-ful" find="\b([bB]eauti|[cC](?:are|heer)|[dD]is(?:beauti|c(?:are|heer)|event|gra[ct]e|help|p(?:eace|ower)|s(?:poon|uccess)|use|wonder)|[eE]vent|[gG]ra[ct]e|[hH]elp|[pP](?:eace|ower)|[sS](?:poon|uccess)|[uU](?:n(?:beauti|c(?:are|heer)|event|gra[ct]e|help|p(?:eace|ower)|s(?:poon|uccess)|use|wonder)|se)|[wW]onder)full(ly|ness|s?)\b" replace="$1ful$2"/>
- Not done (1.1~1.4x gain only)
<Typo word="n-time champion/winner_" find="\b([\d]+[\d,\.]*|e(?:ight(?:een|y?)|leven)|f(?:i(?:ft(?:een|y)|ve)|o(?:rty|ur(?:teen)?))|hundred|nine(?:t(?:een|y))?|one|s(?:even|ix)(?:t(?:een|y))?|t(?:en|h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)))\b\s+time(?=\s+(?:champions?|defending\s+champions?|losers?|nominees?|winners?))" replace="$1-time"/>
- Not done (0.89~1.8x loss/gain only)
<Typo word="n-round something" find="\b(\d+|e(?:ight(?:een|y?)|leven)|f(?:i(?:ft(?:een|y)|ve)|o(?:rty|ur(?:teen)?))|hundred|nine(?:t(?:een|y))?|one|s(?:even|ix)(?:t(?:een|y))?|t(?:en|h(?:irt(?:een|y)|ousand|ree)|w(?:e(?:lve|nty)|o)))\b\s+round(?=\s+(?:d(?:ecisions?|raws?)|knockouts?|KOs?|match(?:es)?|newspaper\s+decisions?|technical\s+knockouts?|TKOs?))" replace="$1-round"/><!--"A n-something" won't catch all useful, esp. boxing-related things-->
- Not done (1.0~1.4x gain only)
<Typo word="n-something contract/deal/run/etc." find="\b((?<!,)\d{1,3}|e(?:ight(?:een|y?)|leven)|f(?:i(?:ft(?:een|y)|ve)|o(?:rty|ur(?:teen)?))|hundred|nine(?:t(?:een|y))?|one|s(?:even|ix)(?:t(?:een|y))?|t(?:en|hr(?:ee|irt(?:een|y))|w(?:e(?:lve|nty)|o)))(?: |\s+)(album|book|episode|fi(?:ght|lm)|game|movie|picture|record)(?=\s+(?:contract|deal|run|s(?:uspension|weep))\b)" replace="$1-$2"/><!--entertainment-related hyphen combos-->
- Improved
<Typo word="-mitted" find="\b([aA]d|[cC]om|[eoEO]|[pP]er|[rR]e(?:[aA]d|[cC]om|[sS]ub|[tT]rans)?|[sS]ub|[tT]rans)mit(ed(?:ly)?|ing)\b" replace="$1mitt$2"/>
- Not done (2~2.1x gain only)
<Typo word="-ical" find="\b([aA]tr?[oy]p|[cC](?:lin|rit)|[eE]lectr|[gG]eograph|[iI]dent|[lL]og|M(?:ag|etaphor)|m(?:ag|etaphor|us)|[pP](?:ho[nt]ograph|olit|ract)|[tT](?:e(?:chn|legraph)|op|r[oy]p|yp))(?:c?|ic)ial(ly|s?)\b" replace="$1ical$2"/><!--avoid Stan Musial-->
- Improved
<Typo word="(Ad/…)Version" find="\b([aA]dv|[cC]onv|[dD]iv|[iI]nv|[oO]bv|[pP]erv|[rR]ev|[sS]ubv|[vV])er(?:is|ti)on(s)?\b" replace="$1ersion$2"/>
- Improved
<Typo word="-ently" find="\b([aA]ppar|[cC]urr|[dD]ec|[eE]vid|[iI]nt|[pP]res|[rR]ec|[sS]il)enlty\b" replace="$1ently"/><!--see also "-equently"-->
- Improved
<Typo word="-ality" find="\b([dD]u|[eE]qu|[fnFN](?:at|orm)|[lL](?:eg|oc)|[qQ]u|[rR]eg?|[tT]o[nt]|[vV]it)all+it(ies|y)\b" replace="$1alit$2"/>
- Improved
<Typo word="-press" find="\b([cC]om|[dD]e(?:com|ex)?|[eE]x|[iI](?:m|n(?:com|ex)?)|[oO]p|[rR]e(?:com|ex)?|[sS]up)pres(e[ds]?|i(?:ng|on[a-z]*|ve(?:ly)?))?\b" replace="$1press$2"/>
The 76 slowest typos discussion
The problem, and sometimes a feature, of most of these rules are their open-ended beginnings (beginnings are expensive). These can be changed from a capture group to a lookbehind, which would speed things up immensely (I can quantify just how much in the near future), but the edit summary of some of those rules would/might be less helpful. By this I mean:
- this change to the "-ment" rule would benignly change its edit summary from, for example,
disagreemetn → disagreement
, toagreemetn → agreement
- this change to the ""-ology"" rule would unfavorably change its edit summary from, for example,
biolagy → biology
, toolagy → ology
So my view is, for these slow rules, if the rule can be vastly sped up while maintaining a meaningful, but slightly shorter, edit summary (i.e. #1 & not #2), then a leading lookbehind should be used instead of an expensive ([a-z])
-esque leading capture group. ~ Tom.Reding (talk ⋅dgaf) 14:13, 30 October 2019 (UTC)
- Right, they look like they can be made faster without negativity impacting the edit summary. Notice how the top5 use the "*" qualifier and the complete Top30 use either the "*" or "+" qualifiers. Sun Creator(talk) 01:47, 31 October 2019 (UTC)
- I've updated the slowest rule. Let me know if that turbo charged it. Sun Creator(talk) 02:07, 31 October 2019 (UTC)
- @Sun Creator:
[A-Za-z]*
is equivalent to[A-Za-z]{0,99}
since there is 0 threat of any 100+ letter words. It might even be slower, b/c it has to check after each letter if it's still less than 99. I'll just make the described change above via a leading lookbehind as an example improvement. ~ Tom.Reding (talk ⋅dgaf) 02:36, 31 October 2019 (UTC)- The leading lookbehind brought the "-ment" rule down from 1st place to
85th5th, or from 355x slower than the fastest rule down to~125x~255x, or from 10 stdev from the mean down to~2.87. ~ Tom.Reding (talk ⋅dgaf) 04:01, 31 October 2019 (UTC)- "-ment" rule corrected to include the
\b
in the lookbehind, per below. ~ Tom.Reding (talk ⋅dgaf) 14:39, 31 October 2019 (UTC)- Is it worth timing with the lookbehind removed completely? It may have no practical effect. The exact character set bounded by \b is implementation dependent but seems to be [a-zA-Z0-9_] here, so the only difference would be to start matching strings such as foo_agreemnet and bar123agreemnet (and to allow JWB to use the rule at all). Certes (talk) 15:05, 31 October 2019 (UTC)
- With the lookbehind completely removed: "-ment" is down to 51st, ~170x, and ~4.2 stdev. If the beginnings of any of these rules have no material effect on them (i.e. if removing a beginning won't trigger any FPs), then I'm in favor of the performance increase, even with the abbreviated edit summary. ~ Tom.Reding (talk ⋅dgaf) 19:04, 31 October 2019 (UTC)
- Is it worth timing with the lookbehind removed completely? It may have no practical effect. The exact character set bounded by \b is implementation dependent but seems to be [a-zA-Z0-9_] here, so the only difference would be to start matching strings such as foo_agreemnet and bar123agreemnet (and to allow JWB to use the rule at all). Certes (talk) 15:05, 31 October 2019 (UTC)
- "-ment" rule corrected to include the
- Naive question: this is a great improvement but can we remove the lookbehind completely? Surely all matches are preceded by 0 or more letters. Certes (talk) 10:36, 31 October 2019 (UTC)
[A-Za-z]
is different than\w
is different than\p{L}
, and I'd like to think that whoever made the rule, and the many people who've reviewed it, chose the beginning carefully, so it wouldn't be a good idea to outright remove it without doing the requisite research. Or, it could have been a lazy implementation of\b([A-Z][a-z]*|[a-z]+)
, which appears ~17 times in the 76 rules. That allows the subtle exclusion of "stRANGEly" capitalized words/acronyms/portmanteaus and words with diacritics, so should only be completely removed if there are no such exceptions to the core of the rule.- I'm only interested in large efficiency improvements here, so I won't be changing the rules qualitatively. ~ Tom.Reding (talk ⋅dgaf) 12:40, 31 October 2019 (UTC)
- OK, I probably misunderstood something. It looks to me as if it's checking that the phrase is preceded by 0 or more letters, which is blatantly true even if I don't know what "letter" means. I can't test this at sites like regex101.com as they don't support variable width lookbehinds. Also the \b is going to rule out some cases: foobar matches /\b[A-Za-z]*bar/ (\b occurs before foo) but not /\b(?<=[A-Za-z]*)bar/ (\b does not occur after foo). Certes (talk) 13:14, 31 October 2019 (UTC)
- Correct. And a subtle consequence of /\b[A-Za-z]*bar/ is that it avoids föbar. ~ Tom.Reding (talk ⋅dgaf) 13:25, 31 October 2019 (UTC)
- Ah, I think I see your point now - the
\b
should be included in the lookbehind, yes. ~ Tom.Reding (talk ⋅dgaf) 13:32, 31 October 2019 (UTC)- Yes. Also the lookbehind is almost completely redundant in this case. Apart from subtle quibbles about [A-Za-z] not being the exact set of characters which triggers the \b boundary test, every string that begins with a letter will automatically be preceded by \b[A-Za-z]* Certes (talk) 13:38, 31 October 2019 (UTC)
- OK, I probably misunderstood something. It looks to me as if it's checking that the phrase is preceded by 0 or more letters, which is blatantly true even if I don't know what "letter" means. I can't test this at sites like regex101.com as they don't support variable width lookbehinds. Also the \b is going to rule out some cases: foobar matches /\b[A-Za-z]*bar/ (\b occurs before foo) but not /\b(?<=[A-Za-z]*)bar/ (\b does not occur after foo). Certes (talk) 13:14, 31 October 2019 (UTC)
- The leading lookbehind brought the "-ment" rule down from 1st place to
- @Sun Creator:
The more I look at this list, the more I realize my #1 solution (only a minor change to the edit summary) is an exception, and not possible in most (possibly all, save 1 or 2 jic) of the other rules. Fortunately, most of the slowest rules 1~45 are short & simple & address word endings, and can have their "base" edit summary text extended slightly to remain helpful, even though the front of the word would be removed. For example,
<Typo word="-ilities" find="\b([A-Z][a-z]*il|[a-z]+il)l+ities\b" replace="$1ities"/>
responsibillities → responsibilities
can be changed to:
<Typo word="-ilities" find="(?<=\b(?:[A-Z][a-z]*|[a-z]+))ill+ities\b" replace="ilities"/>
illities → ilities
Does anyone have a problem with this? ~ Tom.Reding (talk ⋅dgaf) 15:09, 1 November 2019 (UTC)
- No objections. I am trying to run a full typo scan now for WP:TSN, and it is taking ages, so improvements are welcomed. –Darkwind (talk) 20:52, 3 November 2019 (UTC)
- ~5% overall speed gain so far (on a fastest-rule basis, of course), after 26 rules improved. ~ Tom.Reding (talk ⋅dgaf) 14:00, 4 November 2019 (UTC)
- @Tom.Reding: You are doing these edits in such a way that you mess up the 'edit summary'. I'm all for making things faster but not at the expense that it's no longer giving desirable results. Sun Creator(talk) 18:54, 6 November 2019 (UTC)
- @Sun Creator: yes, that's exactly what I said would happen above. I'll pause, pending further discussion. To reiterate, only these 76 would be edited in this way. ~ Tom.Reding (talk ⋅dgaf) 19:02, 6 November 2019 (UTC)
- ~6% overall speed gain now after 39 rules improved. ~ Tom.Reding (talk ⋅dgaf) 00:33, 7 November 2019 (UTC)
- I think "mess up" is a slight overstatement. Hundreds of typos are corrected manually every day with an edit summary of "typo" or worse; we're grateful for the improvement and can easily call up a diff if we want to know more. Certes (talk) 19:13, 6 November 2019 (UTC)
- @Sun Creator: yes, that's exactly what I said would happen above. I'll pause, pending further discussion. To reiterate, only these 76 would be edited in this way. ~ Tom.Reding (talk ⋅dgaf) 19:02, 6 November 2019 (UTC)
The 76 slowest typos results
Metric Before After % improvement ----------------------------------------------------------- Total run time: ~142,351x ~130,482x ~8.3% Slowest rule speed: ~355x ~218x ~39% Average rule speed: ~37x ~34x Median rule speed: ~37x ~34x Stdev: ~31x ~22.5x ~27% Stdev from mean for the slowest rule: ~10.3 ~8.15 ~21% x = times the fastest rule
Graphs are unavailable due to technical issues. There is more info on Phabricator and on MediaWiki.org. |
This is after optimizing 58 of the former 76 slowest rules.
If there's interest, after a few weeks of these changes being in place, I can continue down the list. The next batch would be the ~69 rules that are > 100x slower than the fastest rule. Some of those 69 are in the original 76, so they can't be improved further. I might post a graph of the before & after distributions, if I can figure that out in the WP graphing utility. ~ Tom.Reding (talk ⋅dgaf) 04:15, 8 November 2019 (UTC)
- Thanks again to Tom.Reding and everyone else who's contributed for all the diligent optimisation. It's well worth losing a few letters from the edit summary: we have diff. In deciding whether to continue, I'd look at difference (speed factor) from the median/mean rule rather than the fastest; the latter is easily influenced by unfair comparison with a simple and (hopefully) quick rule like "ELLIPSIS". Certes (talk) 11:50, 8 November 2019 (UTC)
- @Certes: thank you. I've added the slowest rule's stdev from the mean to the results.
- Re fastest vs median/mean: I wanted to normalize against something meaningful & consistent across all tests and across all times (i.e. 9 days ago vs today), to make the comparison easier, even using a dedicated separate computer to run all of timing tests/results. The median & mean change slightly from run to run, even without a rule change, and even more after a round of rule optimization. The fastest rules' ticks, however, don't change. There are always 59~85 fastest rules, and they in fact had the exact same # of ticks for 55 firings, as measured by
DateTime.Now.Ticks
in AWB's C# environment, since I started, 156,000. The next fastest category, 156,001 ticks, contains 25~32 rules, so there is a long stable tail. This is a very nice & solid foundation to build off of. ~ Tom.Reding (talk ⋅dgaf) 13:39, 8 November 2019 (UTC)
My view is they are best reverted, the overall impact is to slow down someone using AWB because extra time is spent finding the change in question on the preview pan, plus if there is no way to go back through contribution history in case of a rule problem like I encountered earlier with 'Amuck'. If JWB wants to have faster rules maybe consideration is given to having a separate rules page with faster/edit summary lacking rules. Sun Creator(talk) 00:03, 9 November 2019 (UTC)
A to An (2)
@GoingBatty and Sun Creator: The "A to An (2)" rule wants to damage non Shari'a evidence
in Muhammad Ali of Egypt. -- John of Reading (talk) 20:37, 18 November 2019 (UTC)
- Thanks John. I have seen this situation on several articles. I've fixed it now. Sun Creator(talk) 20:48, 18 November 2019 (UTC)
- @John of Reading: Thanks for reporting the issue!
- @Sun Creator: Thanks for fixing the issue, but your fix looks complicated. Would simply replacing the first
\b
with\s
provide any significant time savings? Thanks! GoingBatty (talk) 02:57, 19 November 2019 (UTC)- Yes, I'd be happy with using
\s
although I didn't do that because historically objections have been made about starting with\s
Sun Creator(talk) 13:00, 19 November 2019 (UTC) - On further thought there is good reason for avoiding the
\s
as it's not just a space but new line and tabs and that occasionally creates exceptions so while the\b
is not ideal the\s
has it's own problems. Sun Creator(talk) 01:22, 23 November 2019 (UTC)
- Yes, I'd be happy with using
Lieutenant
Tan family of Cirebon uses "Luitenant" which is not a typo for the military rank, it means "a Luitenant der Chinezen, a civil bureaucrat" so this should be left alone. Just mentioning this here... MB 22:40, 30 November 2019 (UTC)
achivered
FYI, "achivered" was changed to "achievered" in Chaudhary Charan Singh Haryana Agricultural University Sports Complex. I manually fixed as "achieved". MB 20:36, 5 December 2019 (UTC)
ie -> i.e.
Can ie not be changed to i.e. when it is preceded directly with a period when part of a Ireland internet domain, e.g. www.live95.ie ?
Is reporting available?
There are many rules such as "(Working/upper/middle/lower)-class" that only make a change based on about a dozen words following "class". Does a report exist where you could input something like "working class", and it would analyze Wikipedia articles and display a count of the most common subsequent words? I'm thinking that such a report could help us intelligently expand rules. Thanks! GoingBatty (talk) 22:10, 28 December 2019 (UTC)
Move "'s" rule to WP:GENFIXES?
The "'s"/apostrophe-s rule has been straightening apostrophes since October 2018, with no issues as far as I can tell. Could it be added to WP:GENFIXES, as it's more of a MOS:PUNCT fix than a proper typo? ~ Tom.Reding (talk ⋅dgaf) 18:31, 18 July 2019 (UTC)
Tracked in T231012. ~ Tom.Reding (talk ⋅dgaf) 14:27, 22 August 2019 (UTC)
- @Tom.Reding: Adding it to WP:GENFIXES and removing it from WP:AWB/T wouldn't impact AWB users who have genfixes and typo fixes turned on (except maybe those who skip if no typos or skip if no genfixes). However, removing it from WP:AWB/T means that these wouldn't get fixed by other tools that use WP:AWB/T, such as WPCleaner. GoingBatty (talk) 20:22, 16 September 2019 (UTC)
- @GoingBatty: could it just be added to WP:GENFIXES instead, and also kept as a WP:AWB/T rule? The benefit would be, if the user has both WP:GENFIXES & WP:AWB/T enabled, to not clog up the edit summary with multiple innocuous "'s" fixes, which don't require the same user attention as a typical, actual, typo fix. ~ Tom.Reding (talk ⋅dgaf) 20:31, 16 September 2019 (UTC)
- @Tom.Reding: Per WP:AWB/OP, I believe that would work. GoingBatty (talk) 20:35, 16 September 2019 (UTC)
- @GoingBatty: could it just be added to WP:GENFIXES instead, and also kept as a WP:AWB/T rule? The benefit would be, if the user has both WP:GENFIXES & WP:AWB/T enabled, to not clog up the edit summary with multiple innocuous "'s" fixes, which don't require the same user attention as a typical, actual, typo fix. ~ Tom.Reding (talk ⋅dgaf) 20:31, 16 September 2019 (UTC)
Bannana → Banan?
This was a strange one I came across. It tried changing "Bannana" to "Banan". The offending regex seems to be: <Typo word="Banana" find="\b([bB])an(?:an|na)na(s)?\b" replace="$1anan$2"/>
. – Frood (talk) 03:45, 12 January 2020 (UTC)
- @Frood: Should be fixed now. -- John of Reading (talk) 07:46, 12 January 2020 (UTC)
omitted
Please add omitted
to the list, with the misspelling ommitted
. 1234qwer1234qwer4 (talk) 09:48, 25 January 2020 (UTC)
A two-part number might not be a range
@Chris the speller: At Colorado Department of State v. Baca, the new "0-0" rule wants to edit the two-part number in the first sentence. Is that correct? -- John of Reading (talk) 12:42, 28 January 2020 (UTC)
- @John of Reading: No, not right. The rule has, in my experience, hit very few false positives. I could add a negative lookbehind for "No.", if you think that's a good idea. Chris the speller yack 14:47, 28 January 2020 (UTC)
- If you do add a lookbehind, Ambeyrac needs an exception for the word "ISSN". -- John of Reading (talk) 15:00, 28 January 2020 (UTC)
- I added the correct template in the article. This could also be added automatically by AWB. 1234qwer1234qwer4 (talk) 16:08, 28 January 2020 (UTC)
- If you do add a lookbehind, Ambeyrac needs an exception for the word "ISSN". -- John of Reading (talk) 15:00, 28 January 2020 (UTC)
- I'll add a lookbehind for No., ISSN, ISBN and nn-nnn or n-nn (not many notable coaches will have a career record of 31–253!) Chris the speller yack 17:26, 28 January 2020 (UTC)
- Done. Thanks for pointing this out. Chris the speller yack 21:49, 28 January 2020 (UTC)
AWB-proofing
(Perhaps a FAQ; if so, sorry. I don't use AWB and am iggernant.) If we're writing English, the conventional spelling is "Afghanistan". "Afganistan" is likely to be a typo. But in this edit (yes, 2015!) I dream of horses used AWB to alter a correct Italian spelling "Afganistan", giving it a spurious "h". (I don't say that "Afghanistan", with an "h", is wrong in Italian; but I don't believe that it's the spelling used for the book whose title is being specified here.) I could of course remove the "h" and add a SGML comment ("<!-- Please don't add an 'h'! -->" or similar), but I'd be addressing the (perhaps tired and inattentive) human; is there a way to warn off AWB itself? (Of course I'm not asking just about this one instance; if there is a neat method, I'd like to use it elsewhere too.) -- Hoary (talk) 09:45, 9 February 2020 (UTC)
- Wrap it in the {{notatypo}} template (i.e.
{{notatypo|Afganistan}}
). If AWB is causing multiple issues with a page and you want to prevent it touching the page in any capacity, then putting{{bots|deny=AWB}}
anywhere on the page will prevent anyone using AWB to edit it at all, but that's a last resort as it will prevent even uncontroversial fixes. ‑ Iridescent 09:56, 9 February 2020 (UTC)
- That's exactly what I was looking for. Thank you, Iridescent! -- Hoary (talk) 11:35, 9 February 2020 (UTC)
- @Hoary: Another solution for text in a foreign language is to wrap it inside a {{lang}} template. This not only turns off AWB's spelling rules, but also marks the text in the HTML version of the page. For example, screen-reading software might know how to switch to Italian pronunciation when reading out that book title. -- John of Reading (talk) 12:58, 9 February 2020 (UTC)
- Oh my frolicking gerbil, John of Reading: that's a week's worth of reading/studying assignment you've just given me there. Till now I've been empty-headedly using Template:Nihongo2 in order to help ensure that an appropriate font would be used to display Japanese script to the sighted (and have wondered why nothing analogous seemed to be available for Chinese); I've merely assumed that Mediawiki did something or other for voice readers or that voice readers did something or other for Wikimedia pages. {{Nihongo2|なにか}} produces <span class="t_nihongo_kanji"><span title="Japanese language text" lang="ja">なにか</span></span>; OK as far as I understand it, but I can't see any mention of "t_nihongo_kanji" in any of three CSS files, so don't know what that does, if anything. Later, I'll sneak {{lang|ja|なにか}} into a page and take a look at what that does to the HTML. But of course if I have a question then I should be asking it elsewhere. -- Hoary (talk) 06:37, 10 February 2020 (UTC)
quotation marks
See Marek Kamiński some subscript quote marks that were not fixed. MB 15:50, 18 February 2020 (UTC)
- @MB: The existing apostrophe-straightening rules only catch apostrophe-s and a couple of others; they haven't been written to straighten the paired marks around quotations. I suspect that it won't be possible for RegExTypoFix to fix those. Material in quotations is hidden away before the RegExTypoFix rules are run, so that these rules never damage quotations; I wouldn't be surprised if the surrounding quote marks are also hidden. The general fixes do know how to straighten more quote marks, but those fixes currently only run inside citation templates. -- John of Reading (talk) 17:17, 18 February 2020 (UTC)
- OK, I've gone ahead an fixed this article manually. MB 17:55, 18 February 2020 (UTC)
Military personal -> Military Personnel
Hi, I've found a somewhat common typo, "military personal", which should somewhat frequently be corrected to "military personnel". Some gotchas you may want to look out for when writing this regex would be: "military personal equipment", "military personalit(y|ies)", "military personally", and "military personal ensign", "military personal decoration". When it comes to "military personal decoration", the correct usage is unclear to me, so I am recommending that we exclude it for now, as the concept of "military personnel decorations" and "military personal decorations" both appear to exist as separate concepts. I've seen both in official US Military documents. Additionally, the regex should catch the typo "military personals" and "military personnels" as those are both typos as well. I've fixed a bunch of these typos using find and replace in AWB, and having this in the typo list would be extremely helpful. Phuzion (talk) 04:08, 27 February 2020 (UTC)
Very slow regular expressions due to start with (?<=…)
Hello. Users are reporting WPCleaner being very slow on several big pages. My first check of this problem shows that several regular expressions from this project page are taking several dozen seconds each to be performed on my computer. The common denominator is that they all start with a complex (?<=…)
, which for me is a very bad practice because it implies an expensive process for each character in the page regardless of whether the regular expression may indeed apply. Here are the 4 regular expressions that are detected as very slow when run on 2019 reasons of the Supreme Court of Canada:
Slow regular expression: Typo AWB -ish:(?<=\b(?:[A-Za-z]+?))i?sih(e(?:[ds]|rs?)|ing(?:ly)?|ly)?\b(?<!asih|A(?:isih|riningsih|sih)|Bersih|esih|Finarsih|ingsih|K(?:asih|osasih)|[rs]sih|M(?:a(?:drasih|ss?ih)|essih|irajoucsih)|N(?:esih|ingsih|urnaningsih)|Su(?:kaesih|mbangsih)|T(?:laksih|sih)|Y(?:ingtsih|ulianingsih))(13387ms)
Slow regular expression: Typo AWB known as:(?<=\b(?:a(?:lso|re|s)|Also|b(?:e(?:came|en|st|tter)|ut)|Be(?:st|tter)|[cC]ommonly|[fF]requently|[gG]enerally|is|[mM]ostly|[nN]ormally|Often|o(?:ften|r)|perhaps|[uU]sually|W(?:ell|idely)|w(?:as|e(?:ll|re)|idely))\s+)know(?:ed|s?)\s+(as|for)\b(37228ms)
Slow regular expression: Typo AWB Its (after):(?<=\b(?:[aA](?:bove|[lm]ong(?:st)?|r(?:e|ound)|t)|[bB](?:e(?:low|tween|yond)?|oth|y)|[cC]elebrat(?:e[ds]?|ing)|[dD]uring|[fF]rom|[hH][eo]ld|[iI]n(?:to)?|[kK]eep|[mM]ade|[oO](?:f|n(?:to)?|ver)|[tT](?:hrough(?:out)?|o)|[uU](?:nder(?:neath)?|p(?:on)?)|[wW]ith(?:in|out)?)\s+)it[´ˈ׳᾿‘’′Ꞌꞌ`;']s\b(41140ms)
Slow regular expression: Typo AWB More/Less/etc. than_:(?<=\b(?:[bB](?:etter|igger|raver)|[gG]reater|[hH]igher|[mM]ore|[lL](?:arger|ess|o(?:nger|wer))|lesser|[oO]lder|[rR]ather|[sS](?:horter|ma(?:ller|rter))|[tT](?:aller|hi(?:cker|nner))|[wW]orse|[yY]ounger)\s+)then\s+(?!than\b)(38345ms)
Would it be possible to optimize them to avoid the look behind as being the first element that is analyzed in the regular expression? Basically, by moving the text that is actually searched at the beginning of the regular expression, and adding it also at the end of the look behind : programs which will use the regular expression will start looking for the text and if found, reverse back to check for the look behind (the text is added to the look behind to go through the text). For example, for the last one, moving the "then" before the (?<=…)
and also adding it inside at the end of (?<=…)
would help a lot. What do you think?
First proposal
|
---|
<Typo word="-ish" find="i?sih(?<=\b(?:[A-Za-z]+?)i?sih)(e(?:[ds]|rs?)|ing(?:ly)?|ly)?\b(?<!asih|A(?:isih|riningsih|sih)|Bersih|esih|Finarsih|ingsih|K(?:asih|osasih)|[rs]sih|M(?:a(?:drasih|ss?ih)|essih|irajoucsih)|N(?:esih|ingsih|urnaningsih)|Su(?:kaesih|mbangsih)|T(?:laksih|sih)|Y(?:ingtsih|ulianingsih))" replace="ish$1"/><!--avoid proper names with -asih -esih -rsih -ssih, e.g., Bersih, Finarsih, Kasih, Kosasih, Madrasih, Masih, Massih, Messih, Nesih, Sukaesih, Nurnaningsih, Ningsih, Ariningsih, Yulianingsih, Asih, Tsih, Aisih, Tlaksih, Mirajoucsih, Sumbangsih, Yingtsih--><!--cheapened expensive beginning-->
<Typo word="known as" find="know(?<=\b(?:a(?:lso|re|s)|Also|b(?:e(?:came|en|st|tter)|ut)|Be(?:st|tter)|[cC]ommonly|[fF]requently|[gG]enerally|is|[mM]ostly|[nN]ormally|Often|o(?:ften|r)|perhaps|[uU]sually|W(?:ell|idely)|w(?:as|e(?:ll|re)|idely))\s+know)(?:ed|s?)\s+(as|for)\b" replace="known $1"/><!--cheapened expensive beginning-->
<Typo word="Its (after)" find="it(?<=\b(?:[aA](?:bove|[lm]ong(?:st)?|r(?:e|ound)|t)|[bB](?:e(?:low|tween|yond)?|oth|y)|[cC]elebrat(?:e[ds]?|ing)|[dD]uring|[fF]rom|[hH][eo]ld|[iI]n(?:to)?|[kK]eep|[mM]ade|[oO](?:f|n(?:to)?|ver)|[tT](?:hrough(?:out)?|o)|[uU](?:nder(?:neath)?|p(?:on)?)|[wW]ith(?:in|out)?)\s+it)[´ˈ׳᾿‘’′Ꞌꞌ`;']s\b" replace="its"/><!--cheapened expensive beginning-->
<Typo word="More/Less/etc. than_" find="then(?<=\b(?:[bB](?:etter|igger|raver)|[gG]reater|[hH]igher|[mM]ore|[lL](?:arger|ess|o(?:nger|wer))|lesser|[oO]lder|[rR]ather|[sS](?:horter|ma(?:ller|rter))|[tT](?:aller|hi(?:cker|nner))|[wW]orse|[yY]ounger)\s+then)\s+(?!than\b)" replace="than "/><!--avoid ends of sentences, e.g., "Life was better then."; too many false positives for "other then"; cheapened expensive beginning-->
|
--NicoV (Talk on frwiki) 07:04, 30 December 2019 (UTC)
Or I can suggest another option, to simply add a positive look ahead at the beginning to have something simpler and faster before the complex and slow positive look behind. Here are the suggestions, any objection to implement them?
Second proposal
|
---|
<Typo word="-ish" find="(?=i?sih)(?<=\b(?:[A-Za-z]+?))i?sih(e(?:[ds]|rs?)|ing(?:ly)?|ly)?\b(?<!asih|A(?:isih|riningsih|sih)|Bersih|esih|Finarsih|ingsih|K(?:asih|osasih)|[rs]sih|M(?:a(?:drasih|ss?ih)|essih|irajoucsih)|N(?:esih|ingsih|urnaningsih)|Su(?:kaesih|mbangsih)|T(?:laksih|sih)|Y(?:ingtsih|ulianingsih))" replace="ish$1"/><!--avoid proper names with -asih -esih -rsih -ssih, e.g., Bersih, Finarsih, Kasih, Kosasih, Madrasih, Masih, Massih, Messih, Nesih, Sukaesih, Nurnaningsih, Ningsih, Ariningsih, Yulianingsih, Asih, Tsih, Aisih, Tlaksih, Mirajoucsih, Sumbangsih, Yingtsih--><!--cheapened expensive beginning-->
<Typo word="known as" find="(?=kown)(?<=\b(?:a(?:lso|re|s)|Also|b(?:e(?:came|en|st|tter)|ut)|Be(?:st|tter)|[cC]ommonly|[fF]requently|[gG]enerally|is|[mM]ostly|[nN]ormally|Often|o(?:ften|r)|perhaps|[uU]sually|W(?:ell|idely)|w(?:as|e(?:ll|re)|idely))\s+)know(?:ed|s?)\s+(as|for)\b" replace="known $1"/><!--cheapened expensive beginning-->
<Typo word="Its (after)" find="(?=it)(?<=\b(?:[aA](?:bove|[lm]ong(?:st)?|r(?:e|ound)|t)|[bB](?:e(?:low|tween|yond)?|oth|y)|[cC]elebrat(?:e[ds]?|ing)|[dD]uring|[fF]rom|[hH][eo]ld|[iI]n(?:to)?|[kK]eep|[mM]ade|[oO](?:f|n(?:to)?|ver)|[tT](?:hrough(?:out)?|o)|[uU](?:nder(?:neath)?|p(?:on)?)|[wW]ith(?:in|out)?)\s+)it[´ˈ׳᾿‘’′Ꞌꞌ`;']s\b" replace="its"/><!--cheapened expensive beginning-->
<Typo word="More/Less/etc. than_" find="(?=then)(?<=\b(?:[bB](?:etter|igger|raver)|[gG]reater|[hH]igher|[mM]ore|[lL](?:arger|ess|o(?:nger|wer))|lesser|[oO]lder|[rR]ather|[sS](?:horter|ma(?:ller|rter))|[tT](?:aller|hi(?:cker|nner))|[wW]orse|[yY]ounger)\s+)then\s+(?!than\b)" replace="than "/><!--avoid ends of sentences, e.g., "Life was better then."; too many false positives for "other then"; cheapened expensive beginning-->
|
--NicoV (Talk on frwiki) 10:31, 2 January 2020 (UTC)
- I implemented the second option, it results in drastic performance improvement in WPCleaner on big pages. --NicoV (Talk on frwiki) 16:51, 2 January 2020 (UTC)
Apparently, the same problem happens for some users for 30 other regular expressions.
Detailed logs
|
---|
22:40:50.514 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -fully:(?<=\b(?:[A-Z][a-z]*|[a-z]+))fuly\b(12607ms) 22:41:21.127 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -itely:(?<=\b(?:[A-Z][a-z]*|[a-z]+)[lnst])(?<![qQ]ual)itly\b(30554ms) 22:41:46.163 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -nally:(?<=\b(?:[A-Z][a-z]*|[a-z]+)[a-mo-z])(?:anlly|nalyl)\b(24740ms) 22:41:58.558 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -acious:(?<=\b(?:[A-Z][a-z]*|[a-z]+))acitous(?<!anthracitous)(ly|ness(?:es)?)?\b(12303ms) 22:42:12.434 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -bility:(?<=\b(?:[A-Z][a-z]*|[a-z]+))b(?:il|li)(?:li?)?t(ies|y)\b(13845ms) 22:43:22.159 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ally (2):(?<=\b(?:(?:[A-Z][a-z-]*|[a-z-]+)(?:[enu]|ic?)))alyl?\b(?<!(?:Ann?|B(?:allyhe|i|on|ri)|br?i|C(?:onne|re)|D(?:e|o[nu])|F(?:e|in)|G(?:lene|re)|He|K(?:an|e(?:nn?e)?|i(?:lte|nn?s?e))|M(?:cNealy|e)|me|N(?:an|e)|Que?|S(?:e|[hm]e|pezi)|Vit|Whe)aly|[lL]inalyl|oualy|[sS]ialyl)(69717ms) 22:43:36.822 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ively:(?<=\b(?:[A-Z][a-z]*|[a-z]+))ivly\b(14644ms) 22:43:49.429 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -eaning:(?<=\b(?:[A-Z][a-z]*|[a-z]+))ean(?:in|ni)ng\b(12597ms) 22:44:02.905 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ictive:(?<=\b(?:[A-Z][a-z]*|[a-z]+))icitve(ly|s?)\b(13333ms) 22:44:15.890 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -fering:(?<=\b(?:[A-Z][a-z]*|[a-z]+))fereing(s)?\b(12916ms) 22:44:28.421 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -geni(s/z)e:(?<=\b(?:[A-Z][a-z]*|[a-z]+))genei([sz][a-z]+)\b(12485ms) 22:46:37.495 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ally (1):(?<=\b(?:(?:[A-Z][a-z]*|[a-z]+)(?:[cd]i|er|gi|i(?:[cn]|on)|li|n[it]|ot|son|[tv]i)))aly\b(?<!Finaly|qualy)(129067ms) 22:46:52.062 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -acity:(?<=\b(?:[A-Z][a-z]*|[a-z]+))act?iy\b(14482ms) 22:47:04.739 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -solutely:(?<=\b(?:[A-Z][a-z]*|[a-z]+))solutly\b(12451ms) 22:47:18.853 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ology:(?<=\b(?:[A-Z][a-z]*|[a-z]+))ol(?:[ai]?|ol)g(y(?<![vV]olgy\b)|i(?:c[a-z]*|es|sts?))\b(13997ms) 22:47:49.058 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -(c/l/t)ious:(?<=\b(?:[A-Z][a-z]*|[a-z]+)[clt])ioous([a-z]*)\b(30168ms) 22:48:04.443 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -wed/-wing:(?<=\b(?:[A-Z][a-z]*|[a-z]+))ww(ed|ing|s)\b(15382ms) 22:48:19.303 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ining:(?<=\b(?:[A-Z][a-z]*|[a-z]+))inig(ly|s?)\b(?<!\b(?:Bre|He|K(?:le|urt)|Lap|Me|Nar(?:ir)?|Re|Stee|[tT]|We)inig\b)(14783ms) 22:48:31.512 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -(g/p)ressive:(?<=\b(?:[A-Z][a-z]*|[a-z]+))([gp]res)i(ons?|ve[a-z]*)\b(12150ms) 22:49:01.829 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ately_:(?<=\b(?:[A-Z][a-z]*|[a-z]+)[bcdgimstv])atly\b(30149ms) 22:49:37.375 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -(a/e/i/o/u)(c/n/o/r/s)king:(?<=\b(?:[A-Z][a-z]*|[a-z]+)[aeiou][cnors])kign\b(35020ms) 22:49:49.535 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -tional(ly):(?<=\b(?:[A-Z][a-z]*|[a-z]+))tion(?:a(ly)|nal(ly)?)\b(12113ms) 22:50:04.381 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -XXX(ed/er/ing/ive):(?<=\b(?:[A-Z][a-z]*|[a-z]+))([aeiou])([bdfgklmnprstvz])\2{2,}(e(?:d|rs?)|i(?:ngs?|ons?|ves?)|ors?)\b(14837ms) 22:50:17.912 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ation:(?<=\b(?:[A-Z][a-z]*|[a-z]+))ati?oin(al(?:ly)?|ed|ing|s?)\b(13431ms) 22:50:31.527 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -vement:(?<=\b(?:[A-Z][a-z]*|[a-z]+))vment(al|ed|ing|s?)\b(13358ms) 22:50:43.184 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -ilities:(?<=\b(?:[A-Z][a-z]*|[a-z]+))ill+ities\b(11652ms) 22:51:12.089 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -tally:(?<=\b(?:[A-Z][a-z]*|[a-z]+)[b-eghj-z])talyl?\b(28823ms) 22:51:25.835 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -tion(s):(?<=\b(?:[A-Z][a-z]*|[a-z]+))tio(?:i|(s))n\b(13585ms) 22:51:38.309 [Thread-2] INFO PERF - Slow regular expression: Typo AWB -soning:(?<=\b(?:[A-Z][a-z]*|[a-z]+))soninig\b(12372ms) 22:51:59.917 [Thread-2] INFO PERF - Slow regular expression: Typo AWB High-profile:\b(?<!(?:[bB]ecause\s+of\s+(?:h(?:er|is)|its|their)|(?:achiev(?:e[ds]?|ing)|creat(?:e[ds]?|ing)|display(?:ed|ing|s?)|ha(?:s|ve)|ke(?:ep(?:ing|s?)|pt)|maintain(?:ed|ing|s?)|retain(?:ed|ing|s?)|with)\s+a)\s+)([hH])igh(?<![A-Z][A-Za-z]+\s+High|specified\s+High|the\s+High)\s+profile\b(?!,|\s+(?:a(?:nd|s)|for|in|of)\b)(21470ms) |
I will apply the same kind of optimization to all of them. --NicoV (Talk on frwiki) 08:03, 8 January 2020 (UTC)
- I've done the 30 optimizations. Tom.Reding, would you be interested in updating the graphs you made recently to see if it has made a significant optimization for AWB? For WPCleaner, I will wait for Jerodlycett's answer to see if it has the huge performance improvement I hope for... --NicoV (Talk on frwiki) 08:42, 8 January 2020 (UTC)
- Thank you for tackling these. My experience with regex searches in the AWB database scanner confirms that re-arrangements like these are very effective, reducing the scan times by hours. -- John of Reading (talk) 08:58, 8 January 2020 (UTC)
- Thanks ! I've confirmation that for WPCleaner, it helped going from 11 minutes to 6 seconds for a page like 2019 reasons of the Supreme Court of Canada. --NicoV (Talk on frwiki) 09:09, 8 January 2020 (UTC)
- Thank you for tackling these. My experience with regex searches in the AWB database scanner confirms that re-arrangements like these are very effective, reducing the scan times by hours. -- John of Reading (talk) 08:58, 8 January 2020 (UTC)
@NicoV: as requested, I reran my analysis, using the same code on the same machine in the same configuration on the latest version of WP:AWB/T as of 2020 February 27. I ran the analysis 4x and averaged the results to obtain the #s in the new 'NicoV' column and the new graph below.
Metric Before After % improvement | NicoV -------------------------------------------------------------|---------- Total run time: ~142,351x ~130,482x ~8.3% | ~138,115x Slowest rule speed: ~355x ~218x ~39% | ~337x Average rule speed: ~37x ~34x | ~36x Median rule speed: ~37x ~34x | ~38x Stdev: ~31x ~22.5x ~27% | ~24x Stdev from mean for | the slowest rule: ~10.3 ~8.15 ~21% | ~12.56x x = times the fastest rule
Graphs are unavailable due to technical issues. There is more info on Phabricator and on MediaWiki.org. |
The results are marginally worse. The 'NicoV' curve is shaped differently than the 'After' curve (as opposed to being a copy of 'After' curve and pasted slightly higher above it or to the right of it), which means that the difference is probably not the result some unknown systematic effect being applied to all the rules. Regardless, the difference is small, and if other uses of WP:AWB/T show vast improvement, then I think it was worthwhile.
Since my foray into this rule optimization was spurred by AWB's database scanner taking ~31,500 minutes (21.88 days!) to complete, I decided to rerun the database scanner today (there have been no changes to WP:AWB/T since 2020 February 25). I regret not having done this after my original optimization, as a basis for comparison, so this run will have to suffice. After 1 hour of running, on the same database as before even (20191020), I get ~27,700 minutes remaining, or 19.24 days, so still abysmally slow. ~ Tom.Reding (talk ⋅dgaf) 15:58, 29 February 2020 (UTC)
BC / AD
Could BC and AD (and probably BCE etc. be added as typos to correct things like - 123BC, 123 B.C. I just edited this horrible page Nabarangpur district which was littered with all kinds of issues and errors but AWB wasn't picking up any of the AD / BC bits. Jamesmcmahon0 (talk) 09:51, 6 April 2020 (UTC)
- B.C. has tens of thousands of occurrences, aside from British Columbia it seems to be a common suffix for Israeli football teams. It might be worth carefully looking at the strings "numericB.C>" and "numericspaceB.C." but I wouldn't be optimistic. ϢereSpielChequers 10:25, 6 April 2020 (UTC)
- We would need to exclude cases such as Empresas 1BC, 2BC, 4BC, character U+28BC, etc. but they should be relatively rare. Certes (talk) 11:59, 6 April 2020 (UTC)
Véhicule Press
Hi can we amend AWB so that Vehicule Press becomes Véhicule Press instead of Vehicle Press? Ta ϢereSpielChequers 09:18, 7 April 2020 (UTC)
- Done. (as requested by @WereSpielChequers:). Happy editing! Chris the speller yack 16:42, 18 April 2020 (UTC)
Crusierweight
Hi, could we add "Crusierweight" as a typo? I can fix the current crop, but it would be good to get into AWB. ϢereSpielChequers 23:21, 24 March 2020 (UTC)
- @WereSpielChequers: How many did you fix? Fifty? Then I'll do it. Five? Probably not worth a Typo rule. Chris the speller yack 16:56, 18 April 2020 (UTC)
- Hi, I found 25 among my edits immediately after this request, but I couldn't now be sure that was the only pass I made to fix the problem. ϢereSpielChequers 18:07, 18 April 2020 (UTC)
- Well, we have usually required about two dozen before we would add a rule. This is right about there. Do you think it is worth it? Chris the speller yack 20:10, 18 April 2020 (UTC)
- Done. I found a case that popped up yesterday [Terry Ray (boxer)]. If one is created every day, that's 366 or 365 a year (based on a ridiculously small sample)! It also fixes "cruserweight". Enjoy! Chris the speller yack 22:53, 18 April 2020 (UTC)
Legionnaires and legionaries
I've had a misstep or two with the legionaires --> legionnaires fix, because the Roman soldiers were legionaries[23]. Should the plural be removed from the fix? -- JHunterJ (talk) 13:09, 23 April 2020 (UTC)
Question about new president and vice-president rules
@Chris the speller: I see you added new rules to change "President" and "Vice President" to lower case. The capitalization in List of stories set in a future now past, List of people barred or excluded from the United States, President of Argentina, Vladimir Putin and Recep Tayyip Erdoğan seems appropriate, but your rule changes it. When is it appropriate to capitalize, and when is it not appropriate? Thanks! GoingBatty (talk) 04:16, 26 April 2020 (UTC)
- @GoingBatty: In MOS:JOBTITLES there are only three cases where "president" should capitalized: (1) when it has become part of the name, as in "President Nixon"; (2) when it refers to a specific person as a substitute for their name during their time in office, as in "Vice President Truman met with the President only a few times during 1945"; (3) as part of an unmodified formal title, as in "Ford became President of the United States in 1974". When preceded by "the", "a", "37th", "US", "former", or any other modifier, it gets lower case. Looking at the articles mentioned above, the rule is working exactly as designed. It misses a fair number of cases that should be in lower case, such as "Assuming the role of President, Erdoğan was criticized ...", but it produces very few false positives. Chris the speller yack 05:21, 26 April 2020 (UTC)
- @Chris the speller: Thanks for the response. Since there's a banner at MOS:JOBTITLES stating that section is "disputed or under discussion", and lots of discussion at the talk page that doesn't seem to come to a consensus, I'm wondering whether the decapitalization should wait until there's consensus. Thanks again! GoingBatty (talk) 05:50, 26 April 2020 (UTC)
- @GoingBatty: The MoS has been quite consistent about this for a year and a half, and, long before that, generally consistent about lower case where "president" is used as a job title ("... he decided not to run for president"), and the status quo was reached by consensus, which, of course, does not mean unanimity. There is a lot of misunderstanding about current wording of the MoS, and some editors are happy to capitalize "president" every chance they get. A recent RFC tried to throw out all of JOBTITLES and start over, but there was no consensus to do so. The main point of misunderstanding and/or disagreement is on lower casing to "president of the United States" when preceded by an ordinal number or definite article. This rule does not make that change, and the changes it does make are generally agreed upon and are unlikely to be overthrown, so I think the rule is doing a great deal of good. If you see it try to make a change that you have misgivings about, don't hit the "SAVE" button. Happy editing! Chris the speller yack 13:35, 26 April 2020 (UTC)
- @Chris the speller: I appreciate you taking the time to explain this. Thanks! GoingBatty (talk) 13:44, 26 April 2020 (UTC)
- @GoingBatty: The MoS has been quite consistent about this for a year and a half, and, long before that, generally consistent about lower case where "president" is used as a job title ("... he decided not to run for president"), and the status quo was reached by consensus, which, of course, does not mean unanimity. There is a lot of misunderstanding about current wording of the MoS, and some editors are happy to capitalize "president" every chance they get. A recent RFC tried to throw out all of JOBTITLES and start over, but there was no consensus to do so. The main point of misunderstanding and/or disagreement is on lower casing to "president of the United States" when preceded by an ordinal number or definite article. This rule does not make that change, and the changes it does make are generally agreed upon and are unlikely to be overthrown, so I think the rule is doing a great deal of good. If you see it try to make a change that you have misgivings about, don't hit the "SAVE" button. Happy editing! Chris the speller yack 13:35, 26 April 2020 (UTC)
seal level - sea level
I have just changed 69 instances of seal level to sea level, and left two false positives. Am I being a spoilsport? Should we allow the language to evolve in such a logical direction? Or should we put an extra rule into AWB? ϢereSpielChequers 18:21, 3 May 2020 (UTC)
- I fear that Mrs. Malaprop has found employment with Geoview. Their page on each place gives its height above seal level [sic], e.g. [24] [25]. Certes (talk) 18:52, 3 May 2020 (UTC)
- the Journal of Indo-Pacific Archaeology at something called washington.edu has another member of the Malaprop clan employed. I checked back to the original report and the pdf was titled "Cultural adaptations and late Holocene sea level change in the Marianas: recent excavations at Chalan Piao, Saipan, Micronesia" rather than "Cultural adaptations and late Holocene seal level change in the Marianas: recent excavations at Chalan Piao, Saipan, Micronesia". ϢereSpielChequers 19:06, 3 May 2020 (UTC)
- To answer the question: if editors are going to copy and paste facts from Geoview then we probably should have a rule. It needs to leave Elsword, Silver Shadow (song) and List of Volkswagen Group petrol engines unmolested. The typo most commonly occurred in the phrase "The estimate terrain elevation above seal level", so perhaps we should throw a "d" at "estimate" while we're there. I gloss over the questions of whether Geoview is a reliable source and whether it's properly attributed. (Geoview is kind enough to link to Wikipedia, although it thinks we are a tourist and accommodation guide.) Finally, some estimates may be less accurate than others. Certes (talk) 19:42, 3 May 2020 (UTC)
- One of those three has a comma between seal and level, so I didn't count it with the two false positives. "estimate Terrain" didn't have any false positives, but I hadn't finished it, last two done now. I'm pretty sure I didn't find 25 of them to fix, and people have mentioned 25 as a threshold for additions to AWB typo rules. ϢereSpielChequers 20:51, 3 May 2020 (UTC)
- I fixed 21 cases of "estimate terrain", and deleted the text from two more. Certes (talk) 21:37, 3 May 2020 (UTC)
- OK "estimate terrain" - "estimated terrain" would also be a useful rule. ϢereSpielChequers 08:29, 7 May 2020 (UTC)
- I fixed 21 cases of "estimate terrain", and deleted the text from two more. Certes (talk) 21:37, 3 May 2020 (UTC)
- One of those three has a comma between seal and level, so I didn't count it with the two false positives. "estimate Terrain" didn't have any false positives, but I hadn't finished it, last two done now. I'm pretty sure I didn't find 25 of them to fix, and people have mentioned 25 as a threshold for additions to AWB typo rules. ϢereSpielChequers 20:51, 3 May 2020 (UTC)
"in (date)" -> "on (date)" and "on (month)" -> "in (month)"
Lately I've noticed multiple instances of typos in the form of, for example, "In May 9, 2020". I did a search and found 56 results for "in May 9". Many of these are false positives, but that still implies perhaps 10,000 or more instances of this typo if both DMY and MDY dates are considered. Likewise, a search for "on April 2020" reveals 93 results. Ionmars10 (talk) 04:23, 9 May 2020 (UTC)
- Good catch. Any typo regex should allow only spaces within the matched string. We might treat other cases with separate AWB/JWB runs, so editors can be vigilant for false positives such as
In May, 31 things happened
. Certes (talk) 10:05, 9 May 2020 (UTC)- @Ionmars10 and Certes: I just added "On MDY" and "on MDY" rules. Let's see how these go before adding "On/on DMY", and "In/in MY" rules. GoingBatty (talk) 02:52, 12 May 2020 (UTC)
- Thanks. I think you could even make the comma optional, to catch cases like the Pacific Star Building which was
inaugurated in May 17 1989
. Certes (talk) 12:03, 12 May 2020 (UTC)- @Certes: The AWB order of procedures is to run the general fixes (which include fixing dates by adding the comma) before launching the typo rules. GoingBatty (talk) 15:40, 12 May 2020 (UTC)
- Thanks. I think you could even make the comma optional, to catch cases like the Pacific Star Building which was
- @Ionmars10 and Certes: I just added "On MDY" and "on MDY" rules. Let's see how these go before adding "On/on DMY", and "In/in MY" rules. GoingBatty (talk) 02:52, 12 May 2020 (UTC)
- @Ionmars10 and Certes: I just added "On DMY" and "on DMY" rules. GoingBatty (talk) 17:03, 12 May 2020 (UTC)
- @Ionmars10 and Certes: Last piece done: adding "In MY" and "in MY" rules. The rules are somewhat conservative, and can be expanded as time goes on. These six rules should keep us busy for a couple days. :-) GoingBatty (talk) 02:14, 13 May 2020 (UTC)
- GoingBatty and Certes, here are some modified regexes which should catch both months and dates that aren't followed by a year:
:::<Typo word="In month" find="\bOn\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\D)" replace="In $1 $2" /> :::<Typo word="in month" find="\bon\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+(\D)" replace="in $1 $2" /> :::<Typo word="On DM" find="\bIn\s+([1-9]|[1-2][0-9]|3[01])\s+(January|February|March|April|May|June|July|August|September|October|November|December)\b" replace="On $1 $2" /> :::<Typo word="on DM" find="\bin\s+([1-9]|[1-2][0-9]|3[01])\s+(January|February|March|April|May|June|July|August|September|October|November|December)\b" replace="on $1 $2" /> :::<Typo word="On MD" find="\bIn\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+([1-9]|[1-2][0-9]|3[01])\b" replace="On $1 $2" /> :::<Typo word="on MD" find="\bin\s+(January|February|March|April|May|June|July|August|September|October|November|December)\s+([1-9]|[1-2][0-9]|3[01])\b" replace="on $1 $2" /> :::
- Already tested these out during a (relatively short) AWB run, with no issues so far. Any objections to adding them to the list? Ionmars10 (talk) 04:21, 19 May 2020 (UTC)
- My "in date" > "on date" rule excludes
(?<!\b(?:drafted|mustered|penciled|sworn)\s+in)
. -- John of Reading (talk) 06:04, 19 May 2020 (UTC)
- My "in date" > "on date" rule excludes
- This is a very helpful change which I support, but please forgive me one more negative comment. Do we exclude (or fix differently) phrases like
on January the 3rd
in Genevieve? Certes (talk) 11:37, 19 May 2020 (UTC)
- @Ionmars10 and Certes: Last piece done: adding "In MY" and "in MY" rules. The rules are somewhat conservative, and can be expanded as time goes on. These six rules should keep us busy for a couple days. :-) GoingBatty (talk) 02:14, 13 May 2020 (UTC)
Borned > Born
I have fixed more than fifty of these, I think it qualifies for AWB. ϢereSpielChequers 19:03, 31 May 2020 (UTC)
possibilites > possibilities
possibilites > possibilities is also common I have just fixed nearly thirty, though occasionally the change should be to the French "possibilités" ϢereSpielChequers 16:58, 25 May 2020 (UTC)
- Done. Most cases of French "possibilites" are in quotations, so the rule does not affect them, and cases that are not can have the lang template added, as in "
{{lang|fr|Les possibilites et les limites du travail et de la recherche scientifiques}}
". This should be done anyway (see "Rationale" section of the template's documentation). In fact, I tried the rule on all articles that contained that string, and it only attempted to change one, a true positive (Samba). Chris the speller yack 23:57, 31 May 2020 (UTC)
carrer - career
I have just fixed about 80 of these, but there are complications as Carrer is both a surname and the Catalan equivalent of Street, boulevard or something like that, except upper case. So ideally the lowercase test would be carrer > career, but we could also have an uppercase test for the start end or total of a section heading, that would pick up a plethora of == Carrer highlights == ==Political Carrer== or just plain ==Carrer== . ϢereSpielChequers 13:54, 30 May 2020 (UTC)
- Did you find any carers or carriers masquerading as carrers, or are all except the surname typos for career? Certes (talk) 14:07, 30 May 2020 (UTC)
- there may have been one or two, but almost all went to career. ϢereSpielChequers 19:03, 31 May 2020 (UTC)
- Bear in mind that the Typo rules are not applied to headings, just to plain text outside quotations, so they can't fix "==Political Carrer==". Chris the speller yack 20:44, 31 May 2020 (UTC)
- In that case can we just have a case sensitive test please? carrer > career in text was probably about half of the 80 I fixed. ϢereSpielChequers 12:55, 1 June 2020 (UTC)
- Bear in mind that the Typo rules are not applied to headings, just to plain text outside quotations, so they can't fix "==Political Carrer==". Chris the speller yack 20:44, 31 May 2020 (UTC)
- There is already an ancient Typo rule that does what you want (try it on Cesar Carrillo), but, as I pointed out above, Typo rules can't fix headings. I think the rules are doing all that can be expected. Chris the speller yack 15:51, 1 June 2020 (UTC)
- There were only four cases of these in section headings; I fixed 'em. Chris the speller yack 16:02, 1 June 2020 (UTC)
Tarbutton
This recent rule:
<Typo word="(At/Con/Dis/Re(dis))Tribute" find="\b([aA]tt|[cC]ont|[dD]ist|[rR]edist|[tT])t?(?:ribu(e[ds]|i(?:ng|on))\b|(?:[aeiou]?r(?:[iu]+)?b(?:[aeiu]+)?t(?<!arbat|[tT]ribut)|ritut)([a-z]+)\b(?<!Attribates|b(?:at(?:a(?:lis|ria|s?)|e(?:jamae|lla)?|i(?:a?|on)|or?|rix|u[ms])|et(?:ek|isonios|sk[iy]?s?|t(?:ite)?)|it(?:an?|[ho]|kan|t(?:ite)?)|ott(?:ite)?|u(?:atur|it[aeiou](?:gli(?:de)?|le|r?)|t(?:aline|h(?:ylazine)?|it|r[oy]n|t(?:ite?|s?))|utti))|conturbat(?:ed|um)|disturbator[ey]|k(?:aya|o[iy]s?)|T(?:arb(?:butt(?:on)?|et[hs]|i(?:at(?:e|ul)|t[sz]a?))|er(?:b(?:a(?:atar|tas)|itlah)|ibithia|ubetaake)|or(?:b(?:at(?:eheydarieh|[io]|ross)|i(?:at[io]|tch)|ut(?:rol|ton))|iibata)|r(?:ib(?:at(?:e|io)|et(?:ek|o(?:n|on|y)|t)|itch)|ub(?:at(?:a|ch(?:ov)?|sa)|e(?:ats|t(?:a|chin(?:o|sky)|sk(?:a|o(?:go|j))|zin))|it(?:s[iy]na?|t)|t(?:ensee|hob)|ute))|urb(?:at(?:a|hi|[iu]|or?|r(?:ix|oss))|et(?:li|ts?)|it(?:een|ity)|utt))|t(?:ax|rubed)|urbitt?s?))(?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})" replace="$1ribut$2$3"/>
changes Tarbutton to Tributton, and Tarbutton is a surname. -- JHunterJ (talk)
- Done. It won't bite you again. I also fixed two articles where AWBers pulled the trigger when they should have been more alert. Chris the speller yack 16:23, 1 June 2020 (UTC)
qualifed qualified
Hi I have just run though wikipedia and made changes in 59 articles from qualifed to qualified. Can we set a rule for the future? ϢereSpielChequers 13:35, 25 May 2020 (UTC)
- We also had 41 sporting qualifer[s] but 80% were [sic], correctly quoted titles or URLs, so I'm not sure whether that should be included. Certes (talk) 14:06, 25 May 2020 (UTC)
- Done, including qualifes, qualfies and qualifer[s]. Also disqualifed, unqualifed, requalifed. Chris the speller yack 22:17, 1 June 2020 (UTC)
carrer - career
I have just fixed about 80 of these, but there are complications as Carrer is both a surname and the Catalan equivalent of Street, boulevard or something like that, except upper case. So ideally the lowercase test would be carrer > career, but we could also have an uppercase test for the start end or total of a section heading, that would pick up a plethora of == Carrer highlights == ==Political Carrer== or just plain ==Carrer== . ϢereSpielChequers 13:54, 30 May 2020 (UTC)
- Did you find any carers or carriers masquerading as carrers, or are all except the surname typos for career? Certes (talk) 14:07, 30 May 2020 (UTC)
- there may have been one or two, but almost all went to career. ϢereSpielChequers 19:03, 31 May 2020 (UTC)
- Bear in mind that the Typo rules are not applied to headings, just to plain text outside quotations, so they can't fix "==Political Carrer==". Chris the speller yack 20:44, 31 May 2020 (UTC)
- In that case can we just have a case sensitive test please? carrer > career in text was probably about half of the 80 I fixed. ϢereSpielChequers 12:55, 1 June 2020 (UTC)
- Bear in mind that the Typo rules are not applied to headings, just to plain text outside quotations, so they can't fix "==Political Carrer==". Chris the speller yack 20:44, 31 May 2020 (UTC)
- There is already an ancient Typo rule that does what you want (try it on Cesar Carrillo), but, as I pointed out above, Typo rules can't fix headings. I think the rules are doing all that can be expected. Chris the speller yack 15:51, 1 June 2020 (UTC)
- There were only four cases of these in section headings; I fixed 'em. Chris the speller yack 16:02, 1 June 2020 (UTC)
Tarbutton
This recent rule:
<Typo word="(At/Con/Dis/Re(dis))Tribute" find="\b([aA]tt|[cC]ont|[dD]ist|[rR]edist|[tT])t?(?:ribu(e[ds]|i(?:ng|on))\b|(?:[aeiou]?r(?:[iu]+)?b(?:[aeiu]+)?t(?<!arbat|[tT]ribut)|ritut)([a-z]+)\b(?<!Attribates|b(?:at(?:a(?:lis|ria|s?)|e(?:jamae|lla)?|i(?:a?|on)|or?|rix|u[ms])|et(?:ek|isonios|sk[iy]?s?|t(?:ite)?)|it(?:an?|[ho]|kan|t(?:ite)?)|ott(?:ite)?|u(?:atur|it[aeiou](?:gli(?:de)?|le|r?)|t(?:aline|h(?:ylazine)?|it|r[oy]n|t(?:ite?|s?))|utti))|conturbat(?:ed|um)|disturbator[ey]|k(?:aya|o[iy]s?)|T(?:arb(?:butt(?:on)?|et[hs]|i(?:at(?:e|ul)|t[sz]a?))|er(?:b(?:a(?:atar|tas)|itlah)|ibithia|ubetaake)|or(?:b(?:at(?:eheydarieh|[io]|ross)|i(?:at[io]|tch)|ut(?:rol|ton))|iibata)|r(?:ib(?:at(?:e|io)|et(?:ek|o(?:n|on|y)|t)|itch)|ub(?:at(?:a|ch(?:ov)?|sa)|e(?:ats|t(?:a|chin(?:o|sky)|sk(?:a|o(?:go|j))|zin))|it(?:s[iy]na?|t)|t(?:ensee|hob)|ute))|urb(?:at(?:a|hi|[iu]|or?|r(?:ix|oss))|et(?:li|ts?)|it(?:een|ity)|utt))|t(?:ax|rubed)|urbitt?s?))(?![^\s\.]*\.\w)(?<!\.[^\s\.]{0,999})" replace="$1ribut$2$3"/>
changes Tarbutton to Tributton, and Tarbutton is a surname. -- JHunterJ (talk)
- Done. It won't bite you again. I also fixed two articles where AWBers pulled the trigger when they should have been more alert. Chris the speller yack 16:23, 1 June 2020 (UTC)
qualifed qualified
Hi I have just run though wikipedia and made changes in 59 articles from qualifed to qualified. Can we set a rule for the future? ϢereSpielChequers 13:35, 25 May 2020 (UTC)
- We also had 41 sporting qualifer[s] but 80% were [sic], correctly quoted titles or URLs, so I'm not sure whether that should be included. Certes (talk) 14:06, 25 May 2020 (UTC)
- Done, including qualifes, qualfies and qualifer[s]. Also disqualifed, unqualifed, requalifed. Chris the speller yack 22:17, 1 June 2020 (UTC)
aslo > also
There is an organisation called ASLO and a fictional character called Aslo, but aslo is just a typo for also, I should know I must have fixed a hundred over the last few years. ϢereSpielChequers 13:00, 1 June 2020 (UTC)
- There is an existing and venerable Typo rule for this; things seem to be well in hand. Chris the speller yack 16:53, 1 June 2020 (UTC)
- Ah, good to know. Probably best if I keep an eye on it for "see aslo" section names. ϢereSpielChequers 11:08, 2 June 2020 (UTC)
tranverse > transverse
I have just fixed more than thirty of these, none of them seemed to be typos of traverse, all of transverse, so a good one for AWB. ϢereSpielChequers 11:10, 2 June 2020 (UTC)
Parlamentarisch
Please can we exclude parlamentarisch(e|er|en|es)? from the "Parliament" fix? I've fixed the articles affected, such as Debate. Thanks, Certes (talk) 10:06, 2 June 2020 (UTC)
boeing - Boeing
Can we add a rule to capitalise this one please? ϢereSpielChequers 08:29, 7 May 2020 (UTC)
"until to" > "until"
Got 229 search results for this phrase which again I'm pretty sure(?) is incorrect but it also might just be an obscure way of phrasing this: [26] Ionmars10 (talk) 18:57, 2 June 2020 (UTC)
- I'm having a look. Most I have found so far I have corrected to until, at least one to to, and a couple just don't make sense regardless. However the search will also bring in a number of quotes with "until to-day" or "until to-morrow", I assume AWB would leave those, so it won't be 228, but easily enough to justify a rule. Good spot. ϢereSpielChequers 22:53, 2 June 2020 (UTC)
- "until to his horror" looks like a legit "until to" to me, though perhaps a comma would be in order. ϢereSpielChequers 07:44, 3 June 2020 (UTC)
- OK I have now run a first pass through them using AWB and fixed more than two thirds. "Until to his amazement" also looks right. I suspect some of the ones I have fixed are translation errors, and others will be typos that should have been "until 10 Month, Year". The latter I have no compunction changing to "until Month, Year". However I don't think that AWB typo fixing would be right for this one, too many false positives, even after AWB screens out a bunch of them; and at least two different results with each instance requiring a human decide what to do. I have started putting until to into a tool designed to handle more false positives than AWB. ϢereSpielChequers 12:13, 3 June 2020 (UTC)
- "until to his horror" looks like a legit "until to" to me, though perhaps a comma would be in order. ϢereSpielChequers 07:44, 3 June 2020 (UTC)
Disabling rules that fix hyphenation
Way back on 4 April 2018 user:Tom.Reding disabled two rules that removed useless hyphenation such as "a newly-elected dogcatcher" and "the idea was well-received by the university", claiming "context-specific", whatever that means. These rules were very heavily tested (not "heavily-tested"!). There was no discussion, and the person who inserted the rules was not notified. For over two years I thought folks running AWB were finding and fixing gratuitous hyphenation, and now I find out that two years' worth of opportunities were missed. I plan to re-enable these rules unless specific false positives can be provided and discussed. Chris the speller yack 21:03, 31 May 2020 (UTC)
- Done. As there has been no objection after 10 days, I am re-enabling the two rules. Chris the speller yack 02:42, 11 June 2020 (UTC)
Interim
Please can "Interim" not change intermède→interimède or intermédie→interimédie? (Not all implementations of \b recognise é and è as alphabetic.) Certes (talk) 23:34, 12 June 2020 (UTC)
Mali - mali Suggestion
Hello,
I am using WPCleaner and in the article Amnat Charoen Province the cleaner is stating that mali should start with a capital letter as in the country Mali. However, this would introduce an error. "hom mali" is a dish as well therefore it does not require capitalization. It is located in the link I provided. Please do let me know if this gets changed. If I understand all of the WPCleaner information, it will keep catching this as an error.
Thank you very much, Bakertheacre (talk) 19:31, 16 June 2020 (UTC)
Interim
Please can "Interim" not change intermède→interimède or intermédie→interimédie? (Not all implementations of \b recognise é and è as alphabetic.) Certes (talk) 23:34, 12 June 2020 (UTC)
Mali - mali Suggestion
Hello,
I am using WPCleaner and in the article Amnat Charoen Province the cleaner is stating that mali should start with a capital letter as in the country Mali. However, this would introduce an error. "hom mali" is a dish as well therefore it does not require capitalization. It is located in the link I provided. Please do let me know if this gets changed. If I understand all of the WPCleaner information, it will keep catching this as an error.
Thank you very much, Bakertheacre (talk) 19:31, 16 June 2020 (UTC)
Enbee fix needed
<Typo word="Emb-" find="\b([eE])nb([a-z]+)\b(?<!\bEnb(?:a(?:[ns]|qom|rr?)|e(?:kshi[a-z]{0,99}|rgs?|tsu)?|i(?:lulu|se)|lend|o(?:m|rne|th)|r(?:el|idge)|u(?:kan|lufushi|n))\b)(?<!Bir Enba)" replace="$1mb$2"/>
claims to avoid Enbee but does not (e.g., on Hotel Milan). -- JHunterJ (talk) 12:57, 22 June 2020 (UTC)
Raised to the ground
I regret to inform you that 13 structures have been raised to the ground
, though one case is literally true. Is this something we should correct mechanically, perhaps checking for (?!\s*floor)
? Certes (talk) 11:23, 8 July 2020 (UTC)
- Yeah, likely meant it to be wikt:razed to the ground. I'd support it being added. Jerod Lycett (talk) 07:29, 15 July 2020 (UTC)
- I've fixed 11 cases. We should also exclude capital G, as in metal band "Raised to the Ground" mentioned in Jacksepticeye#Personal life. Certes (talk) 11:28, 15 July 2020 (UTC)
resitential
Hi, I have just had AWB suggest that I change "resitential" to "resistantial". I think it would be better if "resitential" corrected to "residential" Have others noticed the same?ϢereSpielChequers 19:38, 20 July 2020 (UTC)
Lowercase tests
Celler is a surname, but a lowercase only test for celler > cellar is worthwhile, I've just fixed a pile of them. ϢereSpielChequers 16:58, 25 May 2020 (UTC)
- Similarly troup > troupe would be a good test, I have just corrected about a hundred. Troup is a surname, so this needs to be a case sensitive one. ϢereSpielChequers 14:56, 27 May 2020 (UTC)
- Also jains > Jains is another lowercase test. ϢereSpielChequers 21:33, 28 May 2020 (UTC)
- Done. – jain, jains, jainism – Chris the speller yack 20:47, 3 June 2020 (UTC)
- Thanks. How about troup and groud? ϢereSpielChequers 11:57, 11 June 2020 (UTC)
- Done. – jain, jains, jainism – Chris the speller yack 20:47, 3 June 2020 (UTC)
- Groud is a surname, but groud is a typo of ground. I fixed a couple of dozen today. ϢereSpielChequers 19:06, 31 May 2020 (UTC)
- flim and flims to film and films would be another good lowercase test. There is a town called Flims and a Star Wars character called Flim so only lowercase fix makes sense. ϢereSpielChequers 16:28, 26 June 2020 (UTC)
- Flim flam? All the best: Rich Farmbrough 08:34, 21 July 2020 (UTC).
- Good point. Most Flim Flams are uppercase, but there are enough lowercase ones that a flim test needs to exclude "flam" or "flams", the flim test itself would be a useful one though, I have corrected another bunch of them all of which must have come in since I patrolled "flim" in May. ϢereSpielChequers 11:25, 23 July 2020 (UTC)
Build up
I've taken out the automated replacing of "build up" with "buildup". According to the OED, "build up" is two words, not one. - SchroCat (talk) 11:51, 23 July 2020 (UTC)
can be find
It's surprising what can be find if you look. Should we fix this automatically? Smaller numbers of things "can now be find", "should be find", "could be find", etc. Certes (talk) 11:32, 29 July 2020 (UTC)
Excersice
Should excersice -> exercise be added?
currently six — Preceding unsigned comment added by MB (talk • contribs) 17:01, 31 August 2020 (UTC)
More issues
Probably since the introduction of the "efficiency" changes. Now, "long term" is "fixed" (to include the hyphen) to "long-m". I am a beginner at regex, so just reporting this for now. This is on North Devon Railway while doing a standard typo-fixing run. Thanks! After refreshing the typos, the problem no longer exists. Dawnseeker2000 17:22, 6 September 2020 (UTC)
- That one's been fixed, along with "vice president" and "on date". See also WT:AWB#Institute. Certes (talk) 18:02, 6 September 2020 (UTC)
- I've checked for other possibly problematic recent changes, and pre-emptively fixed the only ones I found: "east–west" and "west–east" (the second rule of each name). Of course, there may be other bugs both new and old which didn't match the pattern I was seeking. Certes (talk) 18:17, 6 September 2020 (UTC)
- Thank you for the quick response and explanations. Dawnseeker2000 20:32, 6 September 2020 (UTC)
- I think this is now all cleaned up. (I fixed a ve-president and a couple of long-m and short-m relationships.) However, there is a small risk that someone who opened AWB on Saturday and has not reloaded the typo list since will introduce other errors, so it's worth another check later. The institute one is awkward to check for, as
ie
is common (though often wrong) in other contexts. Certes (talk) 15:36, 7 September 2020 (UTC)
Profesor
Apparently Profesor is correct in Polish, and I suspect in some other languages. We currently have a couple of thousand articles with profesor. I suspect this means too many false positives for this to be useful in AWB. Should this typofix be disabled? ϢereSpielChequers 17:55, 5 September 2020 (UTC)
- wikt:profesor is valid in several languages, notably Spanish. A random selection that I checked were correct use of another language rather than typos. Perhaps it's a job for a one-off manual fix, after excluding phrases which indicate correct use such as
el profesor
. Certes (talk) 18:19, 5 September 2020 (UTC)- If you're going to exclude phrases, also exclude any use of Profesor followed by a word beginning with a capital letter.--Srleffler (talk) 21:43, 5 September 2020 (UTC)
- I just checked through those, and fixed the 7 out of 677 which were errors. Certes (talk) 22:41, 5 September 2020 (UTC)
- I've also checked the 508 not followed by a capital, and fixed 7 out of 508. The rest seem correct, though I left four borderline cases (1 2 3 4) on the assumption that the previous editor has a clue. The discrepancy between my total and the original couple of thousand is because I required
profesor
(any capitalisation) in the source; I excludedprofesör
etc. and transclusion via {{Japanese Club Football}}, {{Televisa telenovelas 1970s}}, etc. This has been a useful check, but I fear that automating it would cause more false positives than improvements. Certes (talk) 12:29, 6 September 2020 (UTC)- The problem is that it currently is in the AWB typo fixes. My argument is that it should come out of them. ϢereSpielChequers 15:04, 7 September 2020 (UTC)
- Then I agree with you. It may have done a good job in the past but seems likely to do more harm than good in future. Certes (talk) 15:42, 7 September 2020 (UTC)
- The problem is that it currently is in the AWB typo fixes. My argument is that it should come out of them. ϢereSpielChequers 15:04, 7 September 2020 (UTC)
- If you're going to exclude phrases, also exclude any use of Profesor followed by a word beginning with a capital letter.--Srleffler (talk) 21:43, 5 September 2020 (UTC)
Repetoire
I've just come across a problem: this regex:
<Typo word="Repertoire" find="\b([rR])ep[eir]to(?:ires?|r(?:i(?:al|es)|y))\b" replace="$1eperto$2"/>
converts Repetoire
to Reperto$2
but I can't see what's wrong with it. Colonies Chris (talk) 12:39, 9 September 2020 (UTC)
- @Colonies Chris: There is only one capturing group,
([rR])
, so only $1 is set. I think the first ?: needs to be removed, so $2 can be set to "ire". Or it may be better to replace the first ?: by ?= to make it a lookahead, and remove the $2. Certes (talk) 12:54, 9 September 2020 (UTC)- I've implemented the first of these fixes, since it will give a better edit summary than the other. -- John of Reading (talk) 13:16, 9 September 2020 (UTC)
- Thanks, John of Reading. I found two more cases which may need fixing:
- <Typo word="(Dis)Colour-" find="\b([cC]|[dD]isc)olou(?:[a-ln-qs-y][a-z]*)\b" replace="$1olour$2"/>
- <Typo word="ma(d/k)e" find="\bam([dk](?:es?|ing))\b" replace="ma$1$2"/>
- I think that's all of them. Certes (talk) 13:20, 9 September 2020 (UTC)
- Hopefully fixed now. -- John of Reading (talk) 13:28, 9 September 2020 (UTC)
childrens'
This gets corrected to "children's'"... could someone more adept at regexes make it eat the extra apostrophe? Alistair1978 (talk) 18:09, 9 August 2020 (UTC)
- I fixed those I found, but I haven't changed the regex and cases are still being added. See also women's' and mens's (and even one men's's). Certes (talk) 13:34, 25 September 2020 (UTC)
Repetoire
I've just come across a problem: this regex:
<Typo word="Repertoire" find="\b([rR])ep[eir]to(?:ires?|r(?:i(?:al|es)|y))\b" replace="$1eperto$2"/>
converts Repetoire
to Reperto$2
but I can't see what's wrong with it. Colonies Chris (talk) 12:39, 9 September 2020 (UTC)
- @Colonies Chris: There is only one capturing group,
([rR])
, so only $1 is set. I think the first ?: needs to be removed, so $2 can be set to "ire". Or it may be better to replace the first ?: by ?= to make it a lookahead, and remove the $2. Certes (talk) 12:54, 9 September 2020 (UTC)- I've implemented the first of these fixes, since it will give a better edit summary than the other. -- John of Reading (talk) 13:16, 9 September 2020 (UTC)
- Thanks, John of Reading. I found two more cases which may need fixing:
- <Typo word="(Dis)Colour-" find="\b([cC]|[dD]isc)olou(?:[a-ln-qs-y][a-z]*)\b" replace="$1olour$2"/>
- <Typo word="ma(d/k)e" find="\bam([dk](?:es?|ing))\b" replace="ma$1$2"/>
- I think that's all of them. Certes (talk) 13:20, 9 September 2020 (UTC)
- Hopefully fixed now. -- John of Reading (talk) 13:28, 9 September 2020 (UTC)
childrens'
This gets corrected to "children's'"... could someone more adept at regexes make it eat the extra apostrophe? Alistair1978 (talk) 18:09, 9 August 2020 (UTC)
- I fixed those I found, but I haven't changed the regex and cases are still being added. See also women's' and mens's (and even one men's's). Certes (talk) 13:34, 25 September 2020 (UTC)
21th
Does the team think that correcting text such as "21th" would be safe and useful? I was thinking of something like (\d*[02-9]1)th → $1st, (\d*[02-9]2)th → $1nd and (\d*[02-9]3)th → $1rd. (Omitted for clarity: \b at both ends, and making \d*[02-9] a lookbehind for efficiency.) Certes (talk) 21:31, 22 September 2020 (UTC)
- 1th, 2th and 3th are also disappointingly common. I suppose we should consider abominations like 2/3ths and 3thly, and limit the lookbehind size. That may give something like
1th(ly|s)?\b(?<=\b(?:\d{0,9}[02-9])?1th(?:ly|s)?)
→1st$1
, etc. but I'll await advice from a more experienced typo fixer before attempting to introduce anything. Certes (talk) 22:50, 22 September 2020 (UTC)- I'm currently wading through some of these. The false positives include some UK postcodes and an abbreviation for Second Thessalonians. But the military and sports ones are likely typos. Not sure if there are some safe rules we can put into AWB due to false positives. Don't you love Wikipedia! ϢereSpielChequers 16:42, 26 September 2020 (UTC)
- Thank you. I was going to look at these next, but I think there are thousands of errors even once we eliminate FPs. I was limiting to lower case th, which should weed out the postcodes and biblical references. There are a few valid uses such as the John A. Wilson Building on 13 1/2th Street, quotations from erroneous sources ("Siege enters 21th day" – Daily Slapdash) and some needing non-standard correction (1/2th finals of the 2016 Moroccan Throne Cup) but most lowercase uses encased in \b seem safe to correct. Certes (talk) 17:45, 26 September 2020 (UTC)
- I'm currently wading through some of these. The false positives include some UK postcodes and an abbreviation for Second Thessalonians. But the military and sports ones are likely typos. Not sure if there are some safe rules we can put into AWB due to false positives. Don't you love Wikipedia! ϢereSpielChequers 16:42, 26 September 2020 (UTC)
Another question: should we assume that (for example) 3th means 3rd, or is it there a significant risk that it is a typo for 4th, 13th etc? An editor making unrelated bulk edits may not have time to check sources every time an unexpected typo fix appears. Certes (talk) 21:45, 26 September 2020 (UTC)
enmasse
Hi, AWB just suggested to me that I change "enmasse" to "emmasse". Can I suggest that "en masse" would be a better call. ϢereSpielChequers 22:22, 24 August 2020 (UTC)
- WereSpielChequers, as far as I can tell there were only three instances of this that weren't some sort of proper name, which have now all been fixed. Ionmars10 (talk) 22:52, 24 August 2020 (UTC)
- Thanks, but that's not the issue. There will be a steady trickle of these things in the future, and currently the typo rules suggest the wrong change. ϢereSpielChequers 22:54, 24 August 2020 (UTC)
- I fixed four instances of "emmasse" (all resulting from AWB typo fixing). My opinion is that editors shouldn't be fixing typos with AWB if they do not have the time to review the edits they make, but some people seem to think otherwise. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 18:51, 29 September 2020 (UTC)
- Done[27] -- JHunterJ (talk) 19:35, 29 September 2020 (UTC)
municipailty
This gets corrected to "municipalty" instead of "municipality". 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 18:31, 29 September 2020 (UTC)
- Done The existing rules "Municipal (1)" and "Municipal (2)" allow any kind of word ending (correct or incorrect), so added a new rule for fixing "-ality" suffix. -- JHunterJ (talk) 20:00, 29 September 2020 (UTC)
John Hopkins University
I've just fixed 25 of those. Is this worth adding? 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 18:38, 29 September 2020 (UTC)
- There's a "Johns Hopkins University" rule already. Is it not catching? -- JHunterJ (talk) 20:02, 29 September 2020 (UTC)
120hz → 120Hz → 120 Hz
AWB first corrected a typo, 120hz, to 120Hz. When I run AWB on that particular page again, it corrected it again, from 120Hz to 120 Hz (by inserting
between 120 and Hz). Could you fix it so that from now on ###hz
would be corrected to ### Hz
straight away? Zarex (talk) 22:00, 9 September 2020 (UTC)
- Done[28] -- JHunterJ (talk) 20:06, 29 September 2020 (UTC)
aberannt
AWB has just suggested that I "improve" an article by changing "aberannt" to "aberrannt. Can this be changed to "aberrant" please. ϢereSpielChequers 15:06, 7 September 2020 (UTC)
- Which rule suggested that fix? (The "Typos" tab in the edit box will show you which rule(s) hit on a page.) -- JHunterJ (talk) 20:08, 29 September 2020 (UTC)
marine planation
Hi, it would seem that marine planation is a real thing. Please could someone change the plantation - planation rule to leave marine planation alone. ϢereSpielChequers 10:35, 7 October 2020 (UTC)
- Planation surface mentions marine processes as a cause. The word may have other legitimate uses and we might consider removing this rule or exempting the specific word "[Pp]lanations?" from being changed. Certes (talk) 11:14, 7 October 2020 (UTC)
- Zambia seems particularly well endowed with "plantation surfaces", which I suspect are all planation. Certes (talk) 12:23, 7 October 2020 (UTC)
- Done[29] -- JHunterJ (talk) 19:23, 7 October 2020 (UTC)
Were died
Is it worth adding "were died"? In the 24 cases I just fixed, this meant "died", as in "123 people were died". Lower case, because Jonathan Binns Were actually did die. I see that a further 30 souls "was died", though one is a legitimate archaic quote. Certes (talk) 15:46, 9 October 2020 (UTC)
marine planation
Hi, it would seem that marine planation is a real thing. Please could someone change the plantation - planation rule to leave marine planation alone. ϢereSpielChequers 10:35, 7 October 2020 (UTC)
- Planation surface mentions marine processes as a cause. The word may have other legitimate uses and we might consider removing this rule or exempting the specific word "[Pp]lanations?" from being changed. Certes (talk) 11:14, 7 October 2020 (UTC)
- Zambia seems particularly well endowed with "plantation surfaces", which I suspect are all planation. Certes (talk) 12:23, 7 October 2020 (UTC)
- Done[30] -- JHunterJ (talk) 19:23, 7 October 2020 (UTC)
Were died
Is it worth adding "were died"? In the 24 cases I just fixed, this meant "died", as in "123 people were died". Lower case, because Jonathan Binns Were actually did die. I see that a further 30 souls "was died", though one is a legitimate archaic quote. Certes (talk) 15:46, 9 October 2020 (UTC)
access-date
I've noticed recently that AWB is changing |accessdate=
to |access-date=
. The page I'm currently trying to edit has 218 such changes, making it very difficult to find and check my actual intended edit buried amongst them. I realise that I can turn off typo fixing entirely but would rather not. Is this a deliberate change? Certes (talk) 12:55, 27 October 2020 (UTC)
- @Certes: This change is part of the general fixes, not the typos. And yes, I've run into the same issue and have turned off the general fixes for now. It is a deliberate change, part of the long-term project to switch over to the hyphenated form of the parameter names. There's an RFC about it somewhere. -- John of Reading (talk) 13:03, 27 October 2020 (UTC)
- John of Reading, do you know if the RfC is still open? I can't find it in Wikipedia:Requests for comment/All. Ionmars10 (talk) 13:10, 27 October 2020 (UTC)
- @Ionmars10: I think the original RFC is this one from 2014. There's been a gradual move to deprecate and eventually remove the non-hyphenated forms. -- John of Reading (talk) 13:15, 27 October 2020 (UTC)
- John of Reading, do you know if the RfC is still open? I can't find it in Wikipedia:Requests for comment/All. Ionmars10 (talk) 13:10, 27 October 2020 (UTC)
- Certes, it was caused by this edit to Wikipedia:AutoWikiBrowser/Rename template parameters. I actually noticed this myself a few days ago (see Wikipedia_talk:AutoWikiBrowser/General_fixes#accessdate_->_access-date?), and I was worried that other users would complain about it. Ionmars10 (talk) 13:05, 27 October 2020 (UTC)
- Thanks for all the replies. I'll turn genfixes off for now, as it's the only practical way to get any work done. It may be useful to get a bot to change this one for us. Certes (talk) 13:42, 27 October 2020 (UTC)
- John of Reading, perhaps it is worth adding a few words in the update description to the effect that the previous parameters names are deprecated and that current names are preferred? This may assist with reminding editors of the desired changes when they come to add new material. Neils51 (talk) 12:04, 29 October 2020 (UTC)
- @Neils51: I try to include a link to WP:AWB/GF in the edit summary whenever the general fixes have made a difference to an edit. Since the main focus of my edits is to fix spellings and grammar, any pressure to explain the relevant general fixes in the edit summary would merely be another reason to keep the general fixes turned off. -- John of Reading (talk) 17:00, 29 October 2020 (UTC)
- John of Reading, perhaps it is worth adding a few words in the update description to the effect that the previous parameters names are deprecated and that current names are preferred? This may assist with reminding editors of the desired changes when they come to add new material. Neils51 (talk) 12:04, 29 October 2020 (UTC)
- Thanks for all the replies. I'll turn genfixes off for now, as it's the only practical way to get any work done. It may be useful to get a bot to change this one for us. Certes (talk) 13:42, 27 October 2020 (UTC)
Staes
Hi Staes is a surname, but United Staes would appear to be a typo. I've just cleared out 23 of them. I think it should be in AWB. ϢereSpielChequers 22:18, 7 November 2020 (UTC)
- Done with this set of edits -- JHunterJ (talk) 15:25, 8 November 2020 (UTC)
Please add "occured"
It should be "occurred".Xunonotyk (talk) 09:41, 9 November 2020 (UTC)
- It's already there.
<Typo word="(Re)Occurred/ing/ence" find="\b([oO]|[rR]eo)c(?:cur|ur+)(e(?:d|n(?:ces?|t))|ing)\b" replace="$1ccurr$2"/>
- -- JHunterJ (talk) 12:41, 9 November 2020 (UTC)
Cuidad
Hola! We have 147 cases of "Cuidad", usually capitalised. Wiktionary suggests that it's Spanish for [you (plural) must] take care [of someone], but can we assume that almost all enwp appearances are typos for "ciudad"? Certes (talk) 00:29, 9 November 2020 (UTC)
- Cuidad is a somewhat unusual 2nd person imp. non-reflexive conjugation form, and you could go a long time in a Spanish-speaking country before you ran into that verb form, if ever. (And if you did run into it, it would probably be the reflexive form cuidados.) So, yes, it's more likely to be the typo for "ciudad", imho. Mathglot (talk) 07:46, 9 November 2020 (UTC)
- I tried to add the typo with this edit (which I later reverted) and reloaded typos but nothing happened even when I restarted AWB. I have "Regex typo fixing" checked. The regex works in "Find and replace – Normal settings". Please can someone spot my error? 1970 in aviation is a test case. Certes (talk) 13:06, 9 November 2020 (UTC)
- If anyone is fixing these, beware that "Fabrice Cuidad" is a genuine alias of Fabrice Cuitad. I'll add that as an exception once I discover how to turn the typo on. Certes (talk) 13:22, 9 November 2020 (UTC)
- @Certes: AWB won't touch the one in 1970 in aviation, both because it is in italics and because the paragraph starts with an asterisk; see Wikipedia:AutoWikiBrowser/Typos#AutoWikiBrowser (AWB). -- John of Reading (talk) 14:31, 9 November 2020 (UTC)
- I tried to add the typo with this edit (which I later reverted) and reloaded typos but nothing happened even when I restarted AWB. I have "Regex typo fixing" checked. The regex works in "Find and replace – Normal settings". Please can someone spot my error? 1970 in aviation is a test case. Certes (talk) 13:06, 9 November 2020 (UTC)
Convertor
Should we remove the "Converter" entry? It only seems to catch convertor, which multiple dictionaries list as a valid alternative spelling. Certes (talk) 12:24, 16 November 2020 (UTC)
crealy - creally or clearly
Hi, AWB has just prompted me to change "crealy" to "creally" rather than "clearly" which in this case I'm doing. I don't think that creally is a word - I think we have an odd rule here. "Crealy" uppercase is a surname and should stay untouched. ϢereSpielChequers 21:16, 17 November 2020 (UTC)
- The "-ally (2)" rule does ignore Crealy, just not crealy. And the "-ally (2)" rule (like many suffix rules) may swap out one non-word for another. The question is should there be an earlier rule that catches "crealy" and assumes it should always be changed to "clearly"? -- JHunterJ (talk) 12:17, 20 November 2020 (UTC)
Short stores Short stories
I have just fixed 18 of these, is that enough for a Short stores - Short stories rule. ϢereSpielChequers 22:27, 18 November 2020 (UTC)
- 50 Commons PDFs discuss "short stores", usually meaning a lack of saved goods. I think it's a legitimate phrase. Certes (talk) 12:56, 20 November 2020 (UTC)
- The 18 I fixed were all typos of "Short Stories". If it is legit on some Commmons pdfs but not on Wikipedia then I suspect it is archaic. ϢereSpielChequers 09:25, 21 November 2020 (UTC)
Please add "Quaternary" & "Quatercentenary"
Quaternary & Quatercentenary should be added in your auto spell-check. Quaternary means four (4) & quatercentary means four hundred (400), not (25) quarternary or quartercentenary. Here's the link that was discussed more than a year ago: Purpose of Disambiguous/Redirect Links & Spelling NKM1974 (talk) 01:52, 20 November 2020 (UTC)
- What misspellings are you encountering? -- JHunterJ (talk) 12:09, 20 November 2020 (UTC)
- A search for ~quarternary reveals 23 articles (and 1,600 Commons PDFs in File:). Some faithfully render misspelt source titles. Quartercentury only occurs in five articles citing this 25-year review, which should perhaps have been published as "A quarter century..." , and 200 Commons PDFs. Certes (talk) 12:51, 20 November 2020 (UTC)
- This is a date/time stamp from the previous year, regarding adding that word to your auto spell-check: 23:44, 18 October 2019. The left with the yellow highlight is the correct spelling, while the right with the blue highlight is the wrong spelling. I hope this proof of evidence is the reason why quaternary & quatercentenary should be added in your auto spell-check. NKM1974 (talk) 13:23, 20 November 2020 (UTC)
- Some points of clarification: That edit was by a single AWB user, the word changed was "quater", not "quaternary" or "quatercentenary", and the list here is of typos, not spellchecks. So we'd need to decide what typo(s) (or misspellings) to fix. -- JHunterJ (talk)
- This has happened a total of 3 times. Here are 2 links with time/date stamp from: 12:10, 15 July 2019 & 16:53, 31 January 2018. The left with the yellow highlight is the correct spelling, while the right with the blue highlight is the wrong spelling. Here's the Wiktionary link with no "r" to quaternary & quatercentenary. Here are 2 Google links: Showing results for quaternary & Did you mean: quatercentenary. I don't know if this is persuasive enough to include these 2 words in your entry. If not, at least I tried. NKM1974 (talk) 01:41, 21 November 2020 (UTC)
- @NKM1974: The prefix "quater" was marked with {{not a typo}} in this edit on 19 October 2019. That stops AWB's respelling rules from changing it, and should deter anyone else from changing it manually without thinking hard about it first. -- John of Reading (talk) 08:55, 21 November 2020 (UTC)
- And again, the list here isn't a list of correctly spelled words. It's a list of typos. So we can't add the correct words "quaternary" and "quatercentenary"; we'd need to add the misspellings of them that need to be corrected to "quaternary" or "quatercentenary". Which is why I keep asking for which misspellings need to be added. -- JHunterJ (talk) 12:55, 21 November 2020 (UTC)
- This has happened a total of 3 times. Here are 2 links with time/date stamp from: 12:10, 15 July 2019 & 16:53, 31 January 2018. The left with the yellow highlight is the correct spelling, while the right with the blue highlight is the wrong spelling. Here's the Wiktionary link with no "r" to quaternary & quatercentenary. Here are 2 Google links: Showing results for quaternary & Did you mean: quatercentenary. I don't know if this is persuasive enough to include these 2 words in your entry. If not, at least I tried. NKM1974 (talk) 01:41, 21 November 2020 (UTC)
- Some points of clarification: That edit was by a single AWB user, the word changed was "quater", not "quaternary" or "quatercentenary", and the list here is of typos, not spellchecks. So we'd need to decide what typo(s) (or misspellings) to fix. -- JHunterJ (talk)
- This is a date/time stamp from the previous year, regarding adding that word to your auto spell-check: 23:44, 18 October 2019. The left with the yellow highlight is the correct spelling, while the right with the blue highlight is the wrong spelling. I hope this proof of evidence is the reason why quaternary & quatercentenary should be added in your auto spell-check. NKM1974 (talk) 13:23, 20 November 2020 (UTC)
- A search for ~quarternary reveals 23 articles (and 1,600 Commons PDFs in File:). Some faithfully render misspelt source titles. Quartercentury only occurs in five articles citing this 25-year review, which should perhaps have been published as "A quarter century..." , and 200 Commons PDFs. Certes (talk) 12:51, 20 November 2020 (UTC)
- We have 13 examples of "quartercentenary " in wikipedia. I think that at least some of them are the archaic use, and I'm not volunteering to go through and check because I don't see them confusing anyone. As for becoming an AWB rule, even if all of them were typos of "quatercentenary " it is simply too rare for an AWB rule - we are looking for at least twenty occurrences before it becomes worth automating. ϢereSpielChequers 14:28, 28 November 2020 (UTC)
Aircraft naming conventions
Hello everyone. Aircraft naming conventions typically use a hyphen. When aircraft aficionados say them out loud they'll say "dash", but when written it's a hyphen. I'm bringing this up because by default, AWB will suggest a fix for one of these situations. I'd like to start by showing the Boeing types, but I'm certain there are plenty of others in use that may need to be accounted for.
This screenshot shows several instances of a typical Boeing model designation and AWB suggesting fixes for what it perceives as a fix for a range of figures. I'd like to suggest that we exclude these types of fixes. I'll post over on the Aviation Wikiproject to see if some of the editors have other suggestions to exclude. Dawnseeker2000 05:02, 3 December 2020 (UTC)
- Yup, this is annoying.--Marc Lacoste (talk) 06:27, 3 December 2020 (UTC)
- There's always the {{Not a typo}} template, but as far as I can tell you have to mark every single instance of, say, 737-200 in the entire article to prevent AWB "fixing" them. It can be done, of course, but it makes the raw article very hard to read when editing. Perhaps it would be better if AWB ignored all instances of a pattern once one was marked with
{{Not a typo|737-200}}
. Or perhaps better still, a new template could be placed at the top of a page, listing AWB exceptions that would apply to the whole page, e.g.{{Not typos|737-100|737-200|737-300|...}}
or ideally{{Not typos|737-*|757-*}}
. Rosbif73 (talk) 07:36, 3 December 2020 (UTC)- Something like
(?<!\bBoeing\b.{1,999}7\d\d-\d+)
at the end of the "2-1" rule might help, catching the case where the word "Boeing" occurs earlier in the same paragraph as "7xx-xxx". This works in a regular "Find and Replace" rule, but I'm not sure what AWB does to the article text before running the AWB/T regular expressions on it. -- John of Reading (talk) 08:08, 3 December 2020 (UTC)
- Something like
- There's always the {{Not a typo}} template, but as far as I can tell you have to mark every single instance of, say, 737-200 in the entire article to prevent AWB "fixing" them. It can be done, of course, but it makes the raw article very hard to read when editing. Perhaps it would be better if AWB ignored all instances of a pattern once one was marked with
- Sanity check needed here. All US military designations similarly use a hyphen, though it is silent: F-16, B-2 and so on. Other nations also often include hyphens. Having uninformed users constantly going round corrupting our hundreds, perhaps thousands, of Wikipedia articles would necessitate a constant tail-chase to go round sorting out the mess. Please, this absolutely has to be an exception programmed in from ground zero; AWB really does need to be on top of this. — Cheers, Steelpillow (Talk) 11:57, 3 December 2020 (UTC)
- I've stopped AWB making many such changes and I'm sure I've let others slip through in my name. If - is right and – is wrong then we do need to do something about this error. Certes (talk) 12:12, 3 December 2020 (UTC)
- I have disabled the "2-1" rule, since it clearly needs more work to avoid false positives. Courtesy ping Chris the speller. -- John of Reading (talk) 12:29, 3 December 2020 (UTC)
- Almost all aircraft designatons worldwide use a typed "hyphen" and not any form of dash, so AWB needs to be adjusted for this reality. In the recent past I have seen AWB pass-throughs aircraft articles that all need reverting for just this reason. Marc is right - it is a annoying. Please fix it. - Ahunt (talk) 12:54, 3 December 2020 (UTC)
- I will re-enable the rule after adding a check for 7x7-xxx, which will take care of the problem for Boeing models, but first I will clean up about 280 articles that have en dashes in Boeing model numbers. Have you seen similar false positives that are not for Boeing aircraft? Chris the speller yack 19:00, 3 December 2020 (UTC)
- Boeing model numbers are the most prone to this, in that they are purely numeric-hyphen-numeric. Most other manufacturers' model numbers have an alphabetic character that stops this rule from hitting, e.g. Annn-nnn for Airbus, though they are sometimes seen incorrectly written without the A. Rosbif73 (talk) 21:01, 3 December 2020 (UTC)
- I have added a check for Boeing-style model numbers (7x7-xxx) to the "2–1" rule, and re-enabled it, and also to the "0–0" rule, as most occurrences are not immediately preceded by "Boeing". I also added a check for preceding "Dash", as in "Dash 8-200"; there were not many of these. I fixed a few hundred Boeing model numbers, not all of which were damaged by this rule. Some even had a minus sign instead of a hyphen. I just now noticed the irony in the fact that "Dash 8-200" doesn't use a dash. Thanks to all for the feedback, and I guess we're Done. Chris the speller yack 15:38, 4 December 2020 (UTC)
- Boeing model numbers are the most prone to this, in that they are purely numeric-hyphen-numeric. Most other manufacturers' model numbers have an alphabetic character that stops this rule from hitting, e.g. Annn-nnn for Airbus, though they are sometimes seen incorrectly written without the A. Rosbif73 (talk) 21:01, 3 December 2020 (UTC)
- I will re-enable the rule after adding a check for 7x7-xxx, which will take care of the problem for Boeing models, but first I will clean up about 280 articles that have en dashes in Boeing model numbers. Have you seen similar false positives that are not for Boeing aircraft? Chris the speller yack 19:00, 3 December 2020 (UTC)
- Almost all aircraft designatons worldwide use a typed "hyphen" and not any form of dash, so AWB needs to be adjusted for this reality. In the recent past I have seen AWB pass-throughs aircraft articles that all need reverting for just this reason. Marc is right - it is a annoying. Please fix it. - Ahunt (talk) 12:54, 3 December 2020 (UTC)
canister
AWB suggested cannisters -> canistes. The correct change was cannisters -> canisters. Canistes is a rare genus; the term only appears four time in all WP. I think if someone types "cannisters", they are much more likely to mean "canisters". In fact, Cannisters is a redirect to Canister. MB 02:40, 12 December 2020 (UTC)
- @MB: Fixed -- John of Reading (talk) 08:30, 12 December 2020 (UTC)
Unites States - United States
I have just corrected over 300 of these. Can we have a case sensitive test Unites States - United States (unites states would create a false positive). Thanks ϢereSpielChequers 18:54, 26 November 2020 (UTC)
- I think
<Typo word="United States" find="\b[uU]n(?:ite?|[it]e)[ds]\s*[sS]t(?:ate?|[at]e)[ds]\b(?<!United States)" replace="United States"/>
- should catch that. -- JHunterJ (talk) 00:22, 27 November 2020 (UTC)
- That may be a bit broad. We should avoid text like
In January 2018, United stated that...
(History of United Airlines),Airdrie United stats
(Kevin Watt) andTable 8 on non-SI units states:
(Molar mass constant). Certes (talk) 00:44, 27 November 2020 (UTC)- Agreed, but "united sates" would be safe to add except for one false positive "UNITED SATES". "unite states" also has false positives. ϢereSpielChequers 06:51, 27 November 2020 (UTC)
- Darn, I'll never get my chain of franchised Thai-Vietnamese fast food joints into Wikipedia, now. Mathglot (talk) 09:02, 12 December 2020 (UTC)
- Agreed, but "united sates" would be safe to add except for one false positive "UNITED SATES". "unite states" also has false positives. ϢereSpielChequers 06:51, 27 November 2020 (UTC)
- That may be a bit broad. We should avoid text like
False positive: España anyvowel
In "España Boulevard", JWB (and presumably AWB) wants to change "España ends" to "Españan ends". I think the ñ|a boundary matches \b, so rule "A to An" is trying to change the indefinite article from "a ends" to "an ends". Can we do better? Certes (talk) 19:13, 19 December 2020 (UTC)
- @Certes: AWB doesn't try to change "España ends". GoingBatty (talk) 20:34, 19 December 2020 (UTC)
- Thanks, that's interesting! Perhaps AWB is better than JWB at realising that ñ is a letter. Certes (talk) 21:49, 19 December 2020 (UTC)
False positive: one day
In "Chapter 13: The Jedi", AWB wants to change "Ahsoka gives her one day to surrender" to "Ahsoka gives her one-day to surrender". Could someone please tweak the appropriate rule to fix this false positive? Thanks! GoingBatty (talk) 17:47, 19 December 2020 (UTC)
- Fixed with this edit. -- JHunterJ (talk) 14:49, 20 December 2020 (UTC)
aquire -> acquire
There are currently over 30 of these... MB 22:30, 2 January 2021 (UTC)
- Almost all are citations, though I've not checked whether it's the source or the citing editor who kant spel. Certes (talk) 22:33, 2 January 2021 (UTC)
Date ranges
In Duke of Ferrara and of Modena, AWB typo fixes will change "(1597-1796, 1814-1859)" to "(1597-1796, 1814–1859)" which fixes the second year range but not the first year range. In the same article, it doesn't change ranges such as (1490-1757). Could someone please update the rule so it changes all the ranges or none of the ranges? Thanks! GoingBatty (talk) 04:24, 13 January 2021 (UTC)
- I've noticed similar behaviour in several other articles, though I can't cite a link. Consistently using the wrong sort of dash is probably less bad than becoming inconsistent. Certes (talk) 10:22, 13 January 2021 (UTC)
- Yeah, this has been annoying me for a while. It means I have to manually invoke the dashes script to get everything back to consistency. It was suggested at some point that the regexes from that script be included as part of AWB, which would be significantly more convenient. Ionmars10 (talk) 13:33, 13 January 2021 (UTC)
- List of historical ships in British Columbia is another example where genfixes and typo fixes don't fix all year ranges. I've opened an AWB bug to ask if genfixes can be expanded. I'm also hoping someone can update the typo fix rule. Thanks! GoingBatty (talk) 16:49, 17 January 2021 (UTC)
Degree capitalization
I just saw "Master's degree" changed to "master's degree", and I recall it does that with Bachelor also, but it left "Doctorate degree" capped. Also, degrees are frequently wiki-linked and if so, they are left upper case. Can this be improved? MB 23:08, 23 January 2021 (UTC)
Omits
@Tom.Reding: The new typo fix changes ommits to omitts, but I'm not sure how best to fix that. I think find="\b([oO])mmi(?=t)t*(ted|ting|s)\b" replace="$1mit$2"
works but might be more expensive. Certes (talk) 15:29, 26 January 2021 (UTC)
- Which is basically what I had done before Tom Reding; except with "ted|ting" instead of "t(?:ed|ing)". 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 18:39, 28 January 2021 (UTC)
- So you did. Ah well, it's sorted out now. Certes (talk) 19:10, 28 January 2021 (UTC)
Abbreviations
"e.g," and "i.e," are typos, right? 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 18:39, 28 January 2021 (UTC)
- They are, with \b prefixed to exclude a few oddities. Can we always simply turn the comma into a full stop or do we sometimes need to replace with "i.e.,"? Certes (talk) 19:17, 28 January 2021 (UTC)
- I would always do the latter. 𝟙𝟤𝟯𝟺𝐪𝑤𝒆𝓇𝟷𝟮𝟥𝟜𝓺𝔴𝕖𝖗𝟰 (𝗍𝗮𝘭𝙠) 19:19, 28 January 2021 (UTC)
Localized edit summary
Is it possible to have a localized edit summary instead of the English "typos fixed: x -> y" when editing other Wikipedias using AWB? 85.76.140.34 (talk) 15:49, 10 March 2021 (UTC)
- There isn't an option to change the typo portion of AWB's automatic edit summary, but you could manually change each edit summary before saving your edit. GoingBatty (talk) 04:20, 11 March 2021 (UTC)
- Alternatively, you could use the local equivalent of "typos fixed" as the default edit summary copied to each edit, so it reads "errata fixi – typos fixed: x → y" or whatever. Certes (talk) 11:58, 11 March 2021 (UTC)
Ad-hoc
@Chris the speller: Re this addition, I have been advised that "ad-hoc" is in the OED. -- John of Reading (talk) 07:07, 2 April 2021 (UTC)
- @John of Reading: John, thanks. Sorry that it caused trouble. I changed the rule to allow the hyphen. I guess there won't be an easy way to get the hyphens out of articles that use American English. Chris the speller yack 14:13, 2 April 2021 (UTC)
False positive
I'm not very good at regex but there is a false positive with typo fixing from AWB that is changing "synthases" to "syntheses" which is wrong when referring to the enzyme malate synthase. I have pasted the section of the regex which is making this happen.
<Typo word="Synthesis" find="\b([sS])ynth[ai]s(es|i(?:s(?:e[drs]?)?|ze[drs]?))\b" replace="$1ynthes$2"/>
I can easily spot and stop such edits from happening but it would be very helpful if the regex was tweaked to stop AWB from changing "synthases" to "syntheses". Pkbwcgs (talk) 17:11, 14 April 2021 (UTC)
- "Synthese" and "syntheses" aren't even words so I don't understand why AWB is trying to correct "synthase" and "synthases" which are correct. Pkbwcgs (talk) 17:15, 14 April 2021 (UTC)
- Ah, just realised that I should have put this on Wikipedia talk:AutoWikiBrowser/Typos instead. Pkbwcgs (talk) 17:17, 14 April 2021 (UTC)
- Moved. ~ Tom.Reding (talk ⋅dgaf) 17:28, 14 April 2021 (UTC)
- "Syntheses" is the plural of "synthesis", at least according to some dictionaries. Since "synthase" and "synthases" are also words, this pattern should probably be removed, since it's not an automatic typo fix every time. – Jonesey95 (talk) 18:38, 14 April 2021 (UTC)
- We should probably remove
es|
so it only matches a subset ofsynth.si.*
, notsynth.ses
. That will also skip synthises, but it's unclear whether that is a substitution typo for plural syntheses or a transposition typo for the more common singular synthesis. Certes (talk) 20:02, 14 April 2021 (UTC)
- We should probably remove
- "Syntheses" is the plural of "synthesis", at least according to some dictionaries. Since "synthase" and "synthases" are also words, this pattern should probably be removed, since it's not an automatic typo fix every time. – Jonesey95 (talk) 18:38, 14 April 2021 (UTC)
- Moved. ~ Tom.Reding (talk ⋅dgaf) 17:28, 14 April 2021 (UTC)
- Ah, just realised that I should have put this on Wikipedia talk:AutoWikiBrowser/Typos instead. Pkbwcgs (talk) 17:17, 14 April 2021 (UTC)
Through out
"through out" is almost invariably a typo of "throughout", though the test could be ""through out" -"through out of court"" there are 473 at present, though a few are "through-out" which may be an OK variant. ϢereSpielChequers 07:45, 9 May 2021 (UTC)
- I am concerned it could be a sound-alike for "threw out" (as in the court example). If people don't know the difference between there, their, and they're, can we expect them to distinguish through from threw? --Gronk Oz (talk) 10:03, 10 May 2021 (UTC)
- Chain sinnet, Dave Glinka and Operation Flavius have rare correct uses of "through out", though we shouldn't let a few false positives stop a good fix. We also need to look ahead to skip cases with a hyphen after "out", e.g. "through out-of-pocket payments". A few also deserve a hyphen, such as "through out of hours centres" in Glasgow. Certes (talk) 10:38, 10 May 2021 (UTC)
- Thanks, I have fixed a few manually, but at present I don't have a machine that runs AWB so it is time consuming. I have not yet seen one that would be better as threw, but AWB is not a fully automated system, I'm more relaxed about ones where people have to tweak the result than ones where they will keep coming up as a false positive until someone accidentally accepts it. ϢereSpielChequers 15:52, 19 May 2021 (UTC)
- @WereSpielChequers: I added a new rule "Throughout (2)" to change "through out" to "throughout", except for "through out-" and "through out of". GoingBatty (talk) 16:40, 19 May 2021 (UTC)
- Thanks GoingBatty, much appreciated. Will that also handle "through-out"? ϢereSpielChequers 17:02, 19 May 2021 (UTC)
- @WereSpielChequers: It does now! GoingBatty (talk) 02:20, 20 May 2021 (UTC)
- Thanks GoingBatty, much appreciated. Will that also handle "through-out"? ϢereSpielChequers 17:02, 19 May 2021 (UTC)
- @WereSpielChequers: I added a new rule "Throughout (2)" to change "through out" to "throughout", except for "through out-" and "through out of". GoingBatty (talk) 16:40, 19 May 2021 (UTC)
- Thanks, I have fixed a few manually, but at present I don't have a machine that runs AWB so it is time consuming. I have not yet seen one that would be better as threw, but AWB is not a fully automated system, I'm more relaxed about ones where people have to tweak the result than ones where they will keep coming up as a false positive until someone accidentally accepts it. ϢereSpielChequers 15:52, 19 May 2021 (UTC)
British army (capitalization query)
So I discovered there were 3,726 results for potential incorrect capitalization of British Army. This includes all instance of British army (linked or unlinked). Some of these would not need to be changed, but I am finding that many do, according to MOS:MILTERMS. Is this something that's been looked into previously? Dawnseeker2000 09:42, 18 May 2021 (UTC)
- I occasionally patrol British Amy, but I don't recall discussions here re Army. ϢereSpielChequers 08:42, 20 May 2021 (UTC)
- I recently changed a load of links from
British [[Army]]
to[[British Army]]
, and some[[Army]]
links to[[British Army|Army]]
where context suggested it, but I didn't edit the lowercase ones. British army is usually a typo for the whole British Army, but it can refer to one army (unit) such as British Army Germany. Certes (talk) 10:06, 20 May 2021 (UTC)
'lifelong'
"life long" is fixed into "lifelong" OK. But sometimes the wording is "half-life long ..." , which should not be edited. (In radioactivity, eg neptunium). Can this be excepted? -DePiep (talk) 16:35, 11 May 2021 (UTC)
- @DePiep: I think this could be fixed by changing the rule from
"\b([lL])ife...
to"\s([lL])ife...
- what do you think? GoingBatty (talk) 02:41, 13 May 2021 (UTC)- Yes - I see some potential FPs with
['"]([lL])ifelong
, so I think\s
over\b
as well. ~ Tom.Reding (talk ⋅dgaf) 10:25, 13 May 2021 (UTC) - Would a negative lookahead for " *enough" be better? (Special:Search/"life long enough") Certes (talk) 10:46, 13 May 2021 (UTC)
- I am not familiar enough with typo-regexes to !vote a solution; I do see that
\s
solves the incident. - As for "...*enough": indeed hard to think of an other wording in half-life context. But as
\s
is the simples solution, that would do (unless we find errors in that one). -DePiep (talk) 11:43, 13 May 2021 (UTC)- \s doesn't exclude most the 17 FPs in my search, e.g. Loaded Weapon 1
…clings to life long enough to…
Certes (talk) 12:04, 13 May 2021 (UTC)
- \s doesn't exclude most the 17 FPs in my search, e.g. Loaded Weapon 1
- I am not familiar enough with typo-regexes to !vote a solution; I do see that
- Yes - I see some potential FPs with
- Done. I changed the rule to ignore "life long" preceded by a hyphen or followed by " enough". Chris the speller yack 02:54, 4 June 2021 (UTC)
Year Contact - year Contract
Hi, please could we add a test for "year contact" to "year contract" there are over a hundred current examples. I've done the much rarer month contact, but this needs AWB. ϢereSpielChequers 22:40, 28 June 2021 (UTC)
- @WereSpielChequers: Doing... GoingBatty (talk) 02:54, 29 June 2021 (UTC)
- @WereSpielChequers: Done, with no false positives. GoingBatty (talk) 04:26, 29 June 2021 (UTC)
Scores changed to X to Y, from X–Y.
Hi! Recently did a run, didn't realise this change has happened: https://en.wikipedia.org/w/index.php?title=Graeme_Dott&oldid=prev&diff=1031111390. I can't see any specific thing that I have set up that would cause this to be made as a change - is this new? — Preceding unsigned comment added by Lee Vilenski (talk • contribs) 08:20, 30 June 2021 (UTC)
- @Lee Vilenski: If you load the Graeme Dott article in AWB and click the Typos tab, you can see the regex for the rule is making that change. Looks like JHunterJ added the "From 0 to 1" rule in September 2020. (Please remember to sign your posts on talk pages by typing four keyboard tildes like this:
~~~~
. Or, you can use the [ reply ] button, which automatically signs posts.) GoingBatty (talk) 16:39, 30 June 2021 (UTC)- The from one should probably be removed, it's quite normal wording in sports to say that a team went from %Score% to %score% Best Wishes, Lee Vilenski (talk • contribs) 16:44, 30 June 2021 (UTC)
- @Lee Vilenski: Maybe the rule could be restricted to only work if X is a four-digit year? GoingBatty (talk) 04:41, 2 July 2021 (UTC)
- There are scoring systems that do use points of that size, think decathlon or English billiards. I have no issues with the other typo being fixed, but this isn't going to always be a typo. Best Wishes, Lee Vilenski (talk • contribs) 12:35, 2 July 2021 (UTC)
- I do not believe the rule needs to be removed. It is indeed normal to say that a team went from 2 to 3 (for example), and the rule fixes "the team went from 2-3." to "the team went from 2 to 3." The specialized usage here, the team "went from 10-12 behind to ... 13-12", is something that the AWB user should catch when reviewing the edits, IMO. -- JHunterJ (talk) 14:49, 5 July 2021 (UTC)
- There are scoring systems that do use points of that size, think decathlon or English billiards. I have no issues with the other typo being fixed, but this isn't going to always be a typo. Best Wishes, Lee Vilenski (talk • contribs) 12:35, 2 July 2021 (UTC)
- @Lee Vilenski: Maybe the rule could be restricted to only work if X is a four-digit year? GoingBatty (talk) 04:41, 2 July 2021 (UTC)
- The from one should probably be removed, it's quite normal wording in sports to say that a team went from %Score% to %score% Best Wishes, Lee Vilenski (talk • contribs) 16:44, 30 June 2021 (UTC)
Cite web
A search for "cite web" brings up 1,948 articles where the phrase appears in the visible text (not just the wikitext source). From a quick check, most of these are caused by the template name splitting over two lines, e.g.
The sky is blue.<ref>{{cite web|url=example.com|title=Celestial hue}}</ref>
This produces the usual reference superscript ([1]) in the article body but renders in the References section as
1. ^ {{cite web|url=example.com...
Replacing the newline by a space fixes the problem. Is this the sort of thing that typo fixing should include? Related templates may have similar problems. Certes (talk) 01:54, 1 July 2021 (UTC)
- @Certes: When using AWB/JWB/WPCleaner, the typo rules are not applied within templates. They apparently are applied everywhere when using wikEd, but I'm not familiar with that. How about I submit a bot request instead? GoingBatty (talk) 13:37, 1 July 2021 (UTC)
- @GoingBatty: That sounds like a good idea, thanks. It could be even as generic as fixing
(\{\{\s*\w+)\s*\n\s*(\w+)
→$1 $2
if that's felt to be safe. I'm sure a BFRA would spot any legitimate uses that I may have missed. Alternatively, could it be a new AWB genfix? Certes (talk) 13:51, 1 July 2021 (UTC)- @Certes: You could request a new AWB genfix by using the Phabricator links at Wikipedia_talk:AutoWikiBrowser. GoingBatty (talk) 14:55, 1 July 2021 (UTC)
- Thanks. Although my original report applies to hundreds of cases, I'm finding other types of error which make "cite web" appear, such as }} missing, so I'll try to classify and count them before making a combined proposal. Certes (talk) 15:32, 1 July 2021 (UTC)
- (edit conflict)@Certes: I searched the June 20 database dump for
(\{\{\s*\w+)\s*\n\s*(\w+)
and found 283 matches. Based on this edit and this edit and this edit (and a few others) I made today, I'd be more comfortable with a more specific bot fix, such as(\{\{\s*[Cc]ite)\s*\n\s*(book|journal|news|web|GovTrack)
→$1 $2
and then fix others manually. BRFA filed. GoingBatty (talk) 20:48, 1 July 2021 (UTC)- Thanks, GoingBatty. Not far off my latest guess of 375 (of which I've fixed 15 when refining my regexes). I've identified a couple of similar types of error, and you may (or may not) find the clues here helpful. Certes (talk) 20:58, 1 July 2021 (UTC)
- Should we keep the Phab ticket? A bot will be perfect for an initial run, but it may still be useful to get this into AWB eventually, both to handle future new errors and perhaps to also fix the missing braces which seem to cause the bulk of the errors I identified initially.
- @Certes: AWB should already fix (or at least provide alerts) about the missing braces. Having a new alert for the newline in the middle of the template name would be nice too. GoingBatty (talk) 21:24, 1 July 2021 (UTC)
- @Certes: My bot and I have fixed all the instances of
(\{\{\s*\w+)\s*\n\s*(\w+)
from the June 20 database dump. While some were as simple as changing to$1 $2
, many involved reverting an unsourced addition that broke the template (e.g. text incorrectly added between{{cite
andweb
). Happy editing! GoingBatty (talk) 00:25, 7 July 2021 (UTC)- Thank you very much. I see that you've fixed several hundred articles. Certes (talk) 01:04, 7 July 2021 (UTC)
- @Certes: My bot and I have fixed all the instances of
- @Certes: AWB should already fix (or at least provide alerts) about the missing braces. Having a new alert for the newline in the middle of the template name would be nice too. GoingBatty (talk) 21:24, 1 July 2021 (UTC)
- @Certes: You could request a new AWB genfix by using the Phabricator links at Wikipedia_talk:AutoWikiBrowser. GoingBatty (talk) 14:55, 1 July 2021 (UTC)
- @GoingBatty: That sounds like a good idea, thanks. It could be even as generic as fixing
Past records
We'll need to exercise discretion when changing past records → records
. Music articles tend to use the phrase, e.g. The Burning World (album) differs from the sound from their past records
(but not their future records). Certes (talk) 00:31, 8 July 2021 (UTC)
Webiste-Website
Hi, I have been doing the occasional check for "webiste" for a decade or so. But I'm pretty sure it would merit an AWB test. ϢereSpielChequers 16:49, 9 July 2021 (UTC)
- @WereSpielChequers: The existing "Website (1)" rule already fixes "webiste" (e.g. this edit) GoingBatty (talk) 20:20, 9 July 2021 (UTC)
Chair of the board
I just made an edit to Anna Wishart which changed the capitalisation of 'Chair'. However the full phase was 'Chair of the Board' and it left the 'Board' capitalised. Having googled the grammar behind this it seems it should all be lower case. I looked through the rules and it looks like a rule should already be catching this but my regex isn't so hot, and it clearly missed this one. Could someone check / update the rule if there's no particular reason this case shouldn't have been fixed? Jamesmcmahon0 (talk) 14:35, 9 August 2021 (UTC)
- I made an edit for the rule for 'Chairman of the board', now called 'Chair of the board' - I tested my regex against a few pages found through the DB dump. Seems fine to me but I'd appreciate if someone could check! Jamesmcmahon0 (talk) 15:11, 9 August 2021 (UTC)
Typo word relay
The typo word "relay" seems to miss examples that I would expect it to catch, I had a look at the regex and I'm either missing something or it's more complicated than it needs to be...
The current regex is <Typo word="Relay" find="\b4(?: (?:x\s?|×)|[x×]\s?)([248]00|15?00)\s*m\b" replace="4 × $1 m"/>
I think \b4\s?[x×]\s?((?:[248]|15?)00)\s*m\b)
would be simpler and more optimised?
It doesn't seem to catch cases where the ×
character is already in use e.g. Jamaica I'm not sure what's going on there. I would also think it could be extended to catch metres: \b4\s?[x×]\s?((?:[248]|15?)00)\s*m(?:etres?)?\b)
(with the relevant replacement)? Jamesmcmahon0 (talk) 10:46, 13 August 2021 (UTC)
- The complication avoids replacing good text by itself, which would pollute the edit summary and might cause other problems. Does Jamaica need correction? Certes (talk) 12:57, 13 August 2021 (UTC)
Belarussian
I would like to suggest the addition of 'Belarussian' (currently around 500). People of Belarus are Belarusian.
Suggestion is; <Typo word="Belarusian" find="\b[Bb]elarussian\b" replace="Belarusian"/>
- Neils51 (talk) 21:20, 17 August 2021 (UTC)
- @Neils51: Looks like "Belarussian" is a valid alternate spelling - see https://www.dictionary.com/browse/belarussian GoingBatty (talk) 21:34, 17 August 2021 (UTC)
- Thanks GoingBatty. I note though that if an attempt has been made to use the 'Belarussian' spelling in an article title that a redirect has been put in place. All good. - Neils51 (talk) 04:15, 18 August 2021 (UTC)
ladies' singes
Looking through our twenty articles that refer to "ladies' singes" I couldn't find any that were euphemisms for alternatives to waxing, they were all typos for "ladies' singles". So a rule would make sense. "singes title(s)" was another 11. ϢereSpielChequers 22:57, 19 August 2021 (UTC)
Suggestion: "Head of (Department|Operations) etc.
And likely others. Dawnseeker2000 00:51, 24 August 2021 (UTC)
- @Dawnseeker2000: Hi there! Could you please clarify what change you want the typo rule to make? Thanks! GoingBatty (talk) 04:11, 24 August 2021 (UTC)
- Yes, sorry. I am dovetailing off the work that Chris the speller has been doing now for quite some time. He's been creating rules for capitalization changes for job titles and this suggestion is kind of along those lines. Dawnseeker2000 04:59, 24 August 2021 (UTC)
Question about "Vice" rules
I see there are rules to sometimes change "Vice-President" to "vice-president" and other rules to change it to "Vice-president" (as we see when running the typo rules on the August 1966 article). Could @Chris the speller: or someone else help me understand the difference between "Vice-President", "Vice-president", and "vice-president"? Thanks! GoingBatty (talk) 02:03, 14 September 2021 (UTC)
- I don't know specifically regarding the MoS, but noticing that Vice Admiral redirects to Vice admiral. Dawnseeker2000 02:38, 14 September 2021 (UTC)
- See MOS:BIO, the main section "Titles of people", which says "When hyphenated and capitalized, e.g. Vice-president (as it is usually spelled in contexts other than US politics), the element after the hyphen is not capitalized." So, we have Vice President Harris in the US, where we don't hyphenate the title, but Vice-president Boissezon in Mauritius, where they do hyphenate it. In my experience with hyphenated titles, though, the presence or absence of hyphens is fairly unpredictable. But when using generic terms, lower case is used for all: "As vice president, he mostly attended ribbon-cutting ceremonies" or "She was vice-president until her death". Hope this helps. Chris the speller yack 04:21, 14 September 2021 (UTC)
- After looking at the August 1966 article, the title of Lorenzo Guerrero could be kept in all upper case if the hyphen is removed to match the WP article Vice President of Nicaragua. There's a ton of sloppiness in the hyphenation of titles. Chris the speller yack 04:31, 14 September 2021 (UTC)
- @Chris the speller: Done, as well as removing the hyphen to match the WP article Vice Chairman of the Chinese Communist Party. Thanks! GoingBatty (talk) 05:12, 14 September 2021 (UTC)
- After looking at the August 1966 article, the title of Lorenzo Guerrero could be kept in all upper case if the hyphen is removed to match the WP article Vice President of Nicaragua. There's a ton of sloppiness in the hyphenation of titles. Chris the speller yack 04:31, 14 September 2021 (UTC)
- See MOS:BIO, the main section "Titles of people", which says "When hyphenated and capitalized, e.g. Vice-president (as it is usually spelled in contexts other than US politics), the element after the hyphen is not capitalized." So, we have Vice President Harris in the US, where we don't hyphenate the title, but Vice-president Boissezon in Mauritius, where they do hyphenate it. In my experience with hyphenated titles, though, the presence or absence of hyphens is fairly unpredictable. But when using generic terms, lower case is used for all: "As vice president, he mostly attended ribbon-cutting ceremonies" or "She was vice-president until her death". Hope this helps. Chris the speller yack 04:21, 14 September 2021 (UTC)
Fend of - fend off
I have just corrected 36 "fend of" to "fend off" there are also some in quotes but no false positives. ϢereSpielChequers 22:41, 17 September 2021 (UTC)
Capitalization
In this edit, Board of Trustees is lc'ed in one sentence but not in the next. I don't see any difference. Can someone explain. MB 16:49, 22 September 2021 (UTC)
- The regex only matches when Board of Trustees is followed by ".", ";" or certain words such as "and". The first sentence matches "."; the second continues with "are" which isn't in the list of matched words. Certes (talk) 17:21, 22 September 2021 (UTC)
- Can this be improved in any way. I noticed this because the two uses were nearby. If they have not been, I may have accepted the change and left an inconsistency within the article. MB 15:00, 6 October 2021 (UTC)
Offshore
There are over 25,000 uses of offshore, and a couple of thousand of "off shore" or "off-shore". Shouldn't it always be "offshore"? MB 14:56, 6 October 2021 (UTC)
- The major online dictionaries don't list "off shore" or "off-shore", so I think you're right. See also onshore, but beware of sailors on shore leave, etc. Certes (talk) 15:43, 6 October 2021 (UTC)
New Additions
"Old, stable rules (>1 year since last edit) can be sorted into their appropriate sections." How about something like this <!--CCYYMMDD-->
as a suffix to each new addition, containing last edit date? (removed when moved; later, script/code could do the moves) - Neils51 (talk) 21:14, 5 November 2021 (UTC)
Museum
The following misspellings appear to exist;
musuem, musueum, museuem mueseum, muesuem muesem, muesum. Suggest the following update to the regex.
From find="\b([mM])usu?em(s)?\b"
to find="\b([mM])ue?su?e?u?e?m(s)?\b"
.
- Neils51 (talk) 12:04, 11 October 2021 (UTC)
- @Neils51: Your suggestion would also match the correct spelling of "museum", which we avoid with these typo rules. I have expanded the "Museum" rule to catch these misspellings, and will run AWB to fix them all. Thanks! GoingBatty (talk) 13:19, 11 October 2021 (UTC)
- @Neils51: Fixed 22 with the expanded typo rule, 31 manually, added 1 {{not a typo}}, added 14 {{R from misspelling}}, and submitted 1 {{rename media}} request. GoingBatty (talk) 14:46, 11 October 2021 (UTC)
- Excellent, thanks! - Neils51 (talk) 21:11, 11 October 2021 (UTC)
- Need to add musesum to the list. - Neils51 (talk) 03:07, 8 November 2021 (UTC)
- @Neils51: Added to the typo rule, Fixed 12 articles (most of which had the typo in an area that the typo rule wouldn't fix it, so I fixed them manually). GoingBatty (talk) 04:15, 8 November 2021 (UTC)
- Thanks again. I think that each year there should be a word that wins a prize for the most ways that editors can find to misspell it. Might need to be a 'silent' award else some may endeavor to game it. - Neils51 (talk) 04:24, 8 November 2021 (UTC)
- Need to add musesum to the list. - Neils51 (talk) 03:07, 8 November 2021 (UTC)
- Excellent, thanks! - Neils51 (talk) 21:11, 11 October 2021 (UTC)
Skiier(s)
Suggested addition, skiier, skiiers. Average is around 2 a month. Perhaps the following? - \b([sS])ki(?:i+)er(s?)\b - $1kier$2
- Neils51 (talk) 02:19, 11 November 2021 (UTC)
- @Neils51: Added the rule (it's rule 4000!) which fixed 9 misspellings. Fixed other misspellings manually. Also added {{R from misspelling}} to some redirects and submitted a request to rename Category:Harvard Crimson skiiers. GoingBatty (talk) 03:23, 11 November 2021 (UTC)
- Thanks @GoingBatty:, I trust you are fine with doing it this way. I have done a little regex work in a previous life however I would rather make suggestions and bow to the superior experience of you and others than make a mess of the list! - Neils51 (talk) 07:17, 11 November 2021 (UTC)
- @Neils51: When you're ready, be bold and add your own rules. This is a collaborative friendly environment where we all help each other and tweak the rules together, and would be happy to have you join in! GoingBatty (talk) 13:37, 11 November 2021 (UTC)
- Thanks @GoingBatty:, I trust you are fine with doing it this way. I have done a little regex work in a previous life however I would rather make suggestions and bow to the superior experience of you and others than make a mess of the list! - Neils51 (talk) 07:17, 11 November 2021 (UTC)
Enmedio
Another one for the avoid list. ("Emm-") - Neils51 (talk) 19:53, 19 November 2021 (UTC)
- @Neils51: Fixed the "Emm-" rule. GoingBatty (talk) 18:13, 21 November 2021 (UTC)
- Thanks for that! - Neils51 (talk) 19:49, 21 November 2021 (UTC)
Lowercase company
@Chris the speller: You added the "lower-case c" rule which changes "Company" to "company". In this edit, Bebington reverted my changes (which included several instances of "Company" to "company"), stating "Company has a capital when referrng to a specific company when it is a part of its title. compsny would be the generic". Could you two please discuss what the proper capitalization should be? Thanks! GoingBatty (talk) 18:03, 21 November 2021 (UTC)
- Bebington should read and follow MOS:INSTITUTIONS, which says:
- Generic words for institutions, organizations, companies, etc., and rough descriptions of them (university, college, hospital, church, high school) do not take capitals:
Incorrect (generic): The University offers programs in arts and sciences. Correct (generic): The university offers programs in arts and sciences. Correct (proper name): The University of Delhi offers programs in arts and sciences.
- Just knowing what company or university is being referred to ("the company" vs. "a company") does not constitute a reason for upper case. Chris the speller yack 22:23, 21 November 2021 (UTC)
Rugby lague - Rugby league
Please can we have a rule for "Rugby lague - Rugby league" I'm working my way through a current crop of 31, so I think it common enough to be worthwhile. ϢereSpielChequers 21:56, 21 November 2021 (UTC)
- @WereSpielChequers: Added! GoingBatty (talk) 22:41, 21 November 2021 (UTC)
- Ta muchly. That saves me adding it to my regular stuff. ϢereSpielChequers 22:42, 21 November 2021 (UTC)
Misplaced sign when space is used as the thousands separator (#2)
Reported here a year ago, but the problem (changes: "5 000€" -> "5 €000" etc.) still exists: special:diff/1040830307 Older diffs: special:diff/596345000, special:diff/738453701, special:diff/611337488 85.23.79.231 (talk) 17:24, 22 September 2021 (UTC)
- Bumping this to avoid archiving before it's fixed. 85.23.79.231 (talk) 17:11, 3 November 2021 (UTC)
eg to e.g.
eg is an internet domain country code for Egypt. Can eg be left alone when in a string separated by dots (e.g. www.someplace.edu.eg or www.someplace.gov.eg) MB 01:37, 4 December 2021 (UTC)
- @MB: Probably - could you please give an example where the typo rule incorrectly wants to update a domain? Thanks! GoingBatty (talk) 05:06, 4 December 2021 (UTC)
- It happened in this version, in the external links section, but won't in the current version because the domain is no longer plain text. MB 16:41, 4 December 2021 (UTC)
- @MB: Those email addresses weren't appropriate for the article, and converting them to URLs wasn't appropriate either, so I've deleted them. Any other instances of bad typo fixing? GoingBatty (talk) 17:57, 4 December 2021 (UTC)
- No, but I recall this happened before with .ie (Ireland) and I thought a rule was updated at that time, and this was the same thing. MB 18:06, 4 December 2021 (UTC)
- @MB: Those email addresses weren't appropriate for the article, and converting them to URLs wasn't appropriate either, so I've deleted them. Any other instances of bad typo fixing? GoingBatty (talk) 17:57, 4 December 2021 (UTC)
- It happened in this version, in the external links section, but won't in the current version because the domain is no longer plain text. MB 16:41, 4 December 2021 (UTC)
Publishers Weekly
I've fixed a few mentions of Publisher's Weekly, which should presumably refer to Publishers Weekly (example). I'm am wary of continuing as we have about 1000 cases, suggesting that I may be the one out of step here rather than a thousand other editors. A sanity check would be welcome before I go further. Also, do we have a bot or other process for handling widespread errors, or is it better to continue manually? The only false positives I've found so far are cases like Publishers Weekly#cite_note-twsOctJ22-22, which quotes a source describing Publisher's Weekly [sic]. Certes (talk) 12:52, 7 December 2021 (UTC)
- To my surprise, we already have the typo listed. These cases must be a combination of articles which AWB hasn't visited recently, and parameters such as
|website=
in templates which AWB would skip. Certes (talk) 16:36, 7 December 2021 (UTC)- This does look like one of those errors that are common within references. With the complication that search doesn't easily differentiate between Publisher's and Publishers. So a bespoke AWB run is probably needed, I'd do it but I don't currently have a machine that runs windows. ϢereSpielChequers 22:16, 7 December 2021 (UTC)
- Thanks for the feedback. I'll do another 100 or so for now, then finish the job if they attract no adverse comments. I don't use Windows either, but find AWB usable on Linux and use JWB for simple stuff like this. Certes (talk) 23:53, 7 December 2021 (UTC)
- This does look like one of those errors that are common within references. With the complication that search doesn't easily differentiate between Publisher's and Publishers. So a bespoke AWB run is probably needed, I'd do it but I don't currently have a machine that runs windows. ϢereSpielChequers 22:16, 7 December 2021 (UTC)
- @Certes: Remember that AWB's typo rules don't fix text within italics, and Publisher's Weekly would probably be in italics when in prose. Besides a dedicated run, the best bet would be for us to duplicate the rule to our default Find and Replace rules, so we fix the typos while we're doing other things. GoingBatty (talk) 04:07, 8 December 2021 (UTC)
- I've completed a dedicated run in batches, leaving a few older citations of The Publishers' Weekly where appropriate. Thanks for the advice. Certes (talk) 01:30, 12 December 2021 (UTC)
Gold medals
In AWB, I saw it remove the first hyphen in "gold-medal-winning team". I think it should stay and the second hyphen should be changed to an en-dash.BillFlis (talk) 11:18, 16 December 2021 (UTC)
Presidents, etc.
Some uneven behavior currently: President --> president, Vice-President --> Vice-president, Treasurer is unchanged.BillFlis (talk) 12:31, 16 December 2021 (UTC)
- ... and Vice-Chairman --> vice-chairman.BillFlis (talk) 12:37, 16 December 2021 (UTC)
- ... and Directors also remains unchanged. This unevenness looks very odd when multiple corporate officers are discussed in the same paragraph.BillFlis (talk) 12:57, 16 December 2021 (UTC)
Re: the "kbit" rule under SI units. "KB" (or kB) is often used to denote kilobytes, not kilobits and ought not be changed.BillFlis (talk) 15:24, 16 December 2021 (UTC)
Currently, "Academy" is getting de-capitalized, in this, the original name of the University of Pennsylvania.BillFlis (talk) 16:53, 16 December 2021 (UTC).
Morocco
... is not always capitalized (see [31], for example), but you wouldn't know it from Wikipedia's morocco leather article. In the earliest edits of that Wikiarticle, morocco was not capitalized. Maybe add a "look-behind" for leather?--BillFlis (talk) 21:10, 16 December 2021 (UTC)
Predessor
I've just done a search and found over 40 Predessor(s) that should be Predecessor(s). Could we have a rule please? ϢereSpielChequers 23:38, 9 January 2022 (UTC)
- @WereSpielChequers: Added rule and Fixed all "Predessors". GoingBatty (talk) 00:17, 10 January 2022 (UTC)
- Ta muchly. ϢereSpielChequers 09:58, 10 January 2022 (UTC)
Hyphenation rules
They apparently don't work correctly for a leading parenthesis: in "(1808-1810, 1812–1813)", only the second hyphen got changed to an en-dash.BillFlis (talk) 13:26, 16 December 2021 (UTC)
- ... and "(1920-1921)" is left unchanged.BillFlis (talk) 13:28, 16 December 2021 (UTC)
Another issue: GA-RT-22 reported on my talk page that it incorrectly changes an album's catalog number on the Ronnie Spector article. GoingBatty (talk) 23:09, 16 January 2022 (UTC)
University
I'm just going through a few universities in a list i have, and i'm finding several University of Foo being replaced by university of Foo; this is incorrect, both in English and according to our MOS. I am not regex-fluent, but is it possible that this behaviour be changed, please? Happy days ~ LindsayHello 12:10, 20 January 2022 (UTC)
- @LindsayH: Could you please provide the names of the articles where the typo correction is incorrect? Thanks! GoingBatty (talk) 14:15, 20 January 2022 (UTC)
- I see this a fair amount. It's usually when an editor would probably have preferred to link to the actual university article page, but either it did not exist at the time or the editor that wrote it did not know it existed. I usually just manually change these to link to the university's article.
- University of [[California]] → [[University of California]]
- Dawnseeker2000 15:33, 20 January 2022 (UTC)
- I see this a fair amount. It's usually when an editor would probably have preferred to link to the actual university article page, but either it did not exist at the time or the editor that wrote it did not know it existed. I usually just manually change these to link to the university's article.
- Hi, GoingBatty, sorry, i probably should have given the link at the time. This edit at Univeristy of Łódź was the time that caught mine attention and brought me here; as i made the edit, i changed back several of the original five substitutions of university for University, but i would prefer not to have to do that ~ apart from anything else i then need to adjust the summary and i notice i changed it to the wrong number that time. Hope this helps. I would imagine that if there's a way to change University but not University of, that would probably meet the need but, as i say, i don't come close to understanding regex. Happy days ~ LindsayHello 15:56, 20 January 2022 (UTC)
- @LindsayH: I think that edit is correct. I'd guess that the user changed "University of Łódź" → "University", and then reparsed so the typo rules changed "University" → "university". GoingBatty (talk) 16:19, 20 January 2022 (UTC)
- GoingBatty Yes, the edit is correct; i'm terribly sorry i just amn't making myself clear: I made the edit with my semi-auto alternate account, Kahtar; it's correct because AWB changed University of Łódź to university of Łódź five times but, before i saved it, i manually changed all of them back or adjusted the wording (mostly deleting of Łódź i think), then reparsed, then manually changed the summary. If it recognised that University of... is only valid with a capital letter, all would have been fine. Happy days ~ LindsayHello 16:43, 20 January 2022 (UTC)
- This also happens at University at Buffalo. The articles has 15-20 cases of "the University at Buffalo" and every time I run AWB on this article, I have to re-cap all of them manually. MB 16:49, 20 January 2022 (UTC)
- GoingBatty Yes, the edit is correct; i'm terribly sorry i just amn't making myself clear: I made the edit with my semi-auto alternate account, Kahtar; it's correct because AWB changed University of Łódź to university of Łódź five times but, before i saved it, i manually changed all of them back or adjusted the wording (mostly deleting of Łódź i think), then reparsed, then manually changed the summary. If it recognised that University of... is only valid with a capital letter, all would have been fine. Happy days ~ LindsayHello 16:43, 20 January 2022 (UTC)
- @LindsayH: I think that edit is correct. I'd guess that the user changed "University of Łódź" → "University", and then reparsed so the typo rules changed "University" → "university". GoingBatty (talk) 16:19, 20 January 2022 (UTC)
- If it helps, i've gone back to the same article and grabbed a screenshot of just what AWB is trying to do. It's still trying on the three instances i reverted on my last edit there. If i've done it correctly, the image should appear... Happy days ~ LindsayHello 17:00, 20 January 2022 (UTC)
- @LindsayH and MB: Fixed both of these issues. Thanks for the examples. GoingBatty (talk) 17:24, 20 January 2022 (UTC)
- Brilliant! Thank you, GoingBatty, Happy days ~ LindsayHello 07:51, 21 January 2022 (UTC)
- @LindsayH and MB: Fixed both of these issues. Thanks for the examples. GoingBatty (talk) 17:24, 20 January 2022 (UTC)
- If it helps, i've gone back to the same article and grabbed a screenshot of just what AWB is trying to do. It's still trying on the three instances i reverted on my last edit there. If i've done it correctly, the image should appear... Happy days ~ LindsayHello 17:00, 20 January 2022 (UTC)
MOS:BADDATE request
For this situation: when a slash is used as a separator in a date range. In this case, it's the displayed range after the pipe.
- [[1978–79 in English football|1978/79]]
Dawnseeker2000 00:00, 19 December 2021 (UTC)
- Done here (hopefully). ― Qwerfjkltalk 18:29, 11 January 2022 (UTC)
- @Qwerfjkl: I think the rule does not work properly on List of historical Greek countries and regions. GoingBatty (talk) 14:31, 12 January 2022 (UTC)
- Should the rule be extended to handle the first millennium (xx0s as well as xxx0s)? Certes (talk) 14:33, 12 January 2022 (UTC)
- Here's the diff:
- Aegean islands (1516
/–1770-1821): most of the islands in the Aegean Sea retained their distinct local governments and charters, flourishing into maritime states. Some would provide sailors to the Ottoman fleet in exchange for advantageous trade agreements.
- Aegean islands (1516
- ― Qwerfjkltalk 17:26, 12 January 2022 (UTC)
- @Qwerfjkl: I had to look at the source to understand the
<del>...</del>
and<ins>...</ins>
tags, but I agree you have captured the impact of the typo rule on this article. I don't know that the typo rule is making a valid correction in this case. GoingBatty (talk) 20:02, 12 January 2022 (UTC)- I'm not sure what the article is trying to say with stuff like (1224/2345-3456). ― Qwerfjkltalk 20:10, 12 January 2022 (UTC)
- @Qwerfjkl: That makes two of us. GoingBatty (talk) 20:23, 12 January 2022 (UTC)
- Not my field, but I guess that autonomy for each island began on various dates between 1516 and 1770, and ended with the Greek War of Independence in 1821. At least it's not been converted to a telephone number. Certes (talk) 23:26, 12 January 2022 (UTC)
- I'm not sure what the article is trying to say with stuff like (1224/2345-3456). ― Qwerfjkltalk 20:10, 12 January 2022 (UTC)
- @Qwerfjkl: I had to look at the source to understand the
- @Qwerfjkl: The rule works as designed on Saint-Étienne-du-Mont, but again I'm not sure the correction is appropriate. GoingBatty (talk) 00:52, 13 January 2022 (UTC)
- @Qwerfjkl: Another article where the correction is not appropriate is Sexual harassment. Is there a way to tweak the rule to prevent the false positives? GoingBatty (talk) 16:34, 13 January 2022 (UTC)
- @Qwerfjkl: I think the rule does not work properly on List of historical Greek countries and regions. GoingBatty (talk) 14:31, 12 January 2022 (UTC)
- Thank you editors. This is producing lots of hits as I've begun a typo-fixing run. Dawnseeker2000 15:55, 13 January 2022 (UTC)
- @GoingBatty: I think I've fixed the issue on List of historical Greek countries and regions with Special:Diff/1065462507. Can you provide examples for the other erros? ― Qwerfjkltalk 17:41, 13 January 2022 (UTC)
- @Qwerfjkl: Saint-Étienne-du-Mont and Sexual harassment are the examples I have found so far. (I've also found articles where the correction is appropriate.) Thanks! GoingBatty (talk) 17:48, 13 January 2022 (UTC)
- What's the issue on Saint-Étienne-du-Mont? This?
…
File:St. Etienne du Mont, Facade by Henry Fox Talbot.jpg|St. Etienne du Mont, Facade by Henry Fox Talbot, circa 1853/–58.
File:Paris - Saint Etienne-du-Mont.jpg|Turn of the century
… ― Qwerfjkltalk 17:55, 13 January 2022 (UTC)- @Qwerfjkl: Correct! I wonder if "1853/58" means 1853 or 1858, while "1853–58" seems like a date range. GoingBatty (talk) 19:55, 13 January 2022 (UTC)
- It looks okay to me, as it's circa 1853–58. Not sure what could be done about this. ― Qwerfjkltalk 20:26, 13 January 2022 (UTC)
- @Qwerfjkl: Correct! I wonder if "1853/58" means 1853 or 1858, while "1853–58" seems like a date range. GoingBatty (talk) 19:55, 13 January 2022 (UTC)
- What's the issue on Saint-Étienne-du-Mont? This?
- @Qwerfjkl: Saint-Étienne-du-Mont and Sexual harassment are the examples I have found so far. (I've also found articles where the correction is appropriate.) Thanks! GoingBatty (talk) 17:48, 13 January 2022 (UTC)
- I just turned a division into a subtraction here. My fault for not checking, of course, but it would be nice if RETF didn't suggest that change. Certes (talk) 13:05, 28 January 2022 (UTC)
- That happened to me also. MB 15:55, 28 January 2022 (UTC)
- Maybe we should consider limiting this to resolve sports-related content first. That is where I first stared noticing issues with a slash as a separator and many fixes exist in this space alone. The script is not complete as there is much to be said about the issue with uncertain birth/death years and math now. I suggest using "seasons" and or similar words as limiters. Thanks as always. Dawnseeker2000 16:11, 28 January 2022 (UTC)
- I have no problem if it's removed, as it seems to be causing lots of false positives. ― Qwerfjkltalk 21:02, 28 January 2022 (UTC)
- I think it's salvageable. There needs to be a fix in this area; it just needs to be refined and limited a bit. I'll take a stab at it. Dawnseeker2000 22:34, 30 January 2022 (UTC)
Was happened
Wikipedia records that 61 events "was happened". (Another three "was happened upon", which seem correct.) "Happen" can be transitive (e.g. 2 Peter 2:22), but such use is archaic and usually unintentional. Should we be fixing "was happened(?! upon)" → "happened" or similar? I've fixed three that "were happened" – probably not worth automating. Another 27 things "is happened", but some are legitimate quotes of ancient texts. Certes (talk) 18:31, 1 February 2022 (UTC)
- Interesting. There appear to be cases for 'happened', 'had happened' and 'was happening'. Perhaps the alert items need to have an addition around 'construction' flagging a possible manual intervention requirement? Neils51 (talk)
Cat-like
AWB changed cat-like to catlike in Dog. It ignored the term dog-like which I manually changed to doglike after checking merriam-webster.com. Both were reverted. AWB also skipped over wolf-like and fox-like. I am wondering if there should be consistency to how AWB treats cat-like, dog-like, wolf-like and fox-life (and whether either form is allowed). Also if cat-like is acceptable or correct should cat-like be ignored? Kaltenmeyer (talk) 00:29, 7 March 2022 (UTC)
- @Kaltenmeyer: I agree that the typo rules should treat "cat-like", "dog-like", etc. the same way. However, it may be challenging to expand any rule to include every possible animal. GoingBatty (talk) 03:25, 7 March 2022 (UTC)
Adding comma after MDY?
Can someone help point of the MOS where it says we should update was held on November 2, 2010 to elect all 11 members of the newly formed
to was held on November 2, 2010, to elect all 11 members of the newly formed
? (i.e. adding a comma after 2020). Jonatan Svensson Glad (talk) 00:37, 2 April 2022 (UTC)
- MOS:DATECOMMA MB 00:53, 2 April 2022 (UTC)
- Duh, of course it was that easily named (/me facepalms). I've just never seen that anywhere in writing before, so it looks really weird to me. But if it's in the manual, I won't argue. Jonatan Svensson Glad (talk) 00:56, 2 April 2022 (UTC)
- I call it the "wikicomma". Yes, it's weird, and hard to remember as I would never use it elsewhere. I think the idea is to treat 2020 like a relative clause qualifying November 2 (as in
November 2, All Souls' Day, was foggy.
) Certes (talk) 10:32, 2 April 2022 (UTC)- Really, Certes? Fascinating ~ i wouldn't consider my writing to be correct if i didn't use that second comma. That's what i love about this community ~ different people/generations/educations/preferences all working together; hooray for us! Happy days ~ LindsayHello 10:52, 2 April 2022 (UTC)
- Yeah, here in Sweden we use DMY or the ISO YYYY-MM-DD format, and when using MDY I rarely use comma after the year in my natural writing. Feels even weirder when typing things like
(born March 13, 2020, in New York)
since that is such a short phrase and not a full sentence. Jonatan Svensson Glad (talk) 11:53, 2 April 2022 (UTC)
- Yeah, here in Sweden we use DMY or the ISO YYYY-MM-DD format, and when using MDY I rarely use comma after the year in my natural writing. Feels even weirder when typing things like
- Really, Certes? Fascinating ~ i wouldn't consider my writing to be correct if i didn't use that second comma. That's what i love about this community ~ different people/generations/educations/preferences all working together; hooray for us! Happy days ~ LindsayHello 10:52, 2 April 2022 (UTC)
- I call it the "wikicomma". Yes, it's weird, and hard to remember as I would never use it elsewhere. I think the idea is to treat 2020 like a relative clause qualifying November 2 (as in
- Duh, of course it was that easily named (/me facepalms). I've just never seen that anywhere in writing before, so it looks really weird to me. But if it's in the manual, I won't argue. Jonatan Svensson Glad (talk) 00:56, 2 April 2022 (UTC)
savinging$3
'test saving test'.replace(/(?=([aeiou][bdfgklmnprstvz])\2{2,})(?<=\b(?:[A-Z][a-z]*|[a-z]+))\1\2{3,}(e(?:d|rs?)|i(?:ngs?|ons?|ves?)|ors?)\b/,'$1$2$2$3');
returns "test savinging$3 test". Why is this happening? Wikipedia:AutoWikiBrowser/Typos (diff ~256522285) @ThaddeusB: any insight? — Alexis Jazz (talk or ping me) 16:42, 17 April 2022 (UTC)
- The pattern matches "aving", setting $1 to "av" and $2 to "ing". There are only two captures – ([a... and (e(... – so $3 is unset and just returns "$3". "aving" is replaced by "av" + "ing" + "ing" + "$3". Certes (talk) 17:55, 17 April 2022 (UTC)
- ...er... what's that \2 doing to the left of capture 2? That can't be right. Certes (talk) 18:13, 17 April 2022 (UTC)
- Certes (or anyone), so it may be broken (but it doesn't break AWB? we'd have heard sooner?) any idea what this replacement is even supposed to do? — Alexis Jazz (talk or ping me) 20:42, 17 April 2022 (UTC)
- Amending the pattern to
/(?=([aeiou])([bdfgklmnprstvz])\2{2,})(?<=\b(?:[A-Z][a-z]*|[a-z]+))\1\2{3,}(e(?:d|rs?)|i(?:ngs?|ons?|ves?)|ors?)\b/
(adding two brackets in red) would make it reduce the number of consecutive identical consonants to two in typos like "gettting" and "scisssors". But I'm not sure why we're picking on this particular pattern. Almost all triple letters are wrong, and the false positives have an upper case initial (e.g. Rossshire) with very few exceptions such as Riot grrrl. Certes (talk) 20:55, 17 April 2022 (UTC) - Any replacement also needs to avoid changing www.example.com and similar (which this regexp does by excluding w from the consonant list). Certes (talk) 11:05, 18 April 2022 (UTC)
- Certes, interesting. One more question: I assume AWB isn't affected as an expression that mangles every instance of common words like "saving" or "living" would have been caught years ago. Any idea why? — Alexis Jazz (talk or ping me) 14:21, 18 April 2022 (UTC)
- @Alexis Jazz, the \2 before the second capture group is defined might lead it to be ignored? Qwerfjkltalk 14:36, 18 April 2022 (UTC)
- @Alexis Jazz: The diff in your first post here dates from 2008. The rule was edited by Special:Diff/976913898 in 2020, and has probably not achieved anything since then. -- John of Reading (talk) 15:02, 18 April 2022 (UTC)
- John of Reading, I missed that, you found the parentheses Certes was talking about! What do you mean when you say "has probably not achieved anything since then"? This issue was discovered by Qwerfjkl 7 hours and 1 minute after I added RegExTypoFix support to Bawl. Considering the number of users AWB has it seems unlikely this would go unnoticed for some one and a half year if AWB was actually affected. So I'd assume somehow AWB and Bawl implement this list differently, and one of them might be suboptimal, but I don't know which. @Smasongarrison: why did you remove them? — Alexis Jazz (talk or ping me) 15:55, 18 April 2022 (UTC)
- @Alexis Jazz: Yes, AWB and Bawl must be using different regex engines behind the scenes. The rule currently tries to make use of a numbered capture group before it's been defined, so it's an edge case that might turn out differently in different implementations. I'm going to put those parentheses back, as with those in place I can see what the rule is trying to do. -- John of Reading (talk) 16:16, 18 April 2022 (UTC)
- John of Reading, thank you! Bawl just uses the browser, so the .replace JS above is what would be running. Perhaps different browsers could yield different results. I'd think JWB should be affected as well, but who knows. Fixed is fixed. — Alexis Jazz (talk or ping me) 16:35, 18 April 2022 (UTC)
- @Alexis Jazz and John of Reading: I scanned an April 1 database dump for the updated regex pattern, and used AWB to fix 60 typos so far, with hundreds more to go. Some of the typos had an additional problem besides the triple consonant, but it was still good that AWB identified the issue. GoingBatty (talk) 04:35, 19 April 2022 (UTC)
- @Alexis Jazz and John of Reading: Done - 189 typos fixed. One false positive was the musician Spellling; I added wikilinks to the article to avoid incorrect fixes. GoingBatty (talk) 15:14, 19 April 2022 (UTC)
- @Alexis Jazz and John of Reading: I scanned an April 1 database dump for the updated regex pattern, and used AWB to fix 60 typos so far, with hundreds more to go. Some of the typos had an additional problem besides the triple consonant, but it was still good that AWB identified the issue. GoingBatty (talk) 04:35, 19 April 2022 (UTC)
- John of Reading, thank you! Bawl just uses the browser, so the .replace JS above is what would be running. Perhaps different browsers could yield different results. I'd think JWB should be affected as well, but who knows. Fixed is fixed. — Alexis Jazz (talk or ping me) 16:35, 18 April 2022 (UTC)
- @Alexis Jazz: Yes, AWB and Bawl must be using different regex engines behind the scenes. The rule currently tries to make use of a numbered capture group before it's been defined, so it's an edge case that might turn out differently in different implementations. I'm going to put those parentheses back, as with those in place I can see what the rule is trying to do. -- John of Reading (talk) 16:16, 18 April 2022 (UTC)
- John of Reading, I missed that, you found the parentheses Certes was talking about! What do you mean when you say "has probably not achieved anything since then"? This issue was discovered by Qwerfjkl 7 hours and 1 minute after I added RegExTypoFix support to Bawl. Considering the number of users AWB has it seems unlikely this would go unnoticed for some one and a half year if AWB was actually affected. So I'd assume somehow AWB and Bawl implement this list differently, and one of them might be suboptimal, but I don't know which. @Smasongarrison: why did you remove them? — Alexis Jazz (talk or ping me) 15:55, 18 April 2022 (UTC)
- Certes, interesting. One more question: I assume AWB isn't affected as an expression that mangles every instance of common words like "saving" or "living" would have been caught years ago. Any idea why? — Alexis Jazz (talk or ping me) 14:21, 18 April 2022 (UTC)
- Amending the pattern to
- Certes (or anyone), so it may be broken (but it doesn't break AWB? we'd have heard sooner?) any idea what this replacement is even supposed to do? — Alexis Jazz (talk or ping me) 20:42, 17 April 2022 (UTC)
- There don't seem to be any similar problems in other regexps on this page: all \2s are to the right of two captures, all $3s have three captures, etc. We caught one stray $3 after the big 2020 optimisation changes but I don't think we checked for \2. Certes (talk) 20:44, 18 April 2022 (UTC)
Proposed additions
I'm considering some new additions listed here and would value any comments before I mess up your list. I've fixed 30+ cases of each in the previous month with few or no false positives. A few suggestions resemble existing fixes but address different typos, e.g. the current entry for Mauritius uppercases the initial M whereas this fix is for misspellings such as Mauritus. Certes (talk) 13:51, 18 April 2022 (UTC)
- @Certes: Most of these look good to me! The article Argentina lists "Argentinian" as an appropriate demonym, so I suggest that your "Argentine" be changed to "Argentinian". If a rule already exists, I hope you consider merging your changes instead of creating a new rule. GoingBatty (talk) 15:44, 18 April 2022 (UTC)
- I changed many Argentinan typos to Argentinian, as it's nearer to the text, but used Argentine for people (where it seems to be preferred) Argentinian throughout isn't wrong and would be an improvement. What's the best way to merge (for example) Mauritius? The tricky bit is avoiding null changes. A negative lookahead before the expression can be expensive, a negative lookbehind after it may not work in all regexp parsers, and separating it as "(M...|m...)" is only paying lip service to the concept of merging. Certes (talk) 20:17, 18 April 2022 (UTC)
- Added. Thanks for the advice. I've labelled the rules with duplicate names "Foo (2)", but if someone can combine them efficiently that might be an improvement. Certes (talk) 14:34, 21 April 2022 (UTC)
Duplicate word=
We have a few duplicated value for word= in the typo list. Do these need to be made unique? List: "-ality", "First (3)", "Its (after)", "Its (before)", "Nonoperational", "Predecessor", "Regardless", "Sanskrit", "Thaw", "e.g.", "east–west", "km²", "north–south", "south–north", "sworn in", "west–east". (I was checking in case I duplicated any, but someone seems to have beaten me to it.) Certes (talk) 22:47, 20 April 2022 (UTC)
- Also, we have a typo entry marked disable=. Should that be disabled=, or are the two equivalent (perhaps anything other than word= works)? Certes (talk) 11:41, 21 April 2022 (UTC)
- @Certes: If I remember correctly, the AWB implementation just checks that "word=" is present, but doesn't do anything else with it. So, yes, changing "word" to anything else will disable a rule. Duplicate names have no effect, but it's easier to refer to a rule in edit summaries and discussions if they are unique. It's time I downloaded the source code again. -- John of Reading (talk) 14:55, 21 April 2022 (UTC)
"libration war"
Hi, we currently have 107 examples of "libration war", please can they be changed to "liberation war"? Ta ϢereSpielChequers 21:37, 27 April 2022 (UTC)
- In progress, done. Neils51 (talk) 03:37, 28 April 2022 (UTC)
MilliWatt = MediaWiki
<Typo word="W (watt)" find="([\d\.]+(?:[−―–—\s]| )?[µmkMGT])w\b" replace="$1W"/>
changes ".mw-first-heading" (a CSS class of #firstHeading) to ".mW-first-heading". For a non-code example, the ccTLD for Malawi (http://www.registrar.mw/) also matches. Found only one three live bad replacements: 2004 New Zealand local elections (diff 457286863), Gulf University for Science and Technology (diff 708765198) and What's Going On up There? (diff 660471548). — Alexis Jazz (talk or ping me) 04:06, 4 May 2022 (UTC)
- @Alexis Jazz: Could we fix this by ensuring a digit appears before the period, such as this:
find="(\d[\d\.]*(?:[−―–—\s]| )?[µmkMGT])w\b"
GoingBatty (talk) 12:37, 4 May 2022 (UTC) - ...or indeed after the period with just
find="(\d(?:…
, as ".123 mW" seems more likely than "123. mW". That also avoids domains such as "source123.mw". Certes (talk) 13:30, 4 May 2022 (UTC)- Certes, GoingBatty, ensuring there's a digit sounds good. The digit would have to appear after the period (if there is a period) as .1mW is sometimes used for 0.1mW. — Alexis Jazz (talk or ping me) 14:24, 4 May 2022 (UTC)
- @Alexis Jazz Fixed! GoingBatty (talk) 18:33, 4 May 2022 (UTC)
- GoingBatty, thanks! I think originally it was meant to also match 5.−mw. Seems like an unusual way to write to me (it's more common for prices?), but the −―–— is probably not really needed anymore when not matching a period. Edit: you're right, matching "48-kw engine" makes more sense. — Alexis Jazz (talk or ping me) 12:03, 5 May 2022 (UTC)
- @Alexis Jazz I think the dashes are needed for something like "a 48-kw engine". GoingBatty (talk) 14:16, 5 May 2022 (UTC)
- GoingBatty, thanks! I think originally it was meant to also match 5.−mw. Seems like an unusual way to write to me (it's more common for prices?), but the −―–— is probably not really needed anymore when not matching a period. Edit: you're right, matching "48-kw engine" makes more sense. — Alexis Jazz (talk or ping me) 12:03, 5 May 2022 (UTC)
- @Alexis Jazz Fixed! GoingBatty (talk) 18:33, 4 May 2022 (UTC)
- Certes, GoingBatty, ensuring there's a digit sounds good. The digit would have to appear after the period (if there is a period) as .1mW is sometimes used for 0.1mW. — Alexis Jazz (talk or ping me) 14:24, 4 May 2022 (UTC)
Hyphenated phrase
The hyphen is not removed from "less-populated". MB 04:11, 19 April 2022 (UTC)
- @MB I just added a rule for you to fix both "less-populated" and "more-populated". GoingBatty (talk) 04:34, 19 April 2022 (UTC)
I'm getting a lot of -- what I consider -- false positives for the ly-hyphens. Can somebody point me in the direction of the styleguide for that rule? Smasongarrison (talk) 18:46, 9 May 2022 (UTC)
- @Smasongarrison See the response I received from BD2412 on Wikipedia_talk:AutoWikiBrowser/Typos/Archive_4#privately-. Happy editing! GoingBatty (talk) 18:50, 9 May 2022 (UTC)
- thanks! Smasongarrison (talk) 18:53, 9 May 2022 (UTC)
Olso
Saw this edit correcting a typo of Oslo, had ran AWB with Regex on that page right before so would have been fixed earlier if it was in. Just made me think it might be worth adding if someone familiar with the process would like to. Cheers! --TylerBurden (talk) 12:50, 21 June 2022 (UTC)
- I fixed about 30 Olso→Oslo typos in April. There are a few dozen false positives, including some typos for also, Olsen, etc. and the usual verbatim quotes of mistyped sources, so I didn't create a rule, but it might be useful if applied carefully. Certes (talk) 14:20, 21 June 2022 (UTC)
MOS:CURLY
@Trebuchette: To what extent have these new rules been tested? After a quick check, using User:John of Reading/X3, I don't think they work in AWB itself, because AWB automatically protects quoted text from typo-fixing. And the "CURLY SINGLE QUOTES" rule could cause formatting damage if a curly quote is placed next to a straight quote, as the resultant double-straight-quote will trigger italic markup. -- John of Reading (talk) 07:04, 6 July 2022 (UTC)
- Also beware of converting italic to bold, as in
Spielberg wrote Amblin´
. Certes (talk) 11:07, 6 July 2022 (UTC) - I had a similar rule on my own setup for '. I never ported it over to typos because well, it does break some formatting. However, I've used this rule with good success. Mason (talk) 14:51, 6 July 2022 (UTC)
<Typo word="z" find="(?<=[\]\)A-Za-z])[´ˈ׳᾿’′Ꞌꞌ`]" replace="'" />
- I think that MOS for quotes is safe. I have a slightly broader version that doesn't seem to have problems.
<Typo word="5" find="[«»“”„″]" replace=""" />
— Preceding unsigned comment added by Smasongarrison (talk • contribs) 14:51, 6 July 2022 (UTC)
Testing with JWB
Perhaps everyone else knew this already, and there may well be an easier way to do it, but I've finally found a way to test new additions without riskily adding them to the public list or going through the tedious and error-prone process of copying and pasting every regexp into the UI. To add a custom set of typos in a format matching AWB/T to the list, start JWB, invoke the browser's JavaScript console and paste
RETF.list = []; // Empty the list - only needed for iterative testing
(new mw.Api()).get({
action: 'query',
prop: 'revisions',
titles: 'User:Example/typos', // Substitute the title of your typo list page here
rvprop: 'content',
rvlimit: '1',
indexpageids: true,
format: 'json',
}).done(RETF.buildList);
Omit the first line to retain the standard list, but it's useful to get rid of a broken custom list before retesting after a fix. The titles: line can be any Wikipedia page, e.g. User:You/sandbox. Certes (talk) 21:03, 20 April 2022 (UTC)
- Certes, in Bawl you can now enter a custom page title to be used for RegExTypoFix. Only one page will be used so if a custom title is given the regular RETF won't be used. To test your entries, enter a page title for RETF to use instead of the default, save the settings, enter some text, press the magnifying glass and press the AWB RegExTypoFix button. Bawl will immediately report which (if any) rules matched something. Afterwards, empty the custom page title and save the settings to revert back to the title that is associated with your wiki according to d:Q6585066. — Alexis Jazz (talk or ping me) 22:37, 17 May 2022 (UTC)
- Thanks. I've not been using Bawl but it looks useful; I'll investigate it soon. Certes (talk) 22:40, 17 May 2022 (UTC)
- @Certes: Thank you so much for this! I used it to test my most recent change, which had a lot of problems. I'll be using this a lot. -- Beland (talk) 21:43, 21 July 2022 (UTC)
- Thanks. I've not been using Bawl but it looks useful; I'll investigate it soon. Certes (talk) 22:40, 17 May 2022 (UTC)
The US
I'm running typo-fixing right now and came across these two articles in quick succession. It looks like a "The US" typo-fix is trying to repair two phrases in these two articles:
- Moesi: likely to explain the usage of the name → more likely to explain tth USge of the name (typo(s) fixed: Furthermore → Furthermore,)
- New Zealand national rugby sevens team: During the USA Sevens → During tth US Sevens (typo(s) fixed: Covid-19 → COVID-19)
Because the edit summary in parentheses doesn't show anything related to "The US", I don't think there's a typo in:
<Typo word="the US" find="\bthe\s+U\.?S\.?A\.?(?<!Church\s+in\s+the\s+U\.?S\.?A\.?|Girl\s+Scouts\s+of\s+the\s+USA)(?=(?:,|\s+(?:a(?:fter|nd|[st])|by|f(?:or|rom)|in|to|w(?:hen|ith))\s))(?!\s+for\s+Africa)" replace="the US"/>
I see that that typo fix was changed on June 4 [32] but I don't think that that is the cause of this issue. It must be somewhere else.
I've tried isolating the issue as much as I can by using the current version of the typo list and turning off my own custom date formatting module. Dawnseeker2000 21:55, 20 July 2022 (UTC)
- I don't see that problem in either AWB or JWB, even after reloading typos and genfixes. At the risk of being patronising, could you possibly have a local Find & Replace in "Normal Settings" or elsewhere that you've forgotten about? Certes (talk) 22:42, 20 July 2022 (UTC)
- Yeah, that was it. I was experimenting with something related to "The US" and had forgotten about it. I guess that means I might need a small fish. Dawnseeker2000 23:08, 20 July 2022 (UTC)
- I think you have to actually mess up the articles to earn your trout. I've been confused the same way; I recently requested an enhancement to JWB to show a reminder when there are invisible find & replace expressions in force. Certes (talk) 12:40, 21 July 2022 (UTC)
- Thanks for that. Dawnseeker2000 00:43, 26 July 2022 (UTC)
- I think you have to actually mess up the articles to earn your trout. I've been confused the same way; I recently requested an enhancement to JWB to show a reminder when there are invisible find & replace expressions in force. Certes (talk) 12:40, 21 July 2022 (UTC)
- Yeah, that was it. I was experimenting with something related to "The US" and had forgotten about it. I guess that means I might need a small fish. Dawnseeker2000 23:08, 20 July 2022 (UTC)
Non-breaking space rule seems to break things
I'm not sure if this is a quirk for this specific page Af-nest, but as you can see from my [edit https://en.wikipedia.org/w/index.php?title=Af-nest&oldid=1100070362], the rule changed every white space on that page to . I'm not sure what to think about it, but my inclination is that the current implementation breaks more things that it fixes.
<Typo disabled="hidden non-breaking space" find=" " replace=" " /><!--per [[MOS:NBSP]]-->
Mason (talk) 02:42, 24 July 2022 (UTC)
- @Smasongarrison When I try running Af-nest through AWB with Regex typo fixing enabled today, AWB doesn't suggest any changes. GoingBatty (talk) 19:14, 29 July 2022 (UTC)
- I disabled the rule. Did you try it with the rule reactivated? Mason (talk) 19:50, 29 July 2022 (UTC)
- @Smasongarrison I did not. GoingBatty (talk) 20:29, 29 July 2022 (UTC)
- Lol, no worries. I disabled it because I figured that was the least damaging approach. I don't have strong feelings about the rule staying or leaving. Mason (talk) 20:33, 29 July 2022 (UTC)
- @Smasongarrison I did not. GoingBatty (talk) 20:29, 29 July 2022 (UTC)
- I disabled the rule. Did you try it with the rule reactivated? Mason (talk) 19:50, 29 July 2022 (UTC)
Lowercasing titles
I am wondering whether this should be removed from AWB "typo fixing": Edits like this just make the capitalisation inconsistent and look unprofessional. —Kusma (talk) 15:37, 29 July 2022 (UTC)
- Additionally, this kind of things is not a typo, but a stylistic choice. —Kusma (talk) 15:49, 29 July 2022 (UTC)
- @Chris the speller: I believe this user is referring to a rule you added. GoingBatty (talk) 19:26, 29 July 2022 (UTC)
- @Kusma: Let's not lose sight of WP:VOLUNTEER. If a user corrects "berlin" to "Berlin" in an article and does not correct "stuttgart" to "Stuttgart", wouldn't the more helpful action be to correct "stuttgart" yourself? Maybe the editor was changing lots of "berlin" yesterday and plans to change lots of "stuttgart" tomorrow, or next week, or next month; it's his or her time, and his or her choice. If you don't want to help, then it would be best not to interfere. I use a lot of semi-automated tools, and usually make additional changes if they are nearby, as in this case, but I don't scour the whole article every time. And let's not complain about the tools and rules. It is fairly safe to fix "of Deputy Director and", but "Director of" is often part of an official title that is customarily capitalized. And leaving all common job titles in upper case is not a stylistic choice that Wikipedia editors are free to make; we simply don't capitalize common nouns, per MOS:JOBTITLES. Leaving "Doorman" and "Waiter" in upper case looks very unprofessional. I have cleaned up the article that you mentioned. Chris the speller yack 20:03, 29 July 2022 (UTC)
- Changing some job titles to lower case but not all (or doing so incompletely) is what is looking unprofessional. I think it should be done with care, not with rules that are too simplistic. Here is another example: either it is "the Deputy Director for Flight Crew Operations" or it is "deputy director for flight crew operations", but "deputy director for Flight Crew Operations" is just wrong, and AWB should not suggest this. It certainly isn't "fairly safe" to do this. —Kusma (talk) 20:53, 29 July 2022 (UTC)
- And if people use semiautomatic tools to introduce problems (like changing something from a consistent style to an inconsistent style) the most important action is to fix the tool to prevent further bad edits. Precious volunteer time should not be used to clean up after malfunctioning software. —Kusma (talk) 20:56, 29 July 2022 (UTC)
- MOS:JOBTITLES is a stylistic choice that Wikipedia has adopted fairly recently. Not adhering to the manual of style may need to be fixed, but is not a "typo". —Kusma (talk) 21:42, 29 July 2022 (UTC)
- @Kusma: Let's not lose sight of WP:VOLUNTEER. If a user corrects "berlin" to "Berlin" in an article and does not correct "stuttgart" to "Stuttgart", wouldn't the more helpful action be to correct "stuttgart" yourself? Maybe the editor was changing lots of "berlin" yesterday and plans to change lots of "stuttgart" tomorrow, or next week, or next month; it's his or her time, and his or her choice. If you don't want to help, then it would be best not to interfere. I use a lot of semi-automated tools, and usually make additional changes if they are nearby, as in this case, but I don't scour the whole article every time. And let's not complain about the tools and rules. It is fairly safe to fix "of Deputy Director and", but "Director of" is often part of an official title that is customarily capitalized. And leaving all common job titles in upper case is not a stylistic choice that Wikipedia editors are free to make; we simply don't capitalize common nouns, per MOS:JOBTITLES. Leaving "Doorman" and "Waiter" in upper case looks very unprofessional. I have cleaned up the article that you mentioned. Chris the speller yack 20:03, 29 July 2022 (UTC)
- @Chris the speller: I believe this user is referring to a rule you added. GoingBatty (talk) 19:26, 29 July 2022 (UTC)
- I think over-capitalization is an issue here in the encyclopedia and I do what Chris the speller does. When using semi-automated tools, it is necessary to manually fix some of the other instances that AWB doesn't catch. Regex isn't perfect and cannot be used in a way that captures every single example of text that doesn't align with the Manual of Style. I also think that in cases where the software/editor combo doesn't fix every instance, the edit can be a signal to other editors that may be more familiar with the article to make the rest of the changes. In other words, an edit/change isn't required to be perfect; just an improvement. Dawnseeker2000 21:10, 29 July 2022 (UTC)
IMBD
I'm wondering whether to add a typo for IMBD. I'm currently fixing about 400 intended for IMDb, but there are also a few correct references to International Migratory Bird Day which don't lend themselves to a (?!regex). (Also, Turkey has carelessly censored imbd.com [sic].) Certes (talk) 13:03, 25 July 2022 (UTC)
- It looks like there are only two articles with IMBD referring to International Migratory Bird Day. You could create a IMBD -> IMDb rule and add {{not a typo}} on those two articles. GoingBatty (talk) 19:34, 29 July 2022 (UTC)
- Done; thanks. Certes (talk) 23:18, 29 July 2022 (UTC)
- @Certes Should the rule also change IMDB -> IMDb? GoingBatty (talk) 01:11, 31 July 2022 (UTC)
- Good question. I thought it was a bit pedantic to change a reasonable alternative to the stylised version. I certainly wouldn't edit the 17,000 articles which use all caps just for that reason, but it might be worth doing en passant. Certes (talk) 10:31, 31 July 2022 (UTC)
- @Certes Should the rule also change IMDB -> IMDb? GoingBatty (talk) 01:11, 31 July 2022 (UTC)
- Done; thanks. Certes (talk) 23:18, 29 July 2022 (UTC)
"undefined" error
I'm editing Federal Way, Washington with JWB, and RETF would like to change "undefined" to "p$1up$1np$1dp$1ep$1fp$1ip$1np$1ep$1dp$1". It's a recent change to AWB/T, because a session I've had open for a while without reloading AWB/T doesn't attempt that change, but I can't track down which one. Any clues please? Certes (talk) 23:27, 21 August 2022 (UTC)
- @Certes: AWB does not propose any typo changes to that article. GoingBatty (talk) 03:08, 22 August 2022 (UTC)
- Thanks; that's very interesting! It also occurs for other pages such as Undefined itself, though only the first occurrence (which has a capital U). Certes (talk) 11:27, 22 August 2022 (UTC)
- I wonder if this could be a JWB problem. I've asked at User talk:Joeytje50/JWB. Certes (talk) 11:39, 22 August 2022 (UTC)
- One of the typo rules had lost its "find" clause, which might trigger different strange behaviours in AWB and in JWB. Certes, could you try JWB again? Smasongarrison, I have disabled one of your case-changing rules. -- John of Reading (talk) 11:54, 22 August 2022 (UTC)
- Now working again, thank you! I should have thought of that. Certes (talk) 12:41, 22 August 2022 (UTC)
- Interesting! I'll take a look at it when I can. (Classes started today, so it's a tad chaotic...) Mason (talk) 15:34, 22 August 2022 (UTC)
- So I dug back in... and i accidentally removed the "find" https://en.wikipedia.org/w/index.php?title=Wikipedia:AutoWikiBrowser/Typos&oldid=1101334358 on july 30. Hmmmm.... well, at least its an easy fix. I'll have to ponder how I missed that... in my workflow. I'll doublecheck it and then turn it back on in a bit. Mason (talk) 16:36, 27 August 2022 (UTC)
- Interesting! I'll take a look at it when I can. (Classes started today, so it's a tad chaotic...) Mason (talk) 15:34, 22 August 2022 (UTC)
- Now working again, thank you! I should have thought of that. Certes (talk) 12:41, 22 August 2022 (UTC)
- One of the typo rules had lost its "find" clause, which might trigger different strange behaviours in AWB and in JWB. Certes, could you try JWB again? Smasongarrison, I have disabled one of your case-changing rules. -- John of Reading (talk) 11:54, 22 August 2022 (UTC)
Martime - Maritime
Hi, can we have Martime - Maritime added as a rule please? There are about 40 of them. Ta ϢereSpielChequers 01:03, 4 September 2022 (UTC)
- @WereSpielChequers: I created a rule, and used AWB to fix 12 misspellings. The remaining misspellings are in references or are "Martimes". Could you please fix these manually as appropriate? GoingBatty (talk) 20:05, 4 September 2022 (UTC)