Wikipedia:Bots/Requests for approval/SmackBot 35
- The following discussion is an archived debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA. The result of the discussion was Approved.
Operator: Rich Farmbrough (talk · contribs)
Automatic or Manually assisted: Automatic
Programming language(s): Perl/AWB
Source code available: AWB/Perl no.
Function overview: Canonicalise clean up tags to enable dating
Links to relevant discussions (where appropriate): N/A
Edit period(s): Continuousish
Estimated number of pages affected: 0 - this will only be done on pages already being edited.
Exclusion compliant (Y/N): N
Already has a bot flag (Y/N): N
Function details: For some, most or all maintenance tags (templates) on the page the following will be done:
- removal replacement and reduction of leading, inter token and trailing spaces and underscores to the minimum number of spaces required.
- removal of leading :, msg:, template: Template: Msg:
- Replacement of some all or any template names listed on the what links here (redirects only) page by the template name as shown at the top of the page
- Replacement of a large variety of mis-spellings of "date", together with known aliases of date that are not parameters of the templates
- Replacement of a large variety of mis-spellings, abbreviations and translations of month names within the date parameter
- Replacement of a modest variety of mis-formattings, abbreviations of years within the date parameter
- Removal of un-desirable date components (time, day of week, day number time-zone etc)
- Rearrangement of components into monthname 4-digit-year
- Removal of duplicate date parameters
- Removal of certain cruft, vandalism and errors from date parameters
- De-substituting of the template
- Replacement of invalid dates with the current date
For clarity maintenance tags excludes infoboxes, cite templates, navboxes, succession boxes, interwiki sister links (commons, wikitionary, wikisources etc.), portal boxes, convert, language, mark-up and formatting templates: to these only the rule 2 above will be applied.
In addition:
- Mis-spellings of Subst:, use of various DATE/Date templates, substituting of templates such as "fact now", removal or corrections of copy-pastes from template documentation which break the intended syntax and other multifarious, nefarious and toothfarious errors.
- Special re-arrangement and re-formatting where required of dated maintenance templates that do not use a date= parameter
- Certain limited conversions between section versions of templates and templates with a section (list, table...) parameter
- Certain limited conversions between non-stub and stub versions of cleanup templates.
- Certain limited conversions between BLP and non BLP versions of templates.
- AWB's General Fixes, excluding reference ordering, and with limited orphan tagging.
- Replacement of Subst:CURRENTMONTHNAME and Subst:CURRENTYEAR with the build month and year of the ruleset (rules are generally built several times a month, and certainly for each new month) to overcome T4700.
Discussion
[edit]To do its dating task properly SmackBot has evolved many additional rules over and above simply inserting "|date=October 2010" inside templates. The importance of these rules cannot be overstated, and indeed many of them have become part of AWB general fixes, whether by knowledge sharing or independently. It is also the case that many minor fixes that are not essential to dating templates have been added, in order to get the most value out of each edit. By and large these fixes, trivial individually though they are, seem to appreciated by the community, or at least non-contentious. Nonetheless a change on 6th of September resulted in some high WikiDrama a few weeks later which readers may be familiar with. For this reason, and because drama knows no reason, nor yet bounds, I have pulled all SmackBot's custom find and replace rules, and fallen back to running on Full General Fixes (less reference ordering) alone, while I BRFA the more useful rules back. Since there are over 5000 rules, BAGGERs may be alarumned, especially if they are also reviewing Femto Bot 4. Have no fear! The urgent set are covered in this BRFA, the bulk of the rest should be in one additional batch: I will then review what is left.
As I said The importance of these rules cannot be overstated : the proof of the pudding is that without them 85% of pages requiring dating of tags fail to be dated. A rapid approval of this BRFA would be appreciated, whilst I am aware there is a lot here, I hope none if it actually causes any problems. Rich Farmbrough, 22:14, 6 October 2010 (UTC). [reply]
Detailed explanation to Fram
| ||||
---|---|---|---|---|
This was actually out of date even then, including both false positives excluding real redirects: however a glance will show that this is a very incomplete set.
Here are some possibilities. Rich Farmbrough, 16:12, 7 October 2010 (UTC).[reply]
<Replacement>
<Replace>{{$1|$2date=October 2010$3</Replace> <Comment>fix nn nnnnn Year specific:Any ISO date or just year to current</Comment> <IsRegex>true</IsRegex> <Enabled>true</Enabled> <Minor>false</Minor> <RegularExpressionOptions>IgnoreCase</RegularExpressionOptions> </Replacement>
<Replacement> <Find>{{\s*(Citation[ _]+needed|Facts|Citeneeded|Citationneeded|Cite[ _]+needed|Cite-needed|Citation[ _]+required|Uncited|Cn|Needs[ _]+citation|Reference[ _]+needed|Citation-needed|An|Sourceme|OS[ _]+cite[ _]+needed|Refneeded|Source[ _]+needed|Citation[ _]+missing|FACT|Cite[ _]+missing|Citation[ _]+Needed|Proveit|CN|Source\?|Fact|Refplease|Needcite|Cite[ _]+ref[ _]+pls|Needsref|Ref\?|Citationeeded|Are[ _]+you[ _]+sure\?|Citesource|Cite[ _]+source) *([\|}\n])</Find> <Replace>{{Citation needed$2</Replace> <Comment /> <IsRegex>true</IsRegex> <Enabled>true</Enabled> <Minor>false</Minor> <RegularExpressionOptions>IgnoreCase</RegularExpressionOptions> </Replacement>
<Replacement> <Find>{{(Citation[ _]+needed)((?:\|\s*(?:(?:text|reason|category|discuss|topic|1)\s*=[^\|{}]*|[^\|{}=]*))*)}}</Find> <Replace>{{$1$2|date=October 2010}}</Replace> <Comment /> <IsRegex>true</IsRegex> <Enabled>true</Enabled> <RegularExpressionOptions>IgnoreCase</RegularExpressionOptions> </Replacement>
|
Trial run
[edit]How about a 20,000 trial run? Rich Farmbrough, 16:05, 8 October 2010 (UTC).[reply]
5,000? Rich Farmbrough, 20:28, 8 October 2010 (UTC).[reply]
1,000? Rich Farmbrough, 23:34, 10 October 2010 (UTC).[reply]
500? Rich Farmbrough, 19:03, 11 October 2010 (UTC).[reply]
100? Rich Farmbrough, 22:56, 11 October 2010 (UTC).[reply]
20? Rich Farmbrough, 17:50, 12 October 2010 (UTC).[reply]
5? Rich Farmbrough, 22:07, 12 October 2010 (UTC).[reply]
1? Rich Farmbrough, 14:31, 13 October 2010 (UTC).[reply]
- Approved for trial (250 edits). Please provide a link to the relevant contributions and/or diffs when the trial is complete. –xenotalk 14:40, 13 October 2010 (UTC)[reply]
- Excellent. — Preceding unsigned comment added by Rich Farmbrough (talk • contribs) 18:20, 13 October 2010
- Trial complete. here Rich Farmbrough, 01:06, 14 October 2010 (UTC).[reply]
- Trial complete. here Rich Farmbrough, 01:06, 14 October 2010 (UTC).[reply]
- Could you explain these edits? [4] [5] [6] [7] [8][9] [10][11] They don't appear to do anything substantive. –xenotalk 13:10, 18 October 2010 (UTC)[reply]
- Yes these are similar to the edits discussed on the foot of the page. The are items in [Category:Templates with invalid dates] which are there due to the changes to the {{Cleanup}} template on the 30th September which you saw discussed on my talk page that day. You will also have seen the request to clean the category on my talk page since. They will be cleaned out by any edit or within a month or two they will expire from cache. Fortunately or unfortunately this is the first category that SmackBot tackles and exceeds the size of the trial run by a factor of two (usual backlog is 10-15 articles). Rich Farmbrough, 04:31, 22 October 2010 (UTC).[reply]
- Ok, so you chose to do dummy edits instead of null edits - probably not a choice I would have made, and probably not something you should have rolled into this trial which is supposed to cover the bot's normal operations. I don't think that approval should be granted to change the first-letter capitalization of templates when consensus does not exist for a "Ucfirst" schema - they should just be left as-is. –xenotalk 13:44, 22 October 2010 (UTC)[reply]
- Hm. Did you miss the point that no-one objects? Did you miss the point that the vast majority of cleanup templates are ucfirst, a defacto agreement? Did you miss the point that no-one objects? Rich Farmbrough, 03:20, 24 October 2010 (UTC).[reply]
- Rich, the majority of cleanup templates being ucfirst is because your bot changed them to be that way. Please obtain consensus for your personal belief that templates should always be ucfirst. –xenotalk 03:28, 24 October 2010 (UTC)[reply]
- I read that a couple of times today, and apparently I have only imagined that several people objected to that, so let me state it here clearly:
I now object to a bot changing capitalization of the first letter of a transcluded templates if that's all it does to that transclusion.
Amalthea 18:04, 24 October 2010 (UTC)[reply]- We are talking cleanup templates here. But Amlathea's objection is the precise wording of what's needed to get this running again then someone say so. Rich Farmbrough, 04:48, 31 October 2010 (UTC).[reply]
- We are talking cleanup templates here. But Amlathea's objection is the precise wording of what's needed to get this running again then someone say so. Rich Farmbrough, 04:48, 31 October 2010 (UTC).[reply]
- Hm. Did you miss the point that no-one objects? Did you miss the point that the vast majority of cleanup templates are ucfirst, a defacto agreement? Did you miss the point that no-one objects? Rich Farmbrough, 03:20, 24 October 2010 (UTC).[reply]
- Ok, so you chose to do dummy edits instead of null edits - probably not a choice I would have made, and probably not something you should have rolled into this trial which is supposed to cover the bot's normal operations. I don't think that approval should be granted to change the first-letter capitalization of templates when consensus does not exist for a "Ucfirst" schema - they should just be left as-is. –xenotalk 13:44, 22 October 2010 (UTC)[reply]
- Yes these are similar to the edits discussed on the foot of the page. The are items in [Category:Templates with invalid dates] which are there due to the changes to the {{Cleanup}} template on the 30th September which you saw discussed on my talk page that day. You will also have seen the request to clean the category on my talk page since. They will be cleaned out by any edit or within a month or two they will expire from cache. Fortunately or unfortunately this is the first category that SmackBot tackles and exceeds the size of the trial run by a factor of two (usual backlog is 10-15 articles). Rich Farmbrough, 04:31, 22 October 2010 (UTC).[reply]
Footnotes
[edit]- ^
Moderately long explanation about template name diffusion
Clearly some of these names are more likely than others (about 91 possibilities are in use), however the key factor here is that people tend, quite reasonably, to replicate the tag names they have seen: both literally and by analogy. If all a given editor sees is "One source" they will tend to replicate that: they may of course use "One Source" or "Onesource" (especially if they have been exposed to run together words in other template names) or even "OneSource": this is all well and good except when they get red links{*} and get frustrated.
It is, for this reason, perfectly wise and helpful to these editors to create a small array of redirects - it would be better and more efficient if, for example we knew that of 100,000 attempts to enter "One source" there were 10,000 "Onesource" and only one "OneSource", we would probably create the first redirect and not bother with the second - or at least it would inform our decision on a similar template that is only expected to be used 10 times. - but we haven't much data on that as far as I know, although I have gathered a little on template name-space diffusion and consolidation, relating to the template redirect {{Infobox actor}} and its former redirects.
A problem arises, however, if we leave these template redirects languishing in articles forever. The sample editors are seeing is now, let us say, the six actaul redirects to One source (excluding T:SINGLE and T:ONES.
- {{One source}}
- {{Singlesource}}
- {{Single source}}
- {{Oneref}}
- {{Onesource}}
- {{1source}}
plus maybe our OneSource and One Source.
At this point an editor who is used to seeing spaced templates and recalls {{1source}} or {{Oneref}} is likely to enter "1 source" or "One ref" or even 1-source...
We now have the position where instead of dealing with redirects that are one step removed (Coding theory if anyone is interested) from our canonical name, we have to deal with items two, three and more steps away.
The further this goes
- the more redirects we need - until we have completed the dictionary - in this example a relatively small 3x3x4 = 36 redirects covers all combinations generated by the implicit rules. In the unref example the number is in the thousands - but even 36 * current number of templates is rather undesirable (though not infeasible).
- the more chance we have that separate domains start to blur. In the documentation cited you can see this with "Uncited" this has been on the edge of two domains of the partition and has been moved from one to the other. Prevent the thought that this is a rare occurrence even now! Sceptics and skeptics are invited to view "[1]" a list of hundreds of cases.
- the more confusing it becomes for users trying to extract, consciously or subconsciously, the rules for template naming. Do we use Sentence case? Title Case? UPPER CASE? lower case? CamelCase? Do we abbr as mch as poss.? And whn we d, do we use fll. stps. (prds.)? Dowenotleavespaces? Or-do-we-separate-words? And_if_so_how? (I have myself spent time in the last few month trying to choose a valid redirect to "Unreferenced section", and I hazard I work more with these tags than anyone.) This discourages users from using the templates, and ultimately from editing - it is an unnecessary part of the massive learning that is required to become a fluent editor.
Therefore replacing template redirects in articles, while not being a pressing problem, seems worthwhile at least where it can be built into another, ideally bot, edit.
(*) Example at Talk:Dachau_massacre#Changes, first bullet of second list.
{{BAG assistance needed}}
Rich Farmbrough 23:30, 10 October 2010 (UTC)[reply]
Irrelevant stuff
|
---|
This edit [12] was obstensibly for this trial but:
The edit summaries for bot trials must be specific to the trial, if anyone is supposed to be able to tell what is being tested! — Carl (CBM · talk) 11:31, 18 October 2010 (UTC)[reply]
|
{{BAG assistance needed}} Rich Farmbrough 23:37, 20 October 2010 (UTC)[reply]
I think, unfortunately, that Rich sometimes attracts drama, which is of course why we are reading this BRFA in the first place. If, however, we look specifically at his ability to run this bot task, it would be a massive leap (not to mention a mistake) to render this anything other than Approved.. - Jarry1250 [Who? Discuss.] 17:13, 16 November 2010 (UTC)[reply]
- The above discussion is preserved as an archive of the debate. Please do not modify it. To request review of this BRFA, please start a new section at WT:BRFA.