Jump to content

Wikipedia talk:Administrator intervention against vandalism/Archive 17

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 10Archive 15Archive 16Archive 17

Question for the volunteers

Does it actually make it easier for blocking admins when I report rapid-fire whack-a-mole instances like 176.7.8.92 (talk · contribs · WHOIS) returning 8 minutes later as 176.7.4.10 (talk · contribs · WHOIS) on the same pages? I'm very eager to nip things in the bud but I'm curious if there are any common reporting behaviors that accidentally create more work for admins rather than expediting the process as much as possible. Remsense ‥  03:33, 12 September 2024 (UTC)

Any administrator can answer this, not only functionaries – If they're on the same small number of pages you're better off requesting page protection. If they vary their targets but their IPs are on a small range (WP:RANGE), you should post about it with the appropriate evidence at WP:ANI. DatGuyTalkContribs 09:27, 12 September 2024 (UTC)
Oh, I was using that word for flair, I actually didn't know that was a role, oops! Thanks for letting me know so I can stop confusing people. Remsense ‥  09:29, 12 September 2024 (UTC)
Probably worth pointing out that this in fact now at ANI (currently here). I don't have a better answer for you. It kinda depends how whac-a-mole things are and how well it can be reported. -- zzuuzz (talk) 09:41, 12 September 2024 (UTC)
Oh, it was entirely just the tangent point—I've been meaning to ask admins on the other side of the AIV process if certain things impede or help them for a bit. Remsense ‥  09:51, 12 September 2024 (UTC)
I hear that, and it kinda depends (and opinions may differ). When there's something like a board invasion involving hundreds of IPs on a limited set of articles, it's sometimes not worth blocking individuals and going straight for protection. If it's a limited set of IPs, say within a small range like 176.7.* above, there's no harm and some benefit to reporting them individually. The admins will see the range patterns, and the bot will clear them all if range blocked. Some people can actually report ranges, but admins need to see specific examples. For more complicated scenarios, ANI makes sense. -- zzuuzz (talk) 09:57, 12 September 2024 (UTC)

At 13.44 - I added a report "43.247.122.64 (talk · contribs · deleted contribs · filter log · WHOIS · RBLs · http · block user · block log) block evasion by 43.247.122.4 (talk · contribs · deleted contribs · filter log · WHOIS · RBLs · http · block user · block log) same falsification of statistics - rangeblock requested"
At 13.48 - HBC AIV helperbot14 removed my request with an edit summary of "5 users left, rm 43.247.122.4 (blocked by Joyous!)"
It appears that the bot will not let me be helpful and show the edits by the original IP that the new IP is block-evading around
Can this problem be avoided or at least warned about? - Thanks _ Arjayay (talk) 15:25, 1 November 2024 (UTC)

@Arjayay: The bot just looks for the templates 'vandal', 'ipvandal' and 'user-uaa' in the report, and if any are blocked it removes the entire thing.
I think that you can use redirects like iplinks or IPvan, since that's not one of the 3 it uses. – 2804:F14:80E8:FC01:C4BA:71FE:6F4D:7E5E (talk) 16:51, 1 November 2024 (UTC)
For this I just try to do either the IP range, and leave the former sock unlinked so that it gets admin eyes and they know what to do. Nate (chatter) 17:51, 1 November 2024 (UTC)

The official Wikipedia page for the Golden Spiral is being vandalized by fans of the show JoJo's Bizarre Adventure who purposefully change the name of the inventor of the Golden Spiral from Euclid the greek mathematician to a fictional character in the show known as Gyro Zepelli and this isn't tolerable

The main user contributing to this issue is known as Tukumslativian and the other being Goldminer24 2A02:C7C:66C8:B800:2D00:918E:8D03:ADC1 (talk) 20:01, 3 November 2024 (UTC)

And now that a Reddit community spread this its only gonna be more people messing up the wiki page 2A02:C7C:66C8:B800:2D00:918E:8D03:ADC1 (talk) 20:05, 3 November 2024 (UTC)

Golden spiral article semi-protected for two weeks. Favonian (talk) 20:12, 3 November 2024 (UTC)

Vandalism and missinformation on the articles about argentinian nineteenth century parties

This is about these articles: https://en.wikipedia.org/w/index.php?title=Federalist_Party_(Argentina) and https://en.wikipedia.org/wiki/Unitarian_Party

The user Vif12vf keeps reverting my edits correcting the misconception that the Unitarian Party is left wing and the Federalist party is right wing. I have provided arguments in my comments on the edits and on my and his talk page but he just erases it.

No argentine historian has ever said that the unitarians were left wing, this is just a stupid childish interpretation. EmpyrosHunyadi (talk) 22:25, 3 November 2024 (UTC)

@EmpyrosHunyadi: This is not the page for reporting vandalism – you're looking for Wikipedia:Administrator intervention against vandalism. This talk page is for discussing that page. Anyway, Vif12vf's edits don't seem to be vandalism. The two of you are having a content dispute, so please use the methods for resolving content disputes. I suggest starting a discussion at Talk:Federalist Party (Argentina) and/or Talk:Unitarian Party. jlwoodwa (talk) 22:46, 3 November 2024 (UTC)

Sleeper PR/spam accounts

I often come across sleeper PR role accounts while jumping between articles. Those that are used cyclically or only as needed under company's control. For example and go hibernate until there's a change they want to make. How should these sleeper accounts be handled? Sometimes they're accepted. Sometimes they're rejected as "stale" for not having edited for xx days, weeks, months. Asked at Wikipedia_talk:WikiProject_Spam#Handling_of_promo_only_role_accounts_in_AIV also. Graywalls (talk) 15:27, 5 November 2024 (UTC)

Note

https://en.m.wikipedia.org/wiki/Special:Contributions/46.217.101.27

Massive "crystall ball" rule break by adding self proclaimed transfers of handball players. Any solution....... 93.138.218.98 (talk) 22:55, 19 November 2024 (UTC)

https://en.m.wikipedia.org/w/index.php?title=User:Infotalks23&redlink=1
https://en.m.wikipedia.org/w/index.php?title=User:Genuine_10&redlink=1
https://en.m.wikipedia.org/wiki/User:Bobanfasil
(more suspected sport socks) 93.138.218.98 (talk) 23:02, 19 November 2024 (UTC)

Gladiators 2024 vandal (potential ip hopper)

https://en.wikipedia.org/wiki/Special:Contributions/2A0E:CB01:4E:400:D400:330D:492E:87AD

https://en.wikipedia.org/wiki/Special:Contributions/2A06:5906:3E08:8F00:D9B1:C672:85A2:DA42

suspected sockpuppets of blocked ip https://en.wikipedia.org/wiki/Special:Contributions/78.86.131.106

continuing to add unsourced infomation to Gladiators 2024 article in spite of block (including offensive false material to the section on the Gladiators Ready! book) Visokor (talk) 19:14, 24 November 2024 (UTC)

repeated vandalism in ahir clan page

User talk:HistorianAlferedo has indulged in repeated vandalism by reverting valid sourced contents from university of chicago, JN university, london school of economics.

Also the user is editing contents with intention of caste POV using raj sources from 1900

kindly edit the article ahir clan and provide inputs , Thanks Drisha herjee (talk) 02:42, 30 November 2024 (UTC)

I have left you a note at your talkpage [1]. - Ratnahastin (talk) 02:45, 30 November 2024 (UTC)
I have to disagree the user HistorianAlferedo is using British Raj sources
The user HistorianAlferedo has removed contents from university of chicago, JN university, london school of economics, Oxford , removing edits which follow Wikipedia:Reliable sources - Wikipedia is vandalism
overall user HistorianAlfered repeatedly removes academic scholarly contents, Drisha herjee (talk) 02:59, 30 November 2024 (UTC)

Danielle LoPresti - possible IP hopper, vandal

IP user 69.209.27.213, user "KaliIsComingForYou", 2600:1700:9584:c10:7ce9:5ae2:1f1b:e1bf, 2600:1700:9584:c10:1c35:46a4:fdae:4281 have repeatedly removed sourced materials on BLP. Review of IP addresses indicate possible connection to linked sources. They have removed benign and well-publicized facts, such as the fact the subject was married.

Subject is a public figure recently divorced. Sourced materials are related to a domestic violence restraining order.

Page may warrant at least a temporary protection with sourced materials preserved. ASunnyDisposition (talk) 16:36, 4 December 2024 (UTC)

@ASunnyDisposition: As I've mentioned in my edit summary and your talk page, the other users are correct; scribd.com is not a reliable source, and that material should not be in the article without an actually reliable source. Per WP:BLP, please do not restore it without finding such a source. Writ Keeper  16:47, 4 December 2024 (UTC)
Thank you! These editors also removed sources earlier in the history of the page, including items from reputable sources. The flag about scribd (even if including legitimate legal docs) is helpful. Thanks! ASunnyDisposition (talk) 17:44, 4 December 2024 (UTC)

Homoglyph vandalism

In response to a particular instance of vandalism raised at ANI (here; permalink), I added a subsection there to discuss a class of vandalism that it represents. Due to its subject, this discussion more properly belongs here, where it can get a more focused airing from those interested in vandalism as a topic, and also so it can end up archived somewhere where it can more easily be found, if need be. Content of the previous discussion at ANI follows: Mathglot (talk) 19:35, 3 December 2024 (UTC)

Copy of discussion originally at ANI.

Although they are already indeffed, I wanted to call attention to the Mojibake edit linked by Gaismagorm. Τhis is a particularly pernicious form of vandalism that I call homoglyph vandalism (but I'd appreciate hearing the expression used at Wikipedia, if there is one). It involves replacing one character, say, a Latin capital T (Unicode U+0054) with another one, say a Greek capital letter Tau (U+03A4), or a Cyrillic Capital letter Te (U+0422) which has the identical, or almost identical appearance as the original latin T. You can see this in operation at Washeans's edit, where the first letter of the first word in the expression "The result is a systematic replacement of symbols..." in the original is Latin letter capital T (UTF-8: 54) but was replaced with homoglyph Greek capital letter Tau (UTF-8 CE A4) in the wikicode.

It is not by coincidence that they vandalized this article and not some other one, because the topic of the article is related to the type of vandalism they performed; they probably felt pretty clever about themselves doing it, right up to the point were they got indeffed. I am not aware of useful tools for detecting homoglyph vandalism at Wikipedia, but if there is anything at Toolforge, I'd like to know about it. We need a tool to help vandalism fighters detect and correct vandalism of this sort. Not sure if the AWB flavor of regex is powerful enough to write a pattern that would highlight script characters that appear to be embedded in characters belonging to a different unicode script block, but if it is, that might be one way. Mathglot (talk) 00:59, 2 December 2024 (UTC)

As the editor who had to revert it, and as someone who is probably in the 99th percentile of editors for potential awareness of this issue, it took me a solid 20 seconds staring at the diff to realize what was actually changed. An ability to check for this seems technically difficult—surely it would end up being a "notice one diff by a user and the whole house of cards comes tumbling down" thing? Remsense ‥  01:07, 2 December 2024 (UTC)
presumably so. Sometimes I just search up common words in the search but replace l's with capital I's or the other way around, and use that to find vandalism. Gaismagorm (talk) 01:10, 2 December 2024 (UTC)

Mathglot, please see User:Radarhump. Drmies (talk) 04:11, 2 December 2024 (UTC)

(edit conflict) Diffs highlighting words that look identical, and unexpected differences in the byte length are two of the tells of homoglyph vandalism. I did a test edit to this section to demonstrate this. If you look at rev. 1260701025 of 04:02, 2 December 2024 by Mathglot, you will see that that edit replaced the 'T' in the first letter of the word 'This' in rev. 1260672475 of 00:59, 2 December 2024 with Greek letter capital Tau (U+0422). Note the diff (Special:Diff/1260699524/1260701025) highlighting the word 'This' with no visible change to the word 'This', and then look at the History, and note that the difference in byte length: rev. 1260701025 is one byte longer (363,186 bytes) than rev. 1260699524, because UTF-8 requires only one byte to render a Latin T, but two bytes to render a Tau.

These are two of the clues that help find this type of vandalism, the first being a word that is highlighted with no visible change; and the second is the byte count. The latter is easiest to use when only one word is changed, or multiple words but without additional text being added. But careful character counting may reveal it, if one of the encodings requires more UTF-8 bytes than the other, which is normally the case if one of the characters was Latin and the other was not. Mathglot (talk) 04:36, 2 December 2024 (UTC)

I remember a case of this from a few years ago. The tell was a redlink which I knew should have gone to a DAB page, and the corrupting alphabet was Cyrillic. It was a real head-scratcher until I worked out what was going on. Fortunately, the editor had never been very active, and had given up. I cleaned them out by copying suspect characters in their edits into the searchbar; but that requires familiarity with the corrupting alphabet, and it might have been simpler to link every word and see what turned red on preview. Narky Blert (talk) 08:31, 2 December 2024 (UTC)

My interest in raising this here at AIV is multipronged, including introducing the topic to those who might not be aware of it, and to stimulate discussion about it, especially regarding methods and tools to detect and repair it. I would hope that one thing that would come out of a discussion here would be a Help- or Info page-style write-up about the topic at an appropriate venue, directed at vandalism fighters who could go there to read up about it and get advice about how to deal with it. Mathglot (talk) 19:35, 3 December 2024 (UTC)

If we could compile a list of the homoglyphs used for this type of vandalism, it would be, I think, pretty straightforward to put together some Javascript that vandal patrollers could use to more readily identify it (maybe causing the homoglyph characters to highlight in bright blue or the like?) Seraphimblade Talk to me 19:45, 3 December 2024 (UTC)
Having a list of glyph collisions is another great idea. Unicode is a big place and that could be a long list, but we are a big project with a lot of motivated vandal fighters and other interested parties, and if we started a subpage or draft somewhere initiating such a list, it could grow organically over time and if properly formatted, perhaps could be used as the data page upon which the javascript could run (I presume JS can read an external data page?) Mathglot (talk) 19:54, 3 December 2024 (UTC)
Now that I think about it, this may be an area where LLM might shine. I am going to give it a try with Chat GPT, and will report back. Maybe that can be the germ of a list that Seraphimblade is talking about. Mathglot (talk) 20:00, 3 December 2024 (UTC)
Did it work? I also feel like that there is likely one online. I'll go and see if I can find one. Gaismagorm (talk) 01:23, 4 December 2024 (UTC)
@Mathglot https://github.com/codebox/homoglyph/blob/master/raw_data/chars.txt found something that could be useful. Gaismagorm (talk) 01:23, 4 December 2024 (UTC)
That's at least a great place to start, and it's MIT licensed, so entirely fine to use here. We'd probably want to take out just the ASCII ones so that it's not too heavy a load on the user, but that'll certainly get us going. Let me see if I can put together a quick prototype based upon that. Seraphimblade Talk to me 02:02, 4 December 2024 (UTC)
Gaismagorm, I was on a couple of other things and then away, but it looks like you've found a great resource in the meantime, good work! Mathglot (talk) 07:38, 4 December 2024 (UTC)
Thanks! Gaismagorm (talk) 11:24, 4 December 2024 (UTC)
Another way your idea could be helpful, is to write a bot based on the homoglyph list, which would categorize the page in Category:Wikipedia articles with possible homoglyphs and tag the suspect word(s) inline with a new inline template, allowing vandal fighters to deal with the issue in an organized fashion, attempting to reduce the category to empty and the transclusion list of the template to none. Mathglot (talk) 20:21, 3 December 2024 (UTC)
Perhaps an edit filter could be used (I'm not good with code so I have no clue if that would be feasible). It probably shouldn't disallow the edits, but tagging might be nice. Gaismagorm (talk) 01:26, 4 December 2024 (UTC)
Yes, an edit filter is another possibility. It could alert you in the same way that the {{Alert}} template warns you to check the user's page history and logs when you hit Save for the first time, but then lets you save it, if you click Save a second time.
To me, the interesting (and non-obvious) part of any automated detector task, is in defining exactly what you want to flag, which requires a heuristic of some sort, which will inherently have the standard, precision and recall tension between wanting to catch as many genuine cases as possible, while minimizing false positives. Seraphimblade is probably wrestling with that issue right now, and opening up discussion about what the heuristic may help. For example, ideally we wouldn't want it to tag this page (a false positive here would not be a disaster however) although it would be good if it tagged the intentional homoglyph test-word highlighted by the Diff program in this diff. It's not a trivial task to define what exactly you want to tag. Mathglot (talk) 07:55, 4 December 2024 (UTC)
I think at least to start, especially if we're not doing anything like disallowing or auto-reverting anything but just flagging it, a certain number of false positives are acceptable. So, yes, you might see some false positives in the midst of some words written in Greek, Cyrillic, whatever have you, but presumably most people will know "That's not malicious." There's also the question of vandals learning to game any heuristic there is, so if, for example, we say "Don't flag a character if it's surrounded by other non-ASCII characters", vandals could use several in a row to avoid tripping the detection. So, certainly not an easy question, and I doubt there's a perfect solution that will result in no false positives or negatives. The question is more whether false positives or false negatives are more tolerable. Seraphimblade Talk to me 08:03, 4 December 2024 (UTC)
I would say false positives. I also put the question to an LLM; see the subsection below. Mathglot (talk) 09:40, 4 December 2024 (UTC)

@Narky Blert:, your link-everything to see what turns red is a great idea, and suggests a technique that could be automated via template or other tools. In templating, there is the #ifexists parser conditional, which implies that Lua and Toolforge tools would have access to similar functionality, although I'm not familiar with exactly how they do it. Perhaps other tools might be able to be designed, based on finding "unexpected" byte count changes due to the UTF-8 issue regarding the number of bytes to represent Latin vs. non-Latin characters. (edit conflict) Mathglot (talk) 19:49, 3 December 2024 (UTC)

With save disabled (to stymie MOS:OVERLINKers), such a tool might find general use as a rough-and-ready preview spellchecker. (I've has the embarrassing experience of adding a well-crafted well-cited sentence to an article, only for a pagewatcher to correct my glaring typo.) Narky Blert (talk) 16:53, 4 December 2024 (UTC)

Detection heuristic options

Starting this subsection as a place to discuss how to define the detection heuristic; i.e., what do we want to flag, along with questions about tilt towards finding more cases at the risk of more false positives, or the other way. I put the question about defining a heuristic to an LLM, and recorded the response at /Homoglyph detection heuristic. (Feel free to retitle the page or move it to another location.) There's way more there than is desirable or doable for a first effort, but perhaps some of the ideas will be helpful. Some are not, but I hope those will be obvious; in particular, as written, it would find words enclosed in {{lang}} or {{ill}} templates which it should not, but those should be easy exclusions. (They are not immune to homoglyph vandalism, but the heuristic should be defined to handle those cases differently.) Mathglot (talk) 09:57, 4 December 2024 (UTC)

Looking through it some more, I'm not all that impressed. I think we can do better. Mathglot (talk) 11:08, 4 December 2024 (UTC)
I Think it might make sense to flag edits the insert homoglyphs into words starting and ending with a standard english character, or an edit adding in a large amount of homoglyphs but that has a byte change of 0 (in order to account for people who would replace entire words with homoglyphs. Gaismagorm (talk) 16:30, 4 December 2024 (UTC)
also it should flag edits the create words ending/starting with a homoglyph, but with english characters within Gaismagorm (talk) 16:34, 4 December 2024 (UTC)
That's one of the tells: it won't have a byte change of 0 if they replace an entire English word with homoglyphs, it will have a byte change of (at least) the number of letters in the word. E.g., changing This to homoglyphs will result in a byte change of +4 or larger, because of the way that UTF-8 works. Mathglot (talk) 06:14, 5 December 2024 (UTC)
ah i see Gaismagorm (talk) 11:24, 5 December 2024 (UTC)

If this discussion results in a usable tool of some sort, it might be useful to log occurrences of individual cases presented to users and the subsequent choice they made (not-change/change,and before-after). Those results could then be used to refine the tool further. Mathglot (talk) 11:15, 4 December 2024 (UTC)