Jump to content

User talk:The Earwig/Archive 18

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 15Archive 16Archive 17Archive 18

sigma.toolforge.org

Looks like toolforge:sigma got shut down in the Grid Engine deprecation (see phab:T320041). User:Σ is inactive, and you're the only other listed maintainer. Are you planning to migrate it, or should I start trying to find someone to help? AntiCompositeNumber (talk) 00:42, 21 December 2023 (UTC)

@AntiCompositeNumber: Ah. No, the timeline's been so protracted, I haven't been actively following things and didn't know this was happening today. (The date in my mind was early next year.) I could probably do it, but certainly can't allocate time right now to immediately fix this. — The Earwig (talk) 03:27, 21 December 2023 (UTC)
Yeah, they started shutting down tools where maintainers hadn't requested more time today. The Grid won't be shut down completely until February though. I've left a note on the phab task asking for the tool to be un-disabled in the meantime. AntiCompositeNumber (talk) 03:43, 21 December 2023 (UTC)
Thanks! — The Earwig (talk) 03:44, 21 December 2023 (UTC)
Hi, I'm available today or tomorrow and would have time to fix this if it is possible to add me as a co-maintainer. I might need some time to familiarize with the infra though, as it looks like the tool isn't open source. 0xDeadbeef→∞ (talk to me) 04:02, 21 December 2023 (UTC)
Thanks for volunteering, 0xDeadbeef! I've added you as a co-maintainer. There's supposed to be a code repository but it must've disappeared (any idea where that ended up, Lego?). The active code is in ~/www/python/src and possibly other places; there are local changes not in sync with the git repo. Feel free to ping if you have any questions, though honestly, beyond what I just said, I probably know as much as you do about this. — The Earwig (talk) 04:10, 21 December 2023 (UTC)
The repository is there, it's just marked as private. It's up to date with what's on Toolforge, aside from all the uncommitted changes that is. Probably best to push the repository to Wikimedia GitLab tbh. Legoktm (talk) 04:25, 21 December 2023 (UTC)
I just did, at https://gitlab.wikimedia.org/toolforge-repos/sigma 0xDeadbeef→∞ (talk to me) 05:09, 21 December 2023 (UTC)
Btw, has the "AFD Stats" page at https://sigma.toolforge.org/afdstats always been like that? 0xDeadbeef→∞ (talk to me) 06:41, 21 December 2023 (UTC)
Besides the weird afd stats page, I've restored the others and they seem to be running fine, Lowercase sigmabot III's two daily jobs have been converted to use the new framework. Let me know if there are any other errors. 0xDeadbeef→∞ (talk to me) 07:13, 21 December 2023 (UTC)
@0xDeadbeef: Thanks a bunch! I don't think AFD Stats has always been broken, but people are mostly using https://afdstats.toolforge.org/ now, so it's not a priority to fix. Maybe I can take a look at that myself later. I also noticed the main page at https://sigma.toolforge.org/ still displays the 410 Gone error, though the individual tools are fine; did we have an index page before that disappeared? Scratch that, just some bad caching on my end. All good. — The Earwig (talk) 14:02, 21 December 2023 (UTC)
Well...seems like the afdstats tool is also still on the grid, c.f. https://github.com/enterprisey/afdstats/pull/27. Ping @Enterprisey! Legoktm (talk) 07:00, 22 December 2023 (UTC)

The Signpost: 24 December 2023

A solstice greeting

❄️ Happy holidays! ❄️

Hi Ben! I'd like to wish you a splendid solstice season as we wrap up the year. Here is an artwork, made individually for you, to celebrate. It was great to meet you in Toronto, and looking forward to collaborations in the coming year! Take care, and thanks for all you do to make Wikipedia better!
Cheers,
{{u|Sdkb}}talk
Solstice Celebration for The Earwig, 2023, DALL·E 3. (View full series) Note: The vibes are winter solsticey. If you're in the southern hemisphere, oops, apologies.
Solstice Celebration for The Earwig, 2023, DALL·E 3.
Note: The vibes are winter solsticey. If you're in the southern hemisphere, oops, apologies.

{{u|Sdkb}}talk 07:06, 24 December 2023 (UTC)

Thanks very much, Sdkb! Great meeting you as well. All the best to you in the new year. — The Earwig (talk) 20:30, 24 December 2023 (UTC)

Merry Christmas!

Hello, The Earwig! Thank you for your work to maintain and improve Wikipedia! Wishing you a Merry Christmas and a Happy New Year!
Chris Troutman (talk) 23:15, 24 December 2023 (UTC)

Spread the WikiLove and leave other users this message by adding {{subst:Multi-language Season's Greetings}}

Copyvio tool is down

Hello Be. Sorry to bother you but the copyvio tool is down, it's been down for about an hour and a half with 504 gateway timeout errors. Any help appreciated. Thanks, — Diannaa (talk) 16:56, 23 December 2023 (UTC)

Thanks; I've noticed things being a little spotty over the past couple weeks, but haven't identified a cause yet (i.e. no single culprit for increased usage). I'll continue to keep an eye out. — The Earwig (talk) 18:59, 23 December 2023 (UTC)
Sorry to bother you today of all days, but the tool is suffering outages again, and has currently been down for an hour and a half. Thanks, — Diannaa (talk) 17:29, 25 December 2023 (UTC)

Administrators' newsletter – January 2024

News and updates for administrators from the past month (December 2023).

Administrator changes

added Clovermoss
readded Dennis Brown
removed

Arbitration

Miscellaneous


The Signpost: 10 January 2024

User:Reports bot

Hi Earwig, I am enquiring about User:Reports bot and its task to update Wikipedia:WikiProject Women in Red/Metrics. There is a proposal to update the WikiProject banner for this project and I'm just checking that it won't disrupt the work of the bot? Best regards — Martin (MSGJ · talk) 22:33, 18 January 2024 (UTC)

Hey MSGJ, I don’t see any issue with this. The bot is flexible about the page contents, provided its Reports bot variable comments on the individual metric pages are preserved. — The Earwig alt (talk) 22:44, 18 January 2024 (UTC)
Thanks. Not planning to change that page itself but only the banner {{WIR}} used to tag relevant pages within the scope of the project. It was just in case your bot was relying on any specific template or categories to find these pages. — Martin (MSGJ · talk) 09:01, 19 January 2024 (UTC)

Temporary Password

I am User:Wxao Zesty, I am requesting for a temporary password to my email. Since, the last one did not go through.216.176.69.228 (talk) 20:02, 19 January 2024 (UTC)

The Signpost: 31 January 2024

Administrators' newsletter – February 2024

News and updates for administrators from the past month (January 2024).

CheckUser changes

removed Wugapodes

Interface administrator changes

removed

Guideline and policy news

  • An RfC about increasing the inactivity requirement for Interface administrators is open for feedback.

Technical news

  • Pages that use the JSON contentmodel will now use tabs instead of spaces for auto-indentation. This will significantly reduce the page size. (T326065)

Arbitration

  • Following a motion, the Arbitration Committee adopted a new enforcement restriction on January 4, 2024, wherein the Committee may apply the 'Reliable source consensus-required restriction' to specified topic areas.
  • Community feedback is requested for a draft to replace the "Information for administrators processing requests" section at WP:AE.

Miscellaneous


Using The Wikipedia Library for copyvio detection

Hello. I noticed that large chunks of this section of herbicide are copied directly from this source(you'll need to log in) but the copyvio detector doesn't pick it up: [1]. I can't find a tool to show it nicely, but it is especially obvious if you look at the original diff: [2]. Presumably it isn't detected because the tool can't access the full text? I just wondered whether you'd considered linking up the detector with WP:TWL so that it can check the full text? Admittedly, I am not sure whether the publishers permit automated access, but you would think that they would like us to be checking whether their copyright is being violated! @Samwalton9 (WMF): just in case they can add anything. SmartSE (talk) 10:29, 19 December 2023 (UTC)

@Smartse It's an interesting idea! I don't think we could do anything immediately, but if it would be feasible/helpful we could initiate a conversation with one of more of the library's partners about this. Perhaps EBSCO, given that they're our search provider? I'm not sure on the details of how this would work. Samwalton9 (WMF) (talk) 12:56, 19 December 2023 (UTC)
Hey Smartse. I'm with Samwalton9 that this would be really cool to support, but I'd be very surprised if TWL's partners would be willing to open up a service to us that would enable the copyvio detector to check content programmatically. Initiating a conversation couldn't hurt, though. — The Earwig (talk) 03:56, 21 December 2023 (UTC)
@The Earwig It's not impossible to imagine - TWL's partners are often concerned that WP editors are going to be copying content, so being able to say "we want to make absolutely sure that's not happening" could be seen quite positively. Would EBSCO be the right organisation, do you think, since they run (and provide us with) EBSCO Discovery Service? Samwalton9 (WMF) (talk) 09:51, 21 December 2023 (UTC)
@Samwalton9 (WMF): I was initially thinking of just searching the sources cited in the article. Apparently, most of the full texts can be accessed by appending the DOI to https://doi-org.wikipedialibrary.idm.oclc.org/ so it shouldn't be too difficult to programmatically access the full text (not withstanding the authentication and any rate-limiting) and then the text could be compared as the tool already does. I'm not familar with EBSCO, but I imagine that using that would be more complicated as you would need to take chunks of the article, query the search engine repeatedly and then check full texts that could be matches. I also posted about this at meta:Talk:CopyPatrol#Can_the_tool_access_paywalled_full_texts? and the ithenticate service can detect it in a new edit - see the hit for link.springer.com - even though the full text is paywalled, so maybe using that service in this tool could be an option as well? It seems like that tool does a pretty good job of catching new copyvios but we are less capable of detecting old instances. SmartSE (talk) 12:26, 21 December 2023 (UTC)
Checking the DOIs of sources directly cited would be a good start and wouldn't require us to get a search engine working, so we could try that (though the full scope is of course somewhat limited). If I'm to do that through TWL's proxy, we'd need to get the bot access somehow and confirm this usage is within their terms. @Samwalton9: I'm also unfamiliar with EBSCO and from skimming the linked pages it's not clear to me if they offer a search API that I would be able to use for what SmartSE described (query the search engine repeatedly given text snippets from the article and receive results that enable me to get the full text of the source for comparison). I see discussion of end-user search tools, but not an API. One change to the copyvio detector I am sure we will need to make is not showing the user the full text of the suspected source, only the copied snippets. — The Earwig (talk) 14:19, 21 December 2023 (UTC)
@The Earwig Is this a helpful link? Once we've confirmed this is a viable and useful approach I'd be happy to bring this up with them. Samwalton9 (WMF) (talk) 16:07, 8 January 2024 (UTC)
@Samwalton9 (WMF): Probably. I can't say for sure (the API documentation requires an account, and I still don't know the terms of use), but it looks like the right direction. Thanks! — The Earwig (talk) 17:01, 8 January 2024 (UTC)
Alright, I'll get an initial conversation kicked off with them and see how feasible this is. I'll be in touch! Samwalton9 (WMF) (talk) 10:33, 12 January 2024 (UTC)
@The Earwig Good news! We met with EBSCO today and they're enthusiastic about the idea. Their main question was around request load - do you have any data/estimates about how many daily or monthly requests Copyvios makes?
The other topic we talked about was how pulling the text through would work (or not). EDS has access to all these databases to index for searching, but not necessarily for displaying full text. Even if they did, that would be for subscribing customers so there would be some concern about pulling the full text through to display publicly in the tool. It might be the case that they could return some information about finding a match in a source, but perhaps not display the actual matched text directly. That's something we'll need to get more clarity on with them, but perhaps even if that is the case we could make some UI changes to highlight that a match was found in EDS, and the relevant URL, but not display the matching text? Happy to think that through with you.
If this still sounds feasible to you I'd be happy to copy you into our email thread so you could ask any more specific questions you might have. Samwalton9 (WMF) (talk) 16:25, 5 February 2024 (UTC)
@Samwalton9 (WMF): Sounds good, thanks for the update! We can definitely indicate a match without including the full text if needed. There is already some support in the tool for this with the Turnitin option.
Regarding request rate, the tool checks about 1,200 articles per day or 36,000 per month. I'd be surprised if that's too much for them, but we could make the new functionality opt-in like Turnitin, so users have to check a box to use EDS which will drastically reduce the rate (the Turnitin feature is used only 100 times/day). — The Earwig (talk) 16:54, 5 February 2024 (UTC)
@The Earwig Thanks for the data! I remember reading somewhere that the tool makes multiple requests per article check, is that right? I wonder if you have a sense of how many actual API requests are being made? Samwalton9 (WMF) (talk) 13:05, 6 February 2024 (UTC)
@Samwalton9 (WMF): Yes, that's right – up to 8 per article, depending on page size, but again, configurable. Altogether for Google Search the number is under 10k for most days. — The Earwig (talk) 14:41, 6 February 2024 (UTC)
Great, thanks! I've cc'd you on an email. Samwalton9 (WMF) (talk) 15:36, 6 February 2024 (UTC)

The Signpost: 13 February 2024

lowercase sigmabot III

Hi! I reached out to Σ by email about lowercase sigmabot III, which had not been archiving anything (with the exceptions of AN and ANI) since last week. They responded (by email) saying Please reach out to Earwig for this issue. The crontab was erased somehow, which means that it's no longer running the bot on its schedule. I'm not sure what changed but I think he will know where to look and that For the time being I just kicked it off manually. Thank you for any insight you might have! HouseBlaster (talk · he/him) 15:07, 28 February 2024 (UTC)

Thanks for letting me know. I'll take a look at this. — The Earwig (talk) 15:29, 28 February 2024 (UTC)
It's not clear what the original issue was, but I've jiggled things a bit, so if we're lucky it won't happen again. — The Earwig (talk) 16:29, 28 February 2024 (UTC)
Thank you! HouseBlaster (talk · he/him) 17:05, 28 February 2024 (UTC)

Administrators' newsletter – March 2024

News and updates for administrators from the past month (February 2024).

Guideline and policy news

Technical news

  • The mobile site history pages now use the same HTML as the desktop history pages. (T353388)

Miscellaneous


The Signpost: 2 March 2024

Revdel-responder

Hi, it could be a WP:THURSDAY thing but the revdel-respoder script seems to have a problem today. I keep getting a message "Sorry! revdel-responder failed to parse the page content". I'm not good enough at interpreting the console to work out what's gone wrong. Nthep (talk) 11:44, 14 March 2024 (UTC)

not sure if anything has happened during the day but, it seems to have resolved itself. Nthep (talk) 18:49, 14 March 2024 (UTC)
Thanks for letting me know, Nthep. It's possible that was some intermittent error. If you run across it again, let me know the page, or send me the text from the console (right click -> Inspect -> "Console" tab, there should be a line starting with "Error while parsing page content"). — The Earwig (talk) 03:48, 15 March 2024 (UTC)

The Signpost: 29 March 2024

Administrators' newsletter – April 2024

News and updates for administrators from the past month (March 2024).

Administrator changes

removed

Guideline and policy news

Technical news

  • The Toolforge Grid Engine services have been shut down after the final migration process from Grid Engine to Kubernetes. (T313405)

Arbitration

Miscellaneous

  • Editors are invited to sign up for The Core Contest, an initiative running from April 15 to May 31, which aims to improve vital and other core articles on Wikipedia.

request to tag article talk pages within scope of Women's Basketball

Hi The Earwig,

I would like to request that talk pages for articles within the scope of WP:WBB be tagged with both

WikiProject iconBasketball: Women's Unassessed
WikiProject iconThis article is within the scope of WikiProject Basketball, a collaborative effort to improve the coverage of Basketball on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
???This article has not yet received a rating on Wikipedia's content assessment scale.
Taskforce icon
This article is supported by the Women's basketball task force.
WikiProject iconWomen's sport: Basketball Unassessed
WikiProject iconThis article is within the scope of WikiProject Women's sport (and women in sports), a WikiProject which aims to improve coverage of women in sports on Wikipedia. For more information, visit the project page, where you can join the project and/or contribute to the discussion.
???This article has not yet received a rating on Wikipedia's content assessment scale.
???This article has not yet received a rating on the project's importance scale.
Taskforce icon
This article is supported by the Women's basketball task force.

Do you need additional information and/or should I post this request somewhere else?

Thank you, Hmlarson (talk) 23:20, 5 March 2024 (UTC)

Hi Hmlarson, you'll need to define what "within the scope of WP:WBB" means in order to run the bot. — The Earwig (talk) 01:52, 6 March 2024 (UTC)
Can you do any article tagged with a subcategory of Category:Women's basketball? Hmlarson (talk) 18:58, 11 March 2024 (UTC)
@Hmlarson: OK. Subcats are sometimes tricky because of unexpected relationships (a subcategory of a subcategory a few levels deep sometimes has little relationship with the original category), but I reviewed this situation, and it looks mostly fine.
I'll have the bot generate a list of pages it would tag, and we can double-check those. It'll take me a few days.
Separately, there is a requirement that you mention on the WikiProject talk page that you want to run this tagging job, in case there are any objections.
Thanks! — The Earwig (talk) 03:43, 12 March 2024 (UTC)
Thank you. Sounds good. I've posted the notice here. Hmlarson (talk) 17:18, 12 March 2024 (UTC)
Hi The Earwig - Any chance you have you can provide an ETA on this request? Thank you! Hmlarson (talk) 20:01, 1 April 2024 (UTC)
So sorry for the wait here, I had to make some code changes to handle tagging both banners and a few personal things came up – I have some free time now and will get back to you tomorrow in a few days. — The Earwig (talk) 20:19, 6 April 2024 (UTC)

The Signpost: 25 April 2024

Copyvio Detector not working well

Hello Ben, hope you are well. I just thought I'd let you know that the Copyvio Detector is not functioning all that well thae last couple of days, timing out on just about every comparison. ("The URL https://www.dvfu.ru/en/about/ timed out before any data could be retrieved", for example.) Even times out on simple, short webpages of the type that it's usually able to access easily. Any assistance appreciated. Thanks, — Diannaa (talk) 13:34, 1 May 2024 (UTC)

Hi Diannaa. We (Chlod and I) did just block a misbehaving bot last night, so that would account for some extra load, but it doesn't totally explain the issue. That one URL is working for me at the moment, only taking a couple seconds. I will investigate further. — The Earwig (talk) 15:27, 1 May 2024 (UTC)

Administrators' newsletter – May 2024

News and updates for administrators from the past month (April 2024).

Administrator changes

readded Nyttend
removed

Bureaucrat changes

removed Nihonjoe

CheckUser changes

readded Joe Roe

Oversight changes

removed GeneralNotability

Guideline and policy news

Technical news

  • Partial action blocks are now in effect on the English Wikipedia. This means that administrators have the ability to restrict users from certain actions, including uploading files, moving pages and files, creating new pages, and sending thanks. T280531

Arbitration

Miscellaneous


The Signpost: 16 May 2024

Administrators' newsletter – June 2024

News and updates for administrators from the past month (May 2024).

Administrator changes

readded Graham Beards
removed

Bureaucrat changes

removed

Oversight changes

removed Dreamy Jazz

Guideline and policy news

Technical news

  • The Nuke feature, which enables administrators to mass delete pages, will now correctly delete pages which were moved to another title. T43351

Arbitration

Miscellaneous


WikiProject Banner Tagging

Hi, @The Earwig! You seem to be the most active operator for one of the Category:WikiProject tagging bots so I hope this isn't a bother. I'm overseeing the newly created WP:WikiProject AfroCreatives now and would like to disseminate {{WikiProject AfroCreatives}} through our targeted articles in the AfroCreatives categories with all subcategories included. We are willing to make use of auto assessment and to inherit it from existing WP banners too. The template already accommodates this. I would very much appreciate your help. Assem Khidhr (talk) 06:12, 5 May 2024 (UTC)

Hi Assem Khidhr, my apologies for not replying to this sooner, but as you probably guessed by my lack of response I don't have the free time to work on this task at the moment. Sorry. — The Earwig (talk) 04:23, 7 June 2024 (UTC)
Best of luck, @The Earwig. I was since granted AWB authorization and managed to add those banners myself. Thanks! Assem Khidhr (talk) 15:51, 7 June 2024 (UTC)

Copyvio detector not working

Hello Ben, sorry to bother you so early and on a Sunday. The Copyvio detector seems unable to perform any comparisons at the moment. It sits and spins for three minutes before timing out ("The URL https://www.bbc.com/news/articles/cz55y6k0p5go timed out before any data could be retrieved.") Any assistance appreciated, as we have a lot of reports at CopyPatrol, a lot more than usual, and we will not be able to assess them without this tool. Thank you! — Diannaa (talk) 11:48, 2 June 2024 (UTC)

Update: It seems to be functioning normally now. Thank you! — Diannaa (talk) 14:08, 2 June 2024 (UTC)

@The Earwig: It's down again as of 6 June 2024. It takes a long time to reach and then after entering the page title and clicking submit in runs after several minutes with 0 errors. I've tried this with other articles, that got higher vilolations before. Thanks for any help you can provide. Greg Henderson (talk)09:06, 2 June 2024 (UTC)

Today, getting the error message: "An error occurred while using the search engine (Google Error: HTTP Error 429: Too Many Requests). Note: there is a daily limit on the number of search queries the tool is allowed to make. You may repeat the check without using the search engine." Greg Henderson (talk) 23:14, 7 June 2024 (UTC)

(talk page watcher) @Greghenderson2006: This happens when we've reached our daily quota with Google. Unfortunately, the copyvio detector can only handle up to around 1,250 a day. You'll need to try again after a few hours or so. In the meantime, you can try using the copyvio detector without search engine checks, which will still work. Chlod (say hi!) 01:07, 8 June 2024 (UTC)

The Signpost: 8 June 2024

lowercase sigmabot III not archiving properly

For about the last three days, lowercase sigmabot III has only been archiving the Administrator's noticeboards and nothing else. Somebody mentioned that you gave it a good kick the last time it went on the fritz, so I will go ahead and notify you. Safiel (talk) 16:37, 29 April 2024 (UTC)

Thanks for the notice. I've kicked it again and added a workaround in case this issue happens again. — The Earwig (talk) 04:29, 30 April 2024 (UTC)
Hi, hope you're well. I think the bot is down again. ~~ AirshipJungleman29 (talk) 11:36, 12 June 2024 (UTC)
Thanks, AirshipJungleman29. Different issue from last time. I think I've fixed it. — The Earwig (talk) 03:01, 13 June 2024 (UTC)

Copyvio detector constantly timing out

Hello again Ben! I am having issues with the Copyvio detector, finding it almost impossible to get it to generate a report. "The URL http://weaponsystems.net/weaponsystem/CC02%20-%20PTZ89.html timed out before any data could be retrieved" for example. Frequently it goes down completely as well. Any assistance appreciated. Thanks, — Diannaa (talk) 11:00, 13 June 2024 (UTC)

Sorry, there aren't any quick fixes for this. I am working on it. — The Earwig (talk) 16:06, 13 June 2024 (UTC)
Actually, I’ve found a partial fix to improve performance. Let’s see if it helps. — The Earwig alt (talk) 17:19, 13 June 2024 (UTC)
It's much better, thanks! Fixing copyvio is tedious enough lol. — Diannaa (talk) 23:16, 13 June 2024 (UTC)

The Signpost: 4 July 2024

Administrators' newsletter – July 2024

News and updates for administrators from the past month (June 2024).

Administrator changes

added
removed

Technical news

Miscellaneous


Copyvios + Arc (Also, RichBot)

Hi Ben,

I've started using the Arc browser, for some reason whenever I try and access Copyvios on it, I get an Internal Server Error. Trying the same URL in Edge works fine. Not sure where the bug is there, but hopefully you can find it.

Also, I see above there still seems to be issues regarding usage, did you need me to tone RichBot down a bit? - RichT|C|E-Mail 17:10, 28 June 2024 (UTC)

Hey Rich, sorry I took a bit to reply. This is my first time hearing about Arc and I don't really feel like creating an account to test, so I can't confirm on my end. Are you sure it's an Internal Server Error or may it be a 403 Forbidden? (We may have inadvertently blocked its user agent as a crawler, which would give a 403, but I don't see anything in our block list that looks like it or Chrome [except Linux], so I don't know.) This is pretty strange.
Regarding bot usage, there are two main issues the tool's had lately: general downtime and exhausting our Google credits. I've improved the tool's performance a bit so the former is not a major issue now, but we are still frequently exhausting our daily Google quota. I've checked RichBot's usage and recently it's been consuming around 10-20% of our total Google credits. That's not too excessive, but if you could find a way to tone it down a bit compromising its usefulness, it would be appreciated. — The Earwig (talk) 08:10, 1 July 2024 (UTC)
No worries, I have reduced RichBot to only look at 100 (plus existing CVs) per run, so 200 per day (excluding manual runs). Is there a way we can increase the credits? I don't mind throwing some £ at it if need be - RichT|C|E-Mail 09:31, 1 July 2024 (UTC)
No way that I know of unfortunately; the WMF pays for it, but Google's API terms limit our usage without some kind of special arrangement that I have been unable to get. — The Earwig (talk) 15:25, 1 July 2024 (UTC)
Typical Google lol... ah well, worth a shot - RichT|C|E-Mail 17:52, 1 July 2024 (UTC)
Hey The Earwig. Big fan. Is there a venue where advocacy from affected editors might get us closer to that special arrangement? Firefangledfeathers (talk / contribs) 17:50, 18 July 2024 (UTC)
Hi Firefangledfeathers, thank you. I'm not sure who we could talk to about this, to be honest. My former contact at the WMF no longer works there and it's not clear to me who is responsible for managing the relationship with Google right now. Going the other way, i.e. getting someone in a position of power at Google who could help, might be more fruitful. But that is just speculation; I don't know who specifically that might be. — The Earwig (talk) 06:02, 19 July 2024 (UTC)
Thanks. I don't have any bright ideas. I'll probably go with the low-hanging fruit and post at WP:VPWMF. Firefangledfeathers (talk / contribs) 12:00, 19 July 2024 (UTC)
And it's definitely a 500, 'The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.' - RichT|C|E-Mail 14:07, 1 July 2024 (UTC)
Ah, I think I've figured it out. Could you try now? — The Earwig (talk) 15:36, 1 July 2024 (UTC)
Much better :) Thanks :D - RichT|C|E-Mail 17:51, 1 July 2024 (UTC)

The Signpost: 22 July 2024

Administrators' newsletter – August 2024

News and updates for administrators from the past month (July 2024).

Administrator changes

readded Isabelle Belato
removed

Interface administrator changes

readded Izno

CheckUser changes

removed Barkeep49

Technical news

  • Global blocks may now target accounts as well as IP's. Administrators may locally unblock when appropriate.
  • Users wishing to permanently leave may now request "vanishing" via Special:GlobalVanishRequest. Processed requests will result in the user being renamed, their recovery email being removed, and their account being globally locked.

Arbitration


Earwig's Copyvio Detector

Hello, The Earwig,

I have a question about this editing tool. It seemed like I could run this 20 or more times before I got a notice that I had reached my daily limit. But now, I receive a notice if I just run it a few times. Has this limit been decreased for some reason? I use this tool quite a lot while patrolling drafts and CSD categories so it's sometimes difficult to remember to go back to reexamine some pages the next day when I have reached my daily limit for the current day. Thanks for any insight you can provide. Liz Read! Talk! 20:21, 8 June 2024 (UTC)

Hi Liz. Rest assured this isn't related to your own usage of the tool. The daily limit is shared by all users, and allows for about 1000–2000 pages to be checked per day, so even if you're checking a few dozen, that's not a major contributor to the limit getting reached. We've been noticing this issue more frequently recently (see a few threads above) and we're doing some work to restrict other users of the tool who are actually overusing their share of its resources. I'm hoping to have things back to normal soon. — The Earwig (talk) 04:23, 11 June 2024 (UTC)
I didn't realize that I posted two messages about the same issue. I should have reviewed your talk page before posting my subsequent message. I guess I have a sense of frustration now that I know I'm competing with RichBot for copyright inquiries. Liz Read! Talk! 03:11, 8 August 2024 (UTC)

Earwig returns 0% on url-comparison with clever close paraphrase

Hello. I noticed a {{circular}} tag at Ceteris paribus and ran this URL comparison to find out how much duplication there was, and in what section(s). To my surprise, it came back with 0.0%. However, notice these:

Comparison snippets

From: https://www.masterclass.com/articles/ceteris-paribus-explained#7MlD3BCbNL4NC0BejpGo02

1. Supply chain: Ceteris paribus considers production factors, such as logistics, sourcing, competition, and trends with buyers to determine the price of goods. For example, a bread seller observes the costs of the ingredients, labor, packaging, and distribution, in addition to competitors, economic inflation, and consumer trends. Ceteris paribus stipulates that if other factors remain the same, a decrease in the supply of bread will cause prices to rise.

2. The law of supply and demand: In the law of demand, buyers demand less of an economic good when prices are higher. The law of supply says that sellers will supply more of an economic good when prices are higher. The interaction of these two laws determines the actual market price and volume of goods. Ceteris paribus identifies, isolates, and tests the impact of an independent variable that would affect these two laws and the causal factors in the market supply and prices.

3. Gross domestic product: Economists use ceteris paribus to study the GDP, assuming that variables remain fixed to determine the effect in the money market.

4. Interest rates: If the interest rates increase, the independent variable, then the demand for debt goes down as the cost of borrowing increases, the dependent variable.

5. Minimum wage: Economists use ceteris paribus to determine the potential effects of a minimum wage increase, including the possible outcome of fewer jobs available if companies must pay employees more.


From Ceteris paribus#Applications rev. 1238986793:

The concept of ceteris paribus is crucial for economists and can be applied in researching:

  1. Supply chain. Ceteris paribus considers aspects of production, that being competition in the market, production costs, inflation, and consumer trends to conclude pricing of goods, imposing that keeping the aspects of production constant, minimising supply will adjust prices to increase.[1]
  2. Law of supply and demand. The law of demand states that, when prices rise the demand of goods fall, whilst the law of supply dictates that as prices rise sellers are more willing to supply. When these laws interrelate market prices and supply in the market are determined. Ceteris paribus is used in the law of supply and demand through determining how independent variables will impact the casual factors of prices and supply in the market.[1]
  3. Gross domestic product. Ceteris paribus is used in relation to GDP to determine how the money market will change when variables remain constant.[1]
  4. Interest rates. Through keeping interest rates as the independent variable, as interest rates rise, thus borrowing costs rise forcing a reduction in the demand for debt, that being the dependent variable.[1]
  5. Minimum wage. To define the possible effects of a rise in the minimum wage economists will use ceteris paribus. Possible effects include how wage increases may force employments down.[1]

References

  1. ^ a b c d e "Ceteris Paribus Explained: 5 Economic Uses for Ceteris Paribus". MasterClass. 2021-12-21. Retrieved 2024-06-05.

There is a lot of close paraphrase here, maybe enough to cover their tracks and confuse the detector. I remember glancing at Andrei Broder's shingle-based detection paper eons ago (might be this one) and I don't know how yours works, but if it is shingle-based, would it be feasible to add a new param to the input form, or in the settings, maybe in an 'advanced' section, to set the shingle size? In a case of paraphrase like this one, where the information is clearly copied but words are shifted around in the sentences, a shorter shingle size might do a lot better at detecting the similarities. This might kill processing time in the web search version, so maybe would only work when the 'url' radio button was selected, but still could be pretty useful for cases like that, and might make a great tool for assigning a measurable value to close paraphrase, which afaik we do not have currently, and is all very hand-wavy. Thanks, Mathglot (talk) 19:32, 6 August 2024 (UTC)

It does slightly better (4.8%) specifying revision id 1151114395. What is going on here? Mathglot (talk) 20:09, 6 August 2024 (UTC)
Okay, just noticed that in both of those revisions, Earwig doesn't appear to see past the first short section of the web page, so the paraphrased section I am addressing doesn't appear to be visible to Earwig, or at least, it isn't displaying it on the comparison page, for some reason, if you scroll down. Mathglot (talk) 21:59, 6 August 2024 (UTC)
That's exactly it, Mathglot. The website loads its content through JavaScript so it's not available to the tool. There isn't an easy workaround for this, but there are some options I could try further in the future. Since the content doesn't show up in the comparison view as part of the source, my hope is that people will figure out what's going on, as you were able to. — The Earwig (talk) 00:23, 7 August 2024 (UTC)
Thanks for that. Even if it could see it, I wonder if it would come up with any kind of rating, due to the paraphrase? Not sure what kind of test bed you use, but if you could copy the MasterClass page and save it offline locally (post-js, or just scraping the rendered page manually and saving it) and run Earwig against that file, I'd be interested to see what it would come up with. And if you use shingling and it's parametrizable, whether the rating would change if you reduced the shingle size. Mathglot (talk) 01:14, 7 August 2024 (UTC)
OK, I can do a quick experiment of that, Mathglot. The tool does use shingling, actually. I haven't seen this paper and independently came up with a similar algorithm many years ago. Internally I call the shingle size the degree, and I've exposed that as a query-string-only parameter if you would like to play with it.
I manually copied the text to a pastebin. With the tool's default shingle size of 5 words, almost no similar text is found, and the similarity score is 5.7%. With size 3, it's 38.3%. With size 2, it's 67.1%. At this point a lot of the similar content is trivial ("is a", "in the", "of the"), so the odds of a false positive are much higher, though it does at least highlight some interesting similarities, too.
The tool doesn't have a way of identifying more unique common phrases. If we could down-weigh "is a" but up-weigh, say, "wage economists", we could lower the default shingle size and get more sensitive results. The default size was actually 3 several years ago, but I raised it because the false positive rate was just a bit too high and it was causing confusion. So there's a delicate balancing act with the current algorithm.
Food for thought. Thanks. — The Earwig (talk) 05:20, 7 August 2024 (UTC)
Oh, that's very thought-provoking, thanks! You could start with a stop-word list, and eliminate those, and there may be lists of bigrams containing stop words. I searched /most common bi-grams with stop words in English/ and repeatedly ran into "tidytext in R", and "NLTK in Python"; also articles like 1, 2. As far as how to down-weigh and up-weigh, TF-IDF is one very standard solution, which works better on a larger corpus or bag of words, which you could accumulate yourself, by just dumping all of the words of each document you come across into a list, and counting later, maybe once a week or month, and recalculating the frequencies, but my understanding is that there is a budget available for Earwig (for the Google API) and it's likely that there is a term frequency list out there somewhere for English, and we could just buy it. (You would only have to do that once in theory, although language does evolve, so maybe once a year?) Then you wouldn't have to build your own bag of words. Your experiment looks really interesting, and I wonder if any of these other ideas would kick it up a level. Mathglot (talk) 04:05, 13 August 2024 (UTC)
This is helpful. Thanks! — The Earwig (talk) 13:22, 13 August 2024 (UTC)

The Signpost: 14 August 2024

EarwigBot might be down

Hello friend. EarwigBot hasn't edited since August 17. I believe it has some daily tasks such as Wikipedia:Bots/Requests for approval/EarwigBot 3, so this is abnormal, right? It might need a nudge :) –Novem Linguae (talk) 12:50, 21 August 2024 (UTC)

Thanks for the ping! The task was active but had gotten stuck somehow. I've restarted it. — The Earwig (talk) 13:39, 21 August 2024 (UTC)
Thanks! I went ahead and boldly signed you up for a bot to alert you if it goes down again. Diff. If undesired, feel free to revert. –Novem Linguae (talk) 18:23, 21 August 2024 (UTC)
Much obliged. — The Earwig (talk) 07:18, 22 August 2024 (UTC)

Administrators' newsletter – September 2024

News and updates for administrators from the past month (August 2024).

Administrator changes

removed Pppery

Interface administrator changes

removed Pppery

Oversighter changes

removed Wugapodes

CheckUser changes

removed

Guideline and policy news

Arbitration

Miscellaneous


The Signpost: 4 September 2024

The Signpost: 26 September 2024

Administrators' newsletter – October 2024

News and updates for administrators from the past month (September 2024).

Administrator changes

added
removed

CheckUser changes

readded
removed

Guideline and policy news

Arbitration

Miscellaneous


Hello, The Earwig,

I regularly used this tool you created, mostly when patrolling drafts or CSD-tagged articles, I'd probably used it 3 or 4 times a day. When I used it too much, I'd get a message that I was over my limit of how often I could use it. At least that's how I thought things worked. Now, I get this message every time I try to see whether a page is a copyright violation, I have not gotten a successful response to a query in many, many weeks now. So, I'm wondering is this "limit" actually for all users on this platform and not tied to individual editors? Because something odd is going on and maybe new page patrollers or AFC reviewers are using it for every article they review if I can not just get one or two reports on suspicious articles or drafts I've come across. I know with AI, there are ways users can get around copyright restrictions but I still found the tool helpful.

Do you have any idea why it is suddenly no longer available to generate reports? Can you tell me the time of the day when it "resets" so that maybe I could make inquries then? Or is there any possibility of raising this limit of reports generated? I mean, I'm glad it's become so popular but it has also become unavailable for use for those of us who just want to make a few queries a day. Thank you. Liz Read! Talk! 22:31, 19 July 2024 (UTC)

Hi Liz, truly sorry about the ongoing issues. I'm aware and working on it (see some of the threads above you), with the time I have available. I thought things has improved with the overall performance improvement last month, but it has really just made this particular problem of running out of the search quota much worse. Anyway, I am working on it now.
To answer your questions: yes the quota is shared by all users, and we cannot easily raise it. It's a hard limit enforced by Google that I cannot bypass without some special arrangement. It resets I think around midnight Pacific Time, i.e. Google's time zone.
I think the issue is some bots/automated traffic making too many queries. In the past I have been able to block them or ask them to slow down, but that approach has become less effective lately. So, I will be adding authentication to the tool to make sure only logged in users can use it and I can more accurately identify who is overusing it. I expect to finish that work this weekend and I am hopeful that will solve the issue. If it doesn't, there are other things I can try. — The Earwig (talk) 00:43, 20 July 2024 (UTC)
Update: I am still working on this, but have made progress. — The Earwig (talk) 05:14, 22 July 2024 (UTC)
FYI, I've also run into this issue the last couple of days. I'm assuming you're still working on it, or that life has gotten in the way of you fixing the issue. I dream of horses (Hoofprints) (Neigh at me) 21:20, 30 July 2024 (UTC)
Yes, it's still my current focus with the free time I have. — The Earwig (talk) 00:21, 31 July 2024 (UTC)
Just circling back to see how you responded to my query last month. Still have not successfully submitted a query and gotten a report in several months now. I realize that we are all volunteers so I don't have high expectations of when this issue might be "fixed" as we all have outside lives.
But I didn't realize though that regular editors were competing with bots, that's a battle individual editors can never win so please block those bots, if possible! I don't even see how a bot would be able to handle a copyright violation report and interpret it appropriately. Liz Read! Talk! 03:06, 8 August 2024 (UTC)
To second what @Liz said above, I just tried to run the copyvio tool on a promotional draft, and got the error again. Any progress to report on?
Also, Liz, I think authentication has been added so we aren't competing against bots, at least not as much, perSo, I will be adding authentication to the tool to make sure only logged in users can use it and I can more accurately identify who is overusing it. I dream of horses (Hoofprints) (Neigh at me) 23:48, 25 August 2024 (UTC)
Is there anything other people can do to help with getting the copyvio tool up, or is this something you're going to need to do on your own? I dream of horses (Hoofprints) (Neigh at me) 03:09, 25 September 2024 (UTC)
Hey Liz and I dream of horses. With substantial help from Chlod, we've released a change to require logging in to use the search engine option in the tool. (It uses OAuth, and it should redirect you automatically when running a new check.) This is still new, but it looks like this has eased our usage enough that the tool should not run out of quota so often. — The Earwig (talk) 15:19, 5 October 2024 (UTC)
Great! I dream of horses (Hoofprints) (Neigh at me) 15:32, 5 October 2024 (UTC)
It works! I dream of horses (Hoofprints) (Neigh at me) 15:33, 5 October 2024 (UTC)

Copyvio Detector and Google

Hi,

(Sorry if this is the wrong forum for asking, but if so, perhaps you could point me in the right direction?)

I use the Copyvio Detector (great tool, BTW!) in checking new AfC drafts, at least a dozen times most days. I sometimes get an error message saying that the detector has exceeded its maximum allowed Google searches. This issue has always been there, occasionally, but in the last week or two it has occurred daily. When I start reviewing, around 6am or so UK time, the first few reviews always hit this problem. Then, maybe 8am (?) the daily quota probably gets reset, or something else happens, because from then onwards everything is fine until the next morning.

So I was thinking, I don't suppose there's much we can do to increase the quota (?), but would it be possible to add another search engine as a fallback option? Either so that when the user gets that error message, they could manually tick a box to use Bing (say) instead; or maybe the Detector could automatically switch to using the alternative if Google has failed.

I realise this may not be possible, either for technical or policy reasons, but thought I'd ask at least. Cheers, -- DoubleGrazing (talk) 09:35, 8 May 2024 (UTC)

Hi DoubleGrazing, using Bing or some other engine as a fallback is definitely something we’ve discussed—I hadn’t realized the issue had gotten this bad recently. The main issue here is these services usually cost money, and while the WMF pays for our Google access right now, I don’t know if I will be able to ask for access to additional search engines. First, I can take a deeper look into whether anyone is overusing their share of the tool’s resources; we might need to block/limit them. (Our plan with Google allows about 1500 articles to be checked per day.) — The Earwig alt (talk) 16:11, 8 May 2024 (UTC)
Okay, thanks for shedding some more light on this; needless to say, I knew nothing about how these things work.
I guess we at AfC are taking up quite a chunk of that quota, given that we see what are by definition new drafts usually by new users. I for one run the check probably at least on ⅓ of the drafts I review (and if you think that makes me an overuser, feel absolutely free to point this out, of course!). Even at NPP we deal with relatively more experienced users, so there's that much less of a need to check for CV.
It may be that I see the problem worse than some others, mind, because of my weird early-morning AfC habit, combined with the time zone I'm in. -- DoubleGrazing (talk) 17:05, 8 May 2024 (UTC)
Hi again,
Quick update on this, the problem (of the copyvio detector running out of Google quota) has lately become worse. Unlike before, when it would only manifest in the early morning UK time, and usually be fine after 8am UK / 0700 UTC, it's now happening also in the afternoon. This is relatively new, maybe in the past week or two, so I've not yet have a good feel for what time it happens exactly (in case that matters); I would have said late afternoon, but eg. today it started already around 1pm UK / 1200 UTC.
Best, -- DoubleGrazing (talk) 12:35, 4 July 2024 (UTC)
Sorry taking a while to get back, but I'm actively working on an improvement for this now. — The Earwig (talk) 06:43, 19 July 2024 (UTC)
Great to hear, thanks. :) DoubleGrazing (talk) 10:35, 19 July 2024 (UTC)
Do we really still have the same quota we've had for months? (or years?) As in, are we sure it hasn't been reduced? I haven't had a copyvio check go through with the search engine box checked in what seems like weeks. I can't imagine there are suddenly so many new page patrollers that it's making that much of a difference, but... -- asilvering (talk) 22:45, 23 August 2024 (UTC)
Oh. But what has really taken off in the last several months is AI. Nevermind. I think I've answered my own question. ugh. -- asilvering (talk) 22:47, 23 August 2024 (UTC)
I think we were discussing this on WP:VPWMF a few weeks ago, and the idea of making everyone log in using OAUTH came up. If bots are indeed the problem, I think this is a good idea to try. –Novem Linguae (talk) 23:06, 23 August 2024 (UTC)
Yes, we're actively working on this. — The Earwig (talk) 00:09, 24 August 2024 (UTC)
Thanks, and good luck! -- asilvering (talk) 00:26, 24 August 2024 (UTC)
Hey DoubleGrazing and asilvering. With substantial help from Chlod, we've released a change to require logging in to use the search engine option in the tool. (It uses OAuth, and it should redirect you automatically when running a new check.) This is still new, but it looks like this has eased our usage enough that the tool should not run out of quota so often. — The Earwig (talk) 15:20, 5 October 2024 (UTC)
Brilliant, thanks so much. -- asilvering (talk) 17:47, 5 October 2024 (UTC)
Sounds good, thanks! Already tried it and seems to work well. Glad to hear it's taking some of the pressure off the quota. Cheers, -- DoubleGrazing (talk) 19:07, 5 October 2024 (UTC)

Error message on Pablo Escobar

Hello Ben, I have a weird error to report: when I perform a copyvio search on Pablo Escobar I get an error message "Access to copyvios.toolforge.org was denied, You don't have authorisation to view this page. HTTP ERROR 403". It doesn't matter what source url I try to compate it against. However if I try to compare using a specific revision ID of that article, it works okay. It's only occurred on Pablo Escobar (at least so far). Thought you might like to know. — Diannaa (talk) 20:32, 6 October 2024 (UTC)

Hey Diannaa, we had an unusual issue a while back where some bots/crawlers kept running checks against that page so I disabled it. As you noticed, the revision ID should still work. I’ll check if the bots are still hitting it and re-enable if not. — The Earwig alt (talk) 20:37, 6 October 2024 (UTC)
Ok cool, no problem though if you have to leave it, as there's a simple workaround - using the revision ID number. — Diannaa (talk) 20:39, 6 October 2024 (UTC)