User talk:The Earwig/Archive 18
This is an archive of past discussions with User:The Earwig. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page. |
Archive 15 | Archive 16 | Archive 17 | Archive 18 |
sigma.toolforge.org
Looks like toolforge:sigma got shut down in the Grid Engine deprecation (see phab:T320041). User:Σ is inactive, and you're the only other listed maintainer. Are you planning to migrate it, or should I start trying to find someone to help? AntiCompositeNumber (talk) 00:42, 21 December 2023 (UTC)
- @AntiCompositeNumber: Ah. No, the timeline's been so protracted, I haven't been actively following things and didn't know this was happening today. (The date in my mind was early next year.) I could probably do it, but certainly can't allocate time right now to immediately fix this. — The Earwig (talk) 03:27, 21 December 2023 (UTC)
- Yeah, they started shutting down tools where maintainers hadn't requested more time today. The Grid won't be shut down completely until February though. I've left a note on the phab task asking for the tool to be un-disabled in the meantime. AntiCompositeNumber (talk) 03:43, 21 December 2023 (UTC)
- Thanks! — The Earwig (talk) 03:44, 21 December 2023 (UTC)
- Hi, I'm available today or tomorrow and would have time to fix this if it is possible to add me as a co-maintainer. I might need some time to familiarize with the infra though, as it looks like the tool isn't open source. 0xDeadbeef→∞ (talk to me) 04:02, 21 December 2023 (UTC)
- Thanks for volunteering, 0xDeadbeef! I've added you as a co-maintainer. There's supposed to be a code repository but it must've disappeared (any idea where that ended up, Lego?). The active code is in
~/www/python/src
and possibly other places; there are local changes not in sync with the git repo. Feel free to ping if you have any questions, though honestly, beyond what I just said, I probably know as much as you do about this. — The Earwig (talk) 04:10, 21 December 2023 (UTC)- The repository is there, it's just marked as private. It's up to date with what's on Toolforge, aside from all the uncommitted changes that is. Probably best to push the repository to Wikimedia GitLab tbh. Legoktm (talk) 04:25, 21 December 2023 (UTC)
- I just did, at https://gitlab.wikimedia.org/toolforge-repos/sigma 0xDeadbeef→∞ (talk to me) 05:09, 21 December 2023 (UTC)
- Btw, has the "AFD Stats" page at https://sigma.toolforge.org/afdstats always been like that? 0xDeadbeef→∞ (talk to me) 06:41, 21 December 2023 (UTC)
- Besides the weird afd stats page, I've restored the others and they seem to be running fine, Lowercase sigmabot III's two daily jobs have been converted to use the new framework. Let me know if there are any other errors. 0xDeadbeef→∞ (talk to me) 07:13, 21 December 2023 (UTC)
- @0xDeadbeef: Thanks a bunch! I don't think AFD Stats has always been broken, but people are mostly using https://afdstats.toolforge.org/ now, so it's not a priority to fix. Maybe I can take a look at that myself later.
I also noticed the main page at https://sigma.toolforge.org/ still displays the 410 Gone error, though the individual tools are fine; did we have an index page before that disappeared?Scratch that, just some bad caching on my end. All good. — The Earwig (talk) 14:02, 21 December 2023 (UTC)- Well...seems like the afdstats tool is also still on the grid, c.f. https://github.com/enterprisey/afdstats/pull/27. Ping @Enterprisey! Legoktm (talk) 07:00, 22 December 2023 (UTC)
- @0xDeadbeef: Thanks a bunch! I don't think AFD Stats has always been broken, but people are mostly using https://afdstats.toolforge.org/ now, so it's not a priority to fix. Maybe I can take a look at that myself later.
- Besides the weird afd stats page, I've restored the others and they seem to be running fine, Lowercase sigmabot III's two daily jobs have been converted to use the new framework. Let me know if there are any other errors. 0xDeadbeef→∞ (talk to me) 07:13, 21 December 2023 (UTC)
- Btw, has the "AFD Stats" page at https://sigma.toolforge.org/afdstats always been like that? 0xDeadbeef→∞ (talk to me) 06:41, 21 December 2023 (UTC)
- I just did, at https://gitlab.wikimedia.org/toolforge-repos/sigma 0xDeadbeef→∞ (talk to me) 05:09, 21 December 2023 (UTC)
- The repository is there, it's just marked as private. It's up to date with what's on Toolforge, aside from all the uncommitted changes that is. Probably best to push the repository to Wikimedia GitLab tbh. Legoktm (talk) 04:25, 21 December 2023 (UTC)
- Thanks for volunteering, 0xDeadbeef! I've added you as a co-maintainer. There's supposed to be a code repository but it must've disappeared (any idea where that ended up, Lego?). The active code is in
- Hi, I'm available today or tomorrow and would have time to fix this if it is possible to add me as a co-maintainer. I might need some time to familiarize with the infra though, as it looks like the tool isn't open source. 0xDeadbeef→∞ (talk to me) 04:02, 21 December 2023 (UTC)
- Thanks! — The Earwig (talk) 03:44, 21 December 2023 (UTC)
- Yeah, they started shutting down tools where maintainers hadn't requested more time today. The Grid won't be shut down completely until February though. I've left a note on the phab task asking for the tool to be un-disabled in the meantime. AntiCompositeNumber (talk) 03:43, 21 December 2023 (UTC)
The Signpost: 24 December 2023
- Special report: Did the Chinese Communist Party send astroturfers to sabotage a hacktivist's Wikipedia article?
- News and notes: The Italian Public Domain wars continue, Wikimedia RU set to dissolve, and a recap of WLM 2023
- In the media: Consider the humble fork
- Discussion report: Arabic Wikipedia blackout; Wikimedians discuss SpongeBob, copyrights, and AI
- In focus: Liquidation of Wikimedia RU
- Technology report: Dark mode is coming
- Recent research: "LLMs Know More, Hallucinate Less" with Wikidata
- Gallery: A feast of holidays and carols
- Comix: Lollus lmaois 200C tincture
- Crossword: when the crossword is sus
- Traffic report: What's the big deal? I'm an animal!
- From the editor: A piccy iz worth OVAR 9000!!!11oneone! wordz ^_^
- Humour: Guess the joke contest
A solstice greeting
❄️ Happy holidays! ❄️
Hi Ben! I'd like to wish you a splendid solstice season as we wrap up the year. Here is an artwork, made individually for you, to celebrate. It was great to meet you in Toronto, and looking forward to collaborations in the coming year! Take care, and thanks for all you do to make Wikipedia better!Cheers,{{u|Sdkb}} talk
{{u|Sdkb}} talk 07:06, 24 December 2023 (UTC)
- Thanks very much, Sdkb! Great meeting you as well. All the best to you in the new year. — The Earwig (talk) 20:30, 24 December 2023 (UTC)
Merry Christmas!
Joyeux Noël! ~ Buon Natale! ~ Vrolijk Kerstfeest! ~ Frohe Weihnachten!
¡Feliz Navidad! ~ Feliz Natal! ~ Καλά Χριστούγεννα! ~ Hyvää Joulua!
God Jul! ~ Glædelig Jul! ~ Linksmų Kalėdų! ~ Priecīgus Ziemassvētkus!
Häid Jõule! ~ Wesołych Świąt! ~ Boldog Karácsonyt! ~ Veselé Vánoce!
Veselé Vianoce! ~ Crăciun Fericit! ~ Sretan Božić! ~ С Рождеством!
শুভ বড়দিন! ~ 圣诞节快乐!~ メリークリスマス!~ 메리 크리스마스!
สุขสันต์วันคริสต์มาส! ~ Selamat Hari Natal! ~ Giáng sinh an lành!
Весела Коледа! ~ Meri Kirihimete!
Hello, The Earwig! Thank you for your work to maintain and improve Wikipedia! Wishing you a Merry Christmas and a Happy New Year!
Chris Troutman (talk) 23:15, 24 December 2023 (UTC)
Copyvio tool is down
Hello Be. Sorry to bother you but the copyvio tool is down, it's been down for about an hour and a half with 504 gateway timeout errors. Any help appreciated. Thanks, — Diannaa (talk) 16:56, 23 December 2023 (UTC)
- Thanks; I've noticed things being a little spotty over the past couple weeks, but haven't identified a cause yet (i.e. no single culprit for increased usage). I'll continue to keep an eye out. — The Earwig (talk) 18:59, 23 December 2023 (UTC)
- Sorry to bother you today of all days, but the tool is suffering outages again, and has currently been down for an hour and a half. Thanks, — Diannaa (talk) 17:29, 25 December 2023 (UTC)
Administrators' newsletter – January 2024
News and updates for administrators from the past month (December 2023).
- Following the 2023 Arbitration Committee elections, the following editors have been appointed to the Arbitration Committee: Aoidh, Cabayi, Firefly, HJ Mitchell, Maxim, Sdrqaz, ToBeFree, Z1720.
- Following a motion, the Arbitration Committee rescinded the restrictions on the page name move discussions for the two Ireland pages that were enacted in June 2009.
- The arbitration case Industrial agriculture has been closed.
- The New Pages Patrol backlog drive is happening in January 2024 to reduce the backlog of articles in the new pages feed. Currently, there is a backlog of over 13,000 unreviewed articles awaiting review. Sign up here to participate!
The Signpost: 10 January 2024
- From the editor: NINETEEN MORE YEARS! NINETEEN MORE YEARS!
- Special report: Public Domain Day 2024
- Technology report: Wikipedia: A Multigenerational Pursuit
- News and notes: In other news ... see ya in court!
- WikiProject report: WikiProjects Israel and Palestine
- Obituary: Anthony Bradbury
- Traffic report: The most viewed articles of 2023
- Comix: Conflict resolution
User:Reports bot
Hi Earwig, I am enquiring about User:Reports bot and its task to update Wikipedia:WikiProject Women in Red/Metrics. There is a proposal to update the WikiProject banner for this project and I'm just checking that it won't disrupt the work of the bot? Best regards — Martin (MSGJ · talk) 22:33, 18 January 2024 (UTC)
- Hey MSGJ, I don’t see any issue with this. The bot is flexible about the page contents, provided its
Reports bot variable
comments on the individual metric pages are preserved. — The Earwig alt (talk) 22:44, 18 January 2024 (UTC)- Thanks. Not planning to change that page itself but only the banner {{WIR}} used to tag relevant pages within the scope of the project. It was just in case your bot was relying on any specific template or categories to find these pages. — Martin (MSGJ · talk) 09:01, 19 January 2024 (UTC)
Temporary Password
I am User:Wxao Zesty, I am requesting for a temporary password to my email. Since, the last one did not go through.216.176.69.228 (talk) 20:02, 19 January 2024 (UTC)
The Signpost: 31 January 2024
- News and notes: Wikipedian Osama Khalid celebrated his 30th birthday in jail
- Opinion: Until it happens to you
- Disinformation report: How paid editors squeeze you dry
- Recent research: Croatian takeover was enabled by "lack of bureaucratic openness and rules constraining [admins]"
- Traffic report: DJ, gonna burn this goddamn house right down
Administrators' newsletter – February 2024
News and updates for administrators from the past month (January 2024).
- An RfC about increasing the inactivity requirement for Interface administrators is open for feedback.
- Pages that use the JSON contentmodel will now use tabs instead of spaces for auto-indentation. This will significantly reduce the page size. (T326065)
- Following a motion, the Arbitration Committee adopted a new enforcement restriction on January 4, 2024, wherein the Committee may apply the 'Reliable source consensus-required restriction' to specified topic areas.
- Community feedback is requested for a draft to replace the "Information for administrators processing requests" section at WP:AE.
- Voting in the 2024 Steward elections will begin on 06 February 2024, 14:00 (UTC) and end on 27 February 2024, 14:00 (UTC). The confirmation process of current stewards is being held in parallel. You can automatically check your eligibility to vote.
- A vote to ratify the charter for the Universal Code of Conduct Coordinating Committee (U4C) is open till 2 February 2024, 23:59:59 (UTC) via Secure Poll. All eligible voters within the Wikimedia community have the opportunity to either support or oppose the adoption of the U4C Charter and share their reasons. The details of the voting process and voter eligibility can be found here.
- Community Tech has made some preliminary decisions about the future of the Community Wishlist Survey. In summary, they aim to develop a new, continuous intake system for community technical requests that improves prioritization, resource allocation, and communication regarding wishes. Read more
- The Unreferenced articles backlog drive is happening in February 2024 to reduce the backlog of articles tagged with {{Unreferenced}}. You can help reduce the backlog by adding citations to these articles. Sign up to participate!
Using The Wikipedia Library for copyvio detection
Hello. I noticed that large chunks of this section of herbicide are copied directly from this source(you'll need to log in) but the copyvio detector doesn't pick it up: [1]. I can't find a tool to show it nicely, but it is especially obvious if you look at the original diff: [2]. Presumably it isn't detected because the tool can't access the full text? I just wondered whether you'd considered linking up the detector with WP:TWL so that it can check the full text? Admittedly, I am not sure whether the publishers permit automated access, but you would think that they would like us to be checking whether their copyright is being violated! @Samwalton9 (WMF): just in case they can add anything. SmartSE (talk) 10:29, 19 December 2023 (UTC)
- @Smartse It's an interesting idea! I don't think we could do anything immediately, but if it would be feasible/helpful we could initiate a conversation with one of more of the library's partners about this. Perhaps EBSCO, given that they're our search provider? I'm not sure on the details of how this would work. Samwalton9 (WMF) (talk) 12:56, 19 December 2023 (UTC)
- Hey Smartse. I'm with Samwalton9 that this would be really cool to support, but I'd be very surprised if TWL's partners would be willing to open up a service to us that would enable the copyvio detector to check content programmatically. Initiating a conversation couldn't hurt, though. — The Earwig (talk) 03:56, 21 December 2023 (UTC)
- @The Earwig It's not impossible to imagine - TWL's partners are often concerned that WP editors are going to be copying content, so being able to say "we want to make absolutely sure that's not happening" could be seen quite positively. Would EBSCO be the right organisation, do you think, since they run (and provide us with) EBSCO Discovery Service? Samwalton9 (WMF) (talk) 09:51, 21 December 2023 (UTC)
- @Samwalton9 (WMF): I was initially thinking of just searching the sources cited in the article. Apparently, most of the full texts can be accessed by appending the DOI to https://doi-org.wikipedialibrary.idm.oclc.org/ so it shouldn't be too difficult to programmatically access the full text (not withstanding the authentication and any rate-limiting) and then the text could be compared as the tool already does. I'm not familar with EBSCO, but I imagine that using that would be more complicated as you would need to take chunks of the article, query the search engine repeatedly and then check full texts that could be matches. I also posted about this at meta:Talk:CopyPatrol#Can_the_tool_access_paywalled_full_texts? and the ithenticate service can detect it in a new edit - see the hit for link.springer.com - even though the full text is paywalled, so maybe using that service in this tool could be an option as well? It seems like that tool does a pretty good job of catching new copyvios but we are less capable of detecting old instances. SmartSE (talk) 12:26, 21 December 2023 (UTC)
- Checking the DOIs of sources directly cited would be a good start and wouldn't require us to get a search engine working, so we could try that (though the full scope is of course somewhat limited). If I'm to do that through TWL's proxy, we'd need to get the bot access somehow and confirm this usage is within their terms. @Samwalton9: I'm also unfamiliar with EBSCO and from skimming the linked pages it's not clear to me if they offer a search API that I would be able to use for what SmartSE described (query the search engine repeatedly given text snippets from the article and receive results that enable me to get the full text of the source for comparison). I see discussion of end-user search tools, but not an API. One change to the copyvio detector I am sure we will need to make is not showing the user the full text of the suspected source, only the copied snippets. — The Earwig (talk) 14:19, 21 December 2023 (UTC)
- @The Earwig Is this a helpful link? Once we've confirmed this is a viable and useful approach I'd be happy to bring this up with them. Samwalton9 (WMF) (talk) 16:07, 8 January 2024 (UTC)
- @Samwalton9 (WMF): Probably. I can't say for sure (the API documentation requires an account, and I still don't know the terms of use), but it looks like the right direction. Thanks! — The Earwig (talk) 17:01, 8 January 2024 (UTC)
- Alright, I'll get an initial conversation kicked off with them and see how feasible this is. I'll be in touch! Samwalton9 (WMF) (talk) 10:33, 12 January 2024 (UTC)
- @The Earwig Good news! We met with EBSCO today and they're enthusiastic about the idea. Their main question was around request load - do you have any data/estimates about how many daily or monthly requests Copyvios makes?
- The other topic we talked about was how pulling the text through would work (or not). EDS has access to all these databases to index for searching, but not necessarily for displaying full text. Even if they did, that would be for subscribing customers so there would be some concern about pulling the full text through to display publicly in the tool. It might be the case that they could return some information about finding a match in a source, but perhaps not display the actual matched text directly. That's something we'll need to get more clarity on with them, but perhaps even if that is the case we could make some UI changes to highlight that a match was found in EDS, and the relevant URL, but not display the matching text? Happy to think that through with you.
- If this still sounds feasible to you I'd be happy to copy you into our email thread so you could ask any more specific questions you might have. Samwalton9 (WMF) (talk) 16:25, 5 February 2024 (UTC)
- @Samwalton9 (WMF): Sounds good, thanks for the update! We can definitely indicate a match without including the full text if needed. There is already some support in the tool for this with the Turnitin option.
- Regarding request rate, the tool checks about 1,200 articles per day or 36,000 per month. I'd be surprised if that's too much for them, but we could make the new functionality opt-in like Turnitin, so users have to check a box to use EDS which will drastically reduce the rate (the Turnitin feature is used only 100 times/day). — The Earwig (talk) 16:54, 5 February 2024 (UTC)
- @The Earwig Thanks for the data! I remember reading somewhere that the tool makes multiple requests per article check, is that right? I wonder if you have a sense of how many actual API requests are being made? Samwalton9 (WMF) (talk) 13:05, 6 February 2024 (UTC)
- @Samwalton9 (WMF): Yes, that's right – up to 8 per article, depending on page size, but again, configurable. Altogether for Google Search the number is under 10k for most days. — The Earwig (talk) 14:41, 6 February 2024 (UTC)
- Great, thanks! I've cc'd you on an email. Samwalton9 (WMF) (talk) 15:36, 6 February 2024 (UTC)
- @Samwalton9 (WMF): Yes, that's right – up to 8 per article, depending on page size, but again, configurable. Altogether for Google Search the number is under 10k for most days. — The Earwig (talk) 14:41, 6 February 2024 (UTC)
- @The Earwig Thanks for the data! I remember reading somewhere that the tool makes multiple requests per article check, is that right? I wonder if you have a sense of how many actual API requests are being made? Samwalton9 (WMF) (talk) 13:05, 6 February 2024 (UTC)
- Alright, I'll get an initial conversation kicked off with them and see how feasible this is. I'll be in touch! Samwalton9 (WMF) (talk) 10:33, 12 January 2024 (UTC)
- @Samwalton9 (WMF): Probably. I can't say for sure (the API documentation requires an account, and I still don't know the terms of use), but it looks like the right direction. Thanks! — The Earwig (talk) 17:01, 8 January 2024 (UTC)
- @The Earwig Is this a helpful link? Once we've confirmed this is a viable and useful approach I'd be happy to bring this up with them. Samwalton9 (WMF) (talk) 16:07, 8 January 2024 (UTC)
- Checking the DOIs of sources directly cited would be a good start and wouldn't require us to get a search engine working, so we could try that (though the full scope is of course somewhat limited). If I'm to do that through TWL's proxy, we'd need to get the bot access somehow and confirm this usage is within their terms. @Samwalton9: I'm also unfamiliar with EBSCO and from skimming the linked pages it's not clear to me if they offer a search API that I would be able to use for what SmartSE described (query the search engine repeatedly given text snippets from the article and receive results that enable me to get the full text of the source for comparison). I see discussion of end-user search tools, but not an API. One change to the copyvio detector I am sure we will need to make is not showing the user the full text of the suspected source, only the copied snippets. — The Earwig (talk) 14:19, 21 December 2023 (UTC)
- @Samwalton9 (WMF): I was initially thinking of just searching the sources cited in the article. Apparently, most of the full texts can be accessed by appending the DOI to https://doi-org.wikipedialibrary.idm.oclc.org/ so it shouldn't be too difficult to programmatically access the full text (not withstanding the authentication and any rate-limiting) and then the text could be compared as the tool already does. I'm not familar with EBSCO, but I imagine that using that would be more complicated as you would need to take chunks of the article, query the search engine repeatedly and then check full texts that could be matches. I also posted about this at meta:Talk:CopyPatrol#Can_the_tool_access_paywalled_full_texts? and the ithenticate service can detect it in a new edit - see the hit for link.springer.com - even though the full text is paywalled, so maybe using that service in this tool could be an option as well? It seems like that tool does a pretty good job of catching new copyvios but we are less capable of detecting old instances. SmartSE (talk) 12:26, 21 December 2023 (UTC)
- @The Earwig It's not impossible to imagine - TWL's partners are often concerned that WP editors are going to be copying content, so being able to say "we want to make absolutely sure that's not happening" could be seen quite positively. Would EBSCO be the right organisation, do you think, since they run (and provide us with) EBSCO Discovery Service? Samwalton9 (WMF) (talk) 09:51, 21 December 2023 (UTC)
The Signpost: 13 February 2024
- News and notes: Wikimedia Russia director declared "foreign agent" by Russian gov; EU prepares to pile on the papers
- Disinformation report: How low can the scammers go?
- Serendipity: Is this guy the same as the one who was a Nazi?
- Traffic report: Griselda, Nikki, Carl, Jannik and two types of football
- Crossword: Our crossword to bear
- Comix: Strongly
lowercase sigmabot III
Hi! I reached out to Σ by email about lowercase sigmabot III, which had not been archiving anything (with the exceptions of AN and ANI) since last week. They responded (by email) saying Please reach out to Earwig for this issue. The crontab was erased somehow, which means that it's no longer running the bot on its schedule. I'm not sure what changed but I think he will know where to look
and that For the time being I just kicked it off manually.
Thank you for any insight you might have! HouseBlaster (talk · he/him) 15:07, 28 February 2024 (UTC)
- Thanks for letting me know. I'll take a look at this. — The Earwig (talk) 15:29, 28 February 2024 (UTC)
- It's not clear what the original issue was, but I've jiggled things a bit, so if we're lucky it won't happen again. — The Earwig (talk) 16:29, 28 February 2024 (UTC)
- Thank you! HouseBlaster (talk · he/him) 17:05, 28 February 2024 (UTC)
Administrators' newsletter – March 2024
News and updates for administrators from the past month (February 2024).
|
|
- Phase I of the 2024 RfA review is now open for participation. Editors are invited to review, comment on, and propose improvements to the requests for adminship process.
- Following an RfC, the inactivity requirement for the removal of the interface administrator right increased from 6 months to 12 months.
- The mobile site history pages now use the same HTML as the desktop history pages. (T353388)
- The 2024 appointees for the Ombuds commission are だ*ぜ, AGK, Ameisenigel, Bennylin, Daniuu, Doǵu, Emufarmers, Faendalimas, MdsShakil, Minorax, Nehaoua, Renvoy and RoySmith as members, with Vermont serving as steward-observer.
- Following the 2024 Steward Elections, the following editors have been appointed as stewards: Ajraddatz, Albertoleoncio, EPIC, JJMC89, Johannnes89, Melos and Yahya.
The Signpost: 2 March 2024
- News and notes: Wikimedia enters US Supreme court hearings as "the dolphin inadvertently caught in the net"
- Recent research: Images on Wikipedia "amplify gender bias"
- In the media: The Scottish Parliament gets involved, a wikirace on live TV, and the Foundation's CTO goes on record
- Obituary: Vami_IV
- Traffic report: Supervalentinefilmbowlday
- WikiCup report: High-scoring WikiCup first round comes to a close
Revdel-responder
Hi, it could be a WP:THURSDAY thing but the revdel-respoder script seems to have a problem today. I keep getting a message "Sorry! revdel-responder failed to parse the page content". I'm not good enough at interpreting the console to work out what's gone wrong. Nthep (talk) 11:44, 14 March 2024 (UTC)
- not sure if anything has happened during the day but, it seems to have resolved itself. Nthep (talk) 18:49, 14 March 2024 (UTC)
- Thanks for letting me know, Nthep. It's possible that was some intermittent error. If you run across it again, let me know the page, or send me the text from the console (right click -> Inspect -> "Console" tab, there should be a line starting with "Error while parsing page content"). — The Earwig (talk) 03:48, 15 March 2024 (UTC)
The Signpost: 29 March 2024
- Technology report: Millions of readers still seeing broken pages as "temporary" disabling of graph extension nears its second year
- Recent research: "Newcomer Homepage" feature mostly fails to boost new editors
- Traffic report: He rules over everything, on the land called planet Dune
- Humour: Letters from the editors
- Comix: Layout issue
Administrators' newsletter – April 2024
News and updates for administrators from the past month (March 2024).
- An RfC is open to convert all current and future community discretionary sanctions to (community designated) contentious topics procedure.
- The Toolforge Grid Engine services have been shut down after the final migration process from Grid Engine to Kubernetes. (T313405)
- An arbitration case has been opened to look into "the intersection of managing conflict of interest editing with the harassment (outing) policy".
- Editors are invited to sign up for The Core Contest, an initiative running from April 15 to May 31, which aims to improve vital and other core articles on Wikipedia.
request to tag article talk pages within scope of Women's Basketball
Hi The Earwig,
I would like to request that talk pages for articles within the scope of WP:WBB be tagged with both
Basketball: Women's Unassessed | ||||||||||
|
Women's sport: Basketball Unassessed | |||||||||||||
|
Do you need additional information and/or should I post this request somewhere else?
Thank you, Hmlarson (talk) 23:20, 5 March 2024 (UTC)
- Hi Hmlarson, you'll need to define what "within the scope of WP:WBB" means in order to run the bot. — The Earwig (talk) 01:52, 6 March 2024 (UTC)
- Can you do any article tagged with a subcategory of Category:Women's basketball? Hmlarson (talk) 18:58, 11 March 2024 (UTC)
- @Hmlarson: OK. Subcats are sometimes tricky because of unexpected relationships (a subcategory of a subcategory a few levels deep sometimes has little relationship with the original category), but I reviewed this situation, and it looks mostly fine.
- I'll have the bot generate a list of pages it would tag, and we can double-check those. It'll take me a few days.
- Separately, there is a requirement that you mention on the WikiProject talk page that you want to run this tagging job, in case there are any objections.
- Thanks! — The Earwig (talk) 03:43, 12 March 2024 (UTC)
- Thank you. Sounds good. I've posted the notice here. Hmlarson (talk) 17:18, 12 March 2024 (UTC)
- Hi The Earwig - Any chance you have you can provide an ETA on this request? Thank you! Hmlarson (talk) 20:01, 1 April 2024 (UTC)
- So sorry for the wait here, I had to make some code changes to handle tagging both banners and a few personal things came up – I have some free time now and will get back to you
tomorrowin a few days. — The Earwig (talk) 20:19, 6 April 2024 (UTC)
- So sorry for the wait here, I had to make some code changes to handle tagging both banners and a few personal things came up – I have some free time now and will get back to you
- Hi The Earwig - Any chance you have you can provide an ETA on this request? Thank you! Hmlarson (talk) 20:01, 1 April 2024 (UTC)
- Thank you. Sounds good. I've posted the notice here. Hmlarson (talk) 17:18, 12 March 2024 (UTC)
- Can you do any article tagged with a subcategory of Category:Women's basketball? Hmlarson (talk) 18:58, 11 March 2024 (UTC)
The Signpost: 25 April 2024
- In the media: Censorship and wikiwashing looming over RuWiki, edit wars over San Francisco politics, and another wikirace on live TV
- News and notes: A sigh of relief for open access as Italy makes a slight U-turn on their cultural heritage reproduction law
- WikiConference report: WikiConference North America 2023 in Toronto recap
- WikiProject report: WikiProject Newspapers (Not WP:NOTNEWS)
- Recent research: New survey of over 100,000 Wikipedia users
- Traffic report: O.J., cricket and a three body problem
Copyvio Detector not working well
Hello Ben, hope you are well. I just thought I'd let you know that the Copyvio Detector is not functioning all that well thae last couple of days, timing out on just about every comparison. ("The URL https://www.dvfu.ru/en/about/ timed out before any data could be retrieved", for example.) Even times out on simple, short webpages of the type that it's usually able to access easily. Any assistance appreciated. Thanks, — Diannaa (talk) 13:34, 1 May 2024 (UTC)
- Hi Diannaa. We (Chlod and I) did just block a misbehaving bot last night, so that would account for some extra load, but it doesn't totally explain the issue. That one URL is working for me at the moment, only taking a couple seconds. I will investigate further. — The Earwig (talk) 15:27, 1 May 2024 (UTC)
Administrators' newsletter – May 2024
News and updates for administrators from the past month (April 2024).
- Phase I of the 2024 requests for adminship review has concluded. Several proposals have passed outright and will proceed to implementation, including creating a discussion-only period (3b) and administrator elections (13) on a trial basis. Other successful proposals, such as creating a reminder of civility norms (2), will undergo further refinement in Phase II. Proposals passed on a trial basis will be discussed in Phase II, after their trials conclude. Further details on specific proposals can be found in the full report.
- Partial action blocks are now in effect on the English Wikipedia. This means that administrators have the ability to restrict users from certain actions, including uploading files, moving pages and files, creating new pages, and sending thanks. T280531
- The arbitration case Conflict of interest management has been closed.
- This may be a good time to reach out to potential nominees to ask if they would consider an RfA.
- A New Pages Patrol backlog drive is happening in May 2024 to reduce the number of unreviewed articles in the new pages feed. Currently, there is a backlog of over 15,000 articles awaiting review. Sign up here to participate!
- Voting for the Universal Code of Conduct Coordinating Committee (U4C) election is open until 9 May 2024. Read the voting page on Meta-Wiki and cast your vote here!
The Signpost: 16 May 2024
- News and notes: Democracy in action: multiple elections
- Special report: Will the new RfA reform come to the rescue of administrators?
- Arbitration report: Ruined temples for posterity to ponder over – arbitration from '22 to '24
- Comix: Generations
- Traffic report: Crawl out through the fallout, baby
Administrators' newsletter – June 2024
News and updates for administrators from the past month (May 2024).
- Phase II of the 2024 RfA review has commenced to improve and refine the proposals passed in Phase I.
- The Nuke feature, which enables administrators to mass delete pages, will now correctly delete pages which were moved to another title. T43351
- The arbitration case Venezuelan politics has been closed.
- The Committee is seeking volunteers for various roles, including access to the conflict of interest VRT queue.
- WikiProject Reliability's unsourced statements drive is happening in June 2024 to replace {{citation needed}} tags with references! Sign up here to participate!
WikiProject Banner Tagging
Hi, @The Earwig! You seem to be the most active operator for one of the Category:WikiProject tagging bots so I hope this isn't a bother. I'm overseeing the newly created WP:WikiProject AfroCreatives now and would like to disseminate {{WikiProject AfroCreatives}} through our targeted articles in the AfroCreatives categories with all subcategories included. We are willing to make use of auto assessment and to inherit it from existing WP banners too. The template already accommodates this. I would very much appreciate your help. Assem Khidhr (talk) 06:12, 5 May 2024 (UTC)
- Hi Assem Khidhr, my apologies for not replying to this sooner, but as you probably guessed by my lack of response I don't have the free time to work on this task at the moment. Sorry. — The Earwig (talk) 04:23, 7 June 2024 (UTC)
- Best of luck, @The Earwig. I was since granted AWB authorization and managed to add those banners myself. Thanks! Assem Khidhr (talk) 15:51, 7 June 2024 (UTC)
Copyvio detector not working
Hello Ben, sorry to bother you so early and on a Sunday. The Copyvio detector seems unable to perform any comparisons at the moment. It sits and spins for three minutes before timing out ("The URL https://www.bbc.com/news/articles/cz55y6k0p5go timed out before any data could be retrieved.") Any assistance appreciated, as we have a lot of reports at CopyPatrol, a lot more than usual, and we will not be able to assess them without this tool. Thank you! — Diannaa (talk) 11:48, 2 June 2024 (UTC)
Update: It seems to be functioning normally now. Thank you! — Diannaa (talk) 14:08, 2 June 2024 (UTC)
@The Earwig: It's down again as of 6 June 2024. It takes a long time to reach and then after entering the page title and clicking submit in runs after several minutes with 0 errors. I've tried this with other articles, that got higher vilolations before. Thanks for any help you can provide. Greg Henderson (talk)09:06, 2 June 2024 (UTC)
Today, getting the error message: "An error occurred while using the search engine (Google Error: HTTP Error 429: Too Many Requests). Note: there is a daily limit on the number of search queries the tool is allowed to make. You may repeat the check without using the search engine." Greg Henderson (talk) 23:14, 7 June 2024 (UTC)
- (talk page watcher) @Greghenderson2006: This happens when we've reached our daily quota with Google. Unfortunately, the copyvio detector can only handle up to around 1,250 a day. You'll need to try again after a few hours or so. In the meantime, you can try using the copyvio detector without search engine checks, which will still work. Chlod (say hi!) 01:07, 8 June 2024 (UTC)
The Signpost: 8 June 2024
- Technology report: New Page Patrol receives a much-needed software upgrade
- Deletion report: The lore of Kalloor
- In the media: National cable networks get in on the action arguing about what the first sentence of a Wikipedia article ought to say
- News from the WMF: Progress on the plan — how the Wikimedia Foundation advanced on its Annual Plan goals during the first half of fiscal year 2023-2024
- Recent research: ChatGPT did not kill Wikipedia, but might have reduced its growth
- Featured content: We didn't start the wiki
- Essay: No queerphobia
- Special report: RetractionBot is back to life!
- Traffic report: Chimps, Eurovision, and the return of the Baby Reindeer
- Comix: The Wikipediholic Family
- Concept: Palimpsestuous
lowercase sigmabot III not archiving properly
For about the last three days, lowercase sigmabot III has only been archiving the Administrator's noticeboards and nothing else. Somebody mentioned that you gave it a good kick the last time it went on the fritz, so I will go ahead and notify you. Safiel (talk) 16:37, 29 April 2024 (UTC)
- Thanks for the notice. I've kicked it again and added a workaround in case this issue happens again. — The Earwig (talk) 04:29, 30 April 2024 (UTC)
- Hi, hope you're well. I think the bot is down again. ~~ AirshipJungleman29 (talk) 11:36, 12 June 2024 (UTC)
- Thanks, AirshipJungleman29. Different issue from last time. I think I've fixed it. — The Earwig (talk) 03:01, 13 June 2024 (UTC)
- Hi, hope you're well. I think the bot is down again. ~~ AirshipJungleman29 (talk) 11:36, 12 June 2024 (UTC)
Copyvio detector constantly timing out
Hello again Ben! I am having issues with the Copyvio detector, finding it almost impossible to get it to generate a report. "The URL http://weaponsystems.net/weaponsystem/CC02%20-%20PTZ89.html timed out before any data could be retrieved" for example. Frequently it goes down completely as well. Any assistance appreciated. Thanks, — Diannaa (talk) 11:00, 13 June 2024 (UTC)
- Sorry, there aren't any quick fixes for this. I am working on it. — The Earwig (talk) 16:06, 13 June 2024 (UTC)
- Actually, I’ve found a partial fix to improve performance. Let’s see if it helps. — The Earwig alt (talk) 17:19, 13 June 2024 (UTC)
- It's much better, thanks! Fixing copyvio is tedious enough lol. — Diannaa (talk) 23:16, 13 June 2024 (UTC)
- Actually, I’ve found a partial fix to improve performance. Let’s see if it helps. — The Earwig alt (talk) 17:19, 13 June 2024 (UTC)
The Signpost: 4 July 2024
- News and notes: WMF board elections and fundraising updates
- Special report: Wikimedia Movement Charter ratification vote underway, new Council may surpass power of Board
- In focus: How the Russian Wikipedia keeps it clean despite having just a couple dozen administrators
- Discussion report: Wikipedians are hung up on the meaning of Madonna
- In the media: War and information in war and politics
- Sister projects: On editing Wikisource
- Opinion: Etika: a Pop Culture Champion
- Gallery: Spokane Willy's photos
- Humour: A joke
- Recent research: Is Wikipedia Politically Biased? Perhaps
- Traffic report: Talking about you and me, and the games people play
Administrators' newsletter – July 2024
News and updates for administrators from the past month (June 2024).
- Local administrators can now add new links to the bottom of the site Tools menu without using JavaScript. Documentation is available on MediaWiki. (T6086)
- The Community Wishlist is re-opening on 15 July 2024. Read more
Copyvios + Arc (Also, RichBot)
Hi Ben,
I've started using the Arc browser, for some reason whenever I try and access Copyvios on it, I get an Internal Server Error. Trying the same URL in Edge works fine. Not sure where the bug is there, but hopefully you can find it.
Also, I see above there still seems to be issues regarding usage, did you need me to tone RichBot down a bit? - RichT|C|E-Mail 17:10, 28 June 2024 (UTC)
- Hey Rich, sorry I took a bit to reply. This is my first time hearing about Arc and I don't really feel like creating an account to test, so I can't confirm on my end. Are you sure it's an Internal Server Error or may it be a 403 Forbidden? (We may have inadvertently blocked its user agent as a crawler, which would give a 403, but I don't see anything in our block list that looks like it or Chrome [except Linux], so I don't know.) This is pretty strange.
- Regarding bot usage, there are two main issues the tool's had lately: general downtime and exhausting our Google credits. I've improved the tool's performance a bit so the former is not a major issue now, but we are still frequently exhausting our daily Google quota. I've checked RichBot's usage and recently it's been consuming around 10-20% of our total Google credits. That's not too excessive, but if you could find a way to tone it down a bit compromising its usefulness, it would be appreciated. — The Earwig (talk) 08:10, 1 July 2024 (UTC)
- No worries, I have reduced RichBot to only look at 100 (plus existing CVs) per run, so 200 per day (excluding manual runs). Is there a way we can increase the credits? I don't mind throwing some £ at it if need be - RichT|C|E-Mail 09:31, 1 July 2024 (UTC)
- No way that I know of unfortunately; the WMF pays for it, but Google's API terms limit our usage without some kind of special arrangement that I have been unable to get. — The Earwig (talk) 15:25, 1 July 2024 (UTC)
- Typical Google lol... ah well, worth a shot - RichT|C|E-Mail 17:52, 1 July 2024 (UTC)
- Hey The Earwig. Big fan. Is there a venue where advocacy from affected editors might get us closer to that special arrangement? Firefangledfeathers (talk / contribs) 17:50, 18 July 2024 (UTC)
- Hi Firefangledfeathers, thank you. I'm not sure who we could talk to about this, to be honest. My former contact at the WMF no longer works there and it's not clear to me who is responsible for managing the relationship with Google right now. Going the other way, i.e. getting someone in a position of power at Google who could help, might be more fruitful. But that is just speculation; I don't know who specifically that might be. — The Earwig (talk) 06:02, 19 July 2024 (UTC)
- Thanks. I don't have any bright ideas. I'll probably go with the low-hanging fruit and post at WP:VPWMF. Firefangledfeathers (talk / contribs) 12:00, 19 July 2024 (UTC)
- Hi Firefangledfeathers, thank you. I'm not sure who we could talk to about this, to be honest. My former contact at the WMF no longer works there and it's not clear to me who is responsible for managing the relationship with Google right now. Going the other way, i.e. getting someone in a position of power at Google who could help, might be more fruitful. But that is just speculation; I don't know who specifically that might be. — The Earwig (talk) 06:02, 19 July 2024 (UTC)
- No way that I know of unfortunately; the WMF pays for it, but Google's API terms limit our usage without some kind of special arrangement that I have been unable to get. — The Earwig (talk) 15:25, 1 July 2024 (UTC)
- And it's definitely a 500, 'The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.' - RichT|C|E-Mail 14:07, 1 July 2024 (UTC)
- Ah, I think I've figured it out. Could you try now? — The Earwig (talk) 15:36, 1 July 2024 (UTC)
- Much better :) Thanks :D - RichT|C|E-Mail 17:51, 1 July 2024 (UTC)
- Ah, I think I've figured it out. Could you try now? — The Earwig (talk) 15:36, 1 July 2024 (UTC)
- No worries, I have reduced RichBot to only look at 100 (plus existing CVs) per run, so 200 per day (excluding manual runs). Is there a way we can increase the credits? I don't mind throwing some £ at it if need be - RichT|C|E-Mail 09:31, 1 July 2024 (UTC)
The Signpost: 22 July 2024
- Discussion report: Internet users flock to Wikipedia to debate its image policy over Trump raised-fist photo
- News and notes: Wikimedia community votes to ratify Movement Charter; Wikimedia Foundation opposes ratification
- Obituary: JamesR
- Crossword: Vaguely bird-shaped crossword
Administrators' newsletter – August 2024
News and updates for administrators from the past month (July 2024).
- Global blocks may now target accounts as well as IP's. Administrators may locally unblock when appropriate.
- Users wishing to permanently leave may now request "vanishing" via Special:GlobalVanishRequest. Processed requests will result in the user being renamed, their recovery email being removed, and their account being globally locked.
- The Arbitration Committee appointed the following administrators to the conflict of interest volunteer response team: Bilby, Extraordinary Writ
Earwig's Copyvio Detector
Hello, The Earwig,
I have a question about this editing tool. It seemed like I could run this 20 or more times before I got a notice that I had reached my daily limit. But now, I receive a notice if I just run it a few times. Has this limit been decreased for some reason? I use this tool quite a lot while patrolling drafts and CSD categories so it's sometimes difficult to remember to go back to reexamine some pages the next day when I have reached my daily limit for the current day. Thanks for any insight you can provide. Liz Read! Talk! 20:21, 8 June 2024 (UTC)
- Hi Liz. Rest assured this isn't related to your own usage of the tool. The daily limit is shared by all users, and allows for about 1000–2000 pages to be checked per day, so even if you're checking a few dozen, that's not a major contributor to the limit getting reached. We've been noticing this issue more frequently recently (see a few threads above) and we're doing some work to restrict other users of the tool who are actually overusing their share of its resources. I'm hoping to have things back to normal soon. — The Earwig (talk) 04:23, 11 June 2024 (UTC)
- I didn't realize that I posted two messages about the same issue. I should have reviewed your talk page before posting my subsequent message. I guess I have a sense of frustration now that I know I'm competing with RichBot for copyright inquiries. Liz Read! Talk! 03:11, 8 August 2024 (UTC)
Earwig returns 0% on url-comparison with clever close paraphrase
Hello. I noticed a {{circular}} tag at Ceteris paribus and ran this URL comparison to find out how much duplication there was, and in what section(s). To my surprise, it came back with 0.0%. However, notice these:
Comparison snippets
|
---|
From: https://www.masterclass.com/articles/ceteris-paribus-explained#7MlD3BCbNL4NC0BejpGo02 1. Supply chain: Ceteris paribus considers production factors, such as logistics, sourcing, competition, and trends with buyers to determine the price of goods. For example, a bread seller observes the costs of the ingredients, labor, packaging, and distribution, in addition to competitors, economic inflation, and consumer trends. Ceteris paribus stipulates that if other factors remain the same, a decrease in the supply of bread will cause prices to rise. 2. The law of supply and demand: In the law of demand, buyers demand less of an economic good when prices are higher. The law of supply says that sellers will supply more of an economic good when prices are higher. The interaction of these two laws determines the actual market price and volume of goods. Ceteris paribus identifies, isolates, and tests the impact of an independent variable that would affect these two laws and the causal factors in the market supply and prices. 3. Gross domestic product: Economists use ceteris paribus to study the GDP, assuming that variables remain fixed to determine the effect in the money market. 4. Interest rates: If the interest rates increase, the independent variable, then the demand for debt goes down as the cost of borrowing increases, the dependent variable. 5. Minimum wage: Economists use ceteris paribus to determine the potential effects of a minimum wage increase, including the possible outcome of fewer jobs available if companies must pay employees more. From Ceteris paribus#Applications rev. 1238986793: The concept of ceteris paribus is crucial for economists and can be applied in researching:
|
There is a lot of close paraphrase here, maybe enough to cover their tracks and confuse the detector. I remember glancing at Andrei Broder's shingle-based detection paper eons ago (might be this one) and I don't know how yours works, but if it is shingle-based, would it be feasible to add a new param to the input form, or in the settings, maybe in an 'advanced' section, to set the shingle size? In a case of paraphrase like this one, where the information is clearly copied but words are shifted around in the sentences, a shorter shingle size might do a lot better at detecting the similarities. This might kill processing time in the web search version, so maybe would only work when the 'url' radio button was selected, but still could be pretty useful for cases like that, and might make a great tool for assigning a measurable value to close paraphrase, which afaik we do not have currently, and is all very hand-wavy. Thanks, Mathglot (talk) 19:32, 6 August 2024 (UTC)
- It does slightly better (4.8%) specifying revision id 1151114395. What is going on here? Mathglot (talk) 20:09, 6 August 2024 (UTC)
- Okay, just noticed that in both of those revisions, Earwig doesn't appear to see past the first short section of the web page, so the paraphrased section I am addressing doesn't appear to be visible to Earwig, or at least, it isn't displaying it on the comparison page, for some reason, if you scroll down. Mathglot (talk) 21:59, 6 August 2024 (UTC)
- That's exactly it, Mathglot. The website loads its content through JavaScript so it's not available to the tool. There isn't an easy workaround for this, but there are some options I could try further in the future. Since the content doesn't show up in the comparison view as part of the source, my hope is that people will figure out what's going on, as you were able to. — The Earwig (talk) 00:23, 7 August 2024 (UTC)
- Thanks for that. Even if it could see it, I wonder if it would come up with any kind of rating, due to the paraphrase? Not sure what kind of test bed you use, but if you could copy the MasterClass page and save it offline locally (post-js, or just scraping the rendered page manually and saving it) and run Earwig against that file, I'd be interested to see what it would come up with. And if you use shingling and it's parametrizable, whether the rating would change if you reduced the shingle size. Mathglot (talk) 01:14, 7 August 2024 (UTC)
- OK, I can do a quick experiment of that, Mathglot. The tool does use shingling, actually. I haven't seen this paper and independently came up with a similar algorithm many years ago. Internally I call the shingle size the degree, and I've exposed that as a query-string-only parameter if you would like to play with it.
- I manually copied the text to a pastebin. With the tool's default shingle size of 5 words, almost no similar text is found, and the similarity score is 5.7%. With size 3, it's 38.3%. With size 2, it's 67.1%. At this point a lot of the similar content is trivial ("is a", "in the", "of the"), so the odds of a false positive are much higher, though it does at least highlight some interesting similarities, too.
- The tool doesn't have a way of identifying more unique common phrases. If we could down-weigh "is a" but up-weigh, say, "wage economists", we could lower the default shingle size and get more sensitive results. The default size was actually 3 several years ago, but I raised it because the false positive rate was just a bit too high and it was causing confusion. So there's a delicate balancing act with the current algorithm.
- Food for thought. Thanks. — The Earwig (talk) 05:20, 7 August 2024 (UTC)
- Oh, that's very thought-provoking, thanks! You could start with a stop-word list, and eliminate those, and there may be lists of bigrams containing stop words. I searched /most common bi-grams with stop words in English/ and repeatedly ran into "tidytext in R", and "NLTK in Python"; also articles like 1, 2. As far as how to down-weigh and up-weigh, TF-IDF is one very standard solution, which works better on a larger corpus or bag of words, which you could accumulate yourself, by just dumping all of the words of each document you come across into a list, and counting later, maybe once a week or month, and recalculating the frequencies, but my understanding is that there is a budget available for Earwig (for the Google API) and it's likely that there is a term frequency list out there somewhere for English, and we could just buy it. (You would only have to do that once in theory, although language does evolve, so maybe once a year?) Then you wouldn't have to build your own bag of words. Your experiment looks really interesting, and I wonder if any of these other ideas would kick it up a level. Mathglot (talk) 04:05, 13 August 2024 (UTC)
- This is helpful. Thanks! — The Earwig (talk) 13:22, 13 August 2024 (UTC)
- Oh, that's very thought-provoking, thanks! You could start with a stop-word list, and eliminate those, and there may be lists of bigrams containing stop words. I searched /most common bi-grams with stop words in English/ and repeatedly ran into "tidytext in R", and "NLTK in Python"; also articles like 1, 2. As far as how to down-weigh and up-weigh, TF-IDF is one very standard solution, which works better on a larger corpus or bag of words, which you could accumulate yourself, by just dumping all of the words of each document you come across into a list, and counting later, maybe once a week or month, and recalculating the frequencies, but my understanding is that there is a budget available for Earwig (for the Google API) and it's likely that there is a term frequency list out there somewhere for English, and we could just buy it. (You would only have to do that once in theory, although language does evolve, so maybe once a year?) Then you wouldn't have to build your own bag of words. Your experiment looks really interesting, and I wonder if any of these other ideas would kick it up a level. Mathglot (talk) 04:05, 13 August 2024 (UTC)
- Thanks for that. Even if it could see it, I wonder if it would come up with any kind of rating, due to the paraphrase? Not sure what kind of test bed you use, but if you could copy the MasterClass page and save it offline locally (post-js, or just scraping the rendered page manually and saving it) and run Earwig against that file, I'd be interested to see what it would come up with. And if you use shingling and it's parametrizable, whether the rating would change if you reduced the shingle size. Mathglot (talk) 01:14, 7 August 2024 (UTC)
- That's exactly it, Mathglot. The website loads its content through JavaScript so it's not available to the tool. There isn't an easy workaround for this, but there are some options I could try further in the future. Since the content doesn't show up in the comparison view as part of the source, my hope is that people will figure out what's going on, as you were able to. — The Earwig (talk) 00:23, 7 August 2024 (UTC)
The Signpost: 14 August 2024
- In the media: Portland pol profile paid for from public purse
- In focus: Twitter marks the spot
- News and notes: Another Wikimania has concluded.
- Special report: Nano or just nothing: Will nano go nuclear?
- Opinion: HouseBlaster's RfA debriefing
- Traffic report: Ball games, movies, elections, but nothing really weird
- Humour: I'm proud to be a template
EarwigBot might be down
Hello friend. EarwigBot hasn't edited since August 17. I believe it has some daily tasks such as Wikipedia:Bots/Requests for approval/EarwigBot 3, so this is abnormal, right? It might need a nudge :) –Novem Linguae (talk) 12:50, 21 August 2024 (UTC)
- Thanks for the ping! The task was active but had gotten stuck somehow. I've restarted it. — The Earwig (talk) 13:39, 21 August 2024 (UTC)
- Thanks! I went ahead and boldly signed you up for a bot to alert you if it goes down again. Diff. If undesired, feel free to revert. –Novem Linguae (talk) 18:23, 21 August 2024 (UTC)
- Much obliged. — The Earwig (talk) 07:18, 22 August 2024 (UTC)
- Thanks! I went ahead and boldly signed you up for a bot to alert you if it goes down again. Diff. If undesired, feel free to revert. –Novem Linguae (talk) 18:23, 21 August 2024 (UTC)
Administrators' newsletter – September 2024
News and updates for administrators from the past month (August 2024).
- Following an RfC, there is a new criterion for speedy deletion: C4, which
applies to unused maintenance categories, such as empty dated maintenance categories for dates in the past
. - A request for comment is open to discuss whether Notability (species) should be adopted as a subject-specific notability guideline.
- Following a motion, remedies 5.1 and 5.2 of World War II and the history of Jews in Poland (the topic and interaction bans on My very best wishes, respectively) were repealed.
- Remedy 3C of the German war effort case ("Cinderella157 German history topic ban") was suspended for a period of six months.
- The arbitration case Historical Elections is currently open. Proposed decision is expected by 3 September 2024 for this case.
- Editors can now enter into good article review circles, an alternative for informal quid pro quo arrangements, to have a GAN reviewed in return for reviewing a different editor's nomination.
- A New Pages Patrol backlog drive is happening in September 2024 to reduce the number of unreviewed articles and redirects in the new pages feed. Currently, there is a backlog of over 13,900 articles and 26,200 redirects awaiting review. Sign up here to participate!
The Signpost: 4 September 2024
- News and notes: WikiCup enters final round, MCDC wraps up activities, 17-year-old hoax article unmasked
- In the media: AI is not playing games anymore. Is Wikipedia ready?
- News from the WMF: Meet the 12 candidates running in the WMF Board of Trustees election
- Wikimania: A month after Wikimania 2024
- Serendipity: What it's like to be Wikimedian of the Year
- Traffic report: After the gold rush
The Signpost: 26 September 2024
- In the media: Courts order Wikipedia to give up names of editors, legal strain anticipated from "online safety laws"
- Community view: Indian courts order Wikipedia to take down name of crime victim, editors strive towards consensus
- Serendipity: A Wikipedian at the 2024 Paralympics
- Opinion: asilvering's RfA debriefing
- News and notes: Are you ready for admin elections?
- Recent research: Article-writing AI is less "prone to reasoning errors (or hallucinations)" than human Wikipedia editors
- Traffic report: Jump in the line, rock your body in time
Administrators' newsletter – October 2024
News and updates for administrators from the past month (September 2024).
- Administrator elections are a proposed new process for selecting administrators, offering an alternative to requests for adminship (RfA). The first trial election will take place in October 2024, with candidate sign-up from October 8 to 14, a discussion phase from October 22 to 24, and SecurePoll voting from October 25 to 31. For questions or to help out, please visit the talk page at Wikipedia talk:Administrator elections.
- Following a discussion, the speedy deletion reason "File pages without a corresponding file" has been moved from criterion G8 to F2. This does not change what can be speedily deleted.
- A request for comment is open to discuss whether there is a consensus to have an administrator recall process.
- The arbitration case Historical elections has been closed.
- An arbitration case regarding Backlash to diversity and inclusion has been opened.
- Editors are invited to nominate themselves to serve on the 2024 Arbitration Committee Electoral Commission until 23:59 October 8, 2024 (UTC).
- If you are interested in stopping spammers, please put MediaWiki talk:Spam-whitelist and MediaWiki talk:Spam-blacklist on your watchlist, and help out when you can.
Copyright violation tool
Hello, The Earwig,
I regularly used this tool you created, mostly when patrolling drafts or CSD-tagged articles, I'd probably used it 3 or 4 times a day. When I used it too much, I'd get a message that I was over my limit of how often I could use it. At least that's how I thought things worked. Now, I get this message every time I try to see whether a page is a copyright violation, I have not gotten a successful response to a query in many, many weeks now. So, I'm wondering is this "limit" actually for all users on this platform and not tied to individual editors? Because something odd is going on and maybe new page patrollers or AFC reviewers are using it for every article they review if I can not just get one or two reports on suspicious articles or drafts I've come across. I know with AI, there are ways users can get around copyright restrictions but I still found the tool helpful.
Do you have any idea why it is suddenly no longer available to generate reports? Can you tell me the time of the day when it "resets" so that maybe I could make inquries then? Or is there any possibility of raising this limit of reports generated? I mean, I'm glad it's become so popular but it has also become unavailable for use for those of us who just want to make a few queries a day. Thank you. Liz Read! Talk! 22:31, 19 July 2024 (UTC)
- Hi Liz, truly sorry about the ongoing issues. I'm aware and working on it (see some of the threads above you), with the time I have available. I thought things has improved with the overall performance improvement last month, but it has really just made this particular problem of running out of the search quota much worse. Anyway, I am working on it now.
- To answer your questions: yes the quota is shared by all users, and we cannot easily raise it. It's a hard limit enforced by Google that I cannot bypass without some special arrangement. It resets I think around midnight Pacific Time, i.e. Google's time zone.
- I think the issue is some bots/automated traffic making too many queries. In the past I have been able to block them or ask them to slow down, but that approach has become less effective lately. So, I will be adding authentication to the tool to make sure only logged in users can use it and I can more accurately identify who is overusing it. I expect to finish that work this weekend and I am hopeful that will solve the issue. If it doesn't, there are other things I can try. — The Earwig (talk) 00:43, 20 July 2024 (UTC)
- Update: I am still working on this, but have made progress. — The Earwig (talk) 05:14, 22 July 2024 (UTC)
- FYI, I've also run into this issue the last couple of days. I'm assuming you're still working on it, or that life has gotten in the way of you fixing the issue. I dream of horses (Hoofprints) (Neigh at me) 21:20, 30 July 2024 (UTC)
- Yes, it's still my current focus with the free time I have. — The Earwig (talk) 00:21, 31 July 2024 (UTC)
- Just circling back to see how you responded to my query last month. Still have not successfully submitted a query and gotten a report in several months now. I realize that we are all volunteers so I don't have high expectations of when this issue might be "fixed" as we all have outside lives.
- But I didn't realize though that regular editors were competing with bots, that's a battle individual editors can never win so please block those bots, if possible! I don't even see how a bot would be able to handle a copyright violation report and interpret it appropriately. Liz Read! Talk! 03:06, 8 August 2024 (UTC)
- To second what @Liz said above, I just tried to run the copyvio tool on a promotional draft, and got the error again. Any progress to report on?
- Also, Liz, I think authentication has been added so we aren't competing against bots, at least not as much, per
So, I will be adding authentication to the tool to make sure only logged in users can use it and I can more accurately identify who is overusing it.
I dream of horses (Hoofprints) (Neigh at me) 23:48, 25 August 2024 (UTC) - Is there anything other people can do to help with getting the copyvio tool up, or is this something you're going to need to do on your own? I dream of horses (Hoofprints) (Neigh at me) 03:09, 25 September 2024 (UTC)
- Hey Liz and I dream of horses. With substantial help from Chlod, we've released a change to require logging in to use the search engine option in the tool. (It uses OAuth, and it should redirect you automatically when running a new check.) This is still new, but it looks like this has eased our usage enough that the tool should not run out of quota so often. — The Earwig (talk) 15:19, 5 October 2024 (UTC)
- Yes, it's still my current focus with the free time I have. — The Earwig (talk) 00:21, 31 July 2024 (UTC)
- FYI, I've also run into this issue the last couple of days. I'm assuming you're still working on it, or that life has gotten in the way of you fixing the issue. I dream of horses (Hoofprints) (Neigh at me) 21:20, 30 July 2024 (UTC)
- Update: I am still working on this, but have made progress. — The Earwig (talk) 05:14, 22 July 2024 (UTC)
Copyvio Detector and Google
Hi,
(Sorry if this is the wrong forum for asking, but if so, perhaps you could point me in the right direction?)
I use the Copyvio Detector (great tool, BTW!) in checking new AfC drafts, at least a dozen times most days. I sometimes get an error message saying that the detector has exceeded its maximum allowed Google searches. This issue has always been there, occasionally, but in the last week or two it has occurred daily. When I start reviewing, around 6am or so UK time, the first few reviews always hit this problem. Then, maybe 8am (?) the daily quota probably gets reset, or something else happens, because from then onwards everything is fine until the next morning.
So I was thinking, I don't suppose there's much we can do to increase the quota (?), but would it be possible to add another search engine as a fallback option? Either so that when the user gets that error message, they could manually tick a box to use Bing (say) instead; or maybe the Detector could automatically switch to using the alternative if Google has failed.
I realise this may not be possible, either for technical or policy reasons, but thought I'd ask at least. Cheers, -- DoubleGrazing (talk) 09:35, 8 May 2024 (UTC)
- Hi DoubleGrazing, using Bing or some other engine as a fallback is definitely something we’ve discussed—I hadn’t realized the issue had gotten this bad recently. The main issue here is these services usually cost money, and while the WMF pays for our Google access right now, I don’t know if I will be able to ask for access to additional search engines. First, I can take a deeper look into whether anyone is overusing their share of the tool’s resources; we might need to block/limit them. (Our plan with Google allows about 1500 articles to be checked per day.) — The Earwig alt (talk) 16:11, 8 May 2024 (UTC)
- Okay, thanks for shedding some more light on this; needless to say, I knew nothing about how these things work.
- I guess we at AfC are taking up quite a chunk of that quota, given that we see what are by definition new drafts usually by new users. I for one run the check probably at least on ⅓ of the drafts I review (and if you think that makes me an overuser, feel absolutely free to point this out, of course!). Even at NPP we deal with relatively more experienced users, so there's that much less of a need to check for CV.
- It may be that I see the problem worse than some others, mind, because of my weird early-morning AfC habit, combined with the time zone I'm in. -- DoubleGrazing (talk) 17:05, 8 May 2024 (UTC)
- Hi again,
- Quick update on this, the problem (of the copyvio detector running out of Google quota) has lately become worse. Unlike before, when it would only manifest in the early morning UK time, and usually be fine after 8am UK / 0700 UTC, it's now happening also in the afternoon. This is relatively new, maybe in the past week or two, so I've not yet have a good feel for what time it happens exactly (in case that matters); I would have said late afternoon, but eg. today it started already around 1pm UK / 1200 UTC.
- Best, -- DoubleGrazing (talk) 12:35, 4 July 2024 (UTC)
- Sorry taking a while to get back, but I'm actively working on an improvement for this now. — The Earwig (talk) 06:43, 19 July 2024 (UTC)
- Great to hear, thanks. :) DoubleGrazing (talk) 10:35, 19 July 2024 (UTC)
- Do we really still have the same quota we've had for months? (or years?) As in, are we sure it hasn't been reduced? I haven't had a copyvio check go through with the search engine box checked in what seems like weeks. I can't imagine there are suddenly so many new page patrollers that it's making that much of a difference, but... -- asilvering (talk) 22:45, 23 August 2024 (UTC)
- Oh. But what has really taken off in the last several months is AI. Nevermind. I think I've answered my own question. ugh. -- asilvering (talk) 22:47, 23 August 2024 (UTC)
- I think we were discussing this on WP:VPWMF a few weeks ago, and the idea of making everyone log in using OAUTH came up. If bots are indeed the problem, I think this is a good idea to try. –Novem Linguae (talk) 23:06, 23 August 2024 (UTC)
- Yes, we're actively working on this. — The Earwig (talk) 00:09, 24 August 2024 (UTC)
- Thanks, and good luck! -- asilvering (talk) 00:26, 24 August 2024 (UTC)
- Hey DoubleGrazing and asilvering. With substantial help from Chlod, we've released a change to require logging in to use the search engine option in the tool. (It uses OAuth, and it should redirect you automatically when running a new check.) This is still new, but it looks like this has eased our usage enough that the tool should not run out of quota so often. — The Earwig (talk) 15:20, 5 October 2024 (UTC)
- Brilliant, thanks so much. -- asilvering (talk) 17:47, 5 October 2024 (UTC)
- Sounds good, thanks! Already tried it and seems to work well. Glad to hear it's taking some of the pressure off the quota. Cheers, -- DoubleGrazing (talk) 19:07, 5 October 2024 (UTC)
- Hey DoubleGrazing and asilvering. With substantial help from Chlod, we've released a change to require logging in to use the search engine option in the tool. (It uses OAuth, and it should redirect you automatically when running a new check.) This is still new, but it looks like this has eased our usage enough that the tool should not run out of quota so often. — The Earwig (talk) 15:20, 5 October 2024 (UTC)
- Thanks, and good luck! -- asilvering (talk) 00:26, 24 August 2024 (UTC)
- Yes, we're actively working on this. — The Earwig (talk) 00:09, 24 August 2024 (UTC)
- I think we were discussing this on WP:VPWMF a few weeks ago, and the idea of making everyone log in using OAUTH came up. If bots are indeed the problem, I think this is a good idea to try. –Novem Linguae (talk) 23:06, 23 August 2024 (UTC)
- Oh. But what has really taken off in the last several months is AI. Nevermind. I think I've answered my own question. ugh. -- asilvering (talk) 22:47, 23 August 2024 (UTC)
- Sorry taking a while to get back, but I'm actively working on an improvement for this now. — The Earwig (talk) 06:43, 19 July 2024 (UTC)
Error message on Pablo Escobar
Hello Ben, I have a weird error to report: when I perform a copyvio search on Pablo Escobar I get an error message "Access to copyvios.toolforge.org was denied, You don't have authorisation to view this page. HTTP ERROR 403". It doesn't matter what source url I try to compate it against. However if I try to compare using a specific revision ID of that article, it works okay. It's only occurred on Pablo Escobar (at least so far). Thought you might like to know. — Diannaa (talk) 20:32, 6 October 2024 (UTC)
- Hey Diannaa, we had an unusual issue a while back where some bots/crawlers kept running checks against that page so I disabled it. As you noticed, the revision ID should still work. I’ll check if the bots are still hitting it and re-enable if not. — The Earwig alt (talk) 20:37, 6 October 2024 (UTC)
- Ok cool, no problem though if you have to leave it, as there's a simple workaround - using the revision ID number. — Diannaa (talk) 20:39, 6 October 2024 (UTC)