Wikipedia:Wikipedia Signpost/2012-10-01/Technology report
WMF and the German chapter face up to Toolserver uncertainty
The Toolserver is an external service hosting the hundreds of webpages and scripts (collectively known as "tools") that assist Wikimedia communities in dozens of mostly menial tasks. Few people think that it has been operating well recently; the problems, which include high database replication lag and periods of total downtime, have caused considerable disruption to the Toolserver's usual functions. Those functions are highly valued by many Wikimedia communities, comprising data reports on the relationships between pages, categories, images, and external links; support for Wiki Loves Monuments, OpenStreetMap and GLAM projects; talk-page archiving services; edit counters; and tools aimed at easing many automated administrative processes such as the account and unblock request processes on several major wikis, as well as cross-wiki abuse detection.
How did the Toolserver start?
It was originally set up in 2005 through the donation by Sun Microsystems of servers to Wikimedia Deutschland (WMDE); so it was almost by coincidence that the German chapter was prompted to take on responsibility for the project. WMDE has since invested heavily in Toolserver infrastructure and its operations—an unusually global role for a chapter, resulting from the particular nature of its revenue streams and German charity laws. There has been in-kind support from the Wikimedia Foundation, mostly in the form of database replication and space in its Amsterdam data centre (valued at US$65k a year), as well as financial grants to expand the hardware (example). Nevertheless, WMDE still makes up the bulk of the general budget of about €100k (US$130k); other chapters, such as Wikimedia UK, have also made smaller contributions.
Wikimedia Labs vs Toolserver: a comedy of errors?
In 2011, the Foundation announced the creation of Wikimedia Labs, a much better funded project that among other things aimed to mimic the Toolserver's functionality by mid-2013. At the same time, Erik Möller, the WMF's director of engineering, announced that the Foundation would no longer be supporting the Toolserver financially, but would continue to provide the same in-kind support as it had done previously.
DaB is the volunteer who administers the Toolserver, and who in the process has acquired unique expertise for running the system. (WMDE has also contracted Marlen Caemmerer to assist in Toolserver administration since October 2011.) DaB told the Signpost that there is a simple reason for the recent degradation in performance: the Toolserver's hardware was not added to in 2012, while more tools have been written and more people are using the tools. The German chapter, he says, has refused his request to extend the hardware infrastructure, giving only a vague commitment of support. But its September forward planning allocates just a fraction of last year's funding.
DaB's comments are a reference to a message from WMDE's CEO, Pavel Richter, who publicly reassured Toolserver developers this week that "Wikimedia Deutschland will make all necessary investments [including new hardware] to keep the Toolserver up and running", but said that the chapter could not ignore the existence and growth of Labs. The movement now faces a complex challenge in working out how to maintain continuous support of the tools, a complexity that is obvious from recent debates (conducted in German) on Meta and the German Wikipedia; moreover, DaB has threatened to resign if WMDE does not allocate funds for hardware purchase.
What the WMF didn't anticipate, and what it now seems as though they're naively ignoring despite the outcry, is that WMDE doesn't have anything like the foundation's eight-figure budget, and apparently the WMF has decided the Toolserver is going to get the short end of the stick when it comes to funding.
Richter's reference to Wikimedia Labs' rapid growth prompted WMF deputy director Erik Möller to express the Foundation's thinking (full version, including rationale) in response to questions raised about the scenario:
Möller accepted that Labs, while well-resourced both in terms of processing capability and storage space, is not yet suitable for Toolserver migrants, lacking (among other things) both database replication and a "Quick Start" mode for users uninterested in Labs' capability for custom server setups. While funding has been put aside for developing such features, Möller would not commit to targeted WMF funding for tool transition, and therein lies the cause of concern among volunteer Toolserver developers: that they could be left facing a switchover deadline without being in a position (lacking either the time, the capabilities, or both) to migrate their tools themselves. They are concerned, then, that only time will tell what will happen to these popular but difficult to migrate tools, to whose continued existence both WMDE and the WMF seem unwilling to commit.It is true that we (the WMF) have ... asked WMDE to work with us in transitioning from Toolserver to Labs. ... Chapters are autonomous organizations, and it's WMDE's call how much / whether it wants to continue to invest in [the Toolserver] ... However, for our part, we will not continue to support the current arrangement ... indefinitely. The timeline we've discussed with Wikimedia Germany is roughly as follows:
- Wind down new account creation on Toolserver by Q2 of 2013 calendar year
- Decommission Toolserver by December 2013
English Wikipedia arbitrator Hersfold was closely involved in writing the "unblock ticket request system" (UTRS), which allows blocked users—including innocent parties caught up in range-blocks—to appeal their blocks. UTRS, created only recently and now officially mandated by the Foundation, is written for the Toolserver, not the Labs environment. Hersfold told the Signpost:
How Labs functions seems to be almost completely different from how the Toolserver functions. We've been told multiple times that Labs will provide lots of "beefy" infrastructure for tools development; ... users will be able to set up virtual machines, or "instances" ... to handle their development, and submit new programming code to a shared location. As one may expect from the Foundation, it's a very collaborative setup. Once inside their instance, a user can more-or-less do whatever they want; install MediaWiki, run a bot, set up web pages for tools, whatever. But most people on the Toolserver don't need "beefy"; we just need a web server that will let us run our tools and access the databases holding information about Wikipedia and the other projects. If someone needed "beefy," they'd have set up their own server ages ago. While Labs is all swishy and fancy (and presumably has less downtime than the Toolserver), it's an environment we're all completely unused to, and perhaps worst of all, it provides no access to the Wikimedia databases, which will prevent most tools and bots from working at all. Supposedly this functionality will be available at some point in the future [editor's note: planned for the first quarter of 2013] ... I don't think either organization fully realizes how much Wikipedia, the Commons, and all the other projects rely on the tools provided by the Toolserver ... [if it goes,] most of the tools and bots we take for granted will suddenly cease to function.
Carl, another developer, agreed, "labs will be useful for some projects, particularly for developing MediaWiki extensions. [But] the current plans seem to be intentionally preventing [other] Toolserver users from simply migrating their tools to Labs; the result will be a great leap backwards when/if the toolserver is taken offline."
The Signpost understands that a further sticking-point is licensing: while recommended to, some tool operators have not released their code under a free license, which is a requirement for using Labs (one operator has stated he legally cannot do so, since he created the tool using his company's computer systems, so the company holds the copyright).
- An earlier version of this article incorrectly asserted that access to the Wikimedia databases would occur in December 2013. It is actually planned for the first quarter of next year.
In brief
Not all fixes may have gone live to WMF sites at the time of writing; some may not be scheduled to go live for several weeks.
- Page Curation launched: Page Curation, a set of tools developed by the WMF to assist in reviewing newly-created articles (see the video tour), was deployed to the English Wikipedia on 20 September. The initial responses have been positive, with only minor bugs affecting performance.
- MediaWiki 1.21wmf1 begins deployment cycle: 1.21wmf1 – the first release to Wikimedia wikis of the 1.21 cycle (that is to say, the first the branching of MediaWiki 1.20 proper last week) – was deployed to its first wikis on October 1 and will be deployed to all wikis by October 10. The release incorporates about 220 changes to the MediaWiki software that powers Wikipedia, comprising 101 "core" changes plus a similar number of patches for WMF-deployed extensions. Among the changes (the product of some 14 days of development time) are fixes for Special:BrokenRedirects (bug #9237), log pages and a range of database handling improvements aimed at improving performance and reducing the possibility of errors occurring. In related news, former bugmeister Mark Hershberger, who has taken on the role of overseeing the MediaWiki 1.20 release, published a list of what needed doing before the software update could go public.
- Wikidata six months on: The Wikidata project to implement centralised interwiki links (phase 1), infoboxes (phase 2) and dynamic lists (phase 3) is now officially six months old, noted the project's identi.ca feed this week. Originally working to time, the need for code review has slowed progress on phase 1 of the project, which is joint-funded by Wikimedia Deutschland and several of its partners. Fortunately, many of that phase's tricky code review issues do seem to have been finally resolved over the past fortnight (wikitech-l mailing list), suggesting that the long awaited trial deployment to the Hungarian Wikipedia could begin shortly. Development work on phase 2 is already underway, with phase 3 not expected until well into next year.
- Assault on email spoofing begins: Concern early last month about a fake @wikimedia.org email address, combined with WMF internal concerns about donation emails, led to action this week. The Foundation will now use the Sender Policy Framework to help ISPs flag up spoof emails ending in the address. Worries about whether all staff and volunteer holders of the addresses can be registered for the system have however led to only low level protection initially being offered; the plan is then to increase the protection offered over time, making it harder and harder for anyone other than the rightful owners of the addresses to use them in the "From" field of their emails (wikitech-l mailing list).
- Issue with multistream bz2 files fixed, aids dump accessibility: A long-standing problem with .bz2 multistream files that long prevented reusers from making full use of them was fixed this week by WMF Data Dumps Engineer Ariel Glenn (blogpost). The multistream files for September are being regenerated with the fix, which allows reusers to more easily jump straight to the content of the article they are interested in without needing to search through the entire, often prohibitively large, files. Together with this change, Glenn generated a simple proof-of-concept tool to demonstrate one way to take advantage of the feature, which is likely to be of most use to researchers interested in tracking the content of individual articles across time.
- Three bots approved: 3 BRFAs were recently approved for use on the English Wikipedia:
- VIAFbot's 1st BRFA, adding {{authority control}} tags to articles linking to viaf.org; the operator is Maximilianklein;
- BattyBot's 12th BRFA, changing {{Cleanup}} to {{Video game cleanup}} on the relevant articles; the operator is GoingBatty;
- DPL bot's 4th BRFA, tagging and removing tags from articles based on whether they should have the {{incoming links}} template; the operator is JaGa;
- At the time of writing, 14 BRFAs are active. As usual, community input is encouraged.
Discuss this story
In general, as I noted on toolserver-l, I agree with Carl that we should find ways to support projects like the WP 1.0 assessment DB in Labs. The feature set of the Labs DB replication isn't final, and it's likely going to be iterative.
We'll host an IRC meeting soon that we'll broadcast to toolserver-l@ as well to allow for more discussion of requirements for tool labs (the phase of the labs project dedicated to supporting tools development) and to answer questions about how folks can use Labs today. In the meantime, there are usually folks hanging out on #wikimedia-labs on irc.freenode.net as well in case you have immediate questions.--Eloquence* 20:18, 2 October 2012 (UTC)[reply]
I've written several tools that aid maintenance work on Wikipedia, most notably in identifying uncategorized articles and extensive work with disambiguation. If I lose (1) Wikipedia database replication or (2) the ability to join my user database to the replicated database, all of that work is lost. All of it. I know that maintenance work is not glamorous or interesting to most Wikipedians, but it is nevertheless important. I hope that those who are making the decisions about keeping Toolserver viable during the interim and how to set up Wikimedia Labs take into account the role Toolserver plays in maintaining Wikipedia infrastructure. --JaGatalk 22:37, 2 October 2012 (UTC)[reply]
The thing which i have little bit hard to understand is that why the Toolserver need to be shutted down at all. The reasoning behind why to create the Labs is pretty solid, but answer for the question why The Labs and the Toolserver cant coexists is not. The key question in this is seems to be the SQL replication to the outside world. If WMF takes it away then there is no future for anything like Toolserver at all. Period. Alternative vision could be that in the future besides the Labs there could be multiple instances of independent [tool]servers working with replicated data. The current TS could be used as prototype for this. Reasoning for independent systems would be that even when the Labs system is fully operational it can't ever be used for everything. One limiting thing is licence policy, one cannot use the closed source in the labs, second is that even the Labs horsepower is considerable it is not unlimited and suitable for everything. One can prefer to use specialized computing for him/her own needs. --Zache (talk) 08:13, 3 October 2012 (UTC)[reply]
I have direct, personal experience of the utility of the Toolserver for creating content on the projects (Wikisource, in particular). Whatever the engineering considerations, I'm certainly concerned that the approach taken doesn't seem driven by free content. Does seem "more of the same" with the "cool" stuff. I.e. the cart gets put before the horse. Charles Matthews (talk) 16:30, 3 October 2012 (UTC)[reply]