Major news in July include:
a recap of how the Operations team collaborated with the RIPE NCC to measure the delivery of Wikimedia sites to users in Asia and elsewhere;
an analysis of the impact of the San Francisco data center on the speed of Wikimedia sites;
the launch of the new native Wikipedia app for iOS;
a first look at the content translation tool.
Note: We’re also providing a shorter and translatable version of this report.
Engineering metrics in July:
164 unique committers contributed patchsets of code to MediaWiki.
The total number of unresolved commits went from around 1575 to about 1642.
About 31 shell requests were processed.
Work with us
Editor retention: Editing tools
Engineering Community Team
Work with us
Are you looking to work for Wikimedia? We have a lot of hiring coming up, and we really love talking to active community members about these roles.
VP of Engineering
Software Engineer – Front-end (VisualEditor)
Software Engineer – Services
Software Engineer – Front-end
Software Engineer – Maps & Geo
Software Engineer – Mobile – iOS
Software Engineer – Full Stack
Product Manager – Language Engineering
Operations Security Engineer
UX Senior Designer
UX Senior Design Researcher
UX User Research Recruiter
Project Coordinator – Engineering
Mobile Partnerships Regional Manager
Program Evaluation Internship
Arthur Richards is now Team Practices Manager (announcement).
Kristen Lans joined the Team Practices Group as Scrum Master (announcement).
Joel Sahleen joined the Language Engineering team as Software Engineer (announcement).
Dallas data center
Throughout July, the cabling work of all racked servers and other equipment was nearly completed. We’re still awaiting the installation of the first connectivity to the rest of our US network in early August before we can begin installation of servers and services.
San Francisco data center
Due to a necessary upgrade to power & cooling infrastructure in our San Francisco data center (which we call ulsfo), our racks have been migrated to a new floor within the same building on July 9. The move completed in a very smooth fashion without user impact, and the site was brought back online serving all user traffic again in less than 24 hours.
Through the help of volunteer work and research, our staff enabled Perfect Forward Secrecy on our SSL infrastructure, significantly increasing the security of encrypted user traffic.
Labs metrics in July:
Number of projects: 173
Number of instances: 464
Amount of RAM in use (in MBs): 1,933,824
Amount of allocated storage (in GBs): 20,925
Number of virtual CPUs in use: 949
Number of users: 3,500
We’ve made several minor updates to Wikitech: we added OAuth support, fixed a few user interface issues, and purged the obsolete ‘local-*’ terminology for service groups.
OPW Intern Dinu Sandaru has set forms for structured project documentation. This should will help match new volunteers with existing projects, and will make communication with project administrators more straightforward.
Sean Pringle is in the process of updating the Tool Labs replica databases to MariaDB version 10.0. This may reduce replag, and should improve performance and reliability.
We’re setting up new storage hardware for the project dumps. This will resolve our ongoing problems with full drives and out-of-date dumps.
Editor retention: Editing tools
In July, the team working on VisualEditor converged the design for mobile and desktop, made it possible to see and edit HTML comments, improved access to re-using citations, and fixed over 120 bugs and tickets.
The new design, with controls focussed at the top of each window in consistent positions, was made possible due to the significant progress made in cross-platform support in the UI library, which now provides responsively-sized windows that can work on desktop, tablet and phone with the same code. HTML comments are occasionally used on a few articles to alert editors to contentious or problematic issues without disrupting articles as they are read, so making them prominently visible avoids editors accidentally stepping over expected limits. Re-using citations is now provided with its simple dialog available in the toolbar so that it is easier for users to find.
Other improvements include an array of performance fixes targeted at helping mobile users especially, fixing a number of minor instances where VisualEditor would corrupt the page, and installing better monitoring of corruptions if they occur, and better support for right-to-left languages, displaying icons with the right orientation based on context.
The mobile version of VisualEditor, currently available for beta testers, moved towards stable release, fixing a number of bugs and editing issues and improving loading performance. Our work to support languages made some significant gains, nearing the completion of a major task to support IME users, and the work to support Internet Explorer uncovered some more issues as well as fixes. The deployed version of the code was updated five times in the regular release cycle (1.24-wmf12, 1.24-wmf13, 1.24-wmf14, 1.24-wmf15 and 1.24-wmf16).
In wider news, the team expanded its scope to cover all MediaWiki editing tools as well, as the new Editing Team (covered below).
In July, the newly re-named and re-scoped Editing Team was formed from the VisualEditor Team. We are responsible for extending and improving the editing tools used at Wikimedia – primarily VisualEditor and maintenance for WikiEditor. We exist to support new and existing editors alike; our current work is mostly on desktop, and we are working with Mobile to take responsibility for all editing across desktop, tablet and phone platforms, spanning approximately 50 different areas of MediaWiki and extensions related to editing. We will continue to report progress on VisualEditor separately.
The biggest Editing change this month was in the Cite extension (for footnotes) – this now automatically shows a references list at the end of the page if you forget to put in a <references /> tag, instead of displaying an ugly error message. The Math extension (for formulæ) was improved with more rigorous error handling and LaTeX formula checking, as part of the long-term volunteer-led work to introduce MathML-based display and editing. The TemplateData GUI editor was deployed to a further six wikis – the English, French, Italian, Russian, Finnish and Dutch Wikipedias.
A lot of work was done on libraries and infrastructure for the Editing Team and others. The OOjs UI library was extensively modified to bring in a new window management system for comprehensive combined desktop, tablet and phone support, as well as other updates to improve Internet Explorer compatibility and accessibility of controls. In the next few months the team will continue working on OOUI to support other teams’ needs and implement a consistent look-and-feel in collaboration with the Design team. The OOjs library was updated to fix a minor bug, with a new version (v1.0.11) released and pushed downstream into MediaWiki, VisualEditor and OOjs UI. The ResourceLoader framework was extended to allow skins to set the “skinStyles” property themselves, rather than rely on faux dependencies, as part of wider efforts led jointly by a volunteer and a team member to improve MediaWiki’s skin support.
In July, the Parsoid team continued with ongoing bug fixes and bi-weekly deployments.
With an eye towards supporting Parsoid-driven page views, the Parsoid team strategized on addressing Cite extension rendering differences that arise from site-messages based customizations and is considering a pure CSS-based solution for addressing the common use cases. We also finished work developing the test setup for doing mass visual diff tests between PHP parser rendering and Parsoid rendering. It was tested locally and we started preparations for deploying that on our test servers. This will go live end-July or early-August.
The GSoC 2014 LintTrap project continued to make good progress. We had productive conversations with Project WikiCheck about integrating LintTrap with WikiCheck in a couple different ways. We hope to develop this further over the coming months.
Overall, this was also a month of reduced activity with Gabriel now officially full time in the Services team and Scott focused on the PDF service deployment that went live a couple days ago. The full team is also spending a week at a off-site meeting working and spending time together in person prior to Wikimania in London.
Services and REST API
The brand new Services group (currently Matt Walker and Gabriel Wicke) started July with two main projects:
PDF render service deployment
Design and prototyping work on the storage service and REST API
The PDF render service is now deployed in production, and can be selected as a render backend in Special:Book. The renderer does not work perfectly on all pages yet, but the hope is that this will soon be fixed in collaboration with the other primary author of this service, C. Scott Ananian.
Prototyping work on the storage service and REST API is progressing well. The storage service now has early support for bucket creation and multiple bucket types. We decided to configure the storage service as a backend for the REST API server. This means that all requests will be sent to the REST API, which will then route them to the appropriate storage service without network overhead. This design lets us keep the storage service buckets very general by adding entry point specific logic in front-end handlers. The interface is still well-defined in terms of HTTP requests, so it remains straightforward to run the storage service as a separate process. We refined the bucket design to allow us to add features very similar to Amazon DynamoDB in a future iteration. There is also an early design for light-weight HTTP transaction support.
Matt Walker is sadly leaving the Foundation by the end of this month to follow his passion of building flying cars. This means that we currently have three positions open in the service group, which we hope to start filling soon.
In July, the Flow team built the ability for users to subscribe to individual Flow discussions, instead of following an entire page of conversations. Subscribing to an individual thread is automatic for users who create or reply to the thread, and users can choose to subscribe (or unsubscribe) by clicking a star icon in the conversation’s header box. Users who are subscribed to a thread receive notifications about any replies or activity in that thread. To support the new subscription/notification system, the team created a new namespace, Topic, which is the new “permalink” URL for discussion threads; when a user clicks on a notification, the target link will be the Topic page, with the new messages highlighted with a color. The team is currently building a new read/unread state for Flow notifications, to help users keep track of the active discussion topics that they’re subscribed to.
In July, the Growth team completed its second round of A/B testing of signup invitations for anonymous editors on English Wikipedia, including data analysis. The team also built the first API and interface prototypes for task recommendations. This new system, first aimed at brand new editors, makes suggestions based on a user’s previous edits.
Following on from the successful launch to Android, the Mobile Apps team released the new native Wikipedia app to iOS on July 31. The app is the iOS counterpart to the Android app, with many of the same features such as editing, saving pages for offline reading, and browsing history. The iOS app also contains an onboarding screen that is shown the first time the app is launched, asking users to sign up, a feature which was also launched on Android this month (see below).
On Android this month we released to production accessibility and styling features which were requested by our users, such as a night mode for reading in the dark and a font size selector. We also released an onboarding screen that asks users to sign up.
Our plan for next month is to get user feedback from Wikimania, wrap up our styling fixes, and begin work on an onboarding screen the first time that someone taps edit.
Mobile web projects
This month, the team continued to focus on wrapping up the collaboration with the Editing team to bring VisualEditor to tablet users on the mobile site. We also began working to design and prototype our first new Wikidata contribution stream, which we will build and test with users on the beta site in the coming month.
During the last month, the team worked on software architecture features that allow for expansion of the Wikipedia Zero footprint on partner networks and that get users to content faster with support for lowered cache fragmentation on Varnish caches. Whereas the previous system supported one-size-fits-all configuration for heterogeneous partner networks, inhibiting some zero-rated access, the new system supports multiple configurations for disparate IP addresses and connection profiles per operator. Additionally, lightweight script and GIF-ified Wikipedia Zero banner support has been added and is being tested; in time this should drastically reduce Varnish cache fragmentation, making pages be served faster and reducing Varnish server load. A faster landing page was introduced for “zerodot” (zero.wikipedia.org, legacy text-only experience) landing pages when operators have multiple popular languages in their geography. Work on compression proxy traffic analysis for header enrichment conformance with the official Wikipedia Zero configurations was also performed after more diagnostic logging code was added to the system. Finally, watchlist thumbnails, although low bandwidth, were removed from the zerodot user experience, as was the higher bandwidth MediaViewer feature for zerodot; mdot will have these features, though.
In side project work, the team spent time on API continuation queries, Android IP editing notices, Amazon Kindle and other non-Google Play distribution, and Google Play reviews (now that the Android launch dust has settled, mobile apps product management will be triaging the reviews). In partnerships work, the team met with Mozilla to talk about future plans for the Firefox OS HTML5 app (e.g., repurposing the existing mobile website, but without any feature reduction) and how Wikimedia search might be further integrated into Firefox OS, and also spoke with Canonical about how Wikipedia might be better integrated into the forthcoming Ubuntu Phone OS.
Routine pre- and post-launch configuration changes were made to support operator zero-rating, with routine technical assistance provided to operators and the partner management team to help add zero-rating and address anomalies. The team also continued its search for a third Partners engineering teammate.
Wikipedia Zero (partnerships)
We served an estimated 68 million free page views in July through Wikipedia Zero. We continue to bring new partners into the program, though none launched in July. Adele Vrana met with prospective partners and local Wikimedians in Brazil. We published our operating principles to increase transparency.
CLDR extension was updated to use CLDR 25; this work was mostly done by Ryan Kaldari. The team made various internationalization fixes in core, MobileFrontend, Wikipedia Android app, Flow, VisualEditor and other features. In the Translate extension, Niklas Laxström fixed ElasticSearchTTMServer to provide translation memory suggestions longer than one word; and improved translation memory suggestions for translation units containing variables (bug 67921).
Language Engineering Communications and Outreach
We announced the initial availability of the Content translation tool with limited feature support. We are focusing on supporting Spanish to Catalan translations for this initial release. You can read a report on the feedback received since deployment.
An initial version was released on Beta Labs; it supports machine translation between Spanish and Catalan. The machine translation API leverages open source machine translation with Apertium. The tool supports experimental template adaptation between languages. Numerous bug fixes were made based on testing and user feedback. We worked on matching the Apertium version to the cluster, and planning for the next round of development has started.
The Beta cluster is running HHVM. The latest MediaWiki-Vagrant and Labs-vagrant use HHVM by default.
Admin tools development
Most admin tools resources are currently diverted towards SUL finalisation, which will greatly help in reducing the admin tools backlog. July saw the deployment of the global rename tool (bug 14862), and core fixes including the creation of the “viewsuppressed” userright (bug 20476).
Our deployment of CirrusSearch to larger wikis as the primary search back-end turned out to be too ambitious. After encountering performance issues, we rolled back this change. We are now addressing the root of the problem, by getting more servers (nearly doubling the cluster size) and putting together more optimizations to the portion of Cirrus that fell over (working set). If everything goes as planned, it’ll be reduced by about 80%, by reducing indexing performance in return of search performance. These optimizations will slightly change result relevance; please let us know if you notice any issues.
Most work was spent on SUL Finalization tasks. Phpunit and browser tests were added for CentralAuth, global rename was deployed, and lots of small fixes were made to CentralAuth to clean up user accounts in preparation for finalization.
In July, the SUL finalisation team began work on completing the necessary feature work to support the SUL finalisation.
To help users with local-only accounts that are going to be forcibly renamed due to the SUL finalisation, the team is working on a form that lets those users request a rename. These requests will be forwarded onto the stewards to handle. The SUL team is currently in consultation with the stewards about how they would like this tool to work. When this consultation is wrapped up, the team will begin design and implementation.
To help users get globally renamed without having to request renames on potentially hundreds of wikis, the team implemented and deployed GlobalRenameUser, a tool which renames users globally. As the tool is designed to work post-finalisation, it only performs renames where the current name is global, and the requested name is totally untaken (no global account and no local accounts exist with that name).
To help users who get renamed by the finalisation and, despite our best efforts to reach out to them, did not get the chance to request a rename before the finalisation, the team is working on a feature to let users log in with their old credentials. The feature will display an interstitial when they log in, informing them that they logged in with old credentials and that they need to use new ones. We are also considering a persistent banner for those users, so that they definitely know they need to use their new credentials. An early beta version of this feature is complete, and now needs design and product refinements to be completed.
To help users who get renamed by the finalisation and, as a result, have several accounts that were previously local-only turned into separate global accounts, the team is working on a tool to merge global accounts. We chose to merge accounts as it was the easiest way to satisfy the use case without causing further local-global account clashes that would cause us to have to perform a second finalisation. The tool is in its preliminary stages.
The team also globalised some accounts that were not globalised but had no clashes. These accounts were either created in this local-only form due to bugs, or are accounts from before CentralAuth was deployed where the user never globalised. As these accounts had no clashes, there were no repercussions to globalising these accounts, so we did this immediately.
At present, no date has been chosen for the finalisation. The team plans to have the necessary engineering work done by the end of the quarter (end of September 2014), and have a date chosen by then.
Next month the team plans to continue work on these features.
Security auditing and response
MediaWiki 1.23.2 was released, fixing 3 security bugs. Security reviews were made for BounceHandler and Petition extensions, and the password API was merged.
This month, the Release and QA Team became the Release Engineering Team, mostly reflecting the transition of this team from being made up of members of other distinct teams to that of a coherent self-contained (mostly) team. This will, hopefully, allow better coordination of “Release” and “QA” things (broadly spreaking).
A lot of progress was made on making Phabricator suitable as a task/bug tracking system for Wikimedia projects. You can see the work to be sorted and completed at this workboard.
The Beta Cluster now runs with HHVM, bringing us much closer to full HHVM deployment. In addition, the Language Team deployed the new Content translation system on the Beta Cluster with the help of the Release Engineering team.
The second round of public RFP for third-party MediaWiki release management was conducted and concluded.
We now no longer use the third-party Cloudbees service for any of our Jenkins jobs and run all jobs locally. This will enable us to better diagnose issues with our build process, especially as it pertains to our browser tests (which still mostly run on SauceLabs).
This month, the QA team finished two significant achievements: after porting all the remaining browser tests from the browsertests repository to the repositories of the extensions being tested in June, as well as porting a significant set of tests to MediaWiki core itself, we completely retired the Jenkins instance running on a third-party host in favor of running test builds from the Wikimedia Jenkins instance, and we deleted the /qa/browsertests code repository. These moves are the result of more than two years of work. In addition, we have added more functions to the API wrapper used by browser tests, improved support for testing in Vagrant virtual machines, added new Jenkins builds for extensions, and improved the function of the beta labs test environments by preventing database locks and stopping users from being logged out by accident.
The browser tests are now all integrated with builds on the Wikimedia Jenkins host. We added browser tests for MediaWiki core that will validate the correctness of a MediaWiki installation regardless of language, or of what extensions may or may not exist on the wiki, so that the tests may be packaged with the distribution of MediaWiki itself and used on arbitrary wikis. We saw a lot of browser test activity for Flow development, and we are preparing to support even more extensions and features in the very near future.
Media Viewer’s new ‘minimal design’.
In July, the multimedia team reviewed more feedback about Media Viewer, from three separate Requests for Comments on the English and German Wikipedias, as well as on Wikimedia Commons. Based on this community feedback, the team worked to make the tool more useful for readers, while addressing editor concerns. We are now considering a new ‘minimal design’, which would include: a much more visible link to the File: page; an even easier way to disable the tool; a caption or description right below the image; removing additional metadata below the image, directing users to the File: page instead.
As described in our improvements plan, these new features are being prototyped and will be carefully tested with target users in August, so we can validate their effectiveness before developing and deploying them in September. You can see some of our thinking in this presentation.
This month, we continued to work on the Structured Data project with the Wikidata team and many community members, to implement machine-readable data on Wikimedia Commons. We prepared to host a range on online and in-person discussions to plan this project with our communities, and aim to develop our first experiments in October, based on their recommendations. We also continued a major code refactoring for the UploadWizard, as well as fixed a number of bugs for some of our other multimedia tools.
Last but not least, we prepared seven different multimedia roundtables and presentations for Wikimania 2014, which we will report on in more depth in August. For now, you can keep up with our work by joining the multimedia mailing list.
Engineering Community Team
At the Pywikibot bugdays, 189 reports received updates. Technically, Jan enabled invalidating the CSS cache and strict transport security, Matanya updated Bugzilla’s cipher_suite and cleaned up a template, and Daniel deleted an unused config file. Tyler and Andre added requested components to Bugzilla. Planning of an exposed “easy bug of the week” continued, summarized on a wikipage.
Phabricator’s “Legalpad” application (a tool to manage trusted users) was set up on a separate server. This instance provides WMF Single-User Login authentication.
Mukunda implemented restricting access to tasks in a certain project which can be tested on fab.wmflabs.org. As a followup, he investigated enforcing security policy also on files and attachments and replacing the IRC bots by Phab’s chatbot. Chase worked on initial migration code to import data from Bugzilla reports into Phabricator tasks (and ran into missing API code in Phabricator), investigated configuring Exim for mail, set up a data backup system for Phabricator, and upgraded the dedicated Phabricator server to Ubuntu Trusty. Quim started documenting Phabricator.
Andre helped making decisions on defining field values and how to handle certain Bugzilla fields in the import script and sent a summary email to wikitech-l about the Phabricator migration status.
All Google Summer of Code and FOSS Outreach Program for Women projects continued their development toward a successful end. For details, check the reports:
Tools for mass migration of legacy translated wiki content
Wikidata annotation tool
Email bounce handling to MediaWiki with VERP
Google Books, Internet Archive, Commons upload cycle
UniversalLanguageSelector fonts for Chinese wikis
MassMessage page input list improvements
Book management in Wikibooks/Wikisource
Parsoid-based online-detection of broken wikitext
Usability improvements for the Translate extension
A modern, scalable and attractive skin for MediaWiki
Automatic cross-language screenshots for user documentation
Separating skins from core MediaWiki
Chemical Markup support for Wikimedia Commons
Improving URL citations on Wikimedia
Welcoming new contributors to Wikimedia Labs and Tool Labs
Evaluating, documenting, and improving MediaWiki web API client libraries
Feed the Gnomes – Wikidata Outreach
Template Matching for RDFIO
Switching Semantic Forms Autocompletion to Select2
Catalogue for Mediawiki Extensions
Generic, efficient localisation update service.
Chart showing historical Flesch reading ease data for Tech News, a measure of the newsletter’s readability. Higher scores indicate material that is easier to read. A score of 60–70 corresponds to content easily understood by 13- to 15-year-old students.
Guillaume Paumier collaborated with authors of the Education newsletter to set it up for multilingual delivery, using a script similar to the one used for Tech News. He also wrote a detailed how-to to accompany the script for people who want to send a multilingual message across wikis. In preparation for the Wikimania session about Tech News, he updated the readability and subscribers metrics. He also continued to provide ongoing communications support for the engineering staff, and to prepare and distribute Tech News every week.
Volunteer coordination and outreach
We focused on the preparation of the Wikimania Hackathon, encouraging all registered participants to propose topics and sign up to interesting sessions. We also organized a Q&A session with potential organizers of the Wikimedia Hackathon 2015. We organized two Tech Talks: Hadoop and Beyond. An overview of Analytics infrastructure and HHVM in production: what that means for Wikimedia developers. More activities hosted in July can be found at Project:Calendar/2014/07.
Architecture and Requests for comment process
Developers finished the security architecture guidelines, and discussed several requests for comment in online architecture meetings:
2014-07-10 — Frontend standardization discussion focusing on Requests for comment/Redo skin framework;
2014-07-16 — RfC discussion focusing on Requests for comment/Vertical writing support;
2014-07-23 — RfC discussion focusing on Requests for comment/Composer managed libraries for use on WMF cluster, in which the architecture committee approved the RfC;
2014-07-30 — RfC discussion focusing on Requests for comment/CentralNotice Caching Overhaul – Frontend Proxy.
In July, Quim Gil sorted the tasks necessary for the first hub prototype into a Phabricator board, and Sumana Harihareswara determined which three APIs she would document first.
Wikimetrics can now generate vital sign metrics for every project daily. Rolling Monthly Active Editor metric has been implemented; the reports are in JSON format, in a logical path hosted on a file server and downloadable. The team also worked on backfilling data for the daily reports on Newly Registered and Rolling Active Editor, and numerous optimizations to backfill the data quickly.
New nodes were added to the cluster this month and all machines were upgraded to run CDH5. The team decided not to preserve any data on the cluster during the upgrade and started fresh. The team hosted a Tech Talk on our Hadoop installation (see video and slides). Duplicate monitoring has also been implemented in Hadoop to monitor the incoming Varnish logs.
Editor Engagement Vital Signs
The culmination of our efforts this month can be visualized in a prototype built for Wikimania. This was made possible thanks to many back-end enhancements (optimizations) to Wikimetrics, along with research and selection of the optimal technologies to implement the stack to display a dashboard.
EventLogging monitoring is now in graphite, and we can see which schemas cause spikes in traffic (example).
Research and Data
This month, we completed the documentation for the Active Editor Model, a set of metrics for observing sub-population trends and setting product team goals. We also engaged in further work on the new pageviews definition. An interim solution for Limited-duration Unique Client Identifiers (LUCIDs) was also developed and passed to the Analytics Engineering team for review.
We analyzed trends in mobile readership and contributions, with a particular focus on the tablet switchover and the release of the native Android app. We found that in the first half of 2014, mobile surpassed desktop in the rate at which new registered users become first-time editors and first-time active editors in many major projects, including the English Wikipedia. An update on mobile trends will be presented at the upcoming Monthly Metrics meeting on July 31.
Development of a standardised toolkit for geolocation, user agent parsing and accessing pageviews data was completed.
We supported the multimedia team in developing a research study to objectively measure the preference of Wikipedia editor and readers.
We hosted the July research showcase with a presentation by