On the first day of Mozlandia, Johnny Stenback and Doug Turner presented a list of key accomplishments in Platform Engineering/Engineering Operations in 2014.
I have been told a few times recently that people don’t know what my teams do, so in the interest of addressing that, I thought I’d share our part of the list. It was a pretty damn good year for us, all things considered, and especially given the level of organizational churn and other distractions.
We had a bit of organizational churn ourselves. I started the year managing Web Engineering, and between March and September ended up also managing the Release Engineering teams, Release Operations, SUMO and Input Development, and Developer Services. It’s been a challenging but very productive year.
Here’s the list of what we got done.
Web Engineering
Migrate crash-stats storage off HBase and into S3
Launch Crash-stats “hacker” API (access to search, raw data, reports)
Ship fully-localized Firefox Health Report on Android
Many new crash-stats reports including GC-related crashes, JS crashes, graphics adapter summary, and modern correlation reports
Crash-stats reporting for B2G
Pluggable processing architecture for crash-stats, and alternate crash classifiers
Symbol upload system for partners
Migrate l10n.mozilla.org to modern, flexible backend
Prototype services for checking health of the browser and a support API
Solve scaling problems in Moztrap to reduce pain for QA
New admin UI for Balrog (new update server)
Bouncer: correctness testing, continuous integration, a staging environment, and multi-homing for high availability
Grew Air Mozilla community contributions from 0 to 6 non-staff committers
Many new features for Air Mozilla including: direct download for offline viewing of public events, tear out video player, WebRTC self publishing prototype, Roku Channel, multi-rate HLS streams for auto switching to optimal bitrate, search over transcripts, integration with Mozilla Popcorn functionality, and access control based on Mozillians groups (e.g. “nda”)
DXR
Modeless, explorable UI with all-new JS
Case-insensitive searching
Proof-of-concept Rust analysis
Improved C++ analysis, with lots of new search types
Multi-tree support
Multi-line selection (linkable!)
HTTP API for search
Line-based searching
Multi-language support (Python already implemented, Rust and JS in progress)
Elasticsearch backend, bringing speed and features
Completely new plugin API, enabling binary file support and request-time analysis
SUMO
Offline SUMO app in Marketplace
SUMO Community Hub
Improved SUMO search with Synonyms
Instant search for SUMO
Redesigned and improved SUMO support forums
Improved support for more products in SUMO (Thunderbird, Webmaker, Open Badges, etc.)
BuddyUP app (live support for FirefoxOS) (in progress, TBC Q1 2015)
Input
Dashboards for everyone infrastructure: allowing anyone to build charts/dashboards using Input data
Backend for heartbeat v1 and v2
Overhauled the feedback form to support multiple products, streamline user experience and prepare for future changes
Support for Loop/Hello, Firefox Developer Edition, Firefox 64-bit for Windows
Infrastructure for automated machine and human translations
Massive infrastructure overhaul to improve overall quality
Release Engineering
Cut AWS costs by over 70% during 2014 by switching builds to spot instances and using intelligent bidding algorithms
Migrated all hardware out of SCL1 and closed datacenter to save $1 million per year (with Relops)
Optimized network transfers for build/test automation between datacenters, decreasing bandwidth usage by 50%
Halved build time on b2g-inbound
Parallelized verification steps in release automation, saving over an hour off the end-to-end time required for each release
Decommissioned legacy systems (e.g. tegras, tinderbox) (with Relops)
Enabled build slave reboots via API
Self-serve arbitrary builds via API
b2g FOTA updates
Builds for open H.264
Built flexible new update service (Balrog) to replace legacy system (will ship first week of January)
Support for Windows 64 as a first class platform
Supported FX10 builds and releases
Release support for switch to Yahoo! search
Update server support for OpenH264 plugins and Adobe’s CDM
Implement signing of EME sandbox
Per-checkin and nightly Flame builds
Moved desktop firefox builds to mach+mozharness, improving reproducibility and hackability for devs.
Helped mobile team ship different APKs targeted by device capabilities rather than a single, monolithic APK.
Release Operations
Decreased operating costs by $1 million per year by consolidating infrastructure from one datacenter into another (with Releng)
Decreased operating costs and improved reliability by decommissioning legacy systems (kvm, redis, r3 mac minis, tegras) (with Releng)
Decreased operating costs for physical Android test infrastructure by 30% reduction in hardware
Decreased MTTR by developing a simplified releng self-serve reimaging process for each supported build and test hardware platforms
Increased security for all releng infrastructure
Increased stability and reliability by consolidating single point of failure releng web tools onto a highly available cluster
Increased network reliability by developing a tool for continuous validation of firewall flows
Increased developer productivity by updating windows platform developer tools
Increased fault and anomaly detection by auditing and augmenting releng monitoring and metrics gathering
Simplified the build/test architecture by creating a unified releng API service for new tools
Developed a disaster recovery and business continuation plan for 2015 (with RelEng)
Researched bare-metal private cloud deployment and produced a POC
Developer Services
Ship Mozreview, a new review architecture integrated with Bugzilla (with A-team)
Massive improvements in hg stability and performance
Analytics and dashboards for version control systems
New architecture for try to make it stable and fast
Deployed treeherder (tbpl replacement) to production
Assisted A-team with Bugzilla performance improvements
I’d like to thank the team for their hard work. You are amazing, and I look forward to working with you next year.
At the start of 2015, I’ll share our vision for the coming year. Watch this space!