2014-12-20

On the first day of Mozlandia, Johnny Stenback and Doug Turner presented a list of key accomplishments in Platform Engineering/Engineering Operations in 2014.

I have been told a few times recently that people don’t know what my teams do, so in the interest of addressing that, I thought I’d share our part of the list. It was a pretty damn good year for us, all things considered, and especially given the level of organizational churn and other distractions.

We had a bit of organizational churn ourselves. I started the year managing Web Engineering, and between March and September ended up also managing the Release Engineering teams, Release Operations, SUMO and Input Development, and Developer Services. It’s been a challenging but very productive year.

Here’s the list of what we got done.

Web Engineering

Migrate crash-stats storage off HBase and into S3

Launch Crash-stats “hacker” API (access to search, raw data, reports)

Ship fully-localized Firefox Health Report on Android

Many new crash-stats reports including GC-related crashes, JS crashes, graphics adapter summary, and modern correlation reports

Crash-stats reporting for B2G

Pluggable processing architecture for crash-stats, and alternate crash classifiers

Symbol upload system for partners

Migrate l10n.mozilla.org to modern, flexible backend

Prototype services for checking health of the browser and a support API

Solve scaling problems in Moztrap to reduce pain for QA

New admin UI for Balrog (new update server)

Bouncer: correctness testing, continuous integration, a staging environment, and multi-homing for high availability

Grew Air Mozilla community contributions from 0 to 6 non-staff committers

Many new features for Air Mozilla including: direct download for offline viewing of public events, tear out video player, WebRTC self publishing prototype, Roku Channel, multi-rate HLS streams for auto switching to optimal bitrate, search over transcripts, integration with Mozilla Popcorn functionality, and access control based on Mozillians groups (e.g. “nda”)

DXR

Modeless, explorable UI with all-new JS

Case-insensitive searching

Proof-of-concept Rust analysis

Improved C++ analysis, with lots of new search types

Multi-tree support

Multi-line selection (linkable!)

HTTP API for search

Line-based searching

Multi-language support (Python already implemented, Rust and JS in progress)

Elasticsearch backend, bringing speed and features

Completely new plugin API, enabling binary file support and request-time analysis

SUMO

Offline SUMO app in Marketplace

SUMO Community Hub

Improved SUMO search with Synonyms

Instant search for SUMO

Redesigned and improved SUMO support forums

Improved support for more products in SUMO (Thunderbird, Webmaker, Open Badges, etc.)

BuddyUP app (live support for FirefoxOS) (in progress, TBC Q1 2015)

Input

Dashboards for everyone infrastructure: allowing anyone to build charts/dashboards using Input data

Backend for heartbeat v1 and v2

Overhauled the feedback form to support multiple products, streamline user experience and prepare for future changes

Support for Loop/Hello, Firefox Developer Edition, Firefox 64-bit for Windows

Infrastructure for automated machine and human translations

Massive infrastructure overhaul to improve overall quality

Release Engineering

Cut AWS costs by over 70% during 2014 by switching builds to spot instances and using intelligent bidding algorithms

Migrated all hardware out of SCL1 and closed datacenter to save $1 million per year (with Relops)

Optimized network transfers for build/test automation between datacenters, decreasing bandwidth usage by 50%

Halved build time on b2g-inbound

Parallelized verification steps in release automation, saving over an hour off the end-to-end time required for each release

Decommissioned legacy systems (e.g. tegras, tinderbox) (with Relops)

Enabled build slave reboots via API

Self-serve arbitrary builds via API

b2g FOTA updates

Builds for open H.264

Built flexible new update service (Balrog) to replace legacy system (will ship first week of January)

Support for Windows 64 as a first class platform

Supported FX10 builds and releases

Release support for switch to Yahoo! search

Update server support for OpenH264 plugins and Adobe’s CDM

Implement signing of EME sandbox

Per-checkin and nightly Flame builds

Moved desktop firefox builds to mach+mozharness, improving reproducibility and hackability for devs.

Helped mobile team ship different APKs targeted by device capabilities rather than a single, monolithic APK.

Release Operations

Decreased operating costs by $1 million per year by consolidating infrastructure from one datacenter into another (with Releng)

Decreased operating costs and improved reliability by decommissioning legacy systems (kvm, redis, r3 mac minis, tegras) (with Releng)

Decreased operating costs for physical Android test infrastructure by 30% reduction in hardware

Decreased MTTR by developing a simplified releng self-serve reimaging process for each supported build and test hardware platforms

Increased security for all releng infrastructure

Increased stability and reliability by consolidating single point of failure releng web tools onto a highly available cluster

Increased network reliability by developing a tool for continuous validation of firewall flows

Increased developer productivity by updating windows platform developer tools

Increased fault and anomaly detection by auditing and augmenting releng monitoring and metrics gathering

Simplified the build/test architecture by creating a unified releng API service for new tools

Developed a disaster recovery and business continuation plan for 2015 (with RelEng)

Researched bare-metal private cloud deployment and produced a POC

Developer Services

Ship Mozreview, a new review architecture integrated with Bugzilla (with A-team)

Massive improvements in hg stability and performance

Analytics and dashboards for version control systems

New architecture for try to make it stable and fast

Deployed treeherder (tbpl replacement) to production

Assisted A-team with Bugzilla performance improvements

I’d like to thank the team for their hard work. You are amazing, and I look forward to working with you next year.

At the start of 2015, I’ll share our vision for the coming year. Watch this space!

Show more