2014-01-30

… while neither Amazon nor Google publicize their server designs yet

Designing Cloud Infrastructure for 1m+ Server Scale [“cloud scale”] – Kushagra Vaid (General Manager, Cloud Server Engineering, Microsoft) [Open Compute Project YouTube channel, Jan 29, 2014]

OCP Summit V – January 28, 2014 – San Jose Convention Center, San Jose, California Designing Cloud Infrastructure for 1m+ Server Scale [“cloud scale”]- Kushagra Vaid, General Manager, Cloud Server Engineering, Microsoft For more information about OCP go to:http://www.opencompute.org/







“The chassis can take 24 servers”

Microsoft Contributes Cloud Server Specification to Open Compute Project [Microsoft Data Centers Blog, Jan 27, 2014]

Today Microsoft  announced that it will be joining the Open Compute Project Foundation (OCP) and will be contributing hardware specifications, design collateral (CAD and Gerber files), and system management source code for Microsoft’s cloud server designs. These specifications apply to the server fleet being deployed for Microsoft’s largest global cloud services, including Windows Azure, Bing, and Office 365. This significant contribution demonstrates our continued commitment to sharing our key learnings and experiences from more than 19 years of operating online services with the industry.

Microsoft manages a global portfolio of datacenters across all continents, has an installed base of over one million servers, and delivers more than 200 services for 1+ billion customers and 20+ million businesses in 90+ markets. Deploying and operating a huge cloud-scale [Cloud-Scale Data Centers, Feb 11, 2013; see also Microsoft Cloud-Scale Data Center designs [Microsoft Data Centers Blog, March 26, 2013]] infrastructure requires careful attention to several system design principles:

Simplicity of the design is essential, since at cloud-scale the smallest issues can get magnified and potentially cause unexpected availability issues for customers.

Efficiency gains across cost, power, and performance vectors are required to deliver the lowest total cost of ownership (TCO).

Modular system design provides flexibility to accommodate hardware changes necessary for evolving workload requirements, plus it helps streamline the integration of new technologies.

Supply chain agility is essential for adapting to rapid variations in server capacity demand signals.

Ease of operations is key to ensuring system management at scale and cost effective servicing for hardware failures in the datacenter.

Environmental sustainability is an important part of our cloud strategy. This includes minimizing material use and ensure re-use of components wherever possible across the server lifecycle.

Based on these guiding principles, Microsoft has designed an innovative system architecture that we believe will drive design and operational efficiency beyond the conventional commodity servers currently available in the market. The key design features include:

Chassis-based shared design for cost and power efficiency

EIA rack mountable 12U Chassis leverages existing industry standards

Modular design for simplified solution assembly: mountable sidewalls, 1U trays, high efficiency commodity power supplies, large fans for efficient air movement, management card

Up to 24 commodity servers per chassis (two servers side-by-side), option for JBOD storage expansion

Optimized for mass contract manufacturing

Up to 40% cost savings and 15% power efficiency benefits vs. traditional enterprise servers

Estimated to save 10,000 tons of metal per one million servers manufactured

Blind-mated signal connectivity for servers

Decoupled architecture for server node and chassis enabling simplified installation and repair

Cable-free design, results in significantly fewer operator errors during servicing

Reduction of ‘No problem found’ incidents from loose cables

Up to 50% improvement in deployment and service times

Network and storage cabling via backplane architecture

Passive PCB backplane for simplicity and signal integrity risk reduction

Architectural flexibility for multiple network types such as 10Gbe/40Gbe, Copper/Optical

One-time cable install during chassis assembly at factory

No cable touch required during production operations and on-site support

Expected to save 1,100 miles of cable for a deployment of one million servers

Secure and scalable systems management

X86 SoC-based management card per chassis

Multiple layers of security for hardware operations: TPM secure boot, SSL transport for commands, Role-based authentication via Active Directory domain

REST API and CLI interfaces for scalable systems management

Support for server diagnostics and self-health checks

Up to 75% improvement in operational agility vs. traditional enterprise servers

The Microsoft cloud server is a revolutionary design that brings the benefits of commoditization and cloud-scale operations to the industry. The specifications we’re contributing to OCP embody our long history and deep experience in datacenter architecture and cloud computing, and our commitment to sharing our cloud infrastructure best practices with the industry since 2007. As part of joining OCP, Microsoft will be making the following contributions for our Microsoft cloud server design and manufacturing collateral:

Hardware specifications

Server, mezzanine card, tray, chassis, and management card

Management APIs and protocols (for chassis and server)

Mechanical CAD models

Chassis, server, chassis manager, and mezzanines

Gerber files

Management card, power distribution board, and tray backplane

Source code for Chassis infrastructure

Server management, fan and power supply control, diagnostics and repair policy

Microsoft will also be engaging in the OCP community via active participation in the various sub-committees and engineering forums. I am pleased to announce that Mark Shaw, Director of Hardware Development on my team, has been appointed as the Chair of the Server committee via the OCP community voting process. Additionally, MS Open Tech is  releasing an open source implementation of the Chassis Manager specification [“As part of this effort, MS Open Tech is releasing an open source reference implementation of the Chassis Manager specification. Today, this code, is available on GitHub, and implements functions such as server management, and fan and power supply control.”]. We would like to help to build an open source software community for this project within OCP.

Our hardware partners are developing products for Microsoft based on these specifications and we look forward to availability of commercial offerings from our partners in the near future.

We are excited to share our cloud infrastructure learnings and operational experiences with the broader community to help drive the industry efficiencies forward, reduce the cost of hardware for all participants, and accelerate the adoption of cloud computing. You can find more information about the Microsoft cloud server specification via my customer discussion video, our white paper and at www.opencompute.org. 

Compare this to the current (certified) and upcoming (new) boards from Intel based on current OCP specification (Decathlete for financial services, and Windmill the Facebook design with Intel for the dense servers) particularly designed by Facebook, as well as the upcoming Leopard being the next-generation compute module for Facebook): 

But keep in mind Intel’s advanced interest in:  

All from Designing the Datacenter of the Future – Eric Hooper (Director, Cloud Service Provider Optimization, Intel Corporation) [Open Compute Project YouTube channel, Jan 29, 2014] video:

OCP Summit V – January 28, 2014 – San Jose Convention Center, San Jose, California Designing the Datacenter of the Future – Eric Hooper To find out more about OCP go to: http://www.opencompute.org/

In that video there is also a testimonial part from Goldmann Sachs using the jointly developed Decathlete design (code named “Swiss Army Knife”).

Filed under: servers Tagged: cloud server, cloud-scale, Cloud-Scale Data Center designs, Decathlete, designs, Microsoft, Open Compute Project, Silicon Photonics, Windmill

Show more