2012-08-01

Internet Explorer 10’s networking code builds upon the performance improvements in IE9 (caching, overall networking) to help ensure that IE10 loads pages as quickly as possible. In IE10, we identified a few key areas for improvement based on customer feedback, code inspection, and telemetry data.

Our efforts fall into two major categories – enhanced caching, and improved connection management.

Caching Enhancements

New cache database

In Windows 8, we made a huge architectural change and moved the browser cache to use a true database, obsoleting the index.dat memory-mapped file used over the past nine releases. Moving to a proper database has several obvious advantages - the cache is much more reliable and resilient against corruption, and it has faster performance on today’s multi-core architectures. The move has also reduced the time taken to purge files from the cache and significantly lowered the lookup cost for cache misses. In addition, new structure of the cache enables complex queries to support tailored applications in the Windows 8 web platform. This work paved the way to build the new HTML5 AppCache feature.

Caching compiled script

JavaScript and CSS are the two primary resource types that can block page load. JavaScript execution often ends up rewriting the markup, forcing new resource downloads. The following “waterfall” chart shows that as the browser finished compiling script.js, several resource downloads are initiated:



In IE 10, we cache the JIT-compiled JavaScript for reuse on subsequent pages. This optimization enables IE to download subsequent resources without a delay for compilation.

Here are the performance results (in milliseconds) recorded during a sequence of 100 runs of two HTML & JavaScript applications:

App

Time taken without JIT caching

Time taken with JIT caching

Improvement

News

8234

5074

38%

Stocks

9618

6115

36%

Better scavenging

Using telemetry data from Microsoft employees, we collected data about the types and size of files in the browser’s cache, with the hope of tuning the cache size limit if doing so would improve performance.

As an experiment, we removed the cache limit entirely and compared the cache hit ratio against the default cache limit of 250MB. Remarkably, we found that an unlimited cache only increased the hit rate by 1%!

Total Requests:                             1244458
   Internet:                                1122943 (90%)
   Intranet:                                 119485 ( 9%)
   localhost:                                    36 ( 0%)
   Unknown:                                    1994
   GET:                                     1146969 (92%)
   POST:                                      63205 ( 5%)
   SSL:                                      174360 (14%)
      Connection: close:                      20279 (11%)
Cache Misses:                                681544 (51%)
   Not in Cache:                             643292 (51%)
   Validation Required, Content Modified:     38252 ( 3%)
Cache Hits:                                  562914 (45%)
   No Validation Required:                   476589 (38%)
   Validation Required, Not Modified (304):   86325 ( 6%)

Observed Scavenger
---
Resources Scavenged:                         193491
   Re-Requested After Removal:                26934
      Possibly Saved With Infinite Cache:     17129 (1% of requests, 8% of scavenged resources)
      Had Already Expired:                     9805

Infinite Cache
---
Infinite Cache Potential Hit Rate:               46% (+1% over observed hit rate)

Based on this data, we didn’t increase the maximum size of the cache. However, our experiments led to other optimizations in the cache logic.

Note: In Internet Explorer, downloaded files are guaranteed to be kept in the cache for at least 10 minutes. This “protection period” ensures that applications downloading a file are able to consume it. During this period, these files do not count towards the cache quota. Also, large files are temporarily excluded from the cache quota. This ensures that a huge file that was just downloaded does not wipe out the rest of the cache.

Performance improvements for no-cache and “Do not save encrypted pages to disk”

The HTTP caching control header no-cache is described in RFC 2616:

If the no-cache directive does not specify a field-name, then a cache MUST NOT use the response to satisfy a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent caching even by caches that have been configured to return stale responses to client requests.

If the no-cache directive does specify one or more field-names, then a cache MAY use the response to satisfy a subsequent request, subject to any other restrictions on caching. However, the specified field-name(s) MUST NOT be sent in the response to a subsequent request without successful revalidation with the origin server. This allows an origin server to prevent the re-use of certain header fields in a response, while still allowing caching of the rest of the response

IE9 did not cache any response that had the no-cache directive. In IE10, we made a performance improvement by caching the response and always re-validating it before using it. This change matches other browsers’ behavior and significantly improves upon the back/forward caching optimization we had made in IE9 as now we are able to pick up these resources from the cache.

In IE9, we didn’t create cache files for HTTPS responses if the “Do not save encrypted pages to disk” option was checked. This led to problems with file download and other issues across the browser. In IE10, we create a temporary cache file which is purged when the browser session is ended.

Connection Management Improvements

The logic browsers use to establish and reuse connections to servers has a significant impact on overall performance. Consider the following page:



This trivial page, whose resources are all hosted on the same server, results in the browser opening multiple connections to download:

Top webpage (default.htm) containing the HTML

Script (script.js)

Styles (styles.css)

Banner advertisement (buyme.jpg)

Picture (picoftheday.jpg)

More complex sites require many more resources to display their pages, and it is common for those resources to be spread across many different servers.

The following are some of the reasons why website authors create pages with resources from different hostnames:

Domain Sharding: All modern browsers use connection limits to enforce a maximum number of simultaneous connections to a single host. In an attempt to improve performance, some website developers intentionally spread resources across multiple hostnames. Sharding was useful when the connection limit was too low (e.g. 2-per-host in IE7) but has fallen out of favor now that the limits are higher (6-per-host in IE8+ and most other browsers).

Performance: A website may choose to serve some resources like images from a different “cookieless domain” to reduce request size and improve performance. Similarly, they may serve some resources from a geographically load-balanced Content Delivery Network (CDN) for faster performance.

Advertising: Website authors can subsidize the costs of running a website by displaying ads to their users. These ads are typically hosted in a different domain owned by the advertiser.

Content Aggregation:  Sites dedicated to pulling together content about a specific subject from around the web, content aggregators are becoming increasingly popular.

Optimizing the use of connections leads to faster resource downloads and improved page-load times.

The following sections detail some of IE10’s improvements in this area.

Pre-Resolve/Pre-Connect (Pre2)

As the browser processes the HTML of a top-level page, it must perform the following operations to be able to download the additional resources (dependencies) required to display the page:

DNS resolution: The browser relies on the Domain Name System (DNS) to discover the IP address of the server where a resource is stored. DNS converts the domain part of the each dependency’s URL into one or more IP addresses.

TCP/IP connection: Once the browser discovers the IP addresses of a dependency, it will establish a TCP/IP connection to the server where the resource is stored.

(Optional) Secure the Connection: If the target resource is retrieved via HTTPS, the client and server must perform a cryptographic handshake to agree on the security parameters for the connection.

HTTP download: After the browser has established a TCP/IP connection, it issues a Hypertext Transfer Protocol (HTTP) request for the resource.

The following waterfall diagram depicts the network activity for a small subset of the resources needed to load the webpage http://www.msn.com. Each horizontal bar represents the duration of an HTTP request, with yellow representing DNS Resolutions, light yellow representing TCP/IP Connections, blue representing download time and green representing idle time:



As you can see from the above graph, the download of resources needed to display the page is often blocked awaiting DNS Resolution and a TCP/IP Connection. In IE10, we introduced two new features to minimize the time required to download dependencies. These features are called DNS Pre-Resolve and TCP/IP Pre-Connect, referred collectively as Pre2in this blog post.

When a user visits a page, IE10 will remember the hostname components of the URLs of each of the dependencies needed to load the page. When the user later revisits a page for which IE has stored dependency data, the browser will resolve hostnames and initiate TCP/IP connections for dependencies in parallel with the download of the root page itself. The following waterfall diagram depicts the network activity the same webpage shown above with Pre2 enabled:

As the browser processes the HTML of the page and determines that it needs to download other dependent resources, those requests are unblocked from waiting for DNS resolution or TCP/IP Connections because the necessary connections have been pre-established.

In order to measure the impact of the Pre2 work, we set up a test environment in one of our labs. To eliminate network variability, we copied 25 representative web pages locally to a web server behind a hardware network emulator. For this particular experiment, we set the latency of the network emulator to 39ms, which is commonly seen in the wild.

We were pleased to see that on average, the overall time required to load pages was reduced by 9.28%. The following is the full results of our experiment.

Site#

Page Load Time improvement

Site#

Page load time improvement

1

5.31%

14

3.78%

2

14.02%

15

17.80%

3

6.46%

16

6.59%

4

11.39%

17

12.62%

5

9.20%

18

8.30%

6

7.27%

19

6.30%

7

-0.64%

20

3.78%

8

0.91%

21

8.42%

9

15.81%

22

6.29%

10

3.11%

23

6.76%

11

21.90%

24

7.57%

12

10.49%

25

22.93%

13

15.62%

 

 

These results validate that IE10’s Pre2 feature provides impactful performance gains. We also expect that users on higher-latency networks (i.e. 3G and WiFi) will experience even bigger improvements.

These improvements also apply to HTTPS connections, where the connection cost is even higher than that of unencrypted HTTP.

SSL False Start

In December of 2010, Google published a draft proposal for a feature called TLS false start. False Start defines an abbreviated TLS handshake that reduces latency by saving one round-trip:

In IE10, we implemented False Start as described in the IETF draft. Here are the performance improvements we saw for 20 representative sites (times shown represent the cost of 1 roundtrip):

 

Ethernet

Wi-Fi

Mobile broadband

Best

0.09s

0.57s

2.16s

Avg.

0.03s

0.24s

1.02s

Worst

0.01s

0.11s

0.38s

With our implementation, we see a lower compatibility impact than Google reported for their browser when the feature was enabled. Internet Explorer does not send application data in the same TCP packet as the TLS Client Finished message; instead it is sent in a new TCP packet. The following data describes the compatibility impact of both implementations:

Google’s implementation: 423 out of 2000 sites hang when sending application data on the same TCP packet as the TLS Client Finished message.

Our implementation: 23 out of 2000 sites hang when sending application data in a separate TCP packet.

To minimize compatibility issues, we added the following to our TLS False Start feature:

A blocklist that is populated with the sites known to hang. Currently the list has the 23 sites mentioned above.

Logic to detect if a site hangs for longer than a given threshold, which is currently 4 seconds. If this happens, the next time the user navigates to the site, we will not attempt TLS False Start. This information is persisted only in memory to allow a site to benefit from the improvement after they update their servers to be standards-compliant.

Connection selection reordering

TCP Slow Start is a congestion control algorithm defined in RFC 2581 and RFC 3390. TCP Slow Start is used to control the amount of outstanding data in a TCP connection between two hosts. The amount of unacknowledged data starts with a low limit which increases as data is transmitted between the hosts. The higher the connection throughput, the fewer packets and roundtrips it takes to transfer the same amount of data. If a connection remains open but idle (in the connection keep-alive pool, but no data is transferred between the hosts) its limit will gradually decline.

IE9 reused connections from the keep-alive pool using First in First out (FIFO/”queue”) ordering. This resulted in suboptimal performance, since the latest connections to be added to the keep-alive pool have the highest throughput. In IE10, WinINET was changed to use Last in First out (LIFO/”stack”) ordering to select a connection:

This optimization ensures that resources are downloaded using the connection with the highest available throughput.

First available connection

If multiple resources are stored on the same target host, browsers will open multiple parallel connections to the host, up to the per-host connection limit (6). As mentioned in the previous section, connection throughput increases as data is transferred between hosts and the throughput starts to decrease when a connection becomes idle. Therefore, there is a very high probability that a recently used connection will have higher throughput than a newly established one.

Internet Explorer 9 introduced a performance improvement whereby it would always open a second “background” connection when navigating to a page (since virtually all real-world sites benefit from loading over at least two connections). But this improvement was limited in scope, and IE10 improves performance more by more flexibly assigning requests to connections.

In IE9 and lower, if an existing connection became available after the browser had started to establish a new connection for a subsequent request, IE would always wait for and subsequently use the newly-established connection (behavior sometimes called “early connection binding”). Beyond the unnecessary delay in the establishment of the new connection, that new connection would go through TCP Slow Start, meaning that the browser had spent extra time waiting to use a slower connection!

In IE10, the browser always uses the first available connection (behavior sometimes called “late connection binding”). That means the browser will reuse an existing connection as soon as it becomes available, or a newly-established connection if none of the existing connections become free.

DNS cache entries increased

For every new domain that a page accesses, IE spends time in DNS resolution. To optimize performance, IE maintains its own DNS cache (in addition to the system’s DNS cache). Most modern webpages contain content from several domains. In Windows 8 IE’s DNS cache grows from 32 entries to 256 entries, helping minimize the expense of DNS lookup for recently-used hostnames.

Summary

Taken together, these improvements help ensure that Internet Explorer 10 minimizes page load times for today’s real-world pages.

We hope you enjoy the improvements!

Show more