2016-09-03

TL;DR

See the code samples for all this on GitHub.

Meat & Potatoes

I’m writing a podcast app — I’m calling it ‘sodes — both as a way to let off steam and so that I can have the fussy-casual podcast app I’ve always wanted. Most podcast apps pre-download a full queue of episodes before you listen to them, and offer settings to manage how many episodes are downloaded, how often, and what to do with them when finished. ‘Sodes will be streaming-only. I think managing downloads is an annoying vestigial trait from when iPods synced via iTunes. I only listen to a handful of podcasts, and never from a place that doesn’t have Internet access. I’d rather never futz with toggles and checkmarks or police disk usage.

Most other apps do have optional streaming-only modes which, as far as I know1, are implemented as follows:

When a typical app streams an episode, the audio data is streamed using AVFoundation or some similar framework. Except for services like Stitcher, the original audio file is streamed in full quality. This streaming may only buffer a portion of the episode, depending on where you start, how far you skip, etc. Listen for long enough and it will eventually buffer the entire episode, byte-range by byte-range.

In parallel to the audio stream, a typical app also downloads the episode file in full and caches it locally. This is so the app can resume your place in the episode more quickly in a future session, or during the current session if your internet connection craps out.

In other words, even though you may be using a streaming-only mode, your app might be downloading the episode twice. It’s a little sneaky, but it’s a perfectly sensible compromise. If the parallel download succeeds it means the current episode won’t need to be re-buffered during a future session. AVFoundation does not persist streaming buffers across app sessions. Since it’s not uncommon for a podcast MP3 to be encoded at ~60 megabytes an hour, resuming playback from a cached file can dramatically reduce data usage over time, especially if it takes several sessions for someone to finish listening to an episode.

I could use that same dual-download pattern with ‘sodes, but I wondered if it would be possible to eliminate the need for a parallel download without also having to re-download the same streaming buffer with every new app session. After some digging, I found an obscure corner of AVFoundation which will allow me to do exactly that. There’s a protocol called:

It lets your code take the reigns for individual buffer requests when streaming audio or video with an AVPlayer. When setting up an AVURLAsset to stream, you can set the asset’s resource loader’s delegate to a conforming class of your own:

Your custom resource loader delegate is given an opportunity to handle each individual request for a range of bytes from the streamed asset, which means you could load that data from anywhere: from the network if you don’t already have the bytes, or by reading it from a local file if you do.

A proper implementation of AVAssetResourceLoaderDelegate is hard to get correct. The actual code you write needn’t be extraordinary. What’s hard is the documentation is spotty, the protocol method names are misleading, the required url manipulation is bizarre, and the order of events at run-time isn’t obvious. There are still aspects of it that I don’t fully understand, but what follows is a record of what I’ve learned so far.

Note: there are portions of AVAssetResourceLoaderDelegate that are only applicable to streamed media that require expiring forms of authentication. Those are outside the scope of this post since I don’t need to use them for streaming a podcast episode.

Basics of a Streaming Session

When you add an AVPlayerItem to an AVPlayer, the player prepares its playback pipeline. If that item’s asset points to a remotely-hosted media file, the player will want to acquire a sufficient buffer of a portion of that file so that playback can continue without stalling. The internal structure of the relationship between AVPlayer, AVPlayerItem, and AVURLAsset is not publicly exposed. But it is clear that AVPlayer fills its buffer with the help of AVURLAsset’s resourceLoader property, an instance of AVAssetResourceLoader. The resource loader is provided by AVFoundation and cannot be changed. The resource loader fulfills the AVPlayer’s requests for both content information about the media as well as requests for specific byte-ranges of the media data.

AVAssetResourceLoaderDelegate

AVAssetResourceLoader has an optional delegate property that must conform to AVAssetResourceLoaderDelegate. If your app provides a delegate for the resource loader, the loader will give its delegate an opportunity to handle all content info requests and data requests for its asset. If the delegate reports back that it can handle a given request, the resource loader relinquishes control of that request and waits for the delegate to signal that the request finished.

For our purposes, there are two delegate methods we need to implement:

The first method should return true if the receiver can handle the loading request. The method name is confusing at first glance since it’s written from the perspective of the resource loader (“should wait”) as opposed to the delegate (“can handle”), but it makes enough sense. The delegate is returning true if the resource loader should wait for the delegate to signal that the request has been completed. It is from inside this method that the delegate will kick off the asynchronous work needed to satisfy the request.

The second method is called whenever a loading request is cancelled. This is easy enough to reproduce. If you start playback from the beginning of a file, and then scrub far ahead into the timeline, there’s no longer a need to fill up the earlier buffer so the request for that initial range of data will be cancelled in order to spawn a new request starting from the scrubbed-to point.

Both delegate methods will be called on the dispatch queue you provide when setting the resource loader’s delegate:

I recommend that you use something other than the main queue so that loading request work never competes with the UI thread. I also recommend using a serial queue so that you don’t have to juggle concurrent procedures within your delegate.

AVAssetResourceLoadingRequest

The AVAssetResourceLoadingRequest class represents either a request for content information about the asset or a request for a specific range of bytes in the asset’s remotely-hosted file. You can determine which kind of request it is by inspecting the following two properties:

If there is a non-nil content information request, then the loading request is a content info request. If there is a non-nil data request and if the content info request is nil, then the loading request is a data request. It’s crucial to note here that content info requests are always accompanied by a data request for the first two bytes of the file. The actual received bytes are not used by the resource loader.

My implementation of resourceLoader(shouldWaitForLoadingOfRequestedResource:) looks like this:

I perform the work specific to either kind of request in those two private convenience methods.

Content Info Requests

Handling a content info request is straightforward. Create a URLRequest for the original url using a GET verb and set the value of the byte range header to the loading request’s dataRequest’s byte range:

You may wonder why I’m not using a HEAD request instead. I’m following Apple’s lead. Their engineers have their well-considered reasons. My educated guess is that if you request a byte range, the response header field Content-Range will contain a value for the expected content length of the entire file. This value wouldn’t be present in a HEAD response header. A range of two bytes is the smallest valid range, which helps avoid unnecessary data transfer.

Hang onto a strong reference to the loading request and the loading request’s contentInformationRequest. After receiving a response back from the server, you must update the content info request’s properties:

Warning: do not pass the two requested bytes of data to the loading request’s dataRequest. This will lead to an undocumented bug where no further loading requests will be made, stalling playback indefinitely.

After updating those three values on the content info request, mark the associated loading request as finished:

If you get an error when trying to fetch the content info, mark the loading request as finished with an error:

While your delegate is handling the content info request, it is unlikely that any other requests will be started. Your request could be cancelled during this time if the player happens to cancel playback. Since you’re holding onto a strong reference to the loading request, you should take care to cancel any URLSessionTasks and relinquish references to the loading request when it’s cancelled as well as when it’s finished.

Assuming you fetched the content info successfully, calling finishLoading() will trigger the resource loader to follow up with the first genuine data request.

Data Requests

For a given asset, the resource loader will only make one content info request but will many one or more data requests (instances of AVAssetResourceLoadingDataRequest). If the host server does not support byte range requests, there will be one data request for the full file:

iTunes podcast registry will reject any podcast feed whose host server doesn’t support byte range requests. Thus in practice it’s probably hard to find a podcast host server that doesn’t support byte range requests. It’s not a terrible idea for a podcast-specific implementation of AVAssetResourceLoaderDelegate to always fail if you determine that the host server doesn’t support byte range requests. This will spare you the additional headache of handling the edge cases where either the full file is being requested or the length of the file exceeds the maximum length that can be expressed in an NSInteger using the current architecture (this can happen on 32 bit systems). See the documentation for AVAssetResourceLoadingDataRequest for more information about these edge cases.

Most of the time your data requests will be for a specific byte range:

A simplistic implementation would make a GET request with the Range header set to the requested byte range, download the data using URLSessionDownloadTask, and pass the result to the loading request as follows:

A problem with this implementation is that the request doesn’t receive data progressively, but rather in one big bolus at the tail end of the URL task. The respond(with: data) method is designed to be called numerous times, progressively adding more and more data as it is received. AVPlayer will base its calculations about whether or not playback is likely to keep up based on the rate at which data is passed to the data request via respond(with: data). For this reason, I recommend using a URLSession configured with a URLSessionDataDelegate, and to download the data using URLSessionDataTask so that the data delegate can pass chunks of data to the loading request’s data request as each chunk is received:

When the URLSessionDataTask finishes successfully or with an error, finish the loading request accordingly:

If the user starts skipping or scrubbing around in the file, or if the network conditions change dramatically, the resource loader may elect to cancel an active request. Your delegate implementation should cancel any URLSessionTasks still in progress. In practice, requests can be started and cancelled in rapid succession. Failure to properly cancel network requests can degrade overall streaming performance very quickly.

URL Manipulation

I’ve skipped over an important part of implementing AVAssetResourceLoaderDelegate. Your delegate will never be given an opportunity to handle a loading request if the AVURLAsset’s url uses an http or https url scheme. In order to get the resource loader to use your delegate, you must initialize the AVURLAsset using a url that has a custom scheme:

What I recommend doing is prefixing the existing scheme with a custom prefix:

This is a non-destructive edit that can be removed later. Otherwise it would be more difficult to determine whether to use http or https when handling the loading request.

Your resource loader delegate implementation should check for the presence of your custom url scheme prefix when determining whether or not it can handle a loading request. If so, you’ll strip the prefix from the loading request’s url purely as an implementation detail, using the original url value when fulfilling the loading request. The resource loader doesn’t need to know that you’re manipulating the url in this way.

Warning: if you forget to modify the url scheme for the AVURLAsset, your delegate method implementations will never be called.

Special Implementation in ’sodes

My resource loader delegate for ’sodes will be optimized for podcast streaming. When it receives a data request from a resource loader, it will first check a locally-cached “scratch file” to see if any portions of the requested byte range have already been downloaded and written to the scratch file during a previous request. For any overlapping ranges, the pre-cached data will be passed to the data request. For all the gaps, the data will first be downloaded from the internet, and then both written to the scratch file and passed to the data request. In this way, I can download each byte only one time, even across multiple app sessions.2

As byte ranges are downloaded, I write the data to the scratch file using NSFileHandle. If written successfully, I annotate the downloaded range in a plist stored in the same directory as the scratch file. The plist gets updated at regular intervals during a download session. I combine all the contiguous or overlapping downloaded byte ranges when generating the plist, so that it’s easy to parse for gaps in the scratch file when servicing future data requests. The plist is necessary because I am not aware of any facility in Foundation that can determine whether a given file contains ranges of “empty” data. Indeed, an “empty” range might not even contain all zeroes. I take great pains to ensure that the loaded byte range plist is only updated after the data has been successfully written to the scratch file. I’d rather err on the side of having data the plist doesn’t know about, rather than the plist reporting that there is a range of data that hasn’t actually been downloaded.

GitHub

I’ve posted to GitHub a slightly modified version of the actual code that will go into ’sodes. You can see it here. It’s MIT licensed, so feel free to re-use any of it as you see fit. It is not intended for use as a re-usable framework since it’s heavily optimized for the needs of ’sodes. The Xcode project has a framework called SodesAudio which has all the AVFoundation-related code, including my resource loader delegate. There’s also an example host app that plays an episode of Exponent on launch. The example app has simple playback controls, and also a text view that prints out the loaded byte ranges that have been written to the scratch file. The ranges are updated as more data is received.

If you make or know of an app that solves this problem in a different way, I’m anxious to hear about it. ↩

If the user jumps around between multiple episodes, this will negate that effort. I could guard against this by keeping more than one scratch file around in the cache, but for now I’m only keeping a single scratch file around, so that I can minimize disk usage. Disk space tends to be more constrained on iOS devices than network bandwidth. ↩

Show more