Alexstjohn.com

Apple’s Metal API, first impressions

2014-06-04

Just to be absolutely clear with everybody, this impression is based simply on a review of Apple’s publicly available Metal API doc found here;

https://developer.apple.com/library/prerelease/ios/documentation/Miscellaneous/Conceptual/MTLProgGuide/Introduction/Introduction.html

It’s very difficult to accurately or fairly judge a new technology without really getting your hands on an SDK and digging in to try to use it. That said, I confess that I’m a little excited about what I see here. It’s easy to see things you WANT to see in an API that don’t necessarily work when you try to use it but here’s what I was looking for… the big “obstacle” to modern 3D graphics has been that the traditional OpenGL/DirectX like graphics pipeline was very linear and highly moderated by the CPU. In the era before GPU’s became incredibly fast, parallel and general purpose this was a very efficient architecture because the extreme linear pipelining of the graphics API gave the GPU hardware every opportunity to optimize that pipeline for maximum throughput. Times however have changed, CPU’s are relatively slow and buses to the GPU are relatively slow and getting relatively “slower” as GPU parallelism accelerates. This makes the host operating system, CPU and bus a growing performance bottleneck. Furthermore GPU’s have gotten so fast and versatile that they are increasingly capable of running the whole game… graphics, physics, AI, everything and it has become increasingly practical to think about new kinds of game engines that treat graphics, physics, AI and sound as all part of the same world model instead of separate components that have to be artificially stitched together by the game developer. The desire to create “grand unified” game engines is hard to resist but the traditional graphics pipeline has largely effectively isolated graphics rendering from the other aspects of game design.

Here is a crude graph of what the modern DirectX/OpenGL graphics pipeline looks like; Notice that it’s not exactly a linear pipeline as it was in its early days. The addition of “valves” and feedback loops to the pipeline are modern adaptations to the growing need to interject or mediate intermediate stages in the pipeline to achieve particular graphics effects. For example most modern games use a lighting model called volumetric lighting that involves rendering a given scene repeatedly from the camera point of view of each light in the scene in order to correctly light and shadow the complete scene. Thus each scene may make multiple passes through the pipeline to complete a single image frame in a game. Making lights dynamic (allowing them to move around and change properties) has largely been sacrificed in many games in favour of static realism. Because the 3D pipeline is so specialized to graphics, game physics and AI are generally computed elsewhere (mostly on the CPU) and incorporated into the resulting scene. Thus a modern 3D pipeline isn’t really linear anymore. It has joints, loops and interception points that can be said to mostly be artefacts of earlier eras in GPU architecture, 3D authoring tools and engine design.

In an era when the GPU is almost perfectly capable of running the entire game and is inherently massively parallel, it would be more ideal if a game engine could dynamically control every stage of the pipeline in parallel and mix and match graphics compute functionality with the more general shader style of programming. For example for a 3D scene with 3 light sources that each need to be rendered to generate light volumes, why not be able to run all three renders concurrently and then composite them in a final stage INSTEAD of the CPU calling the render loop 4 times sequentially to achieve the same result? Mixing GPU programming with graphics programming has generally been difficult to date. A developer could either rely on the CPU bound graphics API like OpenGL to inject programmable shader code into the graphics pipeline in a highly constrained way OR use a more general purpose GPU programming language like CUDA and use graphics interoperability functions to crudely mix and match OpenGL calls with CUDA or OpenCL powered computations. None of it has been a very elegant solution, which is what has lead me to previously suggest that our existing graphics API’s have reached obsolescence. They’re hindering progress, not enabling it.

On first inspection of Apples Metal API documentation I see several familiar CUDA like references to general purpose compute kernels and programming semantics very similar to CUDA or OpenCL sitting directly along side familiar OpenGL/DirectX like semantics for programmable vertex and pixel shaders. At first perusal it appears that Apple may actually have flattened the graphics pipeline… exactly as desired such that it becomes possible to create many concurrent pipelines which may or may not be graphic in nature but can all be executed simultaneously.

In this illustration for Apple’s Metal API we see what appears to be multiple “custom” constructed graphics pipelines with compute elements getting assembled to all execute concurrently in separate parallel threads. Some of the pipelines are 3D, some are 2D image blits and some are compute packages that may or may not execute as part of a graphics package. In my previous example it would appear that I could indeed render a given scene from three lighting positions concurrently using this approach. This then appears to be a FORWARD step in the direction of a unified graphics/compute architecture. I hate to admit it, but I want to get excited about this… Gimme some slack people, I’m the Direct3D guy, this kills me to admit…

So here we see something in the Metal doc that appears to be analogous to a CUDA __global__ compute function.

In this example the code resembles a traditional OpenGL or DirectX shader function or a general purpose CUDA __global__ kernel function. The keyword “kernel” marks this as a GPU function. The double brace syntax appears to be Metals syntax for indicating buffers allocated in GPU memory and the function body itself looks like modern C++ 0×11 code which is clearly interoperating with graphics functionality. Apple’s Metal documentation says that the Metal compiler will be C++ 0×11 compatible which will mean that more advanced C++ template functions may also be arriving on the GPU very soon. *CUDA doesn’t support this fully yet but then CUDA is a live API and this is just a document at the moment so there is a good chance that Nvidia and Apple will arrive at C++ 0×11 compliant GPU compilers in the same near time frame.

I confess that on first read of Apple’s document it looks pretty complex but then graphics has never been simple and trying to mix general purpose GPU programming with a deconstructed classical 3D pipeline is an ambitious undertaking. On examination it seems to be the right idea, but it’s impossible for me to judge how well it works until I can get my hands on it. I suspect that this new style of parallel graphics programming will present a learning curve for most 3D programmers. The big observation I would take away from this is that unlike the media characterization of this being a Mantel like solution for accelerating DRAW calls, I get the impression that the deconstructed graphics pipeline is actually the real news event here. I know that if I had access to a deconstructed 3D pipeline from CUDA I would be thrilled, so if this is Apple’s vision for how that challenge can be tackled, I’m eager to see more of it.

*Update: After I posted this it occurred to me that I might be guilty of a little observer bias. I’ve been working with CUDA so long that anytime I see a reference to something like a GPU kernel I just assume that it has CUDA like properties such as the ability for a GPU kernel to launch other GPU kernel functions. I don’t recall noticing that Metal kernels supported this… if they don’t then it means that the API is still CPU bound, just more parallel than before, that would be an improvement but not the exciting leap to pure GPU computing I would hope for. I’ll have to look more closely for that feature to see if it’s really missing…

The post Apple’s Metal API, first impressions appeared first on The Saint.