2016-10-31


Why is it so difficult to get sunsets to appear ‘right’? Read on for the answer.

Following the last two articles on rules of vision, it seemed very appropriate to finish the mini-series with this little reminder from 18 months ago…

Many photographs do not work. Subsequently, we find out they do not work because there is a difference between what you saw and what your audience sees in the image. Sometimes this comes down to lack of skill in translating an idea, but often it’s more subtle than that: the camera doesn’t see what we see, and we need to be both highly aware of that and how to compensate for it. Yesterday’s photoessay is a good example: it’s no big deal to make a monochrome image, but our eyes only perceive a lack of color under very exceptional circumstances. Yet it’s these differences that make some images stand out, and others not really ‘work’.

There are a few important properties of the eye/brain combination to note: firstly, we have an incredibly wide dynamic range, but it isn’t linear. Highlights seldom clip to white or oversaturate, though blacks go black fairly quickly. Also, our vision switches to luminance-sensitive rods at lower brightness levels, meaning that the darker it gets, the more desaturated and flat things appear. A camera mostly maintains linear tonal response across the entire tonal range, and thus the final output will look different to both what we see and our idea of how a scene of that given brightness should look.

This is a structural difference: a sensor’s photosites are identical and equally sensitive across the entire field of view. They are color filtered (assuming a Bayer sensor) and ‘actual’ color reinterpolated later in software, but this interpolation is again spatially uniform. On the other hand, our eyes are not spatially linear at all. There are two types of cells; one sensitive to color but only with adequate amounts of light; one sensitive to luminance but less so to color and able to work across a much wider brightness range. Density of luminance-sensitive photoreceptor cells – 100-120 rods million in all – falls off gradually towards the edges, ending about 80 degrees or so off axis. There is a central ‘hot spot’ known as the macula, covering 3-4 degrees or thereabouts where there are few luminance-sensitive rods but a very high density of color-sensitive cones – about 6-7 million of them. These are responsible for both color and detail vision. This too falls off gradually further out to about 20 degrees off-axis or so.

This is the cause of the second difference: visual acuity of our eyes varies across the visual field; the corners are not as sharp or well defined as the centre, but there are no real corners to begin with. We can perceive motion in these areas, but not much detail or color. This structure is the reason we are aware of an entire scene at a macro level, but then ‘focus in’ on a detail even whilst simultaneously retaining some awareness of the wider context. Finally, the image processing portion complicates things further: our brains correct for all sorts of optical deficiencies (such as distortion, chromatic aberration, noise, skewed horizons or minor keystoning) from both experience and automation. Our eyes automatically scan a scene and use a latent image to perceive color and detail across a wider field of view than should be possible with a static eyeball and the cells in the macula alone. A photographic image obviously cannot do this: firstly, it has distinct ‘hard’ boundaries which make you aware of the edges of the image and your eyes scan an image as they would the real scene, so you still have to maintain acuity across the entire portion of the image which you wish your viewer to scan.


Dark areas must be less saturated to appear ‘natural’

The real differences in translation amount to what is probably best described as ‘strange tones’ and ‘subjects that looked a lot stronger in reality’. Tones look strange, because as mentioned before, a camera responds linearly to both luminance and color, but our eyes do not. Achieving this linearity has been the goal of engineers since the the advent of the device. However, now that the technical capabilities of sensors have come close to matching the eye in both resolution and absolute dynamic range – more thought needs to be given to the output presentation.

Anybody who has a color-sensitive eye and subsequently attempted to use a spectrometer to profile a monitor, camera or printer will know that the results aren’t quite what you expect. Even on a wide-gamut monitor, you may still land up with the images not quite appearing like reality; this is simply because a spectrometer functions like a camera – but we interpret the output with our eyes. The calibration profile put out by the spectrometer (in the form of brightness map instructions so that each RGB pixel matches an absolute color on output) is tuned for absolute output, not perceptual output. So you may still be able to achieve 100% saturation at 10% luminance, even if in reality our eyes cannot perceive this. This may at first seem odd: if we can’t perceive it, how would we know the calibration looks incorrect? Remember too that the luminance of the output device isn’t linear, either – introducing another complication into the equation. In other words, blacks are not truly black (even though they are input as RGB 0,0,0). This makes more of a difference to output than you might think, and it’s also the reason why after trying a large variety of calibration tools – I still find the best results to be achieved by eye because this is the ultimate viewing/input device anyway.

You’ll notice I haven’t said anything about dynamic range: this is a very difficult question to answer in absolute terms, but I have a feeling that the eyes are very highlight-biased. By that, I meant that we see shadows blocking up to black fairly quickly; it’s normal to see black. But it isn’t at all normal to see something fully over saturated to white with no spatial detail whatsoever – it has to be a painfully bright object for this to happen. Of course, the response is again nonlinear: we have less ability to differentiate between different luminance levels the brighter those luminance levels become. This is not the same as any sensor: most of the time, it’s a linear input-output relationship. The closest camera I’ve seen to matching this nonlinearity is the D810 or Hasselblad CMOS in combination with ACR 8x – shadow recoverability is traded for a very long highlight shoulder. This is not so good for working in low light because of the noise penalty, but it renders normal to bright scenes better than anything else I’ve seen.


Smooth, natural highlight rolloff is not so much about not clipping as clipping gradually.

The answer to the disconnect between perceived subject prominence and photographic prominence is related to the ability of our eyes to ‘lock in’ on one specific angle of view, letting the periphery fade out thanks to those less-sensitive and less-dense rods: the rest simply seems less prominent because we’re not taking in as much information in those areas. For reasons explained earlier, a camera doesn’t do this. Furthermore, when we view an image, we scan the entire image (and thus take in plenty of extraneous distractions) rather than instantly focus in on one specific area.

Executionally, this means we have to pay extra attention to ensuring that the non-subject areas are distinctly non-prominent and don’t attract more attention (by breaking pattern) than the subject areas. Perhaps this explains why bokeh is so popular: it approximates the way our eyes work by smoothing out the areas that are not meant to be subject. It isn’t a perfect replication by any means, because of two more structural differences: firstly, we don’t have the ability to produce shallow depth of field with our eyes – or even really control it; the irises are limited to about f4 equivalent and we have no conscious control over how wide they open (in that respect, they’re more for light control than resolution or depth of field). Secondly, there are two of them: stereoscopic and binocular vision with two eyes means that the visual field is both rectangular in shape, and we are able to ascertain relative distances by interpreting the difference between images from the left and right eyes.

In reality, our eyes are somewhat like a video camera in program or shutter priority mode with auto-ISO: we maintain smooth motion and increase the impression of detail by continuous scanning; to do that, the exposure time must remain relatively constant. The amount of light collected is regulated automatically by the iris – removing depth of field control – and further compensated for by retina and brain to manage situations where the iris cannot open any larger to admit more light, or close down any more to restrict it. The iris’ action is involuntary, and the only slight control we have over it is to squint, which sometimes helps us to resolve distant or bright objects by both controlling the amount of light and stopping down.

For something to stand out, it has to really stand out in a highly exaggerated way. But even with faithful tones and luminance, this image still appears unnatural because of the perspective compression and elimination of greater context – our eyes simply don’t work this way. Does it make it any less interesting? Not necessarily.

And here we’re back to the difference in perceived and captured subject prominence again: the eyes are getting a bit of a bonus thanks to an extra dimension. We need to imply spatial relationships in a two dimensional image with depth of field cues and shadows; if a shadow falls on something as opposed to behind it, then you know the something must be behind the object casting the shadow. We can of course use these properties to create visual non-sequiteurs: removal of depth of field cues or creation of false ones through careful lighting placement and perspective allows a photograph to represent reality in a way that is not immediately perceivable to the naked eye. These image are interesting precisely because they are clearly reality, but in the same way, not fully agreeing with our personal visual experiences of it.

Here is where we need to learn to see like a camera. The easiest way is to compose the image as close to the final presentation medium as possible; I think it’s why we see people able to compose just fine with iPads and iPhones but struggling with optical finders. The large LCDs are simply much closer to how the image will be eventually viewed, and of course also preview focus and exposure – neither of which are accurately represented even with the best SLR finders. The advantage of optical finders of course remains immediacy and the ability to see every single nuance in the scene, both in tonality and detail; it requires some imagination and experience to translate that into a finished image.

Ironically, to come to a finished output image – let’s say a print – that represents the scene we’d see with our eyes, we have to do a lot of nonintuitive things. We are perceiving the output through the same ‘lens’ as we would perceive the actual scene – so in effect, we need to compensate for the limitations of both capture and output mediums to restore transparency. It’s not as easy as it looks – remember, thanks to pattern recognition, we already have an expectation from experience of how reality ‘should’ appear. The more familiar the scene, the harder it becomes to reproduce it in a transparent, ‘ordinary’ way simply because we have more personal experience with such scenes. We return to the sunset question posed at the very start of the article: every day, we add to our ‘experience database’ of what a sunset can and should look like. We perceive it as a dynamic thing that changes with time and physical position of observation. Colors are relative, especially if the main light source is heavily biased warm. Yet our image is static, there are extremes of dynamic range (especially with a moon in play), and we have no color reference point if nothing is actually or perceptually white. See the challenge? There is of course no right or wrong between camera and eye – we can use the limitations and properties of either to translate an idea into an image in an unexpected way, and create something memorable as a result. But we can’t do that without understanding both the technical differences and their impact of perception. MT

We go into far more detail on good compositional practice and the underlying psychology behind it all in The Fundamentals and the Making Outstanding Images workshop video series – available here from the teaching store.

__________________

My ultimate photographers’ daybag in collaboration wth Frankie Falcon is available here until the end of October, in a strictly limited production run.

__________________

Turn the mood and style up to 11: the Hanoi Cinematic Masterclass in association with Zeiss (21-26 July and 28 July-2 August inclusive) is now open for registration – click here to book and for more information.

__________________

Visit the Teaching Store to up your photographic game – including workshop and Photoshop Workflow videos and the customized Email School of Photography; or go mobile with the Photography Compendium for iPad. You can also get your gear from B&H and Amazon. Prices are the same as normal, however a small portion of your purchase value is referred back to me. Thanks!

Don’t forget to like us on Facebook and join the reader Flickr group!

Images and content copyright Ming Thein | mingthein.com 2012 onwards. All rights reserved

Filed under: Articles, On Photography

Show more