Skip to Content

Scientific Associate of

icsu

ICO Awards

Affiliated Commission of

ICO Newsletter

July 2015 Number 104

photo


 Summary


A stroll through 3D imaging and measurement


 
 Fig. 1: Triangular mesh of 3D data, with colour texture. 

Light, whether during the International Year of Light or at any other time, is the principal intermediary that links the world we live in with the model of that world we carry in our brains. How that model is produced – how our brains convert the effectively flat two-dimensional images produced by the optics of our eyes into a sophisticated model of the world around us – is one of the wonders of nature and of the evolutionary process. Obviously, stereo vision was sufficient for survival of mankind. But we may ask, is stereo vision already all that we can get? Is there something like a 3D camera? Where are the limits of optical acquisition of 3D data? In this article we attempt briefly to convey an idea of how our eye–brain system works and then consider how specialized 3D optical measuring systems can do a much more thorough and precise job.

 
 Fig. 2: Perspective (a) and shading (b, c) give pseudo 3D information [1] [2]

We begin by stating a sometimes overlooked fact: the three-dimensional world around us is far from fully accessible with our eyes. Indeed, although the people, cars, ocean shores that we see and photograph are embedded in a 3D space, we see only projections of their textured surfaces onto our retinas. What is actually incident on the retina is a 2D intensity image I(x,y) that is the product of the illumination and the local surface reflectivity. I(x,y) is a 2D data manifold that fails utterly to include any direct information about the local distance z(x,y) for each point (x,y). We see a projection of the world, a projection of threespace onto two-space. Somewhat surprising is the fact that we so readily accept the flat photos, the flat paintings, the flat TV screens as representing “the world” (though as parents we cannot but sometimes help wondering whether, for our children, the flat screens of their smartphones and tablets constitute their actual world). One gets the sense that survival is possible with just 2D vision and 2D displays. Why should this be so? 

It is of course not quite true that pure intensity data I(x,y) include no 3D information: we estimate the shape and distance of objects from perspective and from shaded texture, as illustrated in figure 2. The texture can be created or enhanced by proper illumination. Scanning electron microscope (SEM) images are intriguing to us because the surface slope is encoded by the secondary electron emission in a way that our brain interprets it as “three dimensional”

Shading by oblique illumination encodes the surface slope extremely efficiently, the tiniest deformations being detectable without the aid of a true 3D sensor. The basic idea is implemented by so-called shape-from-shading or photometric stereo sensor systems for low-accuracy quantitative shape measurement [3].

If 2D images include so much valuable indirect distance information, what, in addition, should a true 3D camera provide and how could we use it? A 3D camera should acquire data about not only the shape of the object, but also its location in space. The shape is commonly encoded as the local distance z(x,y) from the camera reference system to each object position (x,y). Shape data are important in large part because of the following fundamental fact: the shape of an object is invariant against orientation, illumination and texture. The appearance of the shape in a 2D image, on the other hand, depends strongly on these parameters. Immediate applications of 3D data are thus to be found in automated inspection, documentation, metrology, and 3D printing; in industry, medicine, cultural heritage; and in other such areas. Instead of the physical generation of objects by 3D printing, 3D virtual reality is useful in entertainment and industrial design. During the last 30 years, optical 3D sensors with high precision have been developed. Whereas common 2D cameras record a dynamic intensity range of about 100:1, the dynamic distance range of 3D sensors may exceed 1000:1 or even 10,000:1. Without a doubt, optical 3D sensors are useful, and indeed there is a billion dollar market for such items

 
 Fig. 3 Stereoscopic vision.

Consider now stereoscopic vision, which guides us to a major principle of 3D metrology: triangulation. The underlying notion is well known: our eyes see an object from different perspectives. As illustrated in figure 3, the two eyes and a given point on the object span a triangle with triangulation angle θ. On the basis of the positions of the images on the two retinas, the brain calculates the local distance of each point and, eventually, the shape of the surface in space. Triangulation has been used in the measurement of distances for thousands of years; the majority of contemporary optical 3D sensors exploit triangulation

 
 Fig. 4: Stereoscopic entertainment from the mid-19th century to 2015 [4]-[5][6]

As figure 4 shows, 3D imaging has a close relationship with 3D display. As early as 1922, 3D motion pictures were produced for showing in movie theaters. Each eye of the viewer sees its own image from a slightly different perspective, separated by colour or polarization encoding. There was no great demand for stereo movies until they were recently reintroduced – with considerably greater quality – by Hollywood and by the computer-gaming industry. Today, a variety of consumer cameras can capture stereo image pairs.

 
 Fig. 8: Copying the “trombone-angel” at the Dome of Bamberg: a) 800-year-old original; b) virtual 3D model stitched from 200 3D measurements; c) casting mould manufactured by 3D printing; d) cast copy, now at the portal of the dome.

Most of us know that the observation of a stereo image pair allows the viewing of a scene from only a single viewpoint: movement of the head provides no access to different perspectives (although the image pair inherently includes the information to reconstruct different perspectives). Ideally, a 3D display should behave like a “virtual window” such as is provided by a Fresnel hologram: if you move your head from left to right, you should see first the right ear of a person looking at you and then the left ear. The free choice of viewing perspective is clearly a feature that 3D camera data should provide. In the best case, the data should allow access to all possible perspectives. Indeed, sensors and algorithms may provide such data by acquiring many views from different directions and stitching (“registering”) the views together. An example is provided in figure 8.

A ray optical approach for the acquisition of 3D features is implemented by plenoptic cameras. Plenoptic (or light-field) cameras store the intensity of rays along with their direction. This additional information increases the data-storage requirement enormously, but it enables intriguing a posteriori manipulation of the displayed images, such as choosing different planes of sharp focus or different perspectives. Currently, the pixel count for plenoptic cameras is much smaller than that achieved by conventional cameras.

 
 Fig. 5: The simplest 3D sensor: laser triangulation.

Triangulation requires the identification of corresponding points as viewed from a distance. An important variant is active triangulation, which allows one to measure the surface of a diffuse but otherwise untextured object (think white walls). In active triangulation, a special pattern is projected onto the object, eliminating the need of an intrinsic surface texture and extensive computation to find corresponding points. The simplest example is laser point triangulation, where the projected pattern is a single laser spot, as shown in figure 5. With suitable calibration, the distance to the point is easily calculated from the geometry. Laser point triangulation sensors are available for a few dollars. The basic principle can be upgraded by the projection of a line, instead of a point, allowing the measurement of the distance to 1000 or so points with a 1000 × 1000 pixel camera. Projection of many lines allows the acquisition of many thousands of points. This fact leads to the question, where is the limit?

Questions of this kind are of critical importance to 3D measurement, as are questions of accuracy, speed, ease of system setup, and so-forth. It is difficult to acquire “dense” 3D data on the surface of an object, where “dense” means that each camera pixel should provide data about the distance to the corresponding object point. Spatial encoding of corresponding points by pattern projection wastes space-bandwidth product: only sparse 3D data can be acquired by pure spatial encoding. Each camera pixel will store information about distance z(x,y), illumination E(x,y), and local reflectivity R(x,y). These three unknowns cannot be decoded from one single exposure. So we need some additional modality. Commonly, it is time.

 
 Fig. 6: Phase measuring triangulation: principle and 3D data.

The prototypical triangulation sensor, exploiting fringe projection triangulation, uses temporal encoding of each pixel through the projection of a temporal sequence of varying patterns, typically sinusoidal fringes with at least three varying phases (three equations for three unknowns!). The temporal nature of the data-acquisition process implies that neither the object shape nor object position can change during the imaging operation. We cannot acquire suitable 3D data from walking or talking people unless the projector and camera are extremely fast. Nevertheless, fringe projection is the paradigm principle for many macroscopic applications. In figure 6 we see a face with projected fringes in the centre and two profile image provided by mirrors. The sensor can take data from three directions – from ear to ear – at the same time, with photorealistic quality and dense data. This type of sensor is used, for example, for the planning of cranio-facial surgery.

 
 Fig. 7: Tooth with inlay. The inlay was milled according to the measured 3D data of the tooth. High precision is required, since the gap between tooth and inlay has to be smaller than 30 μm to avoid penetration of bacteria.

High-quality data are required as well for the documentation of artwork or to make a 3D copy with a 3D printer, i.e. “3D cloning”. Figure 7 shows a dental inlay that was “printed” (in this case by diamond milling) using data from the measured prepared tooth. In another example of 3D cloning, shown in figure 8, a copy was made of the famous “trombone-angel ŒŽ” at the Bamberg Cathedral in Germany. The 3D numerical model of the 800-year-old statue was stitched together from some 200 3D measurements. The copy now sits at the entrance to the cathedral, allowing the original to be moved inside and protected from the elements.

The 200 measurements had to be made in a stop-and-go process, with subsequent re-orientation of the complicated object. This exhausting procedure can be avoided by “Flying Triangulation ŒŽ‘ ”, in which case the sensor can be freely guided around the object. While the subsequent (sparse) 3D data are registered in video real time, dense data are accumulated within a few seconds. The measurement result is immediately displayed, for an interactive guiding of the sensor.

“Flying Triangulation”, although single-shot, requires rigid objects for its operation. To acquire the shape of talking or walking persons, we need something like a 3D movie camera, which encodes 3D information temporally resolved in each single frame. Such cameras are available , but as they exploit spatial encoding, there is unavoidable spatial sparseness. The sparseness is commonly concealed to the naïve user by interpolation and display of “smoothed” surfaces with reduced lateral resolution.

 
 Fig. 9: Madame Leota, as an example of pseudo 3D-impression [7]

Many 3D sensors provide a “perfect” 3D illusion by simply projecting high-resolution 2D texture onto low-quality 3D data. This effect is beautifully exploited in the Haunted Mansion at Disneyland, where a movie of a talking person is projected onto a white painted sphere inside a spherical glass container. A person viewing the projection has the impression of a mysterious woman’s head talking within a crystal globe (figure 9).

Let us come back to high-quality 3D movies. There is a new approach to the single-shot 3D movie camera Œ that provides improved lateral resolution. Recently, it was shown that with a 1 MP camera, 160,000 true 3D pixels can be acquired. The basic idea is to project 160 narrow lines onto the object, and to view the illuminated object with not one, but two, cameras. The correspondence problem is solved by one of the cameras, which operates with a very small triangulation angle, while the desired precision is achieved by a second camera that operates with a large triangulation angle. Each measured point conveys true, unsmoothed 3D data. Due to its single-shot ability, a sequence of (non-static) 3D models can be acquired. This ability of the camera enables the recording of a “real” 3D movie ŒŽ, since the viewpoint can be freely chosen at any time (unlike as in stereoscopy); see figure 10.

 
 Fig. 10: Three frames of a 3D movie. In each frame, the viewpoint can be freely chosen.

Triangulation serves as the basis for 3D measurements on a macroscopic scale, e.g., for people, car bodies, artwork. But optical sensors can just as well measure the distance to nearby stars or the shape of mirrors with sub-nanometre precision. How do they do this?

Figure 11 displays a canonical sensor model, with all options for illumination, interaction of light with the surface, and the different modalities of information that is conveyed by the object. The combination of options for illumination, interaction, and exploited information enables many different sensors with different limits of physically achievable precision [8], depending on the dominating source of noise. It turns out that there are only a few different sensor principles if we categorize sensors according to the precision limiting source of noise. For example, all triangulation methods are seriously limited by coherent noise (even if they do not use a laser). So we look for other principles.

A straight-forward solution is time-of-flight (TOF) measurement. A laser pulse or temporally modulated laser beam is directed at the object and the time-of-flight of the backscattered signal is measured via fast photodetectors (often by phase correlation). This method was used in the lunar laser ranging project to measure, to within a few centimetres, the distance from the Earth to a reflector array carried to the Moon by the Apollo 11 mission. Today, the shape of large buildings and even of mountains can be measured by terrestrial laser TOF scanning with a precision of a few millimetres. The latest development is the so-called TOF camera with on-chip integration of extremely fast illumination and pixel by pixel detection. Such sensors are already implemented for car-driver assistance.

 
 Fig. 11: The canonical optical 3D sensor.

As governed by the speed of light, 1 ns temporal resolution is required for 150 mm depth resolution. Such temporal resolution would seem difficult to achieve, but surprisingly, with a related technique, it is possible to achieve a resolution in time of even attoseconds, yielding a corresponding depth resolution of nanometres. Interferometric methods can be used to measure the difference of the time-of-flight to a certain object point against a reference object. The paradigm example is the use of the Michelson interferometer, ubiquitous in optical workshops, to measure the shape of polished surfaces. In fact, the timeof- flight difference is not measured by a clock, but via the phase difference between the object wave and the reference wave. Because phase differences can be measured with extreme precision, limited only by photon noise, distance variations of atomic dimensions can be detected. Gravitational wave interferometry aims for the detection of distance variations smaller than 10–20 m. For very long-distance measurement, it is again interferometry that provides the key. With a stellar interferometer, Michelson and Pease measured the diameter of nearby stars in 1920.

 
 Fig. 12: Rough and smooth objects, measured by scanning white light interferometry.

Classical interferometry displays a wide range of applications. However, it is restricted to specularly reflecting objects of regular shape, such as flats or spheres. Scattering (“optically rough”) objects display speckles, each speckle characterized by a random phase. Since interferometry operates on the basis of phase measurements, it fails to measure rough objects. Some 30 years ago it was discovered that instead of measuring the phase, one can measure the local position of the temporal coherence function (“correlogram”), even in speckles. The method is called white light scanning interferometry or “coherence radar”. It displays surprising features. For example, the distance precision is determined only by the surface roughness, neither the stand-off distance nor the aperture limiting the precision. Measurements from great distance and within deep boreholes are possible, as suggested by figure 12. Since the phase is of no importance, coherence radar is comparatively insensitive to vibrations, so we can measure human skin in vivo. Rough objects can be measured even against a rough reference, enabling the measurement of large objects with large height variation.

 
 Fig. 13: Deflectometry – the lens under test reflects the fringe pattern of the background screen. From the deformation of the reflected pattern, the slope of the

We consider now a completely different principle that closes a gap – deflectometry. Specular surfaces are difficult to measure if they are not flat or spherical. If they are clean, we cannot see mirror surfaces, we can only see the mirror image of the illuminating light source. And we see this mirror image only if rays from the light source are reflected into the pupil of the observation system. A further problem is that optical surfaces often require very high precision, for example λ/100 ~ 5 nm. The required depth dynamic range can be up to 1,000,000:1. For simple surfaces, the problem can be fixed by proper reference surfaces. In the absence of a reference surface, a solution can be found that exploits what information theorists call “source encoding” or “redundancy reduction”. Deflectometry measures the local slope of the surface, removing from the equation the stand-off distance. The basic idea is sketched in figure 13. A large screen containing a sinusoidal intensity pattern illuminates the object under test. A camera sees the object and, simultaneously, an unsharp image of the reflected screen. Note that the fringe pattern is not projected onto the object, but is just seen as the mirror image of the remote screen. The local slope of the object causes a local fringe deformation, as shown in the figure

 
 Fig. 14: Deflectometry measurement – curvature map of an progressive eye glass (a) and a car windscreen (b).

With proper calibration of the system – a quite sophisticated procedure – the local gradient of the surface can be inferred from the fringe images. The method is used already for the in-line measurement of progressive eye glasses, of car windscreens, as demonstrated in figure 14, and for astronomical mirrors.

Compared to its competitor interferometry, deflectometry displays quite remarkable features. There are no optical elements involved except the sample under test and the camera lens, and the measurement does not depend on any reference or special illumination optics. The object need not be precisely positioned with respect to the sensor. A sensitivity for local depth variations of 1 nm can easily be achieved and even improved by simple means. Deflectometry is scalable for microscopic objects up to large telescope mirrors.

We cannot conclude without mentioning sensors that really acquire 3D data from the bulk of volume-scattering objects, such as skin, the retina, and ceramics. A well-established application is optical coherence tomography, or OCT, of the retina: see figure 15. OCT is based on time-of-flight. Signals coming from different depth positions travel different path lengths. These path lengths can be deciphered by white light (coherence scanning) interferometry, as explained above. More efficient is the so-called Fourier Domain OCT , but the basic idea is the same.

 
 Fig. 15: Optical coherence tomography (OCT) imaging of the retina [9].

We have completed our walk along the 3D path, often missing many details and having just glimpsed over the fence, but we have learned that there is a wide spectrum of sensors and intriguing applications. Research in this field continues to progress and continues to be fascinating, not least because of new and challenging applications that demand attention.

To bring some order to the reader, we conclude with a guide to the selection of the proper sensor principles for different applications; see figure 16.

 
 Fig. 16: Menu of sensor principles and applications.

We gratefully appreciate the valuable and patient help of Prof. W T Rhodes, who not only polished our poor English writing, but contributed with new ideas.

We invite readers to investigate the many internet links (in blue) that are provided in the article and our homepage www.optik.uni-erlangen.de/osmin

Gerd Häusler is a professor at the Institute of Optics, Information and Photonics, at the University of Erlangen- Nuremberg, Germany. His group’s research centers on the physical and information theoretical limits of optical 3D sensors and on practical sensors that work at these limits. Together with former students, he founded 3D-Shape GmbH.

Florian Willomitzer is a PhD student with Häusler’s group and working on the real-time 3D movie camera.

References

[1]https://upload.wikimedia.org/ wikipedia/commons/c/c7/ Canaletto_Grand_Canal_from_ Palazzo_Flangini_-_JPGM.jpg

[2]https://upload.wikimedia.org/ wikipedia/commons/a/a4/Misc_ pollen.jpg

[3]R Woodham, “Photometric method for determining surface orientation from multiple images” in Shape from Shading, B Horn and M Brooks, eds. (MIT, Cambridge, Mass., 1989), pp. 513–532.

[4]khnemo.wordpress.com/ antique-3-d-cheesecakes-fromswell3d- antique-stereoscopesfrom- ortskundeprufung

[5]https://upload.wikimedia.org/ wikipedia/commons/b/b0/The_ National_Archives_UK_-_ WORK_25-208.jpg

[6]blogs-images.forbes.com/ insertcoin/files/2014/03/oculus2. jpg

[7]https://s-media-cache-ak0.pinimg. com/736x/18/9a/f2/189af2eee4 6fd59969f3e60b29d9cf9f.jpg

[8]G Häusler and S Ettl, “Limitations of optical 3D sensors,” in Optical Measurement of Surface Topography, R Leach, ed. (Springer, 2011), pp. 23–48.

[9]https://upload.wikimedia.org/ wikipedia/commons/2/2d/Retina- OCT800.png


 Next articles


International Commission for Optics

Bureau members (2014-2017):

President: Y. Arakawa;

Past-President: D. T. MooreTreasurer: J A Harrington;

Secretary: A M Guzmán, CREOL, The College of Optics and Photonics, University of Central Florida, PO Box 162700, 4000 Central Florida Blvd,Orlando, FL 32816-2700, USA; e-mail angela.guzman@creol.ucf.edu

Associate Secretary: G von Bally

Vice-Presidents, elected: J. Harvey, F. Höller, H. Michinel, J. Niemela, R. Ramponi, S-H Park, J. Zakrzewski, M. Zghal

Vice-Presidents, appointed: Y. J. Ding, J. C. Howell, S. Morgan, E. Rosas, P. Urbach, A Wagué, M. J. Yzuel

IUPAP Council Representative: C Cisneros

Editor in chief: A M Guzmán

Editorial committee:
K Baldwin, Australian National University, Australia;
J Dudley, Université de Franche-Comté, France;
William T Rhodes, Florida Atlantic University, USA.