For years, suppliers and users of motion-capture hardware and software have been striving to make mocap a mainstream production tool. The technology certainly has reached broad market acceptance in the 3D entertainment world, simply because of its ability to provide compelling signature motion at a lower cost than traditional keyframe methods—but it was a long road. The early adopters in the mid-1990s accepted significant shortcomings in the core technology, but were still able to use it to their advantage to help tell stories.
Revolutionary production feats in motion capture for major films was taking place as far back as 1997, with Titanic. Today, directors Robert Zemeckis and James Cameron are shifting paradigms in modern filmmaking with their groundbreaking 3D productions, which take full advantage of motion capture. Facilities such as Sony Pictures Imageworks and Industrial Light & Magic are also perpetuating leaps in mocap development with work on films such as Beowulf, Monster House, and The Polar Express from the former, and Van Helsing, The Hulk, and the Pirates of the Caribbean series from the latter. Features such as Happy Feet, The Lord of the Rings trilogy, and King Kong have also utilized motion capture to help tell their story. In all these recent cases, motion capture offers a clear method of producing economical and compelling character animation for visual effects and CG filmmaking.
What does the future hold for filmmakers who embrace motion capture? The answer is as easy as looking at the history of the technology—bigger, better, faster, and cheaper. More specifically, we’ll continue to see much more extreme cases of existing production achievements—larger capture areas, higher precision, simultaneous face and full-body capture, first-unit-friendly equipment, and, as always, more performers, more cameras, more markers, and more automation.
Advanced Motion Capture Today
There are several motion-capture technologies available, but by an order of magnitude or better, passive optical systems are the mainstay of entertainment production teams. This is true for all sectors of visual entertainment, including video games, film, commercials, music videos, and broadcast. Why is this so? Primarily because the performance that passive optical systems offer, with respect to quality, speed, and price/performance, overshadow the alternatives.
A thresholded image using a 4MP (megapixel) camera. The white “video” lines detect the presence of a marker. However, the centroid calculation requires a “best fit circle” approach, which is prone to significant error.
So what makes passive optical systems the broadly accepted standard? There are several reasons:
- Precision. Artifact-free motion data reflects the nuance and signature motion required to suspend disbelief and entertain audiences.
- No technology on the performer. Nothing to break, recharge, synchronize, or otherwise worry about on set—just reflective markers.
- Ability to capture in large areas.
- Ability to capture hands, face, and full-body performances simultaneously.
- Ability to capture the camera track, in addition to the character motion. This provides the director with a real-time preview of the scene.
- Integration with other production-standard equipment. This includes electronic slating, genlock, and time-code support, reference video, and more.
- Production-proven pipelines. These range from marker capture to skeletal solve or to retargeted skeleton, or to final output to any animation package.
The bottom line is that optical systems offer a workflow that most closely mimics live-action shooting methods, thereby enabling film directors to transition into 3D production without having to learn a whole new process.
3D Precision: Camera Resolution and Grayscale
One of the most significant differentiators in the overall performance of premium systems is 3D precision. The highest precision systems reduce or eliminate artifacts associated with the capture technology, thus reducing the amount of cleanup, filtering, or other issues that can dramatically affect final character animation. These premium systems also are capable of viewing markers that are much farther away from the motion-capture cameras, thereby enabling artifact-free animation, even in very large performance areas.
3D precision can be affected by a number of things, but by far two of the most important contributors are camera resolution and the use of grayscale image processing. Quite simply, for a given marker size and field of view for the motion-capture camera—and at a given distance from the marker to the camera—the more pixels there are to describe the marker in the camera image. This also means we’ll be better able to calculate the centroid, or center point, of the marker. In summary, the better the centroid calculation, the better the 3D position calculation, and the more accurate the capture data.
(From left to right) A 14mm, 9mm, 6.5mm, and 3mm marker seen from a 0.3MP camera (MX3+ at the top), a 2.0MP camera (F20 in the middle), and a 4.0MP camera (F40 at the bottom). The higher-resolution cameras also can see smaller markers farther away.
That sounds easy—why not just use bigger markers to cheat your way around resolution? There are several reasons why that is not an ideal solution. Larger markers are far more distracting to performers, they tend to move more than smaller markers, and they cause more discomfort for performers who are doing stunt work. As an example, it is very common for a director to capture the facial and full-body performance of an actor, and facial markers are typically between 2mm and 4mm in diameter. Only the highest-resolution cameras are capable of accomplishing this in a useable performance area.
OK, what about image processing and its effect on 3D precision? Older systems from Vicon and existing systems from other manufacturers often use a “thresholded” image, rather than the “grayscale” image, to calculate the centroid. The difference in 3D precision offered by the two methods is quite different. Our tests for cameras in which all other things were equal (camera resolution, marker size, camera-to-marker range) showed an improvement in 3D precision of 8x when moving from a thresholded method to grayscale image processing.
Where is the magic here? As is the case with camera resolution, the goal is to get the most accurate centroid calculations for better-quality mocap data. A thresholded image detects only the brightest images in the scene (as it should be), but each pixel’s value can only be “yes” or “no” when answering the question as to whether a marker is present or not.
Grayscale processing is very unique and enables Vicon cameras to achieve extremely accurate centroid calculations and resulting 3D data. With grayscale processing, each pixel describing a marker can have one of 256 shades of gray (or values) to describe its intensity. The centroid calculation uses the “weight,” or value, of each pixel, defined by its shade of gray, to influence the centroid calculation of the marker. This offers a dramatic improvement in 3D precision; in addition, it removes many artifacts that are present when markers overlap in a camera view or touch each other during capture.
Looking Ahead
Years ago, we saw that third-party camera manufacturers were clearly unable to produce the resolution needed to satisfy the growing markets, so Vicon started designing and manufacturing its own cameras, starting with the MCam in 1998. The MCam, along with several subsequent cameras, have led the industry in speed, resolution, and, ultimately, 3D precision.
Between then and now Vicon has produced a dozen different camera types to help its clients tackle their most complex production challenges. In early April, the company hit another significant milestone. Sensor manufacturers were falling well behind on the company’s future requirements, so we began producing our own sensors (coined the Vegas sensor) and have launched these with the latest Vicon technology—the F40 and F20 camera types (see the Products section, May 2007, pg. 4).
This new sensor combines high-resolution, very high speed, and true freeze-frame shuttering in a single package—and again, it’s the highest specification camera of its kind, inside or outside the world of motion capture. It runs 4MP and has the ability to operate at full resolution up to 370 fps, while its shuttering makes it far more resistant to high-intensity studio lighting—an important feature for combined CG/live-action work.
The sensor also meets the specific requirements of moviemakers: The motion-capture system is friendlier to use on set, and it even enables the shooting of live action and motion capture simultaneously.
It used to be that movie crews had to be instructed on how to operate with motion-capture equipment present, which is quite the opposite hierarchy demanded by crews. “Don’t bump into a camera,” “Wait, we need to recalibrate,” and “That studio lighting is too bright for some of the mocap cameras” were common phrases but unacceptable for efficient productions.
Since then, processors onboard the camera, intelligent thresholded masking, camera calibration from subject data rather than the wand wave, and more have made both motion-capture technicians and movie crews rest easier.
What else is being done with mocap and film today? The sheer scale and ingenuity of the work performed for three consecutive movies at Sony Pictures Imageworks deserves mention. With several thousand face and full-body markers in a scene, hundreds of motion-capture cameras being operated from a single system, as well as A-list talent and props, the logistics of both the motion-capture shoot and the processing of the data required a great deal of innovation.
ILM took mocap outdoors and into the jungle with a groundbreaking technology it developed in-house. The results helped Pirates of the Caribbean: Dead Man’s Chest win this year’s Visual Effects Oscar, and helped break several box-office records, too.
Several other technologies rising out of the minds and hearts of clever people all over the world will continue to help filmmakers tantalize audiences. Our R&D lab has literally dozens of relevant new technologies, ones that are often customized for individual directors’ and technical teams’ needs, and many that will become components of our commercial solutions.
By the very nature of the design goal, motion capture will become more transparent on set, easier to use on the back end, and will continue to produce compelling character animation for the final moving image.
Brian Nilles is the CEO of Vicon Motion Systems, Inc. and a Member of the Board of OMG, plc. OMG is Vicon’s parent company, whose group of technology entities produce image-solving solutions for the entertainment, defense, life-science, and engineering markets. Nilles has been CEO of Vicon for eight years. Prior to joining Vicon in 1997, he held sales and engineering management positions for Smiths Industries.