With a new Star Trek film due out this Summer, get ready for the regular slew of tenuous associations, philosophical and technological. Given that, let’s say upfront that the likely arrival of this concept, as just floated by chipmaker AMD, is years away.
This story is about how the building blocks already at our disposal could ultimately deliver extraordinarily immersive augmented reality environments. They will need to undergo a huge amount of refinement and expansion before we see the mooted innovation, if at all.
However, the idea is a pathfinder. As communicators gave us flip-phones and medical tricorders are reflected by wireless monitoring for e-health, this speculative corporate science project aims to tell us in a more immediate sense about where computing is going in the short term and what we should aim for in terms of both software and hardware architectures.
Enough caveats for yer? Great. Let’s reveal the target: it is the holodeck from Star Trek: The Next Generation.
For those of you unfamiliar with the show, the holodeck is a space on the USS Enterprise capable of faithfully reproducing just about any environment, and allowing any crew member to enter it, walk around it, and interact with it as though it were ‘real’. Think of it as cosplay to the nth degree. Want to relive a classic 40s detective novel? Done. Or captain an 18th Century ship-of-the-line. Made so.
The technological driver is the shift from homogeneous to heterogeneous processing.
Many different processors
So what is this shift. In essence it’s the next generation of multicore computing. We’re going from multiple identical cores performing the heavy loading (homogeneous) to multiple different cores that the system activates as appropriate to optimize performance for the task in hand(heterogeneous).
Today, one model for doing this is big.LITTLE, developed by ARM, the company that provides the core processing technology for most mobile devices. Simply put, it allows a device to choose among cores of typically two different sizes in a chip so that a task can be carried out using the combination that consumes the least power. Result: Longer battery life.
However, the industry already wants to go much further.
At the recent International Solid State Circuits Conference (ISSCC), one of the main hardware design technology events, Lisa Su, senior vice president and general manager for AMD’s Global Business Units, floated the idea of a heterogeneous roadmap towards a holodeck. But to ensure any Trekkers didn’t get too misty-eyed, she also described the state-of-the-art in heterogeneous processing performance.
Today, this looks to share and allocate tasks between a traditional microprocessor (CPU) and a graphics processor (GPU) on a single system-on-chip connected by a unified memory controller. The problem here is the bus over which those units share information.
“They’re really bottlenecked by the bus,” said Su. “When you want to switch from A to B, you really have to move a whole bunch of data over and that ends up being the bottleneck,” she said.
“The next generation takes a more system view of the world and this is where you can put CPUs, other HSA [Heterogeneous System Architecture] computing units like audio acceleration as well as graphics units together on chip where they have unified coherent memory. That allows you to have a unified set of data. So, when you’re doing a [holodeck-like] 360-degree environment, any one of the compute units being able to access that data gives you tremendous capability in terms of optimization. It reduces the latency as you go between the switching. It allows you to do switching in much smaller chunks so that you can optimize.
“Then when we go to next generation processing capability, that includes more sophistication in the graphics unit, including compute context switching and graphics pre-emption.”
All this could be realised in only a few years. Indeed, with HSA as its proprietary take on the future of heterogeneous processing, AMD has a great deal at stake in getting others to join and invest in its associated ecosystem as quickly as possible. The competition to develop these multi-faceted systems is already hotting up, with not just Intel, but also Qualcomm, Samsung, Broadcom, nVidia and other major processor developers looking to capitalise.
For example, the essential distinction between a CPU and GPU - that one handled general tasks and the other graphics - is being replaced by an understanding that there are some highly parallel general tasks that a GPU may perform better (extending the fact that graphics rendering is itself an inherently parallel activity). To that, you can add the way in which heterogeneous designs will pull on many further design specialities, and the background to still broader interest becomes easier to see.
But before we can even start thinking about a holodeck as a long-term goal, there is another important component: the software.
Developing code for even today’s homogeneous hardware has been a challenge. They are inherently parallel and require sophisticated task allocation management.
The Multicore Association, the main trade technical body addressing the shift to homo- and heterogeneous architectures, has only just published its manual on best practices for coding. After four years of development.
“While the industry continues to make important long-term research into new programming languages and methodologies, the MPP guide tackles how existing embedded C/C++ code may be written to be ‘multicore ready’ today,” said Multicore Association president Markus Levy.
The AMD version goes much further than C and C++, though.
“We want to use the tens of thousands [of software developers] who can program in C, C++, HTML5, Java and some of the domain-specific APIs, and really build on top the capability to have an intermediate program language – the HSA Intermediate Language [or HSAIL] – that can unify those models. This is a big undertaking. It requires a lot of cross-collaboration between hardware, software, the system and all the industry standards that have to come around,” Su said.
The end goal here makes a lot of sense. “Think about what you could get with that: the idea that you could write your software once and run it anywhere, albeit with different performance. This is the view of how we really bring heterogeneous systems together.”
But also remember again that the main association’s guide has taken those four years. Even edging towards some of the different functions that could ultimately be combined in a holodeck, but which also have more discrete applications they could fulfill in the near term, will take time. That mission, though has begun.
Su told ISSCC that there are five key technologies that will need to come together in any holodeck of the future. All however are already delivering some innovations and promise many more. They are:
- Computational photography: reproducing a seamless, immersive video environment.
- Directional audio: creating still more immersive and thereby realistic audio to enhance our belief in the environment.
- Natural user interfaces: making man-to-machine and machine-to-man communication more ‘human’ and thereby more realistic.
- Context computing: intuitive understanding and reacting to the user’s needs in real time.
- Augmented reality: combining the real and the virtual.
Here are some examples of how we can already see some of these technologies in use today.
Dolby has just launched the Dolby Atmos audio system for blockbuster movies. It can support up to 128 discrete audio tracks and 64 unique speaker feeds in a cinema. A key driver here has been to match the trend towards very high-end 3D cinema (in films like The Hobbit: An Unexpected Journey and, inevitably, Star Trek Into Darkness) with much richer and more spatially distributed sound. It’s a huge leap forward from 7.1’s spread of seven main and surround speakers and one subwoofer.
Then there is the extension of the eye-tracking features just announced for the Galaxy S4 handset. The Smart Stay function already checked whether you were actually looking at the screen. This has now been joined by Smart Rotation, which adjusts the screen to the angle from which you are looking at it; Smart Pause, which stops a video when it detects you are no longer watching; and Smart Scroll, which will notice you are looking at a document and scroll the text as you tilt the screen.
Completing the trio, there have been the first steps in augmented reality taken by Sony with its PlayStation Vita and which are likely to be seen in still greater maturity on the PlayStation 4 (powered, perhaps not that surprisingly, by an AMD chipset). This version of AR inserts virtual objects into real environments captured by the console’s on-board camera in games such as the Asteroids/Missile Command update, PulzAR.
These examples demonstrate is that Su’s basic shopping list isn’t that far-fetched. The issues are rather ones of sophistication and efficiency. She demonstrated just how far we do still have to go there by recounting an AMD exercise to conduct facial recognition using existing technology.
Facial recognition remains largely driven by the cascaded algorithmic analysis of square-shaped Haar-wavelets, a strategy based on original research from more than a century ago.
Mapping these squares onto an image and then comparing the results against a reference bank is famously computationally intensive. Su illustrated this by describing facial analysis and recognition of the Mona Lisa at 1080p resolution using a heterogeneous CPU/GPU system.
The process begins by breaking down the image into 21x21px blocks, ultimately 2M squares. However, you then need a comparative scaling of just under 2X to 3.8m squares. The maths then looks like this:
Search squares = 3.8 million
Average features per square = 124
Calculations per feature = 100
Calculations per frame = 47 GCalcs
Beyond that, for the sake of extending the argument into the holodeck world, she offered the numbers for moving from a still image to HD video:
30 frames/sec = 1.4TCalcs/second
60 frames/sec = 2.8TCalcs/second
However, the immediate limitation is that these numbers apply only to front-facing images, so the video number is very much no more than indicative.
The good news, though, was that the use of a heterogeneous application of the Haar cascades did point to efficiencies available with existing architectures. It transpired that the while the GPU was clearly more efficient in earlier cascades (1-8), the CPU was better for stages 9-21. By allocating the different processors to different stages in the analysis, AMD got a performance boost of 2.5X and a decrease in the energy consumed per frame analyzed of, again, 2.5X.
Pulling it together
Better algorithms and more efficient heterogeneity could therefore start to bring this process up to holodeck levels, and certainly those for less complex tasks where facial recognition has been proposed (e.g., security).
At this time, the more near-term advantages would appear to be those available through hardware acceleration. Considering a number of other holodeck-type tasks, AMD claims it can see near-term improvements along these lines.
Gesture recognition 12X
Photo indexing 10X
Voice recognition 10X
Visual Search 9X
Audio search 5X
Stereo vision 4X
Video stabilization 4X
None of these is anywhere near enough to enable a holodeck, but they could take its technologies into a number of other less demanding applications.
The real key, though, and it explains why AMD has adopted the holodeck example, is that hardware alone will not do it. As the Mona Lisa example shows, we will need better algorithms and generally we will need more easily portable multicore software.
The real key here is to find an example that while fictional is also attractive, and in AMD’s case, one that gets its strategies for the heterogeneous future noticed. It’s whimsy with a very serious purpose.
And it also gives us an understanding of how processing and software are seeing once disparate elements drawn ever closer together. Success in the future may well be dictated by just how far you can go in that process. To that end, it’s hard to think of a better final objective than the holodeck.