Big data sets are valued for the patterns and trends that they reveal when analysed by equally massive computing resources, but scientists are now questioning whether bigger is always better when it comes to deriving value from the analogue world.
The Term 'Big Data' generally refers to digital data-sets so large and complex that it becomes difficult to process them using standard database-management tools or traditional data-processing applications. When big data meets big science the challenges escalate almost beyond calculation: trafficking petabytes of data from deep space, for example, as the next generation of radio telescopes will soon start to do, will stretch the world's ICT infrastructures to their limits.
Meanwhile, measurements of continuous real-world analogue quantities such as temperature, pressure, light, vibration, radio waves and magnetism now dwarf the digital detritus of sales figures, tweets, and Facebook postings chugging through our virtual data communications infrastructures. Data acquisition specialist National Instruments (NI) likes the term 'big analog data' (BAD) to describe this phenomenon (NI favours the American spelling of analogue in its brand appellation).
"Think of the constant stream of images, sounds, temperature, and pressure changes that the human body picks up. Then consider the machines and sensors doing a similar job. You get a sense of the mind-boggling quantity of analogue data," says Dr Tom Bradicich, R&D fellow for NI, and one of those in charge of developing huge data-acquisition systems for NI customers in fields as diverse as power generation, vehicle manufacturing, construction, and medical imaging.
Faster databases, higher-speed bus systems, more powerful computers, and cleverer data visualisation software are all critical to extracting the value from BAD. Data volume, however, seems set to expand relentlessly with the number of sensors being added into systems for a wide variety of applications. You could say it is a hostage to physicist and electrical and communications engineer Harry Nyquist's theorem. A central tenet of signal processing, the theory states that you cannot capture a signal without error unless you sample it at least twice per cycle of the highest frequency component it contains.
Consider a photograph of a flower taken by a mobile phone. Millions of pixels are captured by the phone's high resolution CCD sensor; then most of the pixels are thrown away when the phone's processors compress it into a JPEG file. It remains a perfectly good picture of a flower; the question is: if you can compress such an image without loss of information, were too many measurements taken in the first place?
These days we "record many more things simply because we can, and we are left trying to tease out a few effects in a sea of irrelevance", according to Emmanuel Cand's, professor of mathematics and statistics at Stanford University in California, and one of the originators of the theory of compressive sampling (CS), a 'less is more' approach to measuring real-world phenomena.
In 2004, Cand's (then at Caltech) and his colleagues Justin Romberg, David Donoho and Terence Tao showed that this insight was indeed correct – and that Nyquist was a pessimist. Their theory of compressive sampling says that signals and images can be reconstructed from fewer measurements as long as sampling bandwidth is proportional to the information content.
Usefully, it turns out that most signals and images we want to capture are 'sparse', which means the information of interest is a small percentage of what we would traditionally sample, store and process.
Mirrors, but no smoke
Since then, CS theory has been quietly inspiring a reconception of a number of physical signal acquisition applications, ranging from high-end imaging to remote wireless sensing with a variety of aims including shrinking the datasets, speeding-up capture time, reducing processing load, and cutting power consumption.
An outstanding example of the scope and strength of CS is the short-wave infrared (SWIR) single-pixel camera developed by Professors Rich Baraniuk and Kevin Kelly of Rice University, Houston, Texas and now sold through a spin-off company called InView Technology Corp. Unlike shorter wavelength visible light, SWIR light (around 0.9 to 1.7µm in the spectrum) is useful in navigation, surveillance, and security because it travels through haze, dust and smoke without scattering.
SWIR is also used in certain biological tissue imaging applications because of its good penetration and ability to pick-up spectral signatures of different materials. What has held the technology back is the expense of the cameras – which cost $70,000 and upwards because of the exotic InGaAs detector array required to respond to SWIR light. Inview cut its SWIR camera cost to under $20,000 by replacing the InGaAs array with a single InGaAs detector and a 1024 x 768 pixel digital micromirror device (DMD) made by semiconductor company Texas Instruments.
The DMD works as a modulator, sampling the incident image by rapidly adjusting its tiny mirrors so that some are on 'on' (light from that pixel is directed towards the detector) and some are 'off' (directing the light away from the detector) at a rate of up to 32,000 times a second.
As Lenore McMackin, InView's chief technology officer, explains, "For the most part, random-looking patterns are used with 50 per cent of the pixels on and 50 per cent are off. After being modulated, the image is sent through a lens that focuses it on the single detector. In this way information from the entire image is converted into an electrical signal."
The signals generated by sequentially displaying a set of patterns on the DMD become co-efficients of a measurement vector that can be used to reconstruct the original image.
Image reconstruction uses knowledge of the mirror patterns and the measured data to determine what image/scene was present at the input of the camera. A typical CS reconstruction algorithm iteratively constructs the image using mathematical optimisation procedures that search for the sparsest solution. The result is a camera that can produce high-quality images using only 10 per cent of the data that would be gathered in a one-to-one imaging technique.
Interestingly, further data reduction is possible using the micromirror array for direct image processing such as tracking, target-detection, and solar exclusion. "For instance, if there is sun in your field of view, you can turn off those pixels automatically at the DMD," explains McMackin. Some initial processing needs to find the region of interest (the micromirror is not smart enough to do that), but it is minimal because of the small amount of data captured.
The results are fed back to the DMD and used for adjusting the mirrors from then on. "You don't necessarily need to reconstruct an image in every case. There are pieces of information you can find just by processing the raw data," she adds.
The first generation of cameras can capture an image every three seconds. A video version (offering up to 15 frames per second), combining three single-pixel cameras, is due for launch soon. Next up is the intriguing idea of adding single-pixel detectors sensitive to other wavelengths such as UV, visible, IR, and terahertz, to provide a highly data-efficient way of detecting objects and materials by their unique electromagnetic 'fingerprint'.
In the UK, Edinburgh University's Compressed Sensing Research Group, led by Professor Mike Davies, has been applying compressive sampling techniques to various applications. "The ideas in compressive sampling present a philosophical change: in effect the reconstruction algorithm guys are telling the sensing people what they should measure," Davies explains. Recently, they accurately measured cardiac blood flow using an MRI scan of the profile of an artery in the neck. MRI is a notoriously slow imaging approach – up to an hour for a 3D brain scan – and if the patient moves at all, the image gets blurred.
Using compressive sampling, the Edinburgh team could scan eight-to-ten times faster than usual, using a specially-designed measurement sequence in each of multiple progressive scans, generating, in effect, a moving image of cardiac blood flow.
UWB radar imaging
In another project with the Ministry of Defence's Science & Technology Laboratory at Porton Down, the Edinburgh team has explored applying CS to improving ground images acquired from aircraft using ultra wideband (UWB) synthetic aperture radar. "UWB radar imaging from a plane has to deal with holes in the spectrum due either to the existence of licensed bands, such as TV signals, taking up the space, or perhaps from gaps caused by jamming signals," explains Edinburgh University's Professor Mike Davies. He and his team have shown that the ideas behind compressed sampling can significantly improve UWB radar image quality, because the images of interest are sparse but have structure (the researchers used examples like military vehicles with some surrounding clutter). The ideas are now being evaluated on working aircraft.
Dedicated silicon chips for CS have taken some time to appear, but the last couple of years have seen some promising results. Emmanuel Cand's and colleagues at Stanford have been working with researchers from Caltech and the company Northrop Grumman in a DARPA-funded project called Analog-to-Information (A-to-I).
The goal was to make CS-inspired analogue to digital converter (ADC) chips suitable for use in small, light, power-constrained airborne systems that might want to capture and transmit signals buried in a wide band of radio spectrum: 2.5GHz of bandwidth was the target which, according to Nyquist, would usually require a digitisation rate of five billion times per second, generating a DVD's-worth of data every second. But because the radio signals of interest are typically sparse (for instance, the way GSM towers allocate bands to minimise interference means that the GSM spectrum is relatively empty), compressive sampling theory says that a much lower sampling rate will work without information loss.
In 2011 the researchers produced their first chips, and showed that they could sample 2.5GHz at 12.5 times lower than the Nyquist rate and with lower power. A useful device to emerge from this is a 90nm CMOS 'universal' encoder, known as the random modulator pre-integrator (RMPI), which achieves 2GHz bandwidth while digitising at 320Msps. While a traditional ADC digitises physical voltage levels, which represent desired information, the RMPI is more like a conventional RF/baseband architecture or analogue pre-processor.
Described in a paper published in 2012, the eight-channel RMPI chip encodes compressed samples by modulating the input signal with a pseudo-random number generator (one per channel). Each channel is in effect implemented as a modified direct down-conversion receiver with the oscillator replaced by the pseudo random number generator. Applying a similar approach, Yusuke Oike of Sony and Abbas El Gamal of Stanford have designed a 256 x 256 pixel CS-based CMOS image sensor that works by summing random combinations of analogue pixels during analogue-to-digital conversion.
CS is not the only game in town. Analogue computing in various guises is also emerging as a way of processing analogue sources more efficiently. For instance, Intel Labs and researchers at Purdue University (in Indiana) have developed a memrister-based analogue computer chip design outlined in a paper 'Ultra Low Energy Analog Image Processing Using Spin Based Neurons' in April 2013. The architecture, which could be used to perform analogue computation for common image sensing and processing applications, is described as providing 'on-sensor' image processing. Simulations of feature extraction, halftone compression, and digitisation, suggest that the design can achieve around 100 times reduction in computation energy, compared with conventional mixed-signal image acquisition and processing chips.
Meanwhile, a research group at Columbia University has developed a programmable 'hybrid discrete continuous architecture' (HDCA), which operates like a kind of two-way data device. An HDCA has special instructions to allow computations to be sent to either analogue or digital units and to garner feedback from either side about the computation. The programmer, compiler, or the hardware, can adapt their programs based on this feedback.
"The analogue side can quickly compute an approximate solution and give it to the digital side for refinement," explains Professor Simha Sethumadhavan who with colleagues Mingoo Seok and Yannis Tsividis is building a prototype chip based on a custom silicon design. "Similarly the digital side can decide that the computation it has been tasked with does not need capabilities offered by the digital domain and shuttle the work to the analogue unit."