vol 9, issue 6

Cyber-physical systems begin to embrace time

16 June 2014
By Chris Edwards
Share |
A Google branded car

The MIRA are researching new ways to make self-driving vehicles operate on the road so they can form platoons

Driving behind the Google car

New safety systems will address issues such as other autonomous vehicles approaching drivers’ blind spots

Google's experimental electric-powered car

Google has begun building a fleet of experimental electric-powered cars that will have a stop-go button but no controls

Circuit board

Research emphasis on time stamping will provide local clocks in any given circuit a constant view of the time

Junction box with wires plugged into it

Timing information trickles down through a series of repeated messages to keep systems syncronised

Despite claiming real-time support, many embedded systems only have a passing relationship with the clock. That is beginning to change as designers try to build safer systems.

Almost 11 years ago the electrical network in the north-east of the USA blacked out for hours as problems with distribution equipment rippled through the network. As investigators pored over the data that might lead to its cause, they ran quickly into one major problem: none of the data gave easy clues as to when it was recorded.

The report claimed: "The Task Force's investigators laboured over thousands of data items to determine the sequence of events, much like putting together small pieces of a very large puzzle. That process would have been significantly faster and easier if there had been wider use of synchronised data-recording devices."

The following February, the US National Electric Reliability Council issued the recommendation that the data-acquisition devices inside power plants and substations should be time-synchronised so that the data they record could easily be assembled on a consistent timeline. The move represents part of a trend to stop time from being the forgotten component in embedded systems as they start to link up together as components of the Internet of Things (IoT).

Because many of these systems will involve some level of motor or robotic control – a class of computerised mechanisms known as cyber-physical systems – safety plays a major role in the decision to look much more closely at timing.

Sending signals

Anthony Baxendale, manager for future transport technologies at research institute MIRA, said at the Future World Symposium organised by the NMI, research is progressing on making road vehicles co-operate on the road so they can form more efficient platoons on motorways and eventually self-navigate around cities. "The high degrees of system autonomy bring major challenges with testing and ensuring reliability and safety," he said.

Long before vehicles become self-guided, they will incorporate computerised safety systems designed to warn drivers of impending problems, such as approaching vehicles in a blind spot.

Industrial control systems are already becoming more distributed as intelligence moves down to the motors and actuators themselves. This is the thinking behind robots such as those made by Synapticon. Each joint has its own microcontroller that can make local decisions about how to drive the motor, based on commands sent from a co-ordinating computer.

"Telling a biological arm to perform an action is done using an abstract command rather than giving each muscle separate multiple commands," says Nikolai Ensslen of Synapticon.

Devolved intelligence brings with it challenges – the main one being what happens when two controllers conflict, which is highly possible if one or even both of them is dealing with out-of-date data.

In a distributed system no process or node has an up-to-date and consistent view of the system's global state. Although a node may have a model that it can regard as up to date, there may be changes elsewhere to which it has not yet been alerted.

A centralised system can have a more complete picture although it may be unwittingly acting on out-of-date data simply because a sensor reading has not reached it from a remote location as it makes a decision.

Systems like this can work – and have done for years – without explicit timing information because they have a clear view of the order in which signals arrive from peripherals. However, multitasking systems can be prone to race conditions caused by interrupts from peripherals being processed in the 'wrong' order. A high-priority task, for example, may handle a later interrupt more quickly as it can pre-empt a less important task that may be assigned to process a signal that happens to arrive earlier.

Getting it right

Distributed systems make race conditions even more apparent, leading systems designers to look now for more deterministic models of the world to let them cope with real-world conditions more effectively.

Simon Knowles, chief technology officer at chipmaker Xmos, which makes processors that are designed to handle I/O deterministically, says: "Determinism helps both security and safety. Having composable software so that when you add something it doesn't disturb the timing of another component is crucial to the safety world."

Rahman Jamal, European director of marketing for National Instruments, says: "Timing is the crux of the matter. You need a different paradigm that has timing embedded in it. As we move into cyber-physical systems, timing becomes centre stage." 

The idea is not new. More than 20 years ago, MIT researcher and Ford professor of engineering Barbara Liskov argued in a paper for the Journal of Distributed Computing: "Synchronised clocks are interesting because they can be used to improve the performance of distributed algorithms. They make it possible to replace communication with local computation."

Instead of a remote node having to ask a master whether a condition holds, it can work out the answer itself based on information from the master node's past and the current time, according to the node's own clock. As long as the clocks are synchronised, the two nodes will agree on the outcome.

Professor Hermann Kopetz of the Vienna University of Technology has argued for the inclusion of time in embedded systems behaviour for years, helping to drive the adoption of time-triggered networks in automotive systems during the past decade. He says: "In embedded system design, physical time is a first-order citizen that must be part of any realistic system model."

Some areas embraced the idea of timing-based networking and control years ago. In many cases, the decision to use timing was to provide a reasonable trade-off between complexity and performance.

Timed communications

Audio processing has relied for years on time-triggered communications, from the early telephone network through to music studio tools such as the DSP networks inside Avid Pro Tools' TDM hardware. In Pro Tools TDM – but not in the version that runs on 'native' hardware inside a PC or Mac – all communication and processing is timesliced. Each DSP works for a fixed length of time on a given task then, at the end, delivers its data to the network using its dedicated time slot.

Systems that do not rely on sample accuracy avoid this kind of time-division multiplexing because it lowers overall system use. To ensure that the software always completes within its time slot programmers will often leave plenty of headroom, which translates into wasted processor cycles.

Systems that have to guarantee safe, dependable behaviour generally also resort to some form of strict time-slicing because it avoids the problem of tasks accidentally being scheduled in time to meet a deadline. In this scenario, important tasks may have nothing to do but they are guaranteed a number of cycles within a period of time to run, just in case they do need to respond to a problem. ARINC 653 avionics systems have used this approach for years. Each piece of software in an ARINC 653 system has its own dedicated, protected memory space and the APEX provides a guaranteed time-slice to each one.

Accurate timing

Digital hardware design extends the idea of rigid time-slicing down to the nanosecond level. The approach restricts design flexibility and has resulted in more and more energy and design effort being applied to try to maintain a consistent clock from one side of a chip to another because the time it takes for electrons to cross a chip is often longer than the clock period itself.

The design style is used almost everywhere in digital electronics because it greatly simplifies verification. The designs may not be optimal but at least they get finished. Researchers such as Professor Steve Furber of the University of Manchester have demonstrated more power efficiency from asynchronous circuitry but the problem of verification keeps many electronics engineers from applying the techniques.

Easier verification provides a strong incentive to move to time-based systems in a wider range of distributed systems. But, rather than attempt to lock systems to a regular clock pulse in the way that TDMA networks and digital hardware do, the emphasis is now on time-stamping communications so that local clocks have a consistent view of the time.

At the University of Illinois, Professors Lui Sha and Jose Meseguer have worked on methods to apply synchronisation to a range of distributed systems problems. They call their technique physically asynchronous, locally synchronous (PALS) design and demands that clocks on each controller remain in sync with others in the system.

For a dual redundant flight guidance system prototype, the results showed that PALS reduced verification time from 35 hours to less than 30 seconds compared with traditional, purely event-driven design.

"Synchronised clocks allow events to happen simultaneously. This not only makes it possible to simplify verification and conformance testing, but also makes it possible to design protocols that are much safer, because events will occur at specified times, rather than based upon message receptions, which may be unreliable," says Columbia University Professor Nick Maxemchuk, who has worked on an experimental protocol based on timing synchronisation for automated vehicles.

"In our communications, all messages are scheduled, so that all cars know when they have missed a message, and can begin recovery. Not sending a scheduled message is a reliable way to inform the other cars that you cannot take part in the manoeuvre."

On the road

Prof Maxemchuk uses the example of a car moving into a lane between two other vehicles. Under the timed protocol, the three cars – which have their clocks synchronised using GPS or IEEE1588 – agree to communicate up to a deadline time. If the manoeuvre does not complete before the deadline, the cars abort the procedure so they can start afresh or deal with other vehicles. If the cars transmit messages that signal they can go ahead before the deadline expires – and those messages are not missed by any of the cars – the manoeuvre can complete.

Without a deadline and synchronisation, it would be more difficult to check whether cars were able to complete the manoeuvre or not, indicating the importance of synchronisation in ensuring correct behaviour.

Although timing has been neglected for years in many systems, the focus on easier verification as well as on safety in cyber-physical systems means that industrial, automotive and other networks will incorporate support for timestamping and synchronisation into their infrastructure.

Share |

Keeping networks in sync

How do systems actually know what the time is? Delays and even general relativity play major roles in making the communication of time more complex than it might appear.

Even when reset, the clocks on most microcontroller-based systems soon go out of sync. A quartz oscillator will drift with changes in temperature – lower-cost crystal typically follow a parabolic curve often centred on room temperature. Unless it is a specialised thermally controlled version designed to limit clock drift, small changes in temperature will lead to the oscillator running slower. A 10°C rise in temperature can cause the oscillator to 'lose' around three seconds per day.

To limit inaccuracies, the clocks in time-aware systems need to continually adjust and resynchronise to a master. The aim is to overcome the delays inherent in asking the time of a clock that might be miles away from the machine – reducing timing accuracy to the level of milliseconds that is encountered with older protocols such as the Network Time Protocol (NTP) used to automatically set the clocks on Internet-connected computers.

IEEE 1588 grew out of techniques used in older telecom networks to set time, although it uses more informal techniques that reflect the erratic growth of the Internet. Almost all digital telecom systems require precise clock synchronisation to avoid missing bits as data passes from switch to switch, although protocols such as SDH contain buffer intervals to compensate for small amount of clock drift. Because Ethernet is taking over from SDH in high-bandwidth backbones, IEEE 1588 is becoming the leading standard for network-based timing synchronisation although it may not make it as far as the radio links between road vehicles.

At the root of any IEEE 1588 system is the grandmaster clock, which is normally an atomic clock. From this grows a tree of slave clocks that are either boundary clocks, which exist mainly to provide accurate time information locally – or ordinary clocks, which will sit inside the computers or embedded systems themselves. Timing information ripples down the tree using a series of message exchanges that need to be repeated at regular intervals to ensure all the systems stay synchronised.

The master clock – which will be the one with the more accurate clock out of any pair – first sends out a sync message that contains a timestamp. Shortly afterwards it sends a follow-up that contains a later timestamp. The slave responds with a message asking the master for a delay estimate, which it provides with a fourth timestamp. The slave ends up with four timestamps, three from the master, and from these works out how to alter its internal clock to match the master as closely as possible.

The protocol results in timing synchronisation accurate to tens of nanoseconds over a local area network, which typically have maximum cable runs of 100m or so. Careful design can bring this down to under 10ns and with specialised hardware support to less than 1ns across links up to 10km long, as Cern has demonstrated with its White Rabbit extension to IEEE 1588.

Because Cern opted for an approach based on existing standards, a number of labs have begun to install their own White Rabbit-based systems and extend the applications for the network. By relaxing the timing constraints, scientists in Finland demonstrated the protocol on a 1,000km link between two labs. Although the delay was more than 10ms, the daily variation in timing was kept to less than 80ns.

Although it was not designed for the job, there is a competitor to IEEE 1588 that is likely to be more commonly used in wireless systems. Rather than rely on two-way transactions, which would be impractical between low-power devices on the ground and orbiting satellites, the Global Positioning System (GPS) and similar systems such as the Russian GLONASS and European Union's Galileo, rely on the ability of the slaves to interpret timing signals from more than one source.

Although the ability to triangulate signals from as many satellites as possible is a vital part of determining position, with GPS it ensures that the clocks are consistent with the atomic sources on each of the orbiting satellites. Until it has a good idea of its position, the GPS receiver cannot be sure how long it is taking for each satellite's timing signal to arrive at its location.

The problem with using GPS as a timing source is that satellites need to be in view to maintain synchronisation. This limits its use to outdoor systems but by avoiding the need for constant two-way communications with reference clocks, GPS has some advantages for transportation systems. IEEE 1588 is more likely to be used in industrial networks where consistent communication over a network can be assured.

Ethernet learns to keep time

When it was first introduced, Ethernet was designed for low cost and easy setup. Any node on the network could try to send a packet at any time – and would often find that it had picked a time when another node was trying to do the same thing. The result on the original Ethernet, which used a bus topology in which nodes share access to a single cable, was a collision.

Whenever a collision occurred, the offending network adapters would back off, wait and try again. So they did not collide again, each adapter generated a random number that it used as its delay time. This made Ethernet a very unreliable network for communicating time data. The sender would have very little idea of how long it would take for a timestamped message to make it to the destination.

Forty years later, practically no Ethernet networks use a bus. Instead, switches provide connections to each node and the switch arbitrates access to the network. As the twisted-pair wiring that carries the signals has multiple cores, both the switch and the network adapter in a computer can transmit simultaneously, which brings network delays down even further.

Industrial, avionic and automotive networks are going further through the introduction of forms of Ethernet that are either fully synchronous in that they embed a high-accuracy clock in their signals or which attach timestamps to each packet they transmit and can provide guaranteed time slots. The latter is the approach taken by versions of Ethernet such as SAE6802, developed for avionic systems. SyncE provides fixed time slots for high-priority traffic and is being inserted into telecom networks as a replacement for time-sliced Synchronous Digital Hierarchy equipment.

Timing is extending to higher layers of the Internet. Google's Spanner project attempts to provide a sense of global time to client software and to deal with the problem of delivering accurate timing over long distances.

If an application asks TrueTime for the time 'now', the software responds with a bounded range that is guaranteed to contain the absolute time at which the function was invoked. It's up to the application to decide how to treat that uncertainty. It may simply wait until it can be sure that a target time has passed and this is what happens in the Spanner database manager – it delays transactions to compensate for uncertainty.

Related forum discussions
forum comment To start a discussion topic about this article, please log in or register.    

Latest Issue

E&T cover image 1408

"What the Scottish independence referenda could mean for engineers and engineering on both sides of the border"

E&T jobs

E&T Marketplace

The essential source of engineering products and suppliers.

E&T podcast

Tune into our latest podcast

iTunes logo

Subscribe

Choose the way you would like to access the latest news and developments in your field.

Subscribe to E&T