We are moving toward a world where every connected system is becoming safety critical - so ICT professionals should step up to the principles of ultra-rigorous system design and build.
It has become increasingly important for technologists of all disciplines to focus on cross-sector communities and activities that may not fit neatly into standard professional vertical sectors. For instance, trends such as ICT convergence, and the emergence of 'smart' products - including smart buildings, smart cars, smart transport, smart grid and smart factories, et al - need dependable IT-enabled control systems that are assuredly safe and secure.
However, as consideration of the issues meets aspects of 'criticality', further questions arise: do we need dependability, or ultra-dependability? Embedded systems are now pervasive in our societies. Used in consumer electronics, transport, buildings, manufacturing, military, satellites and more, embedded systems represent the largest segment of contemporary computing.
Yet arguably, despite our increasing reliance on 'smart' gadgets, they are not fundamentally required to be totally 'dependable'. However, many equally numerous systems are more critical, especially those that are safety-related, safety-critical or form part of society's critical infrastructure. Automotive electronic control units, safety-related systems used to protect operators or the public from machinery, amusement ride safety, industrial chemical processing safety systems, and aircraft safety-critical systems, are examples where dependability is a requirement. Commercial jet aircraft utilisation is rising to meet our ever increasing demand for cheaper air travel - and justify airline economics. Aircraft depend on many embedded systems to fly, and the failure of any embedded system may potentially cause disaster. Aircraft may operate for 30 years or longer, and given tens of thousands of planes flying, the world's fleet operating hours are huge, which in turn dramatically increases the chances of an error occurring during operation. In such situations where systems are required to be practically flawless, embedded systems designers must ensure that their systems are very 'failsafe' dependable.
For instance, the UK Ministry of Defence funded a project called SafSec with the aim of, "reduc[ing] the cost and effort of safety certification and security accreditation for future military avionics systems". The SafSec project demonstrated that there are strong parallels between safety and security certification, and trials on avionics showed benefits in terms of speed and cost of accreditation.
Avionics and military approaches consider the dependability of safety critical systems, which describe the characteristics or behaviour of a system. These dependability attributes include:
- safety - the likelihood that a system will not cause harm;
- reliability - the probability that the system will deliver services as expected;
- availability - the probability that a system will be available to users when required.
- integrity - ensures that data is correct and not tampered with;
- confidentiality - prevents the disclosure of information to unauthorised persons;
- maintainability - the ability to introduce new features to a system, whilst having a low probability of introducing programming errors;
- survivability - the ability of a system to deliver services while under attack or partially disabled (particularly important for networked systems).
These attributes combine the two distinct approaches of safety assurance and information assurance, otherwise now called 'cyber' security.
Safety methodologies are generally concerned with non-malicious faults, and how these can be avoided or mitigated - the likelihood that an event will occur (probability or frequency), and the severity of the resulting accident. Conversely, security addresses the malicious attacks to a system, by identifying the threat actors and sources, compromise methods and vulnerabilities that may be exploited. This analyses the likelihood that a threat will exploit a vulnerability leading to a (business) impact.
Functional safety issues
Developments in safety technology, particularly the dramatic changes brought about by the Functional Safety standard IEC 61508 (for electrical, electronic, and programmable safety systems) have driven the focus of the safety discipline within Europe in the last decade. The standard has fundamentally changed how equipment with safety functions is engineered, implemented, and maintained, with an emphasis on a lifecycle approach, 'from cradle to grave', which includes managing the competency of those responsible.
The breadth of applicability is demonstrated by the scope of domain- specific standards based upon IEC 61508, these include automotive, machinery, industrial processes, railway, safety-related networks and nuclear. Distinctive changes brought about by the very recent development of the Functional Safety standards and the requirement to develop appropriate demonstrable competencies, and the certification of Functional Safety specialists, have led many organisations to focus upon a safety-centric approach, leading to the exclusion of other factors that affect safety, such as malicious intent. An exception has been the nuclear industry, where alignment of Functional Safety, using IEC 61508 Safety Integrity Levels (SILs) to Business Impact Levels. A notable combined approach to security and safety is to use the Functional Safety standard IEC 61508 Safety Integrity Level (SIL), as a measure of reliability and/or risk reduction, and align this to Business Impact Levels (BIL). Business Impact Levels are used by the UK government in the Security Policy Framework and the CESG HMG IA Standard No 1 for protecting the Confi dentiality, Integrity or Availability of assets.
They are used by the UK government, government suppliers and in Critical National Infrastructure. The Impact Levels relate directly to Confidentiality protective markings. However, there is no equivalent set of markings for Integrity or Availability, hence the method to use a dependability approach to combine Availability and Integrity for control systems (see 'Combining safety and security' table, below right).
Lessons identified, but not learnt?
Organisations generally manage information risk using Information Assurance (IA) processes based on ISO/IEC 27001 and ISO/IEC 27002 standards originally developed in the UK. The ISO/IEC 27001/27002 series standards provide a framework for cyber security under the explicit control of management; however, compliance is voluntary. They provide for a risk-based management system that specifies the overarching structural requirements for information management frameworks. As such, they are flexible depending on the requirements of the specific organisation in question and do not require specific security measures to be implemented.
The security risk-based approach using the Confidentiality, Integrity and Availability (CIA) triad does not fully account for the Risk Tolerability, Functional Integrity, [and] Availability aspects of control systems used in industrial processes.
This gap is now being addressed jointly in ISA99 and IEC 62443 for the cyber security of industrial systems. Work recently initiated in ISA99 and being adopted by the IEC Working Group is in progress to align the management framework and control system specific measures to ISO 27000. ISA99 work being incorporated into IEC 62443 applies a more granular approach to security objectives, which include:
- access control - control access to selected devices, information or both to protect against unauthorised interrogation of the device or information;
- use control - control use of selected devices, information or both to protect against unauthorised operation of the device or use of information;
- data integrity - ensure the integrity of data on selected communication channels to protect against unauthorised changes;
- data confidentiality - ensure the confidentiality of data on selected communication channels to protect against eavesdropping;
- restrict data flow - restrict the flow of data on communication channels to protect against the publication of information to unauthorised sources;
- timely response to event - respond to security violations by notifying the proper authority, reporting needed for forensic evidence of the violation, and automatically taking timely corrective action in mission critical or safety critical situations;
- resource availability - ensure the availability of all network resources to protect against denial of service attacks.
These objectives will assist in the provision of appropriate security for automation systems that utilise Functional Safety standards in their implementation of safety-critical systems for the protection of people, the environment and against business loss. The combination of these safety and security standards will be used during the systems design and integration, but not yet in the development of safety-related automation technology.
According to the McAfee's 2011 critical infrastructure protection report 'In The Dark' nearly two-thirds of critical infrastructure companies regularly find malware designed to sabotage their systems. Stuxnet and Duqu malware were defining moments for users of industrial automation; they precisely targeted particular industrial control systems and were attributed (and not denied) to the US and Israel. Stuxnet was evidently the first time a very carefully crafted malware had been produced to compromise industrial control systems and remain undetectable to operators or system programmers.
It was propagated via Microsoft Windows networking and USB auto execution exploits, and not only had the ability to jump the security 'air gap' separating corporate and industrial networks, but also to jump species, from Windows to Siemens' proprietary control system operating system infecting the target programmable logic controllers and affecting their networked I/O. The associated safety-related system in the Iranian nuclear plants should arguably have prevented the intermittent demands for overrated speed that destroyed thousands of uranium enrichment centrifuges. Safety systems were allegedly circumvented (see 'Stuxnet infection routes' diagram, p94).
A Controller Area Network (CAN) is a network technology widely used in a variety of embedded systems, including civil and military vehicles, cranes, marine instrumentation, rail, industrial automation, industrial machinery, lifts, medical devices and even coffee-makers. It is also used in safety-related applications, as an automation building block in many familiar systems; including bridges, amusement rides, baggage systems, ski-lifts and industrial automation. CAN has also been the de facto standard for in-vehicle networking for nearly 20 years.
Automotive safety recalls are alarming, and particularly so if they involve vehicle braking. Research by the Center for Intelligent Transport Systems collaboration between University of Washington and the University of California San Diego) has demonstrated significant weaknesses in the security of vehicle networks and electronic control units (ECUs). Early research focused upon fuzzing of a car's CAN bus and bridging the various car networks to assess potential vulnerabilities. Physical access to the car's On Board Diagnostic (OBD)-II port was used to provide connectivity to the CAN network. Using a custom developed CAN bus analyser and packet injection tool, the researchers sniffed the network and conducted a number of attacks, including sending commands to various ECUs, reprogramming ECUs and circumventing safety-critical systems. They successfully demonstrated that false speed and information could be sent to the dashboard, cause braking or disable brakes, and functions could be initiated either by command or via blended attacks using reprogrammed ECUs and events to trigger operation, when static or in motion.
Even more alarmingly, the researchers demonstrated that the 'malicious' code could be removed on rebooting the affected ECU, preventing forensic analysis. This research was criticised for the requirement to have physical access to the OBD-II port; later research successfully showed how potentially malicious code could be injected by various remote means. The methodology used various ECUs as network gateways to other ECUs using corrupted music files via the entertainment system and wireless networks via the telematics ECU, including comprising Bluetooth, standard mobile, and Wi-Fi connectivity.
Omission or commission?
The psychologist's view of those other factors affecting safety would be either passive negligence or active commission; morally a person can feel righteous by abstaining from the sin of commission or actively commit and sin, the act of which may be considered nefarious and illegal. Acts of omission in the safety engineering domain can lead to situations where safety is compromised, where the scenario was not envisaged.
With the convergence of systems and technologies, the reliance on programmable electronics and data networks, is it realistic to assume that safety is unlikely to be compromised through the omission of the security of a safety-related system?
Unlike mainstream security, a security case remains constantly challenged, yet a safety case often does not take into account'any intentional, nefarious activity that may compromise safety. The issue, where safety and security do not yet operate in concert, is the differing perspectives of the respective disciplines. Yet the lifecycles of safety and security are similar, with assessment, risk mitigation (of hazards or threats), design, implementation, validation, monitoring, management and auditing activities.
Although there is an underlying consensus that improving the overall trustworthiness of all software would be a desirable state of affairs, efforts in this area have tended to be concentrated in isolated 'stovepipes' where such trust is a functional requirement, in particular in the fields of Cyber Security and Safety Critical systems (see box-out, 'Differentiating threats, hazards, and adversities', above right).
There is commonality in safety and security, with the advantages in combining attributes of both disciplines early in the design process, rather than late during implementation. With similar lifecycles, albeit different perspectives with threats and hazards, there is an opportunity to adopt a shared outlook dealing with adversity and increase dependability, potentially with a lower lifetime cost. *
Dr Richard Piggin CEng MIET is a security sector manager at Atkins, and a UK expert to IEC Cyber Security Working Groups involved in producing IEC 62443 covering Industrial Automation and Control Systems Security, and the IEC 61784-3 Working Group responsible for safety-related use of Ethernet in industrial control systems. Dr Carl Sandom CEng MIET is a safety consultant at iSys Integrity, and a member of BSI GEL/65/1, the UK National Committee contributing to the development of IEC 61508 covering Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related Systems.