|
|
||
|
|
Safety Critical Software Using Ada
Introduction Software now pervades almost every aspect of daily life. Transport systems depend on software for control of vehicles and their infrastructure. Financial institutions rely on software for accounting and the transfer of money. Industrial software controls equipment and manufacturing processes. Hospitals depend on software for managing patient records and for control of life-support systems. The use of software has grown dramatically over the last decade with the availability of low-cost, high-performance hardware. It is clear that the safety of much human life and property depends directly or indirectly upon the correctness and deterministic properties of software. Software can provide users with considerable operational flexibility. However, this flexibility brings with it a greatly increased chance of error. There is now an increasing awareness that strict control is needed in order to reduce the risks of errors in what has come to be called safety critical software - that is, software systems whose failure may lead to loss of life or severe injury. As a result, there is a growing concern in all major industrial nations regarding the legal and ethical obligations of companies and their officers to ensure that systems do not violate safety regulations. Many industries are in the process of setting, or have already set, specific standards for the development, testing, and certification of safety critical software. As these standards emerge, the focus is on the use of best practice. In some areas, standards mandate specific techniques for the development of safety critical systems. In all cases, a reasoned justification for the techniques actually used is required, together with evidence to show that the life cycle development processes are being followed. Example of Failure A passenger airplane is circling in a prearranged location off the coast of Florida. The landing is delayed because of bad weather conditions. As the plane is banking into a turn, a sudden updraft causes the plane to roll much faster than the software control system expects. The software "assumes" a glitch, and the computers are set into an automatic reboot process. The pilot looks on with horror as all of the navigation displays turn blue with a white line through them. At a most crucial moment, when the pilot needs information to stabilize the aircraft, the computers are performing memory checks and restarting the display software. Fortunately, the pilot has enough height and time to fly the plane by "feel" until the displays are functioning correctly. Had this error occurred when the plane was landing, the consequences could have been catastrophic. What is Safety? MIL-STD 882B (1984) defines safety as follows: Freedom from those conditions that can cause death, injury, occupational illness, or damage to or loss of equipment or property. The terms safety, reliability, security, and correctness should not be confused. Leveson [Leveson 86] defines the differences among these terms as follows: In general, reliability requirements are concerned with making a system failure free, whereas safety requirements are concerned with making it mishap free . . .Software is a set of instructions and data that makes a general-purpose computer into an application specific one. Software itself is neither safe nor unsafe. However, if software controls the functionality of a safety critical system, then it becomes safety critical software. A study of system safety is described in "Safeware - System Safety and Computers" by Nancy Leveson [Leveson 95]. An extensive discussion of safety critical systems, which are usually also real-time control systems, is found in the reference [Pyle 91]. Criticality Levels Most standards assign a criticality level to a system based on the severity of a potential catastrophe and the probability of its occurrence. These are then mapped onto categories for the system criticality levels. If software controls these safety critical systems, then the software too is assigned a criticality level. The software criticality levels correspond to the failure conditions that would result if the software were to fail. The Federal Aviation Administration (FAA) recognizes five categories of failure conditions and five software-level definitions.
The new IEC 61508 standard uses four software integrity levels, and IEC 880 also uses four levels. [IEC 61508] There is a strong correlation between the way software can contribute to a potential hazard and the severity level. The level assigned to some software has a great influence upon the rigor with which the software is developed and verified and the evidence that must be collected to confirm this.
Standards
The avionics industry has taken the lead in the development of safety certification standards for computer programs. The Radio Technical Commission for Aeronautics (RTCA), a nonprofit organization formed in 1935, has been instrumental in shaping the future of aviation through the application of electronics and telecommunications. The RTCA operates as a federal advisory committee that is composed of industry and government representatives. One of the key activities undertaken by the RTCA is the development and publication of guidance documents and minimum operational performance standards for avionics technology. Although the acronym remains the same, RTCA now represents "Requirements and Technical Concepts for Aviation." The RTCA board of directors and international associates work in association with members of the international aviation community to propose the formation of new committees and provide input to ongoing special committees. One such committee, SC-167, was responsible for the preparation and revision of standards for the certification of avionics software. Major areas of concern include:
SC-167 was responsible for the review and revision of the document Software Considerations in Airborne Systems and Equipment Certification (DO-178A). This review resulted in a new document, DO-178B, which was issued in December, 1992. The Federal Aviation Administration (FAA) of the U.S. Department of Transportation issued an Advisory Circular on January 11, 1993, stating that DO-178B may be used as a basis for submission of material required to obtain FAA approval of digital computer software. The document is known as ED-12B in Europe and is used by the Joint Aviation Administration (JAA) as a basis for certification. As DO-178B was used by industry, it became apparent that parts of the document were unclear or ambiguous. A special committee was formed and tasked to produce guidance materials based on experiences and expertise in the industry. SC-190/WG52 is working on the production of these guidelines initially in four principal areas:
ISO 9000 The ISO 9000 guidelines do not address the production of safety critical software, but rather focus on the issue of quality. All certification bodies insist on a quality system that instills confidence in the production of safety critical software. ISO 9000 standards are not sufficient in themselves to satisfy safety standards. However, any company undertaking the ISO 9000 certification process clearly makes a positive statement of intent regarding quality objectives. Def-Stan 0055 The Procurement of Safety Related Software in Defence Equipment [DS 00-55] was published in August, 1997. It is composed of two parts. Part one covers the requirements and part two provides guidance. Like most others, this standard requires a thorough development and verification process. There is, however, a heavy emphasis on formal approaches to specification, design, verification, etc. The Def-Stan 0055 standard recognizes that, although desirable, formal methods require special skills and tools, which may not be available to a project in a timely fashion. The representative program manager has a great deal of discretion on the use of formal methods, and how much they should be supplemented with alternative verification techniques. Development Guidelines for Vehicle-Based Software The Motor Industry Software Reliability Association (MISRA) is a consortium of automotive and component manufacturers, together with motor industry associations and a university. The MISRA members undertook to research specific issues relating to automotive software. This research resulted in nine detailed reports being published. These development guidelines cover the software life cycle and offer a different perspective for the needs of the automobile industry. Factors in safety analysis must consider possible hazards associated with human interaction with a system. Driver experience, human reaction times, and attentiveness are some of the factors listed. Also included is the risk compensation factor, where improved safety can lead to more risky behavior. Several of the reports discuss the use of languages and compilers: Some safety-critical software pundits deprecate the use of `C' due to its incomplete ISO definition resulting in many aspects of the language being undefined, unspecified or implementation specific. In these aspects, it is viewed as being weaker than assembler. The advent of C++ is regarded by some with horror as the language specification is even weaker than that for `C' and elements of the compiler are becoming very complex.Since the publication of the Development Guidelines for Vehicle Based Software, another document entitled Guidelines for the use of the C language in Vehicle Based Software has been published [MISRA-C]. In the introductory paragraphs the following statements are made: Nonetheless, it should be recognised that there are other languages available which are in general better suited to safety critical systemsThe MISRA C guidelines catalog language rules. There are 92 required rules and 35 advisory rules. Of the 92 required rules, 70 of them do not apply to Ada. Of the 35 advisory rules, 27 of them do not apply to Ada. An example of a rule that applies to both C and Ada is "no-recursion." An example of a rule that does not apply to Ada is "do not use leading zeros in numbers to denote octal-based numbers." Many of the problems addressed by the C guidelines, which require special care and attention by the C programmer, are solved by the Ada language definition and implementation. IEC 880 In 1986, the International Electrotechnical Commission (IEC) published a standard on Software for Computers in the Safety Systems of Nuclear Power Stations. This IEC 880 standard is applicable to the highly reliable software required for computers used in the safety systems of nuclear power plants for safety functions. Although no specific language is recommended, guidance is given for the selection of a suitable language based on some common basic rules for safety-system programming languages. For example:
Use of Ada in Safety Critical Systems
It is the desire of the airline community to reduce the cost and the economic risk associated with avionics software systems. As a means to achieve this, it is recommended that the Ada programming language be used as the standard High-Order Language (HOL) in avionics equipment design.ARINC Report 613 also makes the following recommendation on the certification of the Run-Time Library (RTL): It should be the compiler manufacturer's responsibility to verify the RTL, using the guidelines of DO-178A to the highest level of criticality . . .The Boeing Commercial Airplane Group (BCAG) has selected Ada as the programming language for use in digital avionics computers. Ada was used by Boeing and by their suppliers to develop code for the new Boeing 777 aircraft. Future commercial aircraft and systems will also use Ada. BCAG Airborne Software Product Requirements [D6-81925] specifies: For software levels A or B the supplier shall implement source code in a programming language as defined in table 2.1-1.Table 2.1-1 lists just two languages: Ada 83 and Ada 95. BCAG Digital Avionics Ada Standard [D6-53339] identifies the standards for use of the Ada programming language. The [D6-53339] document lists requirements that specify the features of the Ada language and how they may be used in digital avionics software. The document also addresses the Ada Run-Time System (RTS) and states the following: 3.2 For the purposes of airborne digital computer avionics systems, the Ada RTS is part of the software end item. The following apply if an Ada RTS is used:These and other safety standards require or recommend the use of industry-best practices in all aspects of systems development. One area of key importance is the programming language used as the basis of the final installed system. The standards specify the use of a language which is well defined, has validated tools, enables modular programming, has strong checking properties, and is clearly readable.
The Benefits of Ada in Safety Critical Systems
A study undertaken by A. Hill [Hill 91] of Nuclear Electric states the following: The languages C and Ada are compared in some detail, from the point of view of their suitability for use in safety critical software systems or indeed for any application for which reliability is a major concern. The conclusion reached is that C is unsuitable because it contains a large number of dangerous features. Ada, on the other hand, contains many features which safeguard program integrity and is believed to be wholly appropriate for this purpose.The conclusion is clearof all the widely available languages suitable for safety critical software, only Ada is an appropriate baseline for these applications.
Restricting Ada for Safety Critical Systems
Tasking
When the combination of these states becomes complex, it becomes very difficult to predict the exact operational sequences that will take place. Writing a set of tests for these tasking operations is formidable. The current state of the art precludes such complex and rigorous testing. Use of the full features of the Ada tasking system is not recommended in safety critical systems. A subset of the tasking system, which constrains tasking operations to a deterministic subset, has been defined and is described in the Ravenscar Profile section on Ravenscar Profile.
Exception Declarations
The time taken to locate the appropriate handler depends on the dynamic nesting of subprograms. Testing for all possible subprogram combinations at each point at which an exception can be raised presents a combinatorial explosion of states, since exceptions can be raised implicitly during program execution. When an exception is raised, control is transferred from any statement in the enclosing frame, which makes it difficult to predict the state of the global variables. Thus, the use of nested exception handlers is to be avoided in safety critical systems.
Other Restrictions
Run-Time Requirements
The run-time system must be provided in the program library, which is used during the compilation of an application. While application developers have control and visibility over their own code, the Ada run-time system is usually not visible to users. It may be possible to limit the program constructs to a subset that requires no run-time support at all or to inline the run-time support code. This puts the burden of programming and verification on the user. Since the run-time system is part of the delivered executable program for a safety critical application, it must be subjected to the same hazard analysis and certification as the application code. Consequently, as part of the deliverables, full documentation and certification materials must be supplied, not only for the application code itself but also for the run-time system actually used. The normal run-time system for full Ada is not appropriate for safety critical systems (which use only a subset of the language) because it contains code that uses techniques outside the certifiable regime.
Ravenscar Profile
The Ravenscar model is defined by the following:
The Ada tasking system includes a rich set of features, some of which are inappropriate for safety critical systems. Certain tasking constructs must be avoided and a style of programming imposed which ensures deterministic behavior. Cyclic Executives Traditionally, safety critical real-time control systems are programmed using simple cyclic executives. These are usually implemented as a loop, which calls procedures in turn. A simple representation of such a loop is shown in the figure below:
The calls are made in a cyclic sequence. When the execution of the last procedure completes, a clock function may be invoked to delay the call of Pa until a known time-point, such that each call of Pa starts at a fixed time interval established by the clock. An alternative mechanism may be to suspend the main program after Ph (perhaps executing some background task) and delaying the restart of the loop until triggered by an interrupt generated externally at a prescribed time interval. This time interval is known as the time frame. It is essential that all of the procedures, Pa through Ph, complete their actions within the specified time frame. In safety critical systems, the longest path through each procedure must be estimated and verified through test. When taken in aggregate, the sum determines if the work required can be completed within the allotted time frame. There are a number of problems with this approach:
Periodic Tasks Rather than having one loop which calls the procedures, each of these procedures may be mapped to a task, with the body of the task implemented as a loop. The main loop of the task contains the same code as in the procedure Pa, but at the end of the loop a delay until statement causes the task to be suspended until it needs to be run again. Since these tasks execute continuously in a loop at a frequency that is determined by the delay until statement, they are called periodic tasks. An example of the structure of a periodic task is shown below: task Ta isTo implement the same cyclic scheduling code using tasks instead of procedures Pa to Ph, each procedure is mapped to a corresponding task, Ta to Th. Each of these eight tasks is given a different priority, with task Ta at the highest priority, Tb to be at the next highest, and so on down to the lowest priority task, Th. Task Ta completes its processing, calculates Next_Frame, and delays until that time arrives. Task Tb then completes its processing and delays until Next_Frame. Tasks Tc to Th follow in turn. The effect is that all tasks execute just once in each time frame and then are suspended until the end of the current frame. At the start of each time frame, all tasks are ready to run, in priority order. If each of the tasks suspends itself until the end of the frame, then this implements the cyclic executive scheduling algorithm. Each task may be run at its optimum frequency subject to the availability of hardware resources. If the time frame for these tasks is 50 milliseconds, then a delay until statement in each task increments the time of the next frame by this amount in each loop. The schedulability of the system can be determined by finding the maximum time taken for each task in the periodic loop and dividing this by the period in which each task has to be repeated. These ratios define the portion of time that the processor must devote to each task. The sum of these ratios may be used to calculate and check if there is sufficient processor time available to ensure that no task misses its deadline. So far, only periodic tasks have been considered. These run concurrently and do not interact. Notice, however, that the period of each task may be different and should reflect the repetition rate required by the application rather than the rate imposed by the program structure. Event Driven Tasks The full Ada language provides several mechanisms for tasks to synchronize operations. There are three main reasons why tasks need to synchronize:
Protected procedures may be attached to interrupt handlers. An interrupt may simply result in some code being executed in response to the hardware event, or the handler may trigger a task awaiting a corresponding software event. An alternative synchronization mechanism called Synchronous_Task_Control is defined in Annex D (Real-Time Systems Annex) of the Ada Reference Manual (ARM). This package allows users to define very simple suspension objects with their corresponding operations. The simple Suspend_Until_True, Set_True, and Set_False operations provide the primitive mechanisms to synchronize operations between tasks. Although not explicitly mentioned in the Ravenscar Profile, the implementation of this package is not precluded. It provides a simple building block to construct higher level application specific mechanisms. Benefits of the Ravenscar Profile The Ravenscar Profile possesses the following advantageous software features:
Ada 95 for Safety Critical Systems Annex H - Safety and Security The ANSI/ISO standard for Ada [Ada 95] addresses the requirements for systems that are safety critical. The language is described in the core document, required libraries are defined in Annexes A and B, and optional packages and pragmas are described in annexes C through H. The safety and security annex, Annex H, covers three topics:
HRG The Annex H Rapporteurs Group (ISO/IEC/SC9/WG22/HRG) was formed to produce a guidance document for the use of Ada 95 in high integrity systems. Over a three year period, the group produced a document which is currently under review and on track to becoming an ISO/IEC technical paper [ISO/IEC PDTR 15942]. The group analyzed several sector-specific standards and extracted the requirements for software verification. Ada 95 constructs were then analyzed and a set of tables produced which make a statement on the degree of difficulty that a particular construct or construct combinations present to various verification techniques. The guideline does not make recommendations on the constructs that could be used and those that should be avoided, as this depends on the safety guidelines, the certification level, and the verification methods proposed. Safety critical projects could use this information to specify the design and coding standards specifically for the project in combination with the proposed verification plans.
Certification
What is Required for Certification All certification guidelines stress the importance of a process based on sound engineering practices. The steps to be taken in the development of safety critical software must be well understood and documented before the software can be certified. Rather than waiting until the software is fully developed and tested, it is wise to involve the certification authorities in the early planning stages. Planning Documents To raise the confidence in safety critical software, the development stages used in its production must be understood. Software developed by a software engineer working alone does not instill the confidence required to certify a system for flight. A preferred approach is to use a team that follows a controlled software engineering method. It is important that this method be used consistently throughout the project.
Plan for Software Aspects of Certification (PSAC)
Software Development Plan (SDP)
Software Verification Plan (SVP)
The following topics must be addressed in the SVP:
Software Configuration Management Plan (SCMP)
Software Quality Assurance Plan (SQAP)
STANDARDS
Software Design Standards
Software Code Standards
Data Documents The following documents serve as the repository for data and records maintained for lifecycle processes.
Several kinds of testing strategies are required to achieve confidence in a safety critical system. "Black Box" testing checks that each function generates the expected results under all conditions that the function might encounter. Each function must be tested with its typical data values, and also with its data values at the boundaries to check the extreme conditions that could be experienced. "White Box" testing (also known as "Glass Box" testing) is a more stringent testing process. It involves analyzing the structure of a function under test to ensure that all the elements of the function are required, all the elements are executed, and all execution paths in the application are adequately covered. The tests must ensure that the program executes all conditions, and that all conditions work correctly when evaluated to both true and false. The tests and testing environment must be designed to ensure that the tested software is as close to the final configuration as possible. If the testing environment is intrusive, the test results must describe the expected differences between the tests and the final product.
Condition/Decision Coverage Testing
Identification of Conditions and Decisions. If several conditions are combined with boolean operations, then this becomes a decision. For DO-178B the requirements for coverage testing specify that:
TEMP:= A=B and (C2 or D<3);Modified Condition/Decision Coverage Modified condition/decision tests are more difficult to formulate. MCDC tests show that each condition will independently affect the outcome of the decision. Tests are written in combinations such that for each condition there is a pair of states that change only the one condition and there is a corresponding change in the decision outcome. For the Ada statement: if A=B and (C2 or D>3) then P; end if;the following tests must be shown to comply with these truth table results: A=B - shows corresponding results in cases 3 and 4 C2 - shows corresponding results in cases 1 and 2 D>3 - shows corresponding results in cases 1 and 3 One way to avoid modified/condition decision coverage is to use the Ada short-circuit conditions: if A=0 and then B<2 and then C>5 then P; end if;This ensures that the conditions are only evaluated so far as they are needed to achieve the final result. The short circuit form is equivalent to three nested if statements: if A=0 thenHowever, this shortcut method of avoiding MDCD coverage testing is only valid if the testing method shows each condition being tested to TRUE and FALSE, and not just the decision. Coverage Testing Operational software (software that is loaded as part of the application itself) must satisfy the requirements of the system. There must be no additional software that cannot be traced back to requirements. The rigor with which this is checked depends on the criticality of the application. Under the DO-178B guidelines, level B requires the relationship between requirements and code to be verified at the source code level. At level A this must be shown at the object code level, unless the traceability between source and object code is verified. Compilers usually contain complex algorithms that use data and code flow information to optimize the use of machine registers. Inserting code to trace the results of conditions and decisions will affect the registers used and possibly the order of evaluation. Some method of mitigating the risk of missing some condition during test is required if source is modified to obtain coverage. One approach may be to compile and test with different compilers and verify equivalent coverage results. Control to Data Coupling DO-178B requires that the relationship between data and the code that transforms it is documented. A flight control system implemented using a cyclic scheduler might perform the following:
Systems that pass information through unstructured global data areas make data-to-code coupling analysis difficult. Ada has features that can make implementation, documentation, and subsequent analysis much easier. Putting type and data information in package specifications, use of private types, use of parameter modes and so on, helps to establish the relationship between data and code that uses the data. Careful use of scope, visibility, and access rules can make the data-to-code coupling obvious. Data "Freshness" In a cyclic scheduler, data input is obtained by polling some value at the repetition rate for the loop. The actual data arrival rate may vary. For example, an analog/digital converter, which counts incremental voltage steps, takes longer to convert a higher voltage than a lower one. It could be difficult to couple data values to the particular invocations of the functions that process it. Control systems that require such coupling to be shown may need to increase processing power to ensure that the loop is always "ahead" of the data, improve the quality of the data input device, downgrade the performance of the system, or take a number of other steps to ensure data integrity. Event-Driven Solution It may be easier to certify an event-driven system than a cyclic-based design (except for the simplest cyclic applications). A protected object provides a natural and direct coupling between a data value and the code that uses it. Tasks may be triggered to process data on its arrival. Such event-driven systems implement control based on system needs rather than on the flow imposed by the code structure. Event-based scheduling. together with well structured scope and visibility controls, provide an easier way to demonstrate data to code coupling. Software Accomplishment Summary (SAS) All of the system and software requirements must be adequately covered by tests. Implicit derived requirements, including initialization of the stack or set-up of heap addressing registers, must also be tested. To ensure that every requirement and every byte of code is tested, a compliance matrix must be produced that records the relationships between documents, code, tests, and test results. The matrix must be identified in the SAS. The entire development process used for the project is recorded in the SAS, which provides documentation of compliance with the PSAC.
Conclusions
Ada plays an important role in the development of safety critical software because it is the only commercially available language with characteristics corresponding to various standards, such as those developed by the avionics industry. The Ada program ultimately delivered requires an embedded run-time system as well as project-specific Ada code. Liability Considerations On July 25, 1985, the European Community (EC) Council ratified a Directive Concerning Liability for Defective Products 85/374. Under this directive, products are considered to be defective when they do not provide the level of safety that the public has a right to expect. The directive creates a strict liability factor and introduces a uniform concept of product liability in some EC nations where such a view did not previously exist. The EC Directive on General Product Safety 92/59 was approved by the EC Council. The directive became fully operational in June, 1994, and applies to all products placed on the EC markets. The objective of the directive is to impose a general requirement on producers to introduce only safe products into the EC market. Some legal opinion has been expressed that appropriate quality standards (ISO 9000) together with safety certification materials could be useful in a legal defense. Trends The era of safety critical software is just beginning. Software applications will increase in size and complexity as the move toward automated systems continues to grow in all sectors. Public expectations for safety in products and services will also increase. The wide acceptance of the automotive industry's introduction of air bags and anti-lock brakes is clear proof that customers will pay a premium for safety features. As product safety becomes more of an international imperative, a growing number of industries will be forced to develop and enforce their own standards for safety critical certification. However, this process can also result in a number of positive benefits to the enterprise:
Recommendations for Safety
References
|
|
||||||||||||||||||||||||||||||||||||||||||