Standard To Define RCM

An SAE technical committee has completed a draft standard for Reliability-Centered Maintenance (RCM). It provides criteria that can be used to evaluate proposed maintenance program development processes and determine whether they are RCM processes.

In February 1999, a technical committee sponsored by the International Society of Automotive Engineers (SAE) completed a draft standard for Reliability-Centered Maintenance (RCM), for use by anyone who wishes to apply RCM to their physical assets. The draft SAE standard provides criteria that can be used to evaluate proposed maintenance-program-development processes and determine whether they are RCM processes. SAE approval is expected by September 30, 1999.

RCM is one of several processes developed during the 1960s and 1970s, in various industries, in order to help people determine the best policies for managing the functions of physical assets, and for managing the consequences of their failures. Of these processes, RCM is the most thorough.

An RCM process systematically identifies all of the asset's functions and functional failures, and identifies all of its reasonably likely failure modes (or failure causes). It then proceeds to identify the effects of these likely failure modes, and to identify in what way those effects matter. Once it has gathered this information, the RCM process then selects the most appropriate asset management policy.

Unlike some other maintenance development processes, RCM considers all asset management options: on-condition task, scheduled restoration task, scheduled discard task, failure-finding task, and one-time change (to hardware design, operating procedures, personnel training, or other aspects of the asset outside the strict world of maintenance).

Development of the standard
When the SAE group began working, it thought in the same terms as most others; it thought that an RCM standard had to prescribe a standard RCM process. Therefore it began to work on developing such a process. This was difficult, because different members of the group were already using different processes as they performed RCM. The first members of the group had to work together for about a year of occasional meetings before they developed enough respect for each other's expertise to allow them to listen to one another without rejecting each other's proposals outright. It took another year before they began to agree on a common process that might be called a standard RCM process.

Informal feedback from the RCM community showed that people outside the committee were unaware of the careful compromises in the first draft, and they saw no need for such compromises. It appeared that the effort to develop a standard process was likely to produce only another process, which would be added to the processes already competing for the RCM name. It took another half-year to realize that there was another way.

The current draft standard being considered by the SAE does not present a standard RCM process. Its title is Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes. This standard presents criteria against which a process may be compared. If the process meets the criteria, the standard's user may confidently call it an RCM process. If it does not, the user probably will not call it an RCM process.

The draft standard is not a large document. Including foreword, glossary, and bibliography, it contains only about 4000 words. After the introduction, it begins with the basic statement of the seven RCM questions noted in the accompanying section Essential Elements of RCM.

The last sentence in the essential elements section represents an important concept: any process that conforms to this standard will make the information and the decisions fully available to and acceptable to the owner or user of the asset. Since the goal of maintenance is to ensure that a physical asset continues to do what its owner or user wants it to do, every RCM process must ensure that the desires of the owner or user are made an integral part of the maintenance development process. It is not enough for the vendor simply to hand owners or users a maintenance program, without asking owners or users what they wish the asset to do for them, not if the vendor is going to develop the maintenance program using RCM.

Each of the seven RCM questions is then supported with specific criteria that ensure that the process under evaluation answers the question satisfactorily.

Question 1: Functions
What are the functions and associated desired standards of performance of the asset in its present operating context (functions)? The specific criteria that the process must satisfy are:
• The operating context of the asset shall be defined.
• All the functions of the asset/system shall be identified (all primary and secondary functions, including the functions of all protective devices).
• All function statements shall contain a verb, an object, and a performance standard (quantified in every case where this can be done).
• Performance standards incorporated in function statements shall be the level of performance desired by the owner or user of the asset/system in its operating context.

The operating context is what it says: the context in which the asset is operated. The same hardware does not always require the same failure management policy in all installations. For example, a single pump in a system will usually need a different failure management policy from a pump that is one of several redundant units in a system. A pump moving corrosive fluids will usually need a different policy from a pump moving benign fluids.

Protective devices are often overlooked; an RCM process shall ensure that their functions are identified.

Finally, the owner or user shall dictate the level of performance that the maintenance program shall be designed to sustain. Once again, this key RCM characteristic is part of the evaluation criteria provided by the SAE standard.

Question 2: Functional failures
In what ways can it fail to fulfil its functions (functional failures)? This question has only one specific criterion: All the failed states associated with each function shall be identified.

If functions are well defined, listing functional failures is relatively easy. For example, if a function is to keep system temperature between 50 C and 70 C, then functional failures might be: Unable to raise system temperature above ambient, unable to keep system temperature above 50 C, or unable to keep system temperature below 70 C.

Question 3: Failure modes
What causes each functional failure (failure modes)? In Failure Modes, Effects and Criticality Analysis (FMECA), the term failure mode is used in the way that RCM uses the term functional failure. However, the RCM community uses the term failure mode to refer to the event that causes functional failure, so the SAE standard uses the term in this way as well. The standard's criteria for a process that identifies failure modes are:
• All failure modes reasonably likely to cause each functional failure shall be identified.
• The method used to decide what constitutes a “reasonably likely” failure mode shall be acceptable to the owner or user of the asset.
• Failure modes shall be identified at a level of causation that makes it possible to identify an appropriate failure management policy.
• Lists of failure modes shall include failure modes that have happened before, failure modes that are currently being prevented by existing maintenance programs, and failure modes that have not yet happened but that are thought to be reasonably likely (credible) in the operating context.
• Lists of failure modes should include any event or process that is likely to cause a functional failure, including deterioration, human error whether caused by operators or maintainers, and design defects.

RCM is the most thorough of the analytic processes that develop maintenance programs and manage physical assets. It is therefore appropriate for RCM to identify every reasonably likely failure mode. While reasonably likely is obviously not subject to a strict and rigorous definition, it is possible to name some of the things that are expected to be included within itcertain things that some processes explicitly exclude from their analysis.

For example, some processes explicitly exclude failure modes already addressed by the existing maintenance program. An RCM process will examine these failure modes, in order to decide whether existing maintenance practices are truly the best way to manage those failure modes.

Another thing that an RCM process will include is failure modes that have not yet happened, but that are thought to be reasonably likely (credible) in the operating context. Some analytic processes look only at failure histories, not attempting to foresee problems that have not yet been encountered. In retrospect, it is often said of many industrial accidents that they were simply waiting to happen, that it was only a matter of time before the site's unsafe but customary practices arranged themselves in a sequence that led to disaster. Before that disaster, the failure mode had never appeared in the site's failure history.

Finally, an RCM process will not restrict itself to engineering processes such as deterioration. Human error (especially from lack of training) and design defects lead to many failures as well, and in many industrial sites no one looks at these topics in an organized way.

The standard recognizes that some organizations, especially very large organizations such as the U.S. military, distribute responsibilities for these topics in different offices and may be reluctant to put all of those responsibilities under an RCM program office. The process being evaluated is intended to be the process that the entire organization uses, not simply one office within the organization. If the organization's process satisfies these criteria, then the organization has satisfied this element of an RCM process.

Question 4: Failure effects
What happens when each of the failures occur (failure effects)? The standard's criteria for a process that identifies failure effects are:
• Failure effects shall describe what would happen if no specific task is done to anticipate, prevent, or detect the failure.
• Failure effects include all the information needed to support the evaluation of the consequences of the failure, such as: (a) What evidence (if any) that the failure has occurred (in the case of hidden functions, what would happen if a multiple failure occurred); (b) What it does (if anything) to kill or injure someone, or to have an adverse effect on the environment; (c) What it does (if anything) to have an adverse effect on production or operations; (d) What physical damage (if any) is caused by the failure; and (e) What (if anything) must be done to restore the function of the system after the failure.

FMECA usually describes failure effects in terms of the effects at the local level, at the subsystem level, and at the system level. This reflects its origins in the U.S. military, which assigns each component a place in a functional hierarchy. Some RCM processes follow FMECA's example here. A process that follows this three-part format can satisfy the SAE criteria, so long as the information above is provided.

Some people may stumble over the last element of information: what (if anything) must be done to restore the function of the system after the failure. They may feel that this brings corrective maintenance into the RCM analysis. Actually, this is information that someone will have to gather at some point, in order to compare the costs of maintenance versus the costs of the failure. Practical experience with RCM has found that this is the most convenient point at which to gather (and record) that information.

Question 5: Failure consequences
In what way does each failure matter (failure consequences)? The standard's criteria for a process that identifies failure consequences are:

  • The assessment of failure consequences shall be carried out as if no specific task is currently being done to anticipate, prevent, or detect the failure.
  • The consequences of every failure mode shall be formally categorized as follows:
    • The consequence categorization process shall separate hidden failure modes from evident failure modes.
    • The consequence categorization process shall clearly distinguish events (failure modes and multiple failures) that have safety and/or environmental consequences from those that only have economic consequences (operational and nonoperational consequences).

RCM assesses failure consequences as if nothing is being done about it. Some people are tempted to say, Oh, that failure doesn't matter because we always do (something), which protects us from it. However, RCM is thorough. It checks the assumption that this action that we always do actually does protect them from it, and it checks the assumption that this action is worth the effort.

RCM assesses failure consequences by formally assigning each failure mode into one of four categories: hidden, evident safety/environmental, evident operational, and evident non-operational. The explicit distinction between hidden and evident failures, performed at the outset of consequence assessment, is one of the characteristics that most clearly distinguishes RCM, as defined by Stan Nowlan and Howard Heap, from MSG-2 and earlier U.S. civil aviation processes.

The SAE's criteria add an element to Nowlan and Heap's categories. In 1978, people were far less conscious of the environment than they are today. The consequences of harming the environment were chiefly economic, in terms of fines and fees that many firms felt they could afford.

Then, in the 1980s, the world experienced a string of industrial accidents with serious effects on the environment, such as Chernobyl and Exxon Valdez. Governments increased the severity of their punishments for environmental accidents. Today in some cases an environmental accident may cause a plant to be shut down completely, and its owners or users may be subject to prison terms. To harm the environment is becoming as dangerous to the organization, in business terms, as it is to harm people directly. Environmental consequences are becoming as important as safety consequences.

Question 6: Proactive tasks
What should be done to predict or prevent each failure (proactive tasks and task intervals)? This is a complex topic, and so its criteria are presented in two groups. The first group pertains to the overall topic of selecting failure management policies. The second group of criteria pertains to scheduled tasks and intervals.

The standard requires a process that selects failure management policies to work as if nothing is currently being done about the failure, and to make no assumptions about the presence or absence of wearout. It also requires the process to select scheduled tasks only if they are techically feasible and are worth doing.

With respect to scheduled tasks and their intervals, the standard describes in detail what criteria an RCM process must use to determine whether a task and its interval are technically feasible and worth doing.

Details about all these criteria will be covered in a subsequent article.

Question 7: Default actions
What should be done if a suitable proactive task cannot be found (default actions)? This question pertains to one kind of scheduled task (failure-finding), as well as unscheduled failure management policies: the decision to let an asset run to failure, and the decision to change something about the asset's operating context (such as its design or the way it is operated). Once again, the standard describes in detail what criteria an RCM process must use to determine whether a failure-finding task is technically feasible and worth doing, and whether an unscheduled failure management policy may be selected.

The full set of criteria will be discussed in a subsequent article along with further discussion of selecting proactive tasks.

It is at this point, after selecting proactive tasks and default actions, that most RCM processes usually end. The SAE standard continues briefly.

Two remaining issues
The first issue has to do with the fate of the analysis after the process has run its course. The SAE standard recognizes that (1) much of the data used in the initial analysis are inherently imprecise, and that more precise data will become available in time, (2) the way in which the asset is used, together with associated performance expectations, will also change with time, and (3) maintenance technology continues to evolve. Thus a periodic review is necessary if the RCM-derived asset management program is to ensure that the assets continue to fulfill the current functional expectations of their owners and users.

Therefore, the standard continues, any RCM process shall provide for a periodic review of both the information used to support the decisions and the decisions themselves. The process used to conduct such a review shall insure that all seven RCM questions continue to be answered satisfactorily and in a manner consistent with criteria set out for each in the standard.

The second issue has to do with mathematical and statistical formulae used while applying the processespecially those used to compute task intervals. Some processes offered by some vendors use mathematical algorithms that apply the methods appropriate to scheduled restoration or discard tasks when they compute task intervals for on-condition tasks. Some vendors use complex algorithms that are not easily explained, and then bury them in computer software without offering a coherent account of the mathematics involved.

The SAE standard contains this requirement regarding such formulae: Any mathematical and statistical formulae that are used in the application of the process (especially those used to compute the intervals of any tasks) shall be logically robust, and shall be available to and approved by the owner or user of the asset.

Issues not included
We have seen what is in the SAE standard. What is equally worth noting is what is not in the SAE standard.

First, the SAE standard has no decision-logic diagram. This is deliberate. Different members of the RCM community use different decision-logic diagrams. It seemed not only difficult but also unnecessary to require an RCM process to use only one diagram, since it was possible to give general requirements for tasks that were technically feasible and worth doing without using a diagram.

Second, the standard does not dictate how to organize the RCM analysis. This too is deliberate. Within the RCM community, there are different opinions about the best way to organize the work of an RCM analysis. Some say that the analysis is best performed by a single RCM expert, supplemented by technical information about the asset from local experts, while others say that the analysis is best performed in groups, by local experts who are trained in RCM.

Further, some say that the RCM analysis should be performed by expert consultants (usually a single expert), while others say that the analysis should be performed by trained employees of the organization (either a single analyst or a group). As an intermediate position, some say that a consultant should lead a group of trained employees.

The SAE standard does not prescribe such matters. It is restricted to evaluating possible RCM processes, and does not evaluate ways to use an RCM process.

Third, the SAE standard does not address the question of which assets should be subjected to the process. This question lies outside the scope of the RCM process itself, and is a matter for the owner or user of the asset to decide.

In practice, some owners/users decide to apply RCM to all their assets. Others decide to apply RCM only to their most critical units. Some who decide to apply RCM to all their assets decide to do so as quickly as possible. Others decide to apply RCM to all their assets more slowly.

All of these decisions are made on business grounds, based on weighing the expected costs of the process against its expected benefits. None of these decisions has any bearing on the validity of the process that is applied. Since this standard addresses only the process itself, it therefore remains silent on this decision.

Using the standard
Now that we have seen the standard, how would it be used?

The SAE standard is expected to be used by organizations that want to receive the benefits of RCM and that wish to insure that the process they use is indeed an RCM process.

Some organizations may already be using a process, and may wish to see whether it is an RCM process. The SAE standard helps them do this. If it is not, they may wish to consider whether the benefits they are obtaining from their non-RCM process justify their efforts to use it. If they conclude that the process works and is worth doing, they are free of course to continue to use it. However, if they do wish to use RCM, now they will know that they need to begin using a different process--either in place of, or as well as, their existing process.

Organizations that do not already have RCM in place may want help from those who offer to use or teach RCM. The standard is voluntary; no one requires an organization to use it to evaluate RCM processes. However, an organization that chooses to use the SAE standard is free to reject those whose processes do not meet the standard's criteria, just as a company is free to reject vendors whose processes do not meet the criteria of other noncompulsory industry standards, such as SAE's automotive FMECA standard.

An organization that uses this standard may find itself evaluating a number of firms that offer RCM services. Of those firms, it is possible that more than one firm may have processes that satisfy the criteria in this standard.

Once these surviving processes have been identified--once it is clear that these firms are offering RCM processes--the evaluating organization will then need to choose among them. It is expected that the organization will make its final decision on business grounds: for example, by examining which firm seems best-equipped to provide appropriate services by an appropriate date, which firm has the most appropriate price, and so forth. Issues such as the way in which RCM is organized and which assets should receive RCM, discussed previously, will emerge in the course of these business decisions, and this SAE standard is not intended to evaluate those decisions.

Thanks in part to its carefully restricted scope, the standard was completed only 8 months after its first draft was presented to the SAE RCM subcommittee. In February 1999, the final draft was submitted to the SAE's Supportability Committee for balloting. Once any necessary changes have been made and approved, the draft then will be submitted to the SAE's Technical Standards Council for balloting. If all goes well, the standard should be approved by the SAE in September 1999.

The SAE standard for RCM is expected to help those who wish to apply RCM as they evaluate their own processes, or the processes proposed by vendors and consultants. By using the standard, organizations will be able to determine which processes are RCM processes, and which are not. MT

(Proactive tasks and default actions, Questions 6 and 7, are discussed at length in "RCM Tasks.")

Dana Netherton is chairman of the SAE's RCM subcommittee, and a principal of American Management Systems, Inc. (AMS), Fairfax, VA. He has been working in the field of RCM since 1989, chiefly with surface ships in the U.S. Navy. Before joining AMS, he served in the U.S. Navy for 10 years as an officer in nuclear submarines. He can be contacted by e-mail This e-mail address is being protected from spambots. You need JavaScript enabled to view it .

Essential Elements of RCM

The proposed standard, Evaluation Criteria for Reliability- Centered Maintenance (RCM) Processes, contains the following statement as a basis for an RCM process:
Any RCM process shall ensure that all of the following seven questions are answered satisfactorily and are answered in the sequence shown below:
1. What are the functions and associated desired standards of performance of the asset in its present operating context (functions)?
2. In what ways can it fail to fulfil its functions (functional failures)?
3. What causes each functional failure (failure modes)?
4. What happens when each failure occurs (failure effects)?
5. In what way does each failure matter (failure consequences)?
6. What should be done to predict or prevent each failure (proactive tasks and task intervals)?
7. What should be done if a suitable proactive task cannot be found (default actions)?

To answer each of the above questions satisfactorily, information shall be gathered, and decisions shall be made using the critera discussed in the body of this article. All information and decisions shall be documented in a way that makes the information and the decisions fully available to and acceptable to the owner or user of the asset.

back to article