Do you use a risk management program to conserve asset resources? Does your employer foster a site environment where risk management is a routine part of job planning, preparation and execution?
Risk management was once thought to be the sole product of the site safety department. Maintenance and operations professionals, however, now understand the importance of a risk management process to aid in protecting, conserving and extending the reliability of critical assets. Failure to effectively manage the risks of asset failure can add costs to an operating unit at any plant, site or installation.
Managing risks related to asset maintenance and operation requires good judgment and some professional expertise. It is an art, or vocation-and a science-with its own well-developed technological hierarchy. The objective of managing risk is not to remove all risk, but to eliminate unnecessary or avoidable risk. Thus, the process must allow individuals to make informed decisions about what risks to accept at each operational level.Managers should compare standard risk management principles with historical asset data and their own personal experience. Then, they need to consider how, when and why it applies to specific situations within their area of functional responsibility.
Both managers and craft/techs manage risk on a daily basis. Craft/techs continuously search for hazards within their areas of expertise, during daily job performance, and they routinely recommend the proper controls to reduce risks. Potential hazards and resulting risks vary as operating circumstances and parameters change.Management knowledge, gained from these experienced craft/techs, coupled with additional subject matter training, can influence the extent and success of risk reduction measures.
Have you ever heard of SFMEA? RCFA? Maintenance Optimization? RCM? These are all tools that can be employed to help a site preserve asset resources. Programs such as these can provide the means to identify, assess and implement controls of risks and potential hazards to critical assets. Specific parts of these tools also help compile information necessary for making decisions to help balance PM/PdM program costs with increased operating benefits.What does each have in common with the others? They all ask the same questions as the basic risk management model. You can see the similarities of each process step, or decision level, in Fig. 1.
Most of the previously referenced processes also have a big “M”in their acronym.Its meaning varies for different individuals. The commonality of these programs points to the real definition of that big “M.”All call for management. The acute risk to critical assets at a plant, site or installation is failing to use a process to manage them.
The Risk Management Process is composed of these five basic tasks or process steps: (1) Identify failure hazards; (2) Assess failure hazards; (3) Develop controls & make risk decisions; (4) Implement controls; and (5) Supervise & evaluate (performance of the control measures).
Tasks 1 and 2 comprise the risk assessment. In Task 1, managers and craft/techs identify the failure modes and hazards that may be encountered during operation of the critical assets. Task 2 is a determination of impact of each failure incident and resulting loss of operational function.
Tasks 3 - 5 are activities to help the manager effectively reduce the occurrence, mitigate the consequences and manage risk incidents. In these steps, managers balance asset failure risks against the costs of performing RIB (risk based inspections), increased-frequency PM procedures and expanded PdM programs. They also implement the appropriate actions required to eliminate unnecessary failure risks during asset operation. The planning, preparation and performance of repair, replacement and preventive maintenance activities are carefully evaluated during these steps along the risk management path. Lastly, control activities are monitored and evaluated for their effectiveness and valuable lessons learned are collected for use by others.
Applying the basic risk management model
1. Identify the failure hazards. . .
A hazard is a condition or potential condition where the failure results in loss of an operating function, damage to, or loss of an asset and related components found in an operational environment.
2. Assess the failure hazards. . .
Asset risk is defined as the combination of probability of failure and the consequences (severity) of that occurrence.We can define probability as the likelihood of a failure occurring, and severity as a measure of the impact of the failure to the plant, site or installation operating functions. Asset risk calculations increase as a result of higher probability rates and greater impact to an operation
A risk assessment requires each potential failure incident, hazard or mode to be evaluated in relation to the probability of an incident occurring and the severity (or impact upon the plant, site or installation) of that incident or failure.
This activity is heavily dependent upon the use of asset history, lessons learned in the field, intuitive analysis, the manager’s and craft/tech’s experience and sound judgment. Incomplete, inaccurate, undependable or contradictory information creates doubt and uncertainty when determining the probability and severity of a failure incident. Assessment of risk requires good judgment.
As shown above, Fig. 2 and Fig. 3 are tools that can be used to perform an asset risk assessment. Risk Assessment Tool 1A is a simplified matrix that can be used by the manager or craft/tech to enter the estimated degree of severity and probability for each failure incident or hazard. Numerical values have been assigned to each of the standardized descriptors. Multiplying the severity number by the probability number yields a product between 1 and 25. Comparing that number to the attached key indicates the estimated risk of failure. The larger the number, the higher the risk.
Risk Assessment Tool 1B is a similarly designed table that can be used by the manager or craft/tech in much the same way. First, estimate the level of severity and probability of occurrence, then read right and up. The point where the failure severity row and probability of occurrence column intersect will define the level of failure risk for a particular asset.
Defining the levels of probability of failure occurrence:
Frequent - Failures happen often. Likely - A failure will occur several times during the functional life of the asset. Occasionally - Sporadic incidents of failure. Seldom - Remote chance of an isolated failure. Unlikely - An asset failure is not impossible but highly improbable. The degrees of failure severity are: Catastrophic - Total loss of asset functionality. Implied threat to related assets, systems and property.
Critical - Significant reduction in asset, system, or plant operational capability. Significant collateral damage to adjacent assets, components, property, or environmental systems.
Marginal - Possibility of minor impact upon plant, site, or installation operational activities and requirements.
Negligible - Little or no impact on asset, system, or plant operation or capability. Little or no collateral asset, property, or environmental damage.
None - No impact.
The risk assessment tool examines potential failure occurrences in terms of probability and severity to determine the level of risk.
3. Develop controls & make risk decisions. . .
After identifying and assessing each failure hazard, managers and craft/techs must develop one or more risk controls that will aid in avoiding, preventing or reducing the risk (probability and/or severity) of a failure incident. While developing controls, managers must consider the reason for the failure, not just the incident or its impact on asset functions and operation.
Failure controls generally fall into three categories: risk avoidance, reliability-based technology and educational.
Risk avoidance may include engineering and/or redesign of asset installation and operational profile to remove any risk threat from operation and use of the equipment. Reliability- based activities can include optimized PM procedures, PdM technologies, RCFA (root cause failure analysis), and SFMEA (simplified failure mode effects analysis). RBI (risk-based inspection) is an application of basic risk principles to manage inspection programs for critical assets. Educational and training type controls provide knowledge and skill-based programs to ensure implemented procedures and tasks are performed to specific standards.
To make a meaningful risk decision, a risk assessment should be conducted soon after development and implementation of the above referenced program controls. These results are then used to aid the decision-making process with regard to the amount of risk the manager is willing to accept for the operation of a critical asset or system.
A key activity of this task is to specify by whom, what, where, when and how each control is to be used.
4. Implement risk controls. . .
The number of higher-failure-risk assets is generally a small percentage of total plant assets. Implement the new or additional PM and PdM tasks when and where needed and focus efforts on the most critical items. Institute a formalized proactive planning and scheduling function to ensure all resources required to perform the newly implemented activities will be available. The site CMMS should be configured to record and report KPIs (key performance indicators) required for implementation and continuance of a risk reduction or avoidance program. Do not discount or neglect interaction with MRO. Improve the skills of the workforce through asset, maintenance and reliability training.
5. Supervise & evaluate. . .
The manager is responsible for evaluating the effectiveness of the implemented controls and programs in reducing or removing the failure potential.
Managers and first-line supervision must ensure that subordinates understand how to execute risk controls. Craft/techs continuously assess risks during the workday and should maintain communication with managers. Both groups should guard against complacency to ensure that risk control and mitigation standards are not relaxed, circumvented or violated.
Managers must continuously supervise and monitor asset PM/PdM and other inspection activities to ensure they are effective and can keep risks at an acceptable level. Use the asset history from the site CMMS as a source of information to indicate which controls failed and why. Often, a completely different procedure may prove more effective and require implementation.
The level of failure risk for each asset remaining after implementation of best practice controls is called residual risk. As new controls for failure hazards are identified and selected, a risk assessment is again performed and levels of asset risk revised. MT