Achieving Maximum Benefit from CMMS Data

Reliability analyses conducted on groups of plant equipment provide insight not readily apparent to the casual observer.

Wall Street values a company's stock based upon the company's ability to predict production and earnings. Consistently low estimates are just as bad as overestimates. Persistent production reliability problems can have a direct impact on a company's market share and revenues. Good, reliable production demonstrates control. The question that investors ask is: "Does management have control of the business?".

Investors will put their faith (and their money) into companies that demonstrate the best control. Meeting production targets is a very important part of demonstrating this control. Highly reliable plants consistently meet production goals, which gives the perception that the plant is well run. Reliability problems can erode this perception. The need for reliable operations can be summed up in one word: Predictability. Predictability is one of the most sought after, yet rarely achieved, aspects of modern business. While better predictability is the goal, it is not clear how to achieve it.

New decision-making schemes must accompany advances in information technology. Risk, reliability, projections, and experience must be brought together to understand current business needs and future functions. Investment in technology must be used to satisfy those needs. These new processes often require a paradigm shift in order to be successful. This paradigm shift necessitates that we implement new processes and modify or eliminate old processes.

How do we get people to accept, comprehend, and use statistical techniques applied to computerized maintenance management systems (CMMS) data? Some will tell you, "Ah, that data is just garbage!" Others will say, "I just don't have the time to get to it." Still others contend that "even if I had the time, I dont know what data to use in order to understand reliability."

Resistance to change is quite often the largest barrier to successful implementation of new technologies and procedures. People resist change and tend to trust familiar practices more than new ones.

Reliability analysis
Reliability analysis is a business practice that will make your business more competitive. The goal of analyzing installed assets is to uncover the reasons, symptoms, causes, and effects of equipment unreliability to get a handle on unexpected equipment failures.

An effective reliability program rests on one fundamental principle: future probability of failure can be accurately predicted using previous failure data. If the decision-makers in the company do not accept this concept, there is virtually no probability of success for the program. Success can range from solving a few problems to the tracking of reliability of all major assets and allowing reliability results to influence the "repair-overhaul-replace" decisions that are made on a daily basis. This is not to imply that this should be the only criterion used to make these decisions, but that reliability analysis can be used to modify this approach to improve reliability.

Some companies track failure data separately, simply to report on reliability. In some industries, regulatory agencies require failure tracking and some even require adherence to limits on failure rates of assets installed in their facilities. This kind of regulatory requirement ensures that reliability problems get addressed as a regular part of doing business.

CMMS were not necessarily designed to capture and report reliability data. These systems were optimized to manage, organize, and plan complex maintenance schedules. Because these systems were not originally intended for reliability tracking purposes, some people argue that reliability analysis on this data is invalid. While this is true in some cases, many plants have excellent record keeping in their CMMS, and analysis conducted on this data can be very helpful.

Data integrity is a key issue in using CMMS data for reliability analysis. The analysts need to know the data collection practices used to gather the CMMS data. Is the data submitted in a consistent fashion or is each work order subject to a high level of variability? Variability in data capturing is the enemy of good reliability analysis.

What data is needed?
Some of the basic assumptions of reliability theory are that equipment "times to failure" can be modeled with statistical analysis techniques. The first step in this modeling is to create a set of data based upon failure records for the equipment under study. Some CMMS capture the failure data needed to conduct the reliability analysis. Work orders often contain a vast amount of information, including:

Asset or equipment ID

Asset type


Model number

Event type (PM, repair, etc.)

Description of work

Out of service date or failure date

Maintainable item or failed part

These data fields are used to extract failures against individual assets, manufacturers, or asset types. This is important when trying to model failures from the same cause (Weibull) or different causes (growth). Once we have created the set of data that describes the failures, statistical tools are applied to reveal additional information about the nature and cause of failure, the expected current reliability of equipment, the future reliability of the equipment if we solve the current problem, prediction of future failure time if no action is taken, and the evolution of a failure.

CMMS offer many ways to collect data about maintenance activities. In many systems, there are often areas for cost data, spare parts, and other fields to capture comments and descriptions. There are often many date fields that describe when work is scheduled to start, actually started, scheduled to be completed, and actually completed. When using CMMS data to perform failure analysis, care must be taken to use the proper data. In order to understand what data is available, definitions of each field need to be understood by the analyst.

When a piece of equipment fails in service, a sequence of events occurs. The same sequence happens, in most cases, independent of the CMMS used; listed chronologically, it goes like this:

The item fails

Someone notices that the item has failed

Someone contacts maintenance or enters a work request into the CMMS

The item is scheduled to be repaired

The repairs are conducted

The item is tested and made available for service

The item is returned to service.

While there are variations on this process, this list describes, in a generic sense, how the failed item is recognized and repaired. The CMMS entry noted is the first interaction by a worker with the CMMS. This may or may not coincide with the actual date/time of the failure. Reliability analysts need to keep this in mind when extracting data for use by the CMMS. Analysts often assume that the delay between when the item has failed and when it is reported to the system is short compared with the life of the equipment. For most analyses, this is a good assumption, especially for critical equipment. Sometimes more accurate failure estimates can be extracted from process data or operations logs.

Reliability analysis of the data
What constitutes a reliability analysis? There are many different ways to conduct a reliability analysis. Each method provides a slightly different answer that needs to be interpreted differently. For analyzing reliability data, we suggest the following five steps:

Step 1. Determine the goal of the reliability analysis.

Step 2. Extract the necessary data from the history brief view using a query.

Step 3. If a growth model is desired, build a query of the data that identifies the population of equipment that you want to model.

Step 4. Build the necessary reliability documents to satisfy the goal of the reliability analysis.

Step 5. Interpret the results and implement a corrective action if possible.

The goal of the reliability analysis
Each reliability analysis should have a goal. The goal helps to decide which tools to use. Unfortunately, sometimes analysts will use the wrong tool for the goal they wish to achieve. Two types of reliability modeling techniques are popular in industry today: Distribution analysis (Weibull, normal lognormal, and exponential distributions) and growth modeling.

In some cases, both a Weibull analysis and a growth model need to be constructed to get a complete picture of what is going on.

Weibull analysis
Weibull is by far the most popular approach for failure data analysis since the probability density function adapts itself to the population. A Weibull analysis provides information that can help an analyst to understand if the assets are experiencing end-of-life failures, infant mortality failures, or simply random failures with no discernable pattern. Weibull analysis results also can be used to estimate the time until a certain level of unreliability has been reached.

Growth modeling
In trying to understand the overall reliability of a set of equipment failure data, a popular technique called reliability growth (also called AMSSA Growth, Duane-AMSSAA or AMSSA-Crow) is often used. Growth is valid for all failure causes and can be set up to include assets that have not experienced failures. It is the ideal tool for understanding the overall reliability of equipment.

The growth model produces two parameters: beta and lambda. A beta value greater than 1 shows improving mean time between failure (MTBF). When beta is less than 1, MTBF is decreasing and reliability is deteriorating. These values of beta and lambda also can be applied to a formula which can be used to calculate time to next failure. These estimates have proven to be very accurate when the data is complete and failure data is accurate.

Accuracy of reliability data
Reliability programs use historical data to predict the future. When doing this, plant personnel are immediately faced with questions about the accuracy of the data being used to make the predictions. This accuracy is always of concern whenever future business decisions need to be based on historical datais the data accurate? Two features of reliability analysis help here.

The first is that the analysis itself can be used to sort out inaccurate data. Poor Weibull curve fit results point to inaccurate or "dirty" data. Mixed mode data can be sorted out through the visual inspection of the curve fit. Finally, significantly changing MTBF, detected through growth analysis, gives additional clues to a lack of data integrity.

The second feature of reliability analysis that helps overcome data inaccuracies is that the most important piece of information used for the reliability analysis is the failure date and time. This data for the most part is often quite accurate because it requires no interpretation by the user and, in many systems, is captured automatically.

Case history
During a routine inspection of maintenance costs data, a reliability specialist had come across a rather disturbing result. A Pareto chart of pump costs by location ID showed a particularly alarming result.

The graph, built with Meridium software, showed that maintenance costs for the pump at location Pump-1000 were over $300,000 in 1998. This result triggered an investigation by the reliability personnel. The first step in the investigation was to understand the cause for such high costs for this asset in this location. Another data query was constructed to extract the total maintenance costs for the entire period for which data exists, since 1990. The total maintenance costs over the 9-year period was $445,891.

Upon examining the results of the work order history query, the company found that $329,800 was associated with one event, a single work order conducted on January 19, 1998. The client investigated this work order further to understand the cause. This work order query revealed that 68 work orders had been written against this location over the 9-year period from January 1, 1990 to December 31, 1998. Out of the 68 work orders, 57 were described as "routine repairs" indicating that the asset had been experiencing a high failure rate in addition to the high costs. The work order history query was refined to extract only the routine repairs. A Weibull analysis then was conducted on the data set.

The reliability analysis that was conducted showed a rather low MTBF of 60 days. Typical values of MTBF for this type of asset (centrifugal pump) normally exceed 700 days in practice and 1458 days, according to reliability database. This analysis resulted in a Weibull parameter beta of 0.84, which indicates an infant mortality failure mode.

Since infant mortality is not an expected failure mode for this type of machinery, we can attribute these failures to a procedural deficiency rather than to a design deficiency. Further investigation revealed that the cause of failure was an inadequate lubrication program for this machine. This analysis was critical in bringing attention to this occurrence so that mitigating tasks can be put into place to prevent recurrence of this type of failure.

The reliability analysis bears out the results of the investigation as a procedural problem for this pump. The Weibull results, while not used in this case to solve the problem, provide additional information that further defines and illustrates the severity of the problem. This case shows that procedural problems can be very costly and disruptive to an organization. Identifying and eliminating these types of problems through regular analysis can lead to marked improvements in efficiency and cost performance.

The data contained within the CMMS can be used to describe reliability problems with machinery. Using statistical reliability analysis techniques, managers can identify the general area that causes the problem. The results of the reliability analysis can be compared with the results of analyses done on other equipment from the same site, analyses on like equipment from other sites, and industry data.

In order to manage the reliability of the equipment, you need to measure the reliability of the equipment. A good source of data for this analysis can be the CMMS, if the data can be manipulated to provide useful results. By conducting these types of analysis on problem areas, the organization can take appropriate corrective measures before the next failure can occur, costly in-service failure and unexpected downtime. This means better predictability for the plant, better performance for the company, and higher stock prices for investors. MT

Bob Matusheski is senior consultant at Meridium, Inc., supplier of enterprise reliability management software, Roanoke, VA; (540) 344-9205