# The Fundamentals: How To Begin Measuring Maintenance Effectiveness: Part II

#### Successful maintenance organizations track a number of KPIs. This article continues a discussion of some of the best ones to use.

As we noted in the first installment of this series, Key Performance Indicators (KPIs) are simply the metrics an organization selects to monitor and measure process and operational performance. The key to beginning to manage by metrics is the collection of meaningful and pertinent data, which is then converted into a KPI that can be evaluated and tracked over time. Unfortunately, there is no one single formula that all of your data can be loaded into—although we will look at one that comes close. Thus, most successful organizations tend to track several KPIs.

The following examples are representative of the types of metrics that can help you to manage your particular process. For the sake of these KPI illustrations, let's say that we're watching a machine center that is capable of producing one widget every five seconds, or 12 per minute. The plant runs eight-hour production shifts and does not stop for lunch or breaks. On the production shift in question, our widget-maker experienced 115 minutes of downtime during the scheduled shift and produced 4015 widgets while the equipment was running. Additional information will be given as needed in the examples.

Production Uptime Percentage. Once thought of as the gold standard of KPIs, Production Uptime Percentage is still quite popular today-despite limitations if not used in conjunction with other, more specific measures. It is one of the easiest metrics to calculate, as demonstrated by the following formula:

Actual Uptime Minutes ÷ Planned Runtime Minutes × 100 = Production Uptime Percentage

In the case of our hypothetical widget machine, the calculation would be 365 Actual Uptime Minutes ÷ 480 Scheduled Runtime Minutes × 100 = 76% Production Uptime Percentage.

It should be noted that runtime must be measured at a key machine or location in the process, one that would have the effect of stopping the plant's production if it went down. Most facilities take the runtime measure at the primary conversion machine center within their process-the machine that actually produces or converts the product for their customers. Regardless of where the uptime measurement is taken, it will have limited value if it does not take the machine center's capacity into account. For purposes of this computation, a stoppage for any reason is to be considered downtime. If our widget machine is capable of producing one unit every five seconds, the downtime counter should begin to accrue idle seconds each time that a five-second increment elapses and a widget doesn't pop out. Otherwise, these idle seconds will never be captured and eliminated.

Uptime percentage can be an effective KPI. Tracked over time, it can identify trends in your process and can be a good comparison tool for similar processes between shifts or plants. But, you must recognize its limitations.

If you manage strictly by uptime, you may miss production slowdowns or quality issues. Recall our discussion about the definition of failure. If you have a process that ran 96% uptime yesterday, but half of the product had to be remanufactured or discarded because it couldn't meet customer specs, the picture would not be so rosy. Similarly, if you have a process that ran 96% uptime yesterday, but for some reason at slower-than-normal speed, again, the uptime statistic wouldn't tell the whole story.

Mean Time Between Failure (MTBF). Also known as Mean Time To Failure, MTBF is best used as a machine-specific measure of reliability. As the name implies, it is the average or mean of all uptime increments during a given scheduled production period. The formula is:

Sum of Uptime Periods ÷ Number of Uptime Periods = MTBF

Suppose our widget machine is scheduled to run nonstop for 480 minutes. Instead, it runs 63 minutes, jams and goes down for 16 minutes, runs 102 minutes, blows a fuse and is off line for 30 minutes, then runs 200 minutes before smoking a belt and going down for the rest of the shift. The MTBF would be calculated as 63 Minutes + 102 Minutes + 200 Minutes ÷ 3 Occurrences = 121.7 Minutes MTBF.

As stated earlier, this metric is best employed as a tactical tool at the machine center level. It can be used at the mill level, but information it produces at that scale is fairly diluted and, as such, not as useful as it should be. Caution should be exercised when using this number as a basis for comparison with other machine centers or industries.

Depending on the nature of your business, an MTBF of 121.7 might reflect a fair outing or awful performance. In the widget business, it reflects great room for improvement. As with Uptime Percentage, MTBF should trend upward over time. To learn more about MTBF, visit www.apcmedia.com/salestools/VAVR-5WGTSB_R0_EN.pdf

Mean Time To Repair (MTTR). Often used in conjunction with MTBF (a measure of reliability), MTTR is a measure of severity. Simply put, it is the average time required to repair or restore a failed machine or process component. Like its sister metric, MTTR is best employed at the machine or component level. The formula to compute this metric is:

Sum of Downtime Periods ÷ Number of Downtime Periods = MTTR

Using our example of a machine that is scheduled to run nonstop for 480 minutes and instead runs for 63 minutes, jams and goes down for 16 minutes, runs 102 minutes, blows a fuse and is off line for 30 minutes, then runs 200 minutes before smoking a belt and going down for the rest of the shift (69 minutes), MTTR would be calculated as 16 Minutes + 30 Minutes + 69 Minutes + 21 Minutes (the additional time after the shift it took to repair the problem) ÷ 3 Occurrences = 45.3 Minutes MTTR. Note that the downtime for MTTR didn't end with the shift. For this measure to be valid, all repair time must be counted-not just the repair time that occurs during scheduled runtime.

As your equipment reliability improves, MTTR should trend downward as the failures become less severe. Still, it is important to carefully examine this metric to ensure that what you're actually measuring is NOT your millwrights' increasing expertise in repairing emergency breakdowns due to the constant practice they receive. An interesting article on MTTR can be found at http://en.wikipedia.org/wiki/Mean_time_to_repair.

Overall Equipment Effectiveness (OEE). This exceptionally useful machine-specific KPI can paint a very accurate picture of a machine's general health. That's because OEE takes into account three key factors regarding machine performance-Availability, Performance and Quality. These factors combine to produce the following formula for OEE:

Availability ÷ Performance ÷ Quality × 100 = OEE

Availability compares machine downtime to scheduled runtime as:

Actual Machine Uptime ÷ Scheduled Runtime × 100 = Availability Percentage

In our earlier example, although the machine was scheduled to run nonstop for 480 minutes, it incurred 115 minutes of total downtime during the scheduled runtime. This calculation would be 365 Minutes Actual Uptime ÷ 480 Minutes Scheduled Uptime × 100 = 76% Machine Availability. (Note that this factor is basically the same as Production Uptime.)

The Performance factor is determined by looking at how fast a machine produces product versus its production capability. This formula is:

Actual Widgets Ran ÷ Actual Runtime ÷ Designed Capability in Minutes × 100 = Performance

In the previous example, the machine is designed and capable of producing one widget every five seconds, or 12 per minute, which will result in 5760 widgets being produced during an eight-hour shift. Our machine ran 365 minutes, and in that time it produced 4015 widgets. In this case, the calculation would be 4015 Widgets ÷ 365 Uptime Minutes ÷ 12 Widgets Per Minute Design Capacity × 100 = 91.7% Performance.

The Quality factor is found using the following formula:

Good Widgets ÷ Total Widgets × 100 = Quality

In the case of the foregoing machine, assume that 97 of the 4015 widgets manufactured during the shift were not of sufficient quality to be shipped. Accordingly, the computation we would perform would be 3918 Good Widgets ÷ 4015 Total Widgets × 100 = 97.6% Quality.

Once we have derived the three separate components of the OEE, we can calculate it. In this case, we have 76% Availability ÷ 91.7% Performance ÷ 97.6% Quality × 100 = 84.9% OEE. This tells us that the plant ran fairly well when it was running-but it wasn't running enough. While an OEE of 84.9% might approach world-class performance in some industries, as with the other KPIs we have studied, the number cannot become the point of the exercise.

As an example, suppose you make a concerted effort to improve Availability as a path to improved OEE. Over the course of a month, you increase this factor to 81%, but your Quality factor drops to 90%. While your OEE would be higher, you can't sell OEE. In other words, the waste you accumulate by paying attention to the number rather than the process can cost you real dollars in a time when dollars are especially hard to come by.