Managing Reliability by Managing Backlog

Teamwork and the discipline to follow the work management process are critical ingredients to successful backlog management.

Operations uses MRP II, or a similar process, and software to plan and schedule work. Maintenance has a process also; it's called work management. Its tool is the computerized maintenance management system (CMMS). But many organizations fail to see the similarity of these two business processes. They don't realize that they can manage maintenance in much the same way as they manage production. Most of these organizations also fail to understand that they must manage maintenance in order to manage reliability. And further, they must understand the repository of maintenance work called the backlog in order to manage maintenance.

The maintenance backlog is a key component to managing maintenance. It is a valuable source of information about the work management process, information that can be used to make decisions for today and for the future. Many organizations have conducted downsizing or reorganizing activities without understanding their work management process or their workload (i.e., backlog). As a result, organizations frequently find themselves short-handed, over-worked, and ineffective. When asked if the right maintenance work is being completed at the right time with the right resources, management is unable to provide an answer, even though it has a powerful CMMS at its disposal.

Managing the backlog requires that work be identified properly and prioritized by maintenance and operations together. Lowest-cost reliability can be achieved only when the right work is planned, scheduled, and executed in the proper order. The backlog provides an effective way to organize and quantify the workload. For this reason, backlogged maintenance work is not only desirable, but necessary, just as a backlog of production work is necessary for effective production planning.

Backlog management is a logistical challenge easily mastered when fundamental maintenance philosophies are defined, understood, and adhered to by the organization. Organizations that recognize the value of backlog management, thereby actively and successfully managing it, will go a long way toward achieving lowest-cost reliability.

The work management process
The work management process consists of a number of key steps: work identification, work planning, work scheduling, work accomplishment, work documentation, and work analysis and measurement, as outlined in the accompanying chart. Each of these steps must be well managed if the right work is to be done at the right time and with the right resources. Organizations that have worked to improve these steps in their process have positioned themselves well for achieving lowest-cost reliability.

The keystone activity for the first four steps of the work management process is backlog management. With disciplined and effective backlog management, organizations can position themselves to manage proactively, thereby mitigating resource problems weeks in advance, not just days or, worse still, hours before the work is to begin.

Reliability comes at two levels: Equipment and human. Through an effective work management process, human reliability is achieved. That is to say that the process is defined and followed, that teamwork among departments exists, people are held accountable, and there is a continuous effort to solve problems. From human reliability flows equipment reliability. When everyone is upholding his end of the bargain (human reliability is high) then equipment reliability problems diminish due to avoidance (problems aren't allowed to occur) and problem solving (those that do occur are solved at their root cause).

Backlog management then contributes to lowest-cost reliability by ensuring the right work is getting done at the right time and with the right resources. In the case of preventive maintenance work, backlog management contributes to problem avoidance. For repair work, it contributes to problem solving.

Key concepts
First, let's begin with a definition of backlog: Backlog is all jobs, regardless of status, that have been identified, but are not yet complete. Thus, a job enters the backlog following work order approval and is removed only after the work is complete or it is deleted, for whatever reason.

This definition of backlog includes repair work as well as preventive tasks. It includes work in the formal work order system as well as the lists kept in control rooms or, worse yet, in people's heads. It includes routine, daily work as well as work to be done during an overhaul or turnaround. It includes maintenance work as well as capital project work that will utilize plant craftspeople. Backlog is all uncompleted maintenance work. Thus, for effective management purposes, all of this work should be captured in one place, ideally a CMMS—a powerful relational database tool that allows maintenance work data to be organized in the best manner to meet the organization's needs.

Think of the backlog as the fuel supply for the work management process. Just like the fuel for a power plant, it must be managed and cared for. Managing the backlog means using the backlog daily as a tool to make decisions—decisions about what to plan, what to schedule, how many craftspeople are needed, when to take equipment out of service, when to use contractors, when to schedule vacations, etc. Caring for the backlog means ensuring that every job is accurate—accuracy of job information, job plans and labor estimates, status coding and routing. Caring for the backlog also means promptly removing jobs when they are completed.

Now think of the backlog as the backbone of the joint prioritization process noted in the accompanying flowchart. Joint prioritization is the continual evaluation and resequencing of work needs. This process begins following job approval (work identification) and ends at job completion (work accomplishment). Joint prioritization is performed by operations and maintenance, resulting in mutual agreement on the right work, at the right time, and with the right resources. The backlog is the source of all job information used during joint prioritization and must be updated to reflect the decisions that are made to ensure its integrity.

The backlog is not the domain of just the maintenance department. Operations is instrumental in feeding the backlog with new work requests. Next, operations should be intimately involved in establishing the order that jobs will be planned and should have equal say in how they are scheduled. Finally, operations determines when the work performed is acceptable and the job is finished. Increasingly, operations also is involved in the accomplishment of work when it performs preventive maintenance and minor maintenance tasks itself. Thus, the operations department that believes that submitting a work order is the last time it is involved with the job, is doing the organization and the work management process a grave disservice. As operations becomes increasingly more involved, the backlog becomes a part of its daily routine, since it has a vested interest in what gets done and when.

Organizing the backlog
How well an organization manages its backlog demonstrates the effectiveness with which it applies its resources to the highest priority work. But to effectively manage the backlog, the organization first must organize the backlog by breaking it into useful pieces.

The first level of organization addresses the urgency of the job. This is accomplished through the use of a priority coding system. The primary purpose of a priority code is to segregate the emergency work from everything else. This is similar to the triage process used in the treatment of people in a hospital emergency room. True emergencies get immediate attention and thus dont spend much time in the backlog, if at all.

Think about what this means. The first concern of course is that the organization with a lot of emergencies is being highly reactive. This breeds inefficiency and waste. But also consider that work that never shows up in the backlog is work that can't be measured and thus can't be managed. When emergency work is kept to just a few percentage points, this concern is minimal; if it is allowed to grow, the backlog becomes an invalid predictor of future resource needs because of the high variability of emergency work. The more stable the workload, the more predictable the resource needs.

A basic priority coding system may look like this:

  • Priority 1. Emergency, to be started immediately, with little to no formal planning
  • Priority 2. High priority, work to begin within 24-48 hr, to be determined after planning
  • Priority 3. Low priority, to be completed as resources become available, must be planned

It is important to remind ourselves that when a priority code is assigned to a work order, it is done without the benefit of viewing all the work in the backlog. It is a decision based solely on the perceived importance of the job in relative isolation from all other work. Therefore, we use the priority code to organize incoming jobs into two basic buckets, emergencies and nonemergencies, but it serves little value beyond this. As discussed previously, joint prioritization is the preferred process that ultimately determines when a job will get accomplished. Joint prioritization is a key component of backlog management. Work order priority codes alone cannot accomplish this; it takes people (not computers) viewing the entire backlog to achieve joint prioritization.

The next level of organization breaks the jobs into work types. The primary purpose of this categorization is to divide the work into buckets that will support tactical decisions around using internal craft resources vs contractors. Every organization should have a strategy regarding the use of contractors. Here are the most common types of work around which strategy may differ from one job to the next:

  • Corrective: Day-to-day work (includes emergency work). Corrective work can be accomplished as routine maintenance or in an outage.
  • Proactive: Preventive maintenance (PM) and predictive maintenance (PdM). Proactive work is typically accomplished as nonoutage work.
  • Turnaround or overhaul: Major recurring work (another form of PM/PdM) requiring an outage.
  • Project or capital: Typically work requiring installation of new equipment or modification to current production equipment. More likely to be construction in nature rather than repair and may require an outage.

Beyond work types, the backlog should be broken down by status codes and routing codes. Status codes define what the condition of the job is, such as approved, planned, scheduled, in-work, and hold. Routing codes define who currently has responsibility for the jobs by department or possibly a persons name. Further still, the backlog should be organized by the primary craft group that will be working the job and by the department that "owns" the equipment. Flagging a job as work requiring an outage (lost production) or as safety work, or any other useful designation, is dependent on the organizations need for viewing and managing the backlog.

Proper organization of the backlog permits it to be viewed by many different people in many different ways to meet their daily responsibilities in performing their jobs. In this manner, there are overlapping responsibilities for the backlog. Operations views the backlog from its perspective of operating units or equipment, while maintenance views it from the perspective of the craft group performing the work. When set up properly, this matrixed approach to backlog organization ensures that no jobs "fall through the cracks" and get forgotten.

Maintaining the backlog
If organizing the backlog is setting up the structure and codes to help an organization manage the workload, then maintaining the backlog refers to the constant activity of updating data within that structure to ensure that the information derived from it is accurate. This is key, for if everyone in the organization doesnt work toward this end, then the backlog as a decision-making tool becomes suspect and once again will fall into disuse.

Maintaining the backlog is not the responsibility of one person or group; it is everyones responsibility. It should not be viewed as a task that is separate from each persons daily routine. The backlog is a tremendously powerful tool that will maintain itself if people wholeheartedly incorporate it into their daily job functions. To ensure that the backlog is maintained properly, every step (particularly the first four steps: identification through accomplishment) of the work management process should address the roles and responsibilities, by job position, for using and maintaining the backlog.

Purging the backlog
Periodically, despite the constant efforts of everyone that uses the backlog, it will be necessary to perform a thorough review of all jobs in the backlog to ensure its accuracy. Jobs that will never be worked, were completed some time ago, or are duplicates should be removed.

Purging requires that all departments participate in a review process in which jobs are identified for removal, consensus is reached on the candidate jobs, and action is taken to remove them. Purging is typically a semiannual activity. Each department reviews the backlog for candidates for removal and comes to a meeting prepared to discuss why the jobs are no longer needed. This premeeting work is essential or the review meetings will become so long and cumbersome that no one will want to participate. When this happens, the integrity of the backlog becomes questionable once again.

Measuring the backlog
The backlog serves an obvious purpose in the daily functioning of the work management process. For this reason alone, it should be maintained properly. However, the backlog serves a second purpose as a decision-making tool for management so it may prepare for the future. This need is as big if not bigger than the daily need. To accomplish this, the backlog must be measured in a meaningful way, by estimated labor hours.

Key to measuring the backlog is the understanding that accurate labor estimates are a necessity. Accuracy may be a misnomer, for what is truly important is that the planning process produce consistent estimates; these then can be factored to reflect reality, thus achieving accuracy. The work scheduling process levies a similar requirement on planning; to build accurate schedules requires accurate labor estimates. So why are labor estimates so important in measuring the backlog? Because the key resource that determines when a job will get done is almost always manpower.

There are many indicators that can be derived from the backlog, but one of the most useful is input/output. This measurement is one of throughput; it is much more meaningful than just looking at the size (or weeks) of backlog.

Critical to this measurement is the need to measure the output of work on the same scale as the input of work. This may sound obvious, but the tendency is to measure estimated labor hours going in and actual labor hours coming out. This "apples to oranges" approach invalidates the measurement. The simplest solution is to measure estimated hours in and estimated hours out.

Throughput should be calculated for each work type (i.e., staffing strategy) and, logically, by every craft group. The difference between input and output represents the throughput of that type of work for that craft group. When input consistently exceeds output, this may be an indication of a craft group that is understaffed, and vice versa when output exceeds input. Factoring in conditions such as seasonal events, planned outages, vacations, etc., the manager can gauge whether the trend is temporary or a situation that needs to be permanently addressed. Thus, we have the basis for making decisions about future resource needs, developing a staffing strategy that is consistent with the overall plant objectives of increasing reliability and availability.

The culture
The concept of backlog management sounds simple, but getting it implemented is the challenge. Implementation is always the challenge in any new process because it is the work culture of the organization that requires the most effort to change, not the mechanics of doing work. Remember that backlog management is the backbone to the first four steps (work identification through work accomplishment) of the work management process. Therefore, all four steps must be improved to make backlog management effective, and those four steps involve 100 percent of the organizations work force. Teamwork and the discipline to follow the work management process are critical ingredients to successful backlog management.

Backlog management requires organizational commitment. The steps in the overall work management process are only as good as the execution. If an organization is to achieve lowest-cost reliability, then it must be able to manage the backlog in a manner that ensures its accuracy and its value. This may be a significant cultural change for organizations that are accustomed to "fire fighting" as their approach to managing maintenance work.

Every organization that performs maintenance work has a backlog. Many mistakenly believe that having a backlog of maintenance work is bad and conceal the work by calling it something else or by spreading it out among several systems. Backlog is good, provided it is actively managed and used to prepare the organization for the future. The future can be tomorrow or next month, but regardless of which, backlog management contributes to lowest-cost reliability by allowing the organization to select the right work to be done at the right time and with the right resources.

Management requires discipline
Backlog management requires the organization to use the tool at its disposal (i.e., the CMMS), but first it must make an investment in getting accurate data into the tool to improve the quality of the decisions derived from it. Achieving and maintaining data accuracy is the cultural challenge for the organization. A well-defined and understood work management process, with specific roles and responsibilities for everyone involved, will help overcome this challenge.

The organization must approach backlog management with the same rigor that it tackles production planning. The two business processes are interrelated. The cost of poor reliability is not just the annual maintenance expenses; it also includes the lost opportunity costs of not being able to produce. In a competitive marketplace this can be the difference between economic survival or not. MT


This e-mail address is being protected from spambots. You need JavaScript enabled to view it is project manager at Reliability Management Group, 151 W. Burnsville Pkwy., Suite 224, Minneapolis, MN 55337; (612) 882-8122