Setting a cadence for your failure analysis efforts

I have decided to address this topic because I believe that we are leaving a lot of value on the table with our failure analysis (RCFA, RCA, Problem Solving, 8D) efforts. Here is what I see when I interact with maintenance organizations (some or all of these may apply to you).

Some simple facts

Let me share a couple of ideas that I hope you will agree with:

  1. Problems occur every day, both big ones and small ones.
  2. We cannot possibly address them all.
  3. The benefit from a failure analysis effort comes from finding the root cause of a problem and permanently eliminating it.
  4. Getting good at problem solving is a learned behavior that we must practice at, much like a golf swing or the ability to play music.

If you can sign up for these ideas as being a fact, then what I want to talk to you about today is how we can build a process around these ideas that produces results we can benefit from.

The challenge

The reason I have decided to address this topic is that I believe that we are leaving a lot of value on the table with our failure analysis (RCFA, RCA, Problem Solving, 8D) efforts. Here is what I see when I interact with maintenance organizations (some or all of these may apply to you):

  • A general understanding of the RCFA process, with some classroom training being provided in the past that was well accepted and understood.
  • When the big problem (the huge catastrophic apocalyptic one) happens, we perform a failure analysis on it and issue a report. Rarely for any of the small ones.
  • If we are honest with ourselves, we largely rely on documenting problems, but do not often dig down to the root cause, or make any permanent change that will really prevent it from happening again.
  • When we look at our failure analysis records, we must admit that we often treat symptoms of the failure and fail to dig all of the way to the actual root cause.
  • Our failure analysis efforts are accepted to be an engineering function, and we fail to involve (sufficiently) the people at the front line (operators and maintenance) who have direct knowledge and experience with the failure.
  • Our solutions are more often than not engineering solutions (translated to things that cost money) and we rarely address process, training, or procedural solutions (all human related and relatively inexpensive).

The Fix

All right, I am not going to throw stones here. If any of those bullet points above look familiar, then just take a moment to consider some of these simple solutions I have provided below.

Trigger points

Trigger points help us understand when to act. They are our rules that tell us when we need to perform a failure analysis and when we do not. They need to be set in such a way to drive a constant level of activity around this process. The level of activity is determined by how many problems that you can adequately study and solve in a given time frame – likely much lower than you would think.

Let’s say that within your span of control (department or workgroup), you can solve only 1 or 2 problems per month (identify, study and find root cause, design implement and test a solution). Then we must set trigger points that will drive that level of activity each month.

For example, if you say that we must address every production delay of 60 minutes or more, and you have 23 such events in a given month, that is not a good trigger point for you. Set your trigger point high enough (say 2 or even 4 hours) so that you are addressing those 1 or 2 critical problems per month.

What about the smaller problems, you may ask? They are coming soon. Our reward for success is an adjustment to our trigger points. If we find ourselves in a position where we have no production delays greater than 2 hours for several months in a row, that means we have improved. Congratulations! Now let’s adjust the triggers downward to 60 minutes and continue on.

These simple trigger points set a cadence for your failure analysis efforts and allow you to drive improvements continuously, a little bit at a time.

Quality control and oversight

Failure analysis is a skill that must be built and refined over time. It does not come naturally and it is difficult for us to be critical of our own work. Put a responsible person in charge of your failure analysis program to review your failure analysis records (spot check or all of them, your call) and ask the following questions:

  • Did we identify the root cause or did we address a symptom?
  • Are we applying our triggers correctly and do we have the correct level of activity?
  • Do our solutions address the actual root cause?
  • Are we keeping our solutions small and achievable? (Let’s not solve all of the world’s problems, just this one.)
  • Did we address the problem in a timely manner?
  • Did we involve the right people in the analysis or was this performed in a back office somewhere?

Drive the process to the front line

Finally, true value from your problem solving efforts comes when we involve a team of people with the right skill and knowledge. It is very easy for us to overlook our front line operators and maintenance technicians and look at our failure analysis efforts as “engineering only” effort.

Not to mention the fact that these people likely have direct knowledge of the failure, and may have even seen it themselves first hand, they also have the ability to help you find simple and effective solutions that relate back to the way we maintain and operate our assets.

There is tendency to treat failure analysis as an “equipment redesign” effort. These types of solutions are generally costly and tend to mask the human-related causes of failures.

Getting results from your failure analysis effort takes discipline and a knowledgeable team of people with the right training, tools, and focus.

You as a leader can provide this focus with the way in which you administer your program. Provide your teams with a framework to function within and they will do the rest.

Source: Plant Engineering