Failure
mode and effects analysis
A FMEA is often the first step of a
system reliability study. It involves reviewing as many components, assemblies,
and subsystems as possible to identify failure modes, and their causes and
effects. For each component, the failure modes and their resulting effects on
the rest of the system are recorded in a specific FMEA worksheet. There are
numerous variations of such worksheets. A FMEA is mainly a qualitative
analysis. An FMEA is an inductive reasoning (forward logic) single
point of failure analysis and is a core task in reliability engineering, safety engineering and quality engineering. Quality engineering is especially concerned
with the "Process" (Manufacturing and Assembly) type of FMEA. Failure
modes and effects analysis (FMEA) is a step-by-step approach for identifying
all possible failures in a design, a manufacturing or assembly process, or a
product or service. “Failure modes” means the ways, or modes, in which
something might fail. Failures are any errors or defects, especially ones that
affect the customer, and can be potential or actual. “Effects analysis” refers
to studying the consequences of those failures. Failures are prioritized
according to how serious their consequences are, how frequently they occur and
how easily they can be detected. The purpose of the FMEA is to take actions to
eliminate or reduce failures, starting with the highest-priority ones. Failure
modes and effects analysis also documents current knowledge and actions about
the risks of failures, for use in continuous improvement. FMEA is used during
design to prevent failures. Later it’s used for control, before and during
ongoing operation of the process. Ideally, FMEA begins during the earliest
conceptual stages of design and continues throughout the life of the product or
service. A successful FMEA activity helps to identify potential failure modes
based on experience with similar products and processes - or based on common
physics of failure logic. It is widely used in development and manufacturing
industries in various phases of the product life cycle.
When
to use FMEA
·
When
a process, product or service is being designed or redesigned, after quality
function deployment.
·
When
an existing process, product or service is being applied in a new way.
·
Before
developing control plans for a new or modified process.
·
When
improvement goals are planned for an existing process, product or service.
·
When
analyzing failures of an existing process, product or service.
·
Periodically
throughout the life of the process, product or service
FMEA
Procedure
1. Assemble a cross-functional team of
people with diverse knowledge about the process, product or service and
customer needs. Functions often included are: design, manufacturing, quality,
testing, reliability, maintenance, purchasing (and suppliers), sales, marketing
(and customers) and customer service.
2. Identify the scope of the FMEA. Is
it for concept, system, design, process or service? What are the boundaries?
How detailed should we be? Use flowcharts to identify the scope and to make sure every team
member understands it in detail. (From here on, we’ll use the word “scope” to
mean the system, design, process or service that is the subject of your FMEA.)
3. Fill in the identifying information
at the top of your FMEA form. Figure 1 shows a typical format. The remaining
steps ask for information that will go into the columns of the form.
4. Identify the functions of your
scope. Ask, “What is the purpose of this system, design, process or service?
What do our customers expect it to do?” Name it with a verb followed by a noun.
Usually you will break the scope into separate subsystems, items, parts,
assemblies or process steps and identify the function of each.
4. For each function, identify all the
ways failure could happen. These are potential failure modes. If necessary, go
back and rewrite the function with more detail to be sure the failure modes
show a loss of that function.
5. For each failure mode, identify all
the consequences on the system, related systems, process, related processes,
product, service, customer or regulations. These are potential effects of
failure. Ask, “What does the customer experience because of this failure? What
happens when this failure occurs?”
6. Determine how serious each effect
is. This is the severity rating, or S. Severity is usually rated on a scale
from 1 to 10, where 1 is insignificant and 10 is catastrophic. If a failure
mode has more than one effect, write on the FMEA table only the highest
severity rating for that failure mode.
7. For each failure mode, determine all
the potential root causes. Use tools classified as cause analysis tool, as well as the best knowledge and experience of the team.
List all possible causes for each failure mode on the FMEA form.
8. For each cause, determine the
occurrence rating, or O. This rating estimates the probability of failure
occurring for that reason during the lifetime of your scope. Occurrence is
usually rated on a scale from 1 to 10, where 1 is extremely unlikely and 10 is
inevitable. On the FMEA table, list the occurrence rating for each cause.
9. For each cause, identify current
process controls. These are tests, procedures or mechanisms that you now have
in place to keep failures from reaching the customer. These controls might
prevent the cause from happening, reduce the likelihood that it will happen or
detect failure after the cause has already happened but before the customer is
affected.
10. For each control, determine the
detection rating, or D. This rating estimates how well the controls can detect
either the cause or its failure mode after they have happened but before the
customer is affected. Detection is usually rated on a scale from 1 to 10, where
1 means the control is absolutely certain to detect the problem and 10 means
the control is certain not to detect the problem (or no control exists). On the
FMEA table, list the detection rating for each cause.
11. (Optional for most industries) Is
this failure mode associated with a critical characteristic? (Critical
characteristics are measurements or indicators that reflect safety or compliance
with government regulations and need special controls.) If so, a column labeled
“Classification” receives a Y or N to show whether special controls are needed.
Usually, critical characteristics have a severity of 9 or 10 and occurrence and
detection ratings above 3.
12. Calculate the risk priority number,
or RPN, which equals S × O × D. Also calculate Criticality by multiplying
severity by occurrence, S × O. These numbers provide guidance for ranking
potential failures in the order they should be addressed.
13. Identify recommended actions. These
actions may be design or process changes to lower severity or occurrence. They
may be additional controls to improve detection. Also note who is responsible
for the actions and target completion dates.
14. As actions are completed, note
results and the date on the FMEA form. Also, note new S, O or D ratings and new
RPNs.
Probability (P)
In this step it is necessary to look at
the cause of a failure mode and the likelihood of occurrence. This can be done
by analysis, calculations / FEM, looking at similar items or processes and the
failure modes that have been documented for them in the past. A failure cause
is looked upon as a design weakness. All the potential causes for a failure
mode should be identified and documented. This should be in technical terms.
Examples of causes are: Human errors in handling, Manufacturing induced faults,
Fatigue, Creep, Abrasive wear, erroneous algorithms, excessive voltage or
improper operating conditions or use (depending on the used ground rules). A
failure mode is given an Probability
Ranking.
Rating
|
Meaning
|
A
|
Extremely Unlikely
(Virtually impossible or No known occurrences on similar products or
processes, with many running hours)
|
B
|
Remote (relatively
few failures)
|
C
|
Occasional (occasional
failures)
|
D
|
Reasonably Possible
(repeated failures)
|
E
|
Frequent (failure is
almost inevitable)
|
Severity (S)
Determine the
Severity for the worst case scenario adverse end effect (state). It is
convenient to write these effects down in terms of what the user might see or
experience in terms of functional failures. Examples of these end effects are:
full loss of function x, degraded performance, functions in reversed mode, too
late functioning, erratic functioning, etc. Each end effect is given a Severity
number (S) from, say, I (no effect) to VI (catastrophic), based on cost and/or
loss of life or quality of life. These numbers prioritize the failure modes
(together with probability and detectability). Below a typical classification
is given. Other classifications are possible. See also hazard
analysis.
Rating
|
Meaning
|
I
|
No
relevant effect on reliability or safety
|
II
|
Very
minor, no damage, no injuries, only results in a maintenance action (only
noticed by discriminating customers)
|
III
|
Minor,
low damage, light injuries (affects very little of the system, noticed by
average customer)
|
IV
|
Moderate,
moderate damage, injuries possible (most customers are annoyed, mostly
financial damage)
|
V
|
Critical
(causes a loss of primary function; Loss of all safety Margins, 1 failure
away from a catastrophe, severe damage, severe injuries, max 1 possible death
)
|
VI
|
Catastrophic
(product becomes inoperative; the failure may result complete unsafe
operation and possible multiple deaths)
|
Detection (D)
The means or method by which a failure
is detected, isolated by operator and/or maintainer and the time it may take.
This is important for maintainability control (Availability of the system) and
it is especially important for multiple failure scenarios. This may involve
dormant failure modes (e.g. No direct system effect, while a
redundant system / item automatic takes over or when the failure only is
problematic during specific mission or system states) or latent failures (e.g.
deterioration failure mechanisms,
like a metal growing crack, but not a critical length). It should be made clear
how the failure mode or cause can be discovered by an operator under normal
system operation or if it can be discovered by the maintenance crew by some
diagnostic action or automatic built in system test. A dormancy and/or latency
period may be entered.
Rating
|
Meaning
|
1
|
Certain - fault will
be caught on test
|
2
|
Almost certain
|
3
|
High
|
4
|
Moderate
|
5
|
Low
|
6
|
Fault is undetected
by Operators or Maintainers
|
Calculation
of the Risk Priority Number
To be able to judge objectively the current
state of a process or product and the effect of actions during the course of a
Failure Mode and Effects Analysis, it is possible to calculate the risk
priority number (RPN) of an FMEA automatically.
This is
possible as soon as you have determined the valuations for causes and effects
of defects and entered them in the system.
In FMEA the Risk Priority Number(RPN) is calculated by using
following formula:
RPN = Severity of the Effect X Probability of Occurrence of
the cause X Probability of the Detection
Benefits/Advantages
1. It
provides a documented method for selecting a design with a high probability of
successful operation and safety.
2. A
documented uniform method of assessing potential failure mechanisms, failure
modes and their impact on system operation, resulting in a list of failure
modes ranked according to the seriousness of their system impact and likelihood
of occurrence.
3. Early
identification of single failure points (SFPS) and system interface problems,
which may be critical to mission success and/or safety. They also provide a
method of verifying that switching between redundant elements is not
jeopardized by postulated single failures.
4. An
effective method for evaluating the effect of proposed changes to the design
and/or operational procedures on mission success and safety.
5. A
basis for in-flight troubleshooting procedures and for locating performance
monitoring and fault-detection devices.
6. Criteria
for early planning of tests
7. Improve the quality, reliability and
safety of a product/process
8. Improve company image and
competitiveness
9. Increase user satisfaction
10. Reduce system development time and
cost
11. Collect information to reduce future
failures, capture engineering knowledge
12. Reduce the potential for warranty
concerns
13. Early identification and elimination
of potential failure modes
14. Emphasize problem prevention
15. Minimize late changes and associated
cost
16. Catalyst for teamwork and idea
exchange between functions
17. Reduce the possibility of same kind
of failure in future
18. Reduce impact on company profit
margin
19. Improve production yield
Limitations
If used as a top-down tool, FMEA may only identify
major failure modes in a system. Fault tree analysis
(FTA) is
better suited for "top-down" analysis. When used as a
"bottom-up" tool FMEA can augment or complement FTA and identify many
more causes and failure modes resulting in top-level symptoms. It is not able
to discover complex failure modes involving multiple failures within a
subsystem, or to report expected failure intervals of particular failure modes
up to the upper level subsystem or system.
Additionally, the multiplication of the severity, occurrence
and detection rankings may result in rank reversals, where a less serious
failure mode receives a higher RPN than a more serious failure mode. The reason
for this is that the rankings are ordinal scale numbers, and multiplication is not defined for ordinal
numbers. The ordinal rankings only say that one ranking is better or worse than
another, but not by how much. For instance, a ranking of "2" may not
be twice as severe as a ranking of "1," or an "8" may not
be twice as severe as a "4," but multiplication treats them as though
they are.
No comments:
Post a Comment