“In our industry we have many more troublemakers than troubleshooters” —(Professor Zarko Olujic, presentation to AIChE) Troubleshooting, revamping, and eliminating waste offer huge benefits in reducing capital investment, downtime, carbon dioxide emissions, and energy consumption. Unfortunately, the attention paid to this resource in the energy-transition era has been too little to reflect its tremendous potential. A multitude of examples in Norman Lieberman’s book Process Engineering for a Small Planet (287) demonstrate how poor troubleshooting leads to wasteful practices that guzzle energy, increase the carbon footprint, and deplete the earth’s precious resources. The chemical process industry (CPI) has been paying dearly for downtime, lost production, substandard product quality, raw material problems, safety and environmental issues, and excessive energy consumption due to ineffective troubleshooting of abnormal situations (16). In 1997, a consortium estimated that the annual loss for the CPI due to ineffective abnormal situation management was US $10 billion (16). Correct diagnosis is at the heart of problem identification and implementing a correct, cost-effective solution. An incorrect diagnosis breeds ineffective solutions that prolong the agony and escalate the costs. This chapter focuses on the systematic diagnostic steps. The following chapters will describe the multitude of techniques that have been found effective for diagnosing distillation problems. A well-known sales axiom states that 20% of the customers bring in 80% of the business. A sales strategy tailored for this axiom concentrates the effort on these 20% without neglecting the others. Distillation diagnostics follow an analogous axiom. A person engaged in diagnosing column problems must develop a good understanding of the factors that cause the vast majority of column malfunctions and the techniques available for narrowing in on their root causes. While a good knowledge and understanding of the broader field of distillation is beneficial, the diagnosis often requires only a shallow knowledge of this broader field. It is well accepted that diagnosing problems is a primary job function of operating engineers, supervisors, and process operators. Far too few realize that distillation diagnostics start at the design phase. Any designer wishing to achieve a trouble-free column design and operation must be as familiar with diagnostic techniques, many of which are applicable during the design (and more so at the revamp) phase. Expansive surveys of the causes of column malfunctions were described in previous studies (196, 198, 201). Abundant resources are available to distinguish good from poor practices and to avoid and overcome troublesome design and operations (e.g., 192, 285, 286, 293). What is often missing is the connect. How does one link field observations with the known tower malfunctions in order to develop an effective remedy? This book is all about the link: translating field observations into diagnoses and cures. Following a brief survey of the primary causes of tower malfunctions, this chapter looks at the basic diagnostic: the systematic strategy for diagnosing distillation problems and the dos and don’ts for formulating and testing theories. Finally, it reviews the techniques for testing these theories and for focusing on the most likely root cause. Close to 1500 case histories of malfunctioning columns were extracted from the literature and abstracted in Ref. 201. Most of these malfunctions were analyzed in Ref. 198 and classified according to their principal causes. A summary of the common causes of column malfunctions is provided in Table 1.1. If one assumes that these case histories make up a representative sample, then the analysis presented below has statistical significance. Accordingly, Table 1.1 can provide a useful guide to the factors most likely to cause column malfunctions and can direct troubleshooters toward the most likely problem areas. The general guidelines in Table 1.1 often do not apply to a specific column or even plant. For instance, foaming is not high up in the table; however, in amine absorbers, it is a very common trouble spot. The author therefore warns against blindly applying these guidelines in any specific situation. The total number of cases in each category is shown in the column headed “Cases.” The other three columns show the split of these cases according to industry categories, namely refining, chemicals, and olefins/gas plants. An analysis of Table 1.1 suggests the following: Table 1.1 Most common causes of column malfunctions. (From Kister, H. Z., Transactions of the Institution of Chemical Engineers, 81, Part A, p. 5, January 2003. Reprinted Courtesy of the Institution of Chemical Engineers in the UK.) In Sections 1.3 and 1.4, the systematic approach recommended for diagnosing distillation problems is presented. The recommended sequence of steps is illustrated with reference to the case history described below.1 The following story is not a myth; it really happened. One morning as I sat quietly at my desk in corporate headquarters, the boss dropped by to see me. He had some unpleasant news. One of the company’s refinery managers was planning to visit our office to discuss the quality of some of the new plants that had been built in his refinery. As an example of how not to design a unit, he had chosen a new gas plant for which I had done the process design. The refinery manager had but one complaint: “The gas plant would not operate.” I was immediately dispatched to the refinery to determine which aspect of my design was at fault. If nothing else, I should learn what I did wrong so as not to repeat the error. Upon arriving at the refinery, I met with the operating supervisors. They informed me that, while the process design was fine, the gas plant’s operation was unstable because of faulty instrumentation. However, the refinery’s lead instrument engineer would soon have the problem resolved. Later, I met with unit operating personnel. They were more specific. They observed that the pumparound circulating pump (see Figure 1.1a) was defective. Whenever they raised hot oil flow to the debutanizer reboiler, the gas plant would become destabilized. Reboiler heat-duty and reflux rates would become erratic. Most noticeably, the hot-oil circulating pump’s discharge pressure would fluctuate wildly. They felt that a new pump requiring less net positive suction head was needed. Both these contradictory reports left me cold. Anyway, the key to successful troubleshooting is personal observation. So I decided to make a field test. When I arrived at the gas plant, both the absorber and debutanizer towers were running smoothly but not well. Figure 1.1b shows the configuration of the gas plant. The debutanizer reflux rate was so low it precluded significant fractionation. Also, the debutanizer pressure was about 100 psi below design. Only a small amount of vapor, but no liquid, was being produced from the reflux drum. Since the purpose of the gas plant was to recover propane and butane as a liquid, the refinery manager’s statement that the gas plant would not operate was accurate. As a first step, I introduced myself to the chief operator and explained the purpose of my visit. Having received permission to run my test, I switched all instruments on the gas-plant control panel from automatic over to local manual. In sequence, I then increased the lean oil flow to the absorber, the debutanizer reflux rate, and the hot-oil flow to the debutanizer reboiler. Figure 1.1 Column troubleshooting case history. (a) Hot oil from the fractionator supplies heat to gas plant reboilers. (b) Leaking debutanizer reboiler upsets gas plant. (From Lieberman, N. P., “Troubleshooting Process Operations,” Ed. 4, PennWell Books, Tulsa, OK, 2009. Reprinted with permission.) The gas plant began to behave properly. The hot-oil circulating pump was putting out a steady flow and pressure. Still, the plant was only producing a vapor product from the debutanizer reflux drum. This was because the debutanizer operating pressure was too low to condense the C3–C4 product. By slowly closing the reflux drum vapor vent valve, I gradually increased the debutanizer pressure from 100 psig toward its design operating pressure of 200 psig. Suddenly, at 130 psig, the hot-oil flow to the debutanizer’s reboiler began to waiver. At 135 psig the debutanizer pressure and the hot-oil flow plummeted. This made absolutely no sense. How could the debutanizer pressure influence hot-oil flow? To regain control of the gas plant, I cut reflux to the debutanizer and lean-oil flow to the absorber. I was now back where I started. The thought of impending failure loomed. I repeated this sequence twice more. On each occasion, all went well until the debutanizer pressure increased. By this time it was 3 a.m. Was it also time to give up and go home? Just then, I noticed a commotion at the main fractionator control panel. The operators there stated that the fractionator was flooding again—for the third time that night. The naphtha production from the fractionator had just doubled for no apparent reason. In every troubleshooting assignment there always occurs that special moment, the moment of insight. All of the bits and pieces fall into place, and the truth is revealed in its stark simplicity. I cut the debutanizer pressure back to 100 psig and immediately the flooding in the main fractionator subsided. The operators then closed the inlet block valve to the hot-oil side of the reboiler and opened up a drain. Naphtha poured out instead of gas oil. This showed that the debutanizer reboiler had a tube leak. Whenever the debutanizer pressure reached 130 psig, the reboiler pressure exceeded the hot-oil pressure. The relatively low-boiling naphtha then flowed into the hot oil and flashed. This generated a large volume of vapor that then backed hot oil out of the reboiler. The naphtha vapors passed on into the main fractionator and flooded this tower. Thus, the cause of the gas plant instability was neither a process design error, instrument malfunction, nor pumping deficiency. It was a quite ordinary reboiler tube failure. In almost any troubleshooting assignment, it is desirable to solve a problem as fast as possible with the least expense. Surprisingly, this objective is often only partially achieved, the obstacle being a poor (often nonexistent) strategy for diagnosing the problem. While devising a troubleshooting strategy, it is useful to think in terms of a “doctor and patient” analogy. The doctor’s strategy for diagnosing an illness is well established and easily understood by most people. Applying a similar strategy to diagnosing distillation problems often constitutes the most effective and least expensive course of action. The sequence of steps listed below is often considered optimum for tackling a troubleshooting problem. It is based on the author’s experience as well as the experience of others (112, 116, 161, 166, 286, 293, 318, 411, 498). The step headings refer to the doctor and patient analogy. Actions described in Lieberman’s case history (Section 1.2) are referenced to demonstrate the optimum sequence of steps. A good troubleshooting strategy always proceeds stepwise, starting with the simple and obvious. Assess the safety, health, and/or environmental hazards that the problem can create. If a hazard exists, an emergency action is required prior to troubleshooting. In terms of the medical analogy, measures to save the patient or prevent the patient’s illness from spreading to others have priority over investigating the cause of the problem. Implement a temporary strategy to live with the problem. Problem identification, troubleshooting, and implementing the solution take time. Meanwhile, negative effects on safety, health, the environment, and plant profitability must be minimized. The strategy should be as conducive for troubleshooting as practicable. The strategy, and the adverse effects that need to be temporarily tolerated (e.g., lost production, off-spec product, instability, higher utility costs), usually set the pace of the troubleshooting investigation. In the debutanizer case history, the short-term strategy was to operate the column at a pressure low enough to eliminate instability and to tolerate an off-spec bottom product. In the medical analogy, the short-term strategy is hospitalization, bed rest, special diet, anti-inflammation drugs, or just “taking it easy.” The extent to which the temporary solution (step 2) can be tolerated and, to a lesser degree, the complexity of the problem and resources required and available need to be considered next, which will set the pace of the troubleshooting investigation. “Crisis” urgency is assigned when there are significant adverse safety, health, or environmental concerns, if the column cannot produce an on-spec valuable product, or if there is a major impact on plant profitability. “Medium” urgency is typical when the column falls short of producing the desired capacity or product quality but can still operate and make acceptable products. “Low” urgency is usually assigned to instability or operating nuisance or when the cost effects are not major. In the debutanizer example, the urgency was medium to crisis, as the plant could not make the overhead product and the bottom product was off-spec. In the medical analogy, life threat sets a fast pace of treatment, while minor pain sets a slow pace. The urgency of treatment is affected by the proximity of the next scheduled plant outage or turnaround. Urgency is likely to be stepped up when there is an opportunity to attempt a fix in a forthcoming outage or when there is a prospect of having the tower limp along for a lengthy period. In contrast, the urgency is reduced if cleaning, major repairs, or column replacement is planned for a forthcoming outage and is likely to eliminate the problem. In the medical analogy, there may be no need to do some tests when the patient is about to have a major checkup. The urgency may change, sometimes quickly, with changing market conditions. For instance, a rise in product demand or price often steps up the urgency. Obtain a clear, factual definition of the symptoms. A poor problem definition is one of the most common diagnostics pitfalls. In the debutanizer case history above, the definitions used by different people to describe the symptoms of a reboiler tube leak problem were: The above represents a typical problem definitions spectrum. The last definition, provided by a troubleshooting expert, can clearly be distinguished from the others. The first two definitions were nonspecific and insufficiently detailed. The third described one symptom, but left the other symptoms out. The first three definitions also contained implied diagnoses, none of which turned out to be correct. Listening to the people involved helps formulate a good definition. It is easy to miss or overlook crucial details. Different people focus on different details, so talking can bring out hidden details. In the debutanizer case, the observation by the plant personnel became part of the problem definition. The doctor–patient equivalents to the first three definitions are statements such as “I feel I am going to die,” “I am feeling a bit off, but I will be OK soon,” and “I do have a sharp headache” (without mentioning other pains and having a fever as well). These statements do not provide the doctor with the entire story. Familiarize yourself with the column (if not familiar yet). What is the main purpose of the column? What are the key performance criteria? What are the operating temperatures, pressures, product purities, and mass balance that the column is attempting to achieve? Have there been any recent modifications? Familiarize yourself with the physical arrangement of the column and its instrumentation. What instruments are available? Do not overdo this at this point; there will be opportunities to learn the finer details as the investigation proceeds. In the medical analogy, upon entry to the doctor’s office, a nurse often measures height, weight, temperature, and blood pressure for the doctor to review. Examine the column behavior yourself. This is imperative if the problem definition is poor. In the debutanizer example, had the troubleshooter based his investigation merely on other people’s observations, he would have missed a major part of the problem definition. Communication gaps are often hard to bridge. In a similar manner, a doctor always needs to examine the patient before coming up with a diagnosis. In some situations, it may be impractical or too costly for the troubleshooter to visit the site (e.g., a column located on another continent). In this case, the troubleshooter must be in close direct video communication with the operating person, who should be entirely familiar with the column, its history, and its operation. Walk around the column, looking for outside signs. Check all lines containing valves that cross-connect products and measure surface temperatures on each side of each valve. Valves often leak or are inadvertently left open (Section 3.2.1 has some case studies). Survey the column piping for any unusual features, poor piping arrangement, leaking valves, “sticking” control valves, and valves partially shut. In one case (166), a cooling water valve left quarter-opened from a previous campaign constrained the vacuum and led to an off-spec product. Filter baskets may have not been adequately reinstalled after cleaning. Check that meters are correctly installed and samples are taken from the correct lines. For instance, a thermocouple near the junction of two streams may give an unreliable indication of the temperature of either. Listen to sounds coming from the tower. These may indicate vibrations, eruptions, sloshing, loose nuts, or pump cavitation. Inspect liquid lines entering and leaving. Crackling, vibration, and hammering indicate vapor presence in the lines. Column sways may indicate a high base level. Learn about the column history. The question “What are we doing wrong now that we did right before?” is one of the most powerful diagnostic tools available. Closely examine differences between the column and columns used for identical or at least similar services. Examine any differences between the expected and the actual performance. Each difference can provide a major clue. A tabular listing of the similarities and differences (possibly using a spreadsheet) may provide useful leads. In search of clues, doctors always ask patients about their health histories. In the debutanizer example, the troubleshooter compared the operation to the design performance (he was working with a new column). Dig into past issues. Had plugging, precipitation, scaling, corrosion, tray, or packing damage previously occurred? Review the maintenance records. A pump, filter, or control valve that required an abnormal amount of servicing may give a useful lead. Digging into the past may reveal a recurring (“chronic”) problem. Search for the correct link between the past and present circumstances. Be cautious: a new problem may give the same symptoms as a past problem, but be caused by an entirely different mechanism. A history search may also unveil a hidden flaw. In one case (161), a column modification reduced column efficiency. The reduced efficiency was unnoticed, and the poor performance became the norm. The problem was noticed several years later. Search and identify events that occurred around the time the problem started. Carefully review operating charts, trends, computer records, and operator logs. Establish event timing to differentiate an initial problem from its consequences. Chapter 9 describes several case histories with actual operating charts that demonstrate the value of analyzing the event timing. In terms of the medical analogy, doctors always ask patients what happened first and what they did differently about the time when the trouble started. Do not exclude events that may appear completely unrelated, as these may be linked in an obscure manner to the problem. In the debutanizer example, it was the observation that the onset of debutanizer instability coincided with flooding in the fractionator that gave the troubleshooter the vital clue. At first glance, the two appeared completely unrelated. Do not restrict the investigation to the column. Many column problems initiate in upstream equipment. Doctors frequently seek clues by asking patients about people they have been in contact with or their family health history. Question the designers, equipment suppliers, installers, and others that are familiar with the column or worked on the column. Listen to shift operators and supervisors. Take good notes; details that appear unworthy of remembering may turn out to provide vital clues. Experienced people often spot problems even if they cannot fully define or explain them. Operators and supervisors know the behavior of the column better than anyone, and familiarity with this behavior can lead to the correct diagnosis. Different operators may offer different observations, all of which may be useful. Listening to all of them can provide vital clues. When something operators or supervisors say sounds odd, finding out why they say so is crucial and is likely to provide a major clue. In the debutanizer example, some of the key observations were supplied by the shift team. Hanson et al. (157, 158) described two cases, and Hasbrouck et al. (166) another case, where observations by operators led to the root cause. Achoundong et al. (2) showed how systematically listing and understanding the reasons for a variety of comments by the operating team became central to a correct diagnosis. In one crude tower, the author calculated a feed capacity of 85,000 BPD, but the operating team swore they could not go past 70,000 BPD. Understanding why the operating team was saying this revealed a previously unknown bottleneck. A similar experience was reported by Litzen and Bravo (298). Language spoken by operators and supervisors may differ from process engineers’ language. For instance, an operator may state that increasing the column reflux makes the trays more efficient. For process engineers, tray efficiency is the number of theoretical stages achieved by a tray, and this number is usually independent of the reflux (193, 245). For operators, the tray efficiency criterion is separation, which does improve with higher reflux. Effective listening requires understanding why the operator is making each statement. Operating personnel often tell you their conclusions first (166). You need to find out how they reached these conclusions. For example, when they tell you that a pump is defective (as they did in the debutanizer example), asking what led to this conclusion revealed the key observation about instability setting in when the oil flow to the reboiler was raised. During all discussions, be on the same team with the operators and supervisors. Do not be afraid to ask questions, even those that may appear stupid. The attitude that you want to learn more about the issue and work with the operating team to seek a solution is productive and will win cooperation. If you suspect an operator is incorrect, a polite reply like “I guess that is possible” allows saving their face (your face if the operator turns out to be correct!). The attitude that you have the answers is counterproductive, and you will lose the much-needed cooperation. Study the column behavior by making small, inexpensive changes. These are central for refining the definition of the symptoms. Record all observations and collect data; these may contain a major clue, which can become forgotten or hidden as the investigation progresses. In the debutanizer example, the expert watched the column response to raising pressure. This led to the observation that the debutanizer pressure affected the oil flow – an unexpected occurrence that became a major clue. In the doctor–patient analogy, this is similar to the doctor asking the patient to take a deep breath or momentarily stop breathing during a medical examination. Take a good set of readings on the column and its auxiliaries, including laboratory analyses. This step is equivalent to laboratory tests prescribed by a doctor to the patient. To avoid misleading information, suspect instrument readings and laboratory analyses, and make as many validation cross-checks as possible. An instrument may lie even when the instrument technician swears it is correct. One expert (112) stated, “I’ve spent a good deal of company money, and a lot of time, chasing a perceived operating problem because of an improperly calibrated instrument.” In one example (498, Case 25.2 in Ref. 201), an erroneous reading of a reflux flow meter, resulting from an incorrect pipe design, led to unnecessary, costly shutdowns. At the same time, do not disbelieve your instruments. They may be trying to tell you something. You need not trust them, but you should listen to them. In one case (213), an apparently defective flow meter reading led to the identification of incorrectly connected cooling water pipes. Compile mass, component, and energy balances; these provide a valuable check on the consistency of instrument readings and the possibility of leakage. Carefully review the column drawings for unusual features. Verify that the column drawings used are the latest update and explore the possibility of past modifications evading the updates. Review the column internals for violations of good design practices. If identified, examine the consequences of such violation and its consistency with the symptoms. Perform a hydraulic calculation under test conditions to determine whether any operating limits are approached or exceeded. For a separation problem, carry out a computer simulation of the column; check against test samples, temperature readings, and exchanger heat loads. Chapters 2 and 3 elaborate on the various checks. If more information is needed, like looking inside the tower, there are a large number of noninvasive techniques, some of them high-tech, that can give close insights. These include gamma scans, neutron backscatter, surface temperature surveys, thermal scans, computer-aided tomography (CAT) scans, tracer injection, quantitative multi-chordal gamma scans, and others. These are described in detail in Chapters 5–7. These are equivalent to ultrasound, X-rays, magnetic resonance imaging (MRI), and CAT scans used in medicine. “Selective attention bias” refers to strongly concentrating on one diagnosis to the exclusion of other observations one should catch (328). Under pressure and stress, narrowing of focus is natural, but can exclude important and relevant information. A well-known example illustrating this bias is discussed in the book The Invisible Gorilla and Other Ways Our Intuition Deceives Us (62). In the medical analogy, experienced X-ray technicians looking for a particular diagnosis on an X-ray missed obvious, serious problems not related to the looked-at diagnosis (62, 328). In the debutanizer example, selective attention bias is likely to have played a role in the incomplete problem definitions (step 4) and was overcome by the expert’s definition that combined all the relevant information. The doctor needs to step back and look at the bigger picture to counteract this bias. The data obtained from one technique should be consistent with those of others. For instance, in flood testing, check that alternative techniques such as gamma scans and differential pressure measurements give consistent results. Investigate any inconsistencies; these may provide a vital clue. Repeat measurements as necessary. Doctors check that the X-ray results are consistent with their examination results and with the bloodwork. A simulation of the current column operation can be compared to one with good performance. A tabular listing of the similarities and differences (possibly using a spreadsheet) may highlight the issues. Comparing the patient with a healthy individual can provide the doctor with invaluable clues. The simulation can also provide the internal vapor and liquid loads and the physical properties needed for hydraulic checks. These can determine whether a hydraulic issue is expected or whether it occurs prematurely. The simulation can also provide the feed flashes and properties needed to rate-check distributors and inlet pipes. Input from other disciplines may be critical to problem diagnosis (455). For instance, mechanical engineers will be the best to determine whether a tray was dislodged upward, or downward, or vibrated loose and why. Doctors often enlist help from specialists. Beware of placing undue reliance on expert opinion. While experts have vast knowledge, they are less familiar with the problem and data than you. Closely scrutinize their comments in light of your findings, and do not hesitate to challenge them. As an advisor, my opinion had often been correctly challenged by members of a process team, sending me in a different direction. Doctors often go back and forth with a specialist until the correct diagnosis is established. For maximum effectiveness, make the expert a part of the troubleshooting team, working closely together. Tests conducted under upset conditions can be relied on only as a preliminary indication (112), and their data should be suspect and treated with caution. Backing off from the upset to calm conditions eliminates interactions, narrows in on the key variables, and minimizes bad leads. The huge costs of lost production, off-spec products, or idle equipment often tilt the scale to “How fast can we fix the problem?” However, this need for speed should not be allowed to overpower good testing and adequate analysis (16). A study by Swain (16, 470) showed that the probability of coming up with a correct diagnosis initially rises rapidly with the time spent and then flattens off. The author’s father, an old-school doctor, would spend 45 minutes examining a patient before coming up with a diagnosis that was always correct. I have been examined by doctors that took less than two minutes, often with an incorrect diagnosis, and have never returned to them. Following the previous steps, a good problem definition should now be available. In some cases (e.g., the debutanizer), the root cause may be identified. If not, there will be sufficient information to narrow down the possible causes and to formulate a theory. In general, when problems emerge, everyone will have a theory. In the next phase of the investigation, these theories are tested by experimentation or by trial and error. The following guidelines apply to this phase: Sherlock Holmes once stated, “The difficulty is to detach the framework of fact – of absolute undeniable fact – from the embellishments of theorists and reporters.” This detachment is central to obtaining a correct diagnosis. Check and recheck the validity of your data until you are positive that they are correct. Never assume anything. See step 12 in Section 1.3 for typical validation checks. Incorrect data support the wrong theories and deny the correct ones. Look for independent ways of confirming or denying the validity of measurements and observations. Any theory must be consistent with adequately validated data. Adequately validated data form a strong basis for formulating theories. Logic is wonderful as long as it is consistent with the facts and the information is good. Clearly distinguish facts from theories and interpretations. The pitfall to avoid is “Don’t let the data get in the way of a good theory.” Follow the data. There are no “impossible” data. If data appear “impossible,” perform additional validation checks to confirm or deny them. When you have conclusive data, adhere to them. Make sure that the data are good. Do not trust instruments and drawings without verifying their correctness (356). Look for anything that does not make sense. Doubt everything you are told, no matter how much you trust and respect the person (356). Seek to positively confirm all data and facts. Critically check that the theory does not violate the laws of physics and chemistry (356). Closing a valve cannot increase the flow. An exothermic reaction does not reduce temperatures. Any theories that violate the laws of physics or chemistry need to be disposed of. Distillation failures are repetitive (see Section 1.5). Therefore, learning from past experiences in similar systems is invaluable for formulating a good theory. Look for something that happened in the past rather than to large molecular-weight protein molecules wreaking havoc in your system. Talk to people that operate similar columns or check experiences in the literature (see Section 1.5). At the same time, beware of being biased by experience in past cases. Treat a theory based on past experiences as one more theory that needs to be tested. It is important to prevent biases from steering theory formulation in the wrong direction. Mostia (328) elaborates on the variety of biases. Foremost is tending to see what one wants to see. Other biases include being swayed by presentation quality (speech, visuals, color), initial piece of information, group thinking (“bandwagon effect”), or an authority in the field. To counter biases, it is essential to be aware that they exist (no one is immune) and to watch out for them when formulating theories. When formulating a theory, attempt to map the paths of liquid and vapor travel inside the column. Imagine yourself as a pocket of liquid or vapor traveling through the column internals. Which way will you travel? Remember that this pocket will always look for the easiest path. Table 1.1 shows that points of transition (tower base, feed points, draws, distributors) are common bottlenecks. The relevant points of transition can be effectively troubleshot by preparing simplified sketches and addressing the question “Would it work like it should?” This technique is described in detail in Chapter 8. When drawings are not clear or miss important information, do not guess. Check with the supplier, designer, or mechanical engineers. If needed, prepare a cardboard model. Focus on the main components, but do not overlook any foreign material or unexpected byproducts. There have been cases (e.g., 401) in which an apparent poor separation was caused by the presence of an unexpected component. This subject is discussed in detail in Section 3.2.12. Another useful technique is to think of everyday analogies. The processes that occur inside the column are no different from those that occur in the kitchen, bathroom, or yard. For instance, blowing air into a straw while sipping a drink will make the drink splash all over; similarly, a reboiler return nozzle submerged in liquid will cause excessive entrainment and premature flooding. In most cases, the simpler the theory, the more likely it is to be correct. Very few problems are really random. If something happens more than once, there is likely a root cause (318). The most likely problem is probably the problem (488). The key you lost is often in the last place you look (356). An obvious flaw is not necessarily the root cause of the problem. One of the most common troubleshooting pitfalls is retarding or discontinuing further investigation once an obvious flaw is identified. Often, this flaw fits in with most theories, and all are sure that the flaw is the root cause of the problem. The author has seen many cases where correcting an obvious flaw neither solved the problem nor improved the performance. Once an obvious flaw is detected, it is best to treat it as another theory and continue troubleshooting. Start with an empty sheet of paper (150). Beware of an obvious cause blinding the team to other causes. It is common to blame new trays or packings for poor performance initiating immediately after a tower revamp; in reality, the root cause is often unrelated to the new trays or packings. In one case (481), premature flooding following a tower repack turned out to be due to a maintenance replacement of a leaking plug valve by a lower-pressure-drop ball valve. This led to a false level indication that in turn induced excessive tower base level and premature flooding. In Case 20.2 in Ref. 201 as well as in the debutanizer example, an exchanger leak rather than the new trays was the root cause. In yet another case (237), a 40-year-old liquid draw issue bottlenecked a tower following a retray. There were other cases where the existing draw rather than the new trays turned out to be the issue (e.g., 150). In all these cases, the retray or repack was initially suspected. Systematic testing identified the real root cause. In the debutanizer example, it was the troubleshooter’s asking “Why did the tower pressure influence the hot oil flow to the reboiler?” that was invaluable in connecting the dots. Premises on which theories are based can often be easily supported or disproved by calculation. In one case, it was argued that liquid entrainment was an issue. A simple calculation showed that at the upward velocities involved, the rise of any liquid drops would be reversed by gravity within less than 1 in. This totally invalidated the theory. Calculations are only as good as their basis. Closely review any assumptions and that all the dimensions used for tray internals are correct. Request the supplier to provide any missing dimensions. Testing theories should begin with those easiest to prove or disprove, almost irrespective of how likely or unlikely these theories are. If shutting the tower down is expensive (which is almost always the case) but is required for testing a leading theory, it is worthwhile to first cater to alternative theories that require less drastic actions even if they are longer shots. In the medical analogy, surgery should not be performed before a blood test that may identify a less likely cause. Test the response of the column to changes in variables such as vapor flow rate or liquid flow rate. Compare the results with predictions from the various theories. For instance, if a column flood responds to changes in the vapor load but not in the liquid load, any theory that argues that the flood is due to excessive liquid load is invalidated. In one case (385), determining that the tower responded to changes in the vapor load but not to changes in the liquid load invalidated the leading theory and identified the correct root cause. Change one variable at a time. If several variables are allowed to change simultaneously, the result is likely to be inconclusive. Take time to plan every step of your test and consider all possible outcomes (356). Tests that reveal very little not only are a waste of time and effort, but also undermine people’s confidence in the investigation. Discussing the plans with knowledgeable individuals can help avoid this trap. Refrain from making any permanent changes until all practical tests are done. Look for possibilities of simplifying the system. For instance, if it is uncertain whether an undesirable component enters the column from outside or is generated inside the column, consider operating at total reflux to check it out. In one amine absorber (225), where foaming was suspected, a field trial was conducted in which the amine solution was replaced by nonfoaming clean water. The water trial showed an identical capacity limitation, conclusively denying foaming. A small-diameter chemical tower (481) flooded after packing replacement. The flood persisted in a trial where the old packings were reinstalled, ruling out the new packing as the root cause. Be concise in drawing conclusions from simplified tests. For instance, in the amine absorber trial discussed above, the plant took special care to check the water quality before the trial. Had there been doubts about water cleanliness, foaming could not have been ruled out. Critically examine the argument that the same equipment has been performing flawlessly in a similar/identical column for many years (456). No two columns are the “same.” Small variations may make large differences in performance. Closely compare the variations; they may lead you to the root cause. People act based on their reasoning, which is likely to differ from yours. People often have their own agendas, especially when they have a lot to lose in an unfavorable outcome. The more thoroughly you question their operating or design philosophy, the closer you will be able to reconstruct the sequence of events. Their replies may also reveal considerations you are not aware of. Be cautious in your questioning. The attitude that you want to learn more about the system and what can be improved will win cooperation. Pointing fingers or implying that someone screwed up is a sure way of getting noncooperation (411). It is human nature to give preference to data or information that favors one’s beliefs over conflicting data. Such bias can steer an investigation away from the correct theory. Be open to new ideas and beware of human nature to rely on the initial diagnostic impression. To maintain an unbiased attitude when working with conflicting data, request a “cold eyes review” by knowledgeable colleagues or use an alternative analysis method (456). When working with experts, keep in mind step 17 in Section 1.3. Ensure that management is well informed of what is being done and is receptive to it (454, 498). Otherwise, important nontechnical considerations may be overlooked. Further, management is far less likely to become frustrated with a slow-moving investigation when it is convinced that the best course of action is being taken. Often, management is done by technical people with expertise that can contribute ideas. Moreover, such technical people often expect that their ideas are incorporated into the testing. Whenever possible, give supervisors and operators detailed guidelines for the fix attempt and leave them with some freedom to make the system work. The author was involved in several cases where the actions of a motivated operator made a fix work and had seen other cases where a correct fix was unsuccessful because of an unmotivated effort by the operators. With different people having different ideas and theories, it is important to assemble all these ideas into constructive teamwork and to suppress any confrontations. Some people will have a stake in their theory being correct. They will feel that they win when their theory is pushed ahead and feel rejected when their theory is dismissed. Good troubleshooting leadership needs to encourage all ideas, treat all respectfully, recognize that even the ideas that are disproved contribute in the path to solving the problem, and acknowledge their initiators accordingly. Admitting that you are wrong is inherently difficult to do. Nonetheless, recognize that the investigation is not about who is right and who is wrong, but about finding the correct technical solution. Everyone serves on the same team, and all will win when the correct solution is found. Accepting the truth, or accepting that others’ ideas are superior to yours, sends the message of cooperation and the dominance of technical validity. This will promote idea exchange, productivity, and teamwork. Verbal instruction, multidiscipline personnel involvement, and rush generate an atmosphere ripe for miscommunication (161). Ensure any instructions are clear, concise, easy to follow, and sufficiently detailed. When leaving a shift team to implement a fix by themselves, provide them with written instructions. Be reachable and encourage communication should questions or problems arise. Call in at the beginning of the shift to check whether the shift team understood your instructions and are good with them. Unforeseen side effects of even seemingly minor modifications have been the root cause of many accidents. Disallow “back of an envelope” modifications, as their side effects can generate hazards. Properly document any planned modification, and have a team systematically review it with the aid of a “HAZOP” or similar checklist. Prior to completion, inspect to ensure the modification was implemented correctly and as intended. Document any fix attempt, the reasons for it, and the results. This information will be useful for future fixes. In many cases, a sudden change in plant conditions lowers the priority of a troubleshooting endeavor, and it is discontinued. At a later time, the endeavor is resumed. Good documentation of the initial endeavor gives the resumed endeavor a much better starting point. At one time, we designed and built baffles to prevent vortexing near a feed inlet, only to find that similar baffles had already existed, but did not show on the drawings and no one in the plant knew about them. Tests that require column shutdown or low-rate operation are usually impractical. Nonetheless, unforeseen circumstances such as plant slowdown or crash shutdown may open an opportunity for performing them if one moves fast. Be alert for and capitalize on such opportunities. In one large column where a tray malfunction was suspected, a crash shutdown opened an opportunity for installing 10 well-designed trays as a trial fix. A subsequent gamma scan showed that this fix would solve the problem. If a plant shutdown occurs before the tower problem is diagnosed, do not miss the opportunity to enter the column and investigate. Inspecting column internals often reveals unexpected features that contribute to or are the root cause of the problem (Chapter 10). Opportunities for measuring and taking photographs of the internals will not return once the tower is back in service and may turn out crucial. In a recent experience, the question of whether 40 welded-in bubble cap trays should be cut out and replaced (a mammoth task) hinged on the distance between the vapor riser and the cap, and this dimension was unavailable. When a deadline approaches (e.g., a plant turnaround), there is pressure on the team to suspend troubleshooting efforts and to proceed with an arbitrary fix, often imposed by the more vocal or emotional members of the team. This may not coincide with the best engineering fix (455). Anticipate this possibility well before the deadline, and work with the team to promote the best engineering fix while there is still plenty of time. Troubleshooting is not magic, nor is it performed by magicians. It is a learned art. Unfortunately, not much of it is taught at school, although a few university courses on troubleshooting exist. It is learned in the school of hard knocks. You can avoid most of these hard knocks by learning from other people’s experiences. The objective of Refs. 196, 198, 201 was to put these experiences in the hands of every engineer, supervisor, or operator who is interested. Failures are repetitive, and learning from the past can solve today’s problems and avoid tomorrow’s. Three elements have been listed as critical to successful troubleshooting (16): knowledge of the process and equipment, experience with operations and solving problems, and using an effective method to solve the problem. Training programs should provide an understanding of the process and equipment, be based on a large number of experiences, include examples and exercises based on these experiences, proceed stepwise from simple to complex, address interactions with other units, and be accompanied by a relevant manual (14, 16). There are many other resources. Talk to the experienced people in your plant and organization and to fellow workers in professional meetings, and attend their presentations. Get involved in startup, shutdown, and commissioning work. Get involved in incident investigations. Inspect equipment and participate in equipment testing. Consider supplementing the above with self-training. After three years in technical services, I was transferred to operations on startup duties, needing to become an overnight troubleshooting expert. I picked up a notebook and talked to experienced people, taking notes at each stop and collecting lessons, guidelines, and advice. I combed the literature in search of other people’s experiences, often writing to the authors. Their stories and wisdom, together with many other lessons my colleagues and I learned in the school of hard knocks over the years, are among the pages of this book. It is my hope that all these invaluable lessons will be useful to future students of the art of troubleshooting. The problems usually experienced in distillation columns can be classified as follows: Often, a separation problem may show up as a capacity or instability problem. The reason is that due to poor separation, operators increase the reflux and boilup to maintain the product on-spec. This hydraulically loads up the column, and the problem shows up as a capacity limit or, when operating right near the limit, as an instability. Conversely, a capacity problem may produce premature flood, which shows up as poor separation or instability. Likewise, pressure or temperature deviations may be a reflection of premature flood or poor separation or may cause them. The troubleshooter’s challenge is to distinguish the cause from the result. The following chapters will cover the primary techniques available to narrow the root cause down.
Chapter 1
Troubleshooting Steps
1.1 CAUSES OF COLUMN MALFUNCTIONS
No.
Cause
Total cases
Refinery cases
Chemical cases
Olefins/gas cases
1
Plugging, coking
121
68
32
16
2
Tower base and reboiler return
103
51
22
11
3
Tower internals damage (excluding explosion, fire, implosion)
84
35
33
6
4
Abnormal operation incidents (startup, shutdown, commissioning)
84
35
31
12
5
Assembly mishaps
75
23
16
11
6
Packing liquid distributors
74
18
40
6
7
Intermediate draws (including chimney trays)
68
50
10
3
8
Misleading measurements
64
31
9
13
9
Reboilers
62
28
13
15
10
Chemical explosions
53
11
34
9
11
Foaming
51
19
11
15
12
Simulations
47
13
28
6
13
Leaks
41
13
19
7
14
Composition control difficulties
33
11
17
5
15
Condensers that did not work
31
14
13
2
16
Control assembly
29
7
14
7
17
Pressure and condenser controls
29
18
3
2
18
Overpressure relief
24
10
7
2
19
Feed inlets to tray towers
18
11
3
3
20
Fires (excluding explosions)
18
11
3
4
21
Intermediate component accumulation
17
6
4
7
22
Chemicals release to the atmosphere
17
6
10
1
23
Subcooling problems
16
8
5
1
24
Low liquid loads in tray towers
14
6
2
3
25
Reboiler and preheater controls
14
6
–
5
26
Two liquid phases
13
3
9
1
27
Heat integration issues
13
5
2
6
28
Poor packing efficiency (excluding maldistribution/support/hold-down)
12
4
3
2
29
Troublesome tray layouts
12
5
2
–
30
Tray weep
11
6
1
3
31
Packing supports and hold-downs
11
4
2
2
1.2 COLUMN TROUBLESHOOTING – A CASE HISTORY
1.3 STRATEGY FOR TROUBLESHOOTING DISTILLATION PROBLEMS
1.4 DOS AND DON’TS FOR FORMULATING AND TESTING THEORIES
1.5 LEARNING TO TROUBLESHOOT
1.6 CLASSIFICATION OF COLUMN PROBLEMS
NOTE