Online monitoring of strategic transformers … Why bother?



By Trevor Lord, Director, LORD Consulting

Most executive engineers managing transformer fleets are quick to acknowledge that certain units of that fleet would be classified as ‘strategic’ assets to their company. They would readily agree that the unforeseen loss of such strategic assets would be a calamity, but then the interesting conversation begins. When asked to articulate the likely cost of that impact to the business, most would cite an approximate figure for the transformer replacement cost, but begin to pale when asked next to articulate the cost to the business beyond the value of the asset itself. Perhaps a ‘well it depends…’ statement could be expected, but few have worked through it in reality to the extent that the true likely impacts sit at their fingertips, nor by corollary, at their top of mind.

It is not uncommon at this point for the conversation to leapfrog to  comments like: “Well, what are the chances… we have had no trouble with these transformers to date”, or: “We look after these transformers well and do annual/periodic DGA and oil samples so we are doing all we can to manage the risk”, and almost certainly the perennial comment flows: “Well even if it did fail we have N-1 so would be fine”. Really? Would the reader be happy to draw the conclusion from such a conversation that strategic risk to the business lying with such plant was actually being well managed? One would hope that was not the conclusion.

What about thoughts like: “Are there any spares”; “How long would it take to procure and install another transformer?”; “Could the failure be far more dramatic than the major failure being an imagined ‘quiet handover’ to the ‘sister N-1 unit’… with no  risk at all of fire, explosion, loss of life, consequential damage to the site, or injury?”; and “What confidence so we have that the N-1 sister transformer really could confidently then manage nominally twice normal load?”. What would be the cost then were such a failure to be more complicated than often imagined? Well, it soon gets to be a ‘difficult conversation’.

Given the prevalence of the above ‘N-1’ statement in routine conversation with the industry, it is worth digressing for a moment of reflection. One the biggest ‘frights’ an Australasian transmission company ever got was the day they closed down one vital transformer for maintenance, transferred the load to the ‘N-1 sister transformer’, dismantled the yard around the shut-down transformer, then suffered the indignity of the N-1 unit failing catastrophically inside 12 hours under the extra loading (it was not fitted with online monitoring, of course). A major city was very nearly plunged into a major and long-term power cut, as it turned out that the incident took place on a highly strategic set of transformers which only 12 hours prior had been considered to have a strong reliability. Of course the lesson here, as with all the above points, is that one really never does know when such an event will occur, despite many years of the situation never being considered possible nor showing any likelihood of doing so. A monitor costing some $30,000 would have alerted the asset manager of that company to this risk in a timely fashion and allowed the failed unit to have been shut down in a more orderly manner then repaired economically and put back into service comparably quickly. As it was, this situation became a protracted risk exposure to the company whilst a replacement ‘N-1 transformer’ was procured at great cost and taking a lengthy period to be replaced. Is the point here becoming a little more evident with just one illustration alone?

It is pertinent here to address one of the most common misconceptions of all which, if taken literally (which it so often is currently as wider expertise is progressively leaving the industry), would put a very large population of strategic transformers at hugely elevated risk. The misconception to which I refer is the simple fact that transformers fail slowly and progressively over a long period of time, and certainly at such a rate that annual or even bi-annual sampling (perhaps even accompanied by comprehensive offline electrical testing every four years on average) is deemed perfectly adequate to be ‘doing a good job’ of asset management. By corollary of this approach, the asset manager is saying to us that the fastest fault he expects will occur is one that will take a minimum of two years to happen, thus allowing this to be picked up in a timely manner by annual sampling (this is an illustration of the Nyquist Sampling Theorem all engineers learned at University… (one must sample at least twice as fast as an event in order to detect the event in a timely manner). Can this be right… are the failures that long in duration and development? Very sadly, I must convey bad news…that assumption is by no means always the case. Such thinking cost the company dearly in the above case study and very nearly caused a further political fallout and public embarrassment that alone would have easily added a six-figure area sum to the misery it already had incurred.

To explain, transformer failures typically fall into two camps… slowly-developing faults that one might best describe as textbook ageing (ageing DP, moisture and particle build up in oil due to paper ageing, etc dropping oil dielectric strength), and fast-developing faults that can arise with catastrophic outcomes in weeks, days, or matters of under a day. Sadly, the latter are real (see illustrative Figures 1-3) and have been frequently observed and caught by purpose-built tools for noting such things. Those asset owners who have experienced the action of fast-developing faults have been greatly humbled and shaken by the event, but there are some asset owners who have had this occur on their assets and, whilst shaken, have developed a wry smile as their purpose-built tools (more commonly known as ‘on-line main tank monitors’) installed to ‘see this coming’ did not fail them and the event was contained in a timely and very cost-effective manner.

Really, one might ask? Surely these sorts of fast-developing faults are more in line with ‘Acts of God’… “surely that is why we have a Buchholz relay”, one might say. “How may people have ever had this issue happen to them?” is often the cry. The conversation typically slides onto very thin ice past this point and has a very familiar parallel with the sort of argument that house fires are so rare that why bother putting in fire alarms, sprinklers etc. and paying huge money in fire insurance? Rationally, however, despite the small risks of an event, it is a very small percentage of homeowners who gamble their biggest strategic asset on a game of chance… small risk but big consequence. So, why would an asset manager take the very same gamble with the most strategic power systems assets of their company, all on the premise that failures only occur slowly?

Fig 1: Random Arcing faults in main tank. High level, short duration, 3-4000 ppm H2, dissipating over 1 week.
Arguably, the most dangerous of all main tank faults & almost certainly missed by ‘routine’ annual oil sampling.
Fig 2: Failure of a brand new 500MVA GSU at Entergy Louisiana USA, just 2 weeks old.
Timely detection & action led to a full repair &
Fig 3: Failure of a bushing tail in GSU at Sam Yamoto Steel, Thailand . Failure developed over 2 weeks and ran away in just 4 days.
Timely detection allowed a full repair & return to service.

LORD Consulting is very concerned to observe that such pushback on important conversations of engineering risk management is becoming more commonplace in our interactions with industry. It would appear to be simply due to lack of knowledge of the possible failure mechanisms of their transformers, a situation appearing to gain momentum as older and experienced engineers progressively hand over to younger asset managers who are yet to acquire such skills. While we can rationalise the cause, the problem remains, and the risk grows. This is of course a much wider Industry concern.

But back to the story…

Fast-evolving faults might include scenarios like thermal faults progressively worsening in a runaway mode and then leading to arcing faults (not uncommon in GSUs), or arcing faults occurring randomly due poor oil dielectric strength, or paper ageing prematurely due to failures of cooling systems. To catch these in a timely manner and then have time to react suitably, one must sample at no less than an amazingly short three-hourly interval, a fact again derived by pitching the Nyquist Theorem against the reality of the observed time lines of fast-evolving faults.

Cigre (TB 642, Dec 2015) has in fact assessed failures and failure rates of transformers of diverse populations. While low (with probabilities around ranging from 0.4 to two per cent), strategic transformers have by their very nature, a ‘high’ cost of risk to the asset owner. The matter, sadly, is a full and ever-present reality of transformer failure events. The reality is well known to insurers and experienced transformer risk professionals alike… no one is served well in the end by being unaware of the risk of strategic transformer potential failure mechanisms and timelines that pertain. Interestingly, age of the transformer is not a good guide to the risk potential… FM Global revealed in 2018 international claims statistics that even show that they failure rates for transformers under 20 years old is disproportionately represented.

FM Global clients’ transformer loss experience (by value)

Fig 4: FM Global clients transformer loss experience (by value, 2006 to 2015). Note the significant contribution of the main tank faults. Courtesy: FM Global Australia

So, what is the asset manager to do when thinking of their strategically-critical transformers?

  1. Understand and accept that there is a real risk of short-gestation time failures.
  2. Consider risk potential very seriously, even if the probability might appear slight on paper, consider consequences.
  3. Carry out a detailed risk assessment. Examine availability of spares, the effects of secondary damage from a failure, contingency plans for a worst-case scenario, cost of a potential failure and consider the cost of the risk.
  4. Consider if the consequences of an adverse failure are accepted to be high. If so, take steps deploy transformer main tank monitoring in the first instance. Provided this is a suitably-chosen monitor specification, that the devices sample and analyse representative main tank oil in nominally three hours, and that it is suitably implemented in the company, FM Global [Fig 3] has shown that this will cover some 80 per cent of the total risk to the transformer by offering a timely warning of both fast and slow-evolving faults. Other options open to the asset manager, but possibly offering lesser return on the investment (depending upon the risk assessment), are to fit on line monitoring to the bushings (representing nominally 12 per cent of the risk by value) and to either upgrade or service more routinely the tap changer (about eight per cent of the claims risk value). Incidentally, we also recommend that the traditional nominally annual more comprehensive oil analysis and longer term off-line diagnostic testing at shut-downs is continued as a further complementary preventative measure to the fast-sampling on-line monitors.
  5. Embrace transformer on-line monitoring. It is vital that several key steps are taken as part of that decision. These should include:
    • Development of a ‘Transformer Monitoring Policy’ aligned with your company’s Asset Management Plan
    • Establishment of a procedure to service and maintain the monitors on a regular basis, ensuring that they are working as they should be (an asset management plan for the new monitoring systems)
    • Ensure that main tank alarms are set, and maintained, based upon the stable level of the gases being monitored. A good manufacturer of the monitors will guide in this matter. It is vital to set monitors in this manner, and not simply via a blanket ‘level based’ triggering common to a whole fleet which simply is inappropriate in the risk management of a diverse fleet of transformers, even if they have similar specifications. Recall that transformers are all hand-made and likely no two are ever ‘the same’.
    • Ensure that there is a ‘Plan of Action’ in place for when the monitor alarms. Remember that there may only be hours or days to react and one must plan accordingly. Ideally, a small specialist team is established to take and react to the alarms from the monitors. This decision will pay handsome dividends in reacting appropriately to any alarm and in a timely manner, this delivering the final goal of timely warnings meeting a timely and informed reaction and subsequent risk mitigation.
  6. Review your transformer protection scheme. While the recommended steps above essentially allow one to manage nearly all potential risk, remember that there is still a need to consider the fastest-evolving faults of all… catastrophic failures. Typically, such incidents are more likely the result of mechanical damage to windings and insulation because of significant “through faults”. Transformer monitoring is not designed to be a protection relay… that is the domain of the likes of the Buchholz relay and the overall performance of the breakers and protection relays safeguarding the transformer. Make sure that the overall protection architecture external to the transformer is as fast acting as practicable and that ideally there is a breaker on upstream side.

About LORD Consulting

LORD Consulting has an extensive consulting team specialising in both transformers and in the field of asset management of power systems assets. Each consultant is heavily involved with Cigre in their respective fields of transformer and asset management topics and are both informed and current in their expertise. Realising the time pressures and experience challenges of the modern asset manager to deal with all such matters in the level of detail and time frames that typically pertain, it would be our pleasure to offer to guide and assist any asset manager motivated to address a perceived risk to his strategic transformer assets. Such a contribution may be one or more or a combination of the following services:

  • Assistance and independent assessment of risk associated with your assets
  • Condition assessments of strategic transformer assets or entire asset fleets
  • Recommendation of suitable systems and specifications for monitoring equipment
  • Peer review of your existing transformer risk-assessed plans for managing transformers or we can develop one for you
  • Development of asset management plans.

LORD Consulting cares deeply about the pursuit of excellence in the field of strategic transformer management and monitoring. We welcome the chance to contribute supportive input, or to address questions in a discrete and professional manner.