What Owners Should Know About High Availability and Redundancy

What Owners Should Know About High Availability and Redundancy

Underspecify redundancy and you might end up paying for it later.

By: Allan Evora

We see a lot of misinformation and underspecification when it comes to high availability in mission critical control systems. Sometimes this is due to the misconception that high availability is expensive or overly complicated. We’re going to debunk that myth, and talk about other things owners should know about high availability and redundancy.


What is redundancy?

When designing a high availability system, we strive to eliminate single points of failure. If you have a single PLC, there should be a mechanism by which processes and equipment will continue to operate even if the primary controller fails. This is redundancy.


Who needs redundancy?

High availability is needed whenever loss of power would significantly impact a facility’s production, efficiency, or life safety. For example, in most hospitals, the NFPA 110 requirement states when there is a loss of normal power, a generator must provide power to loads within 10 seconds. If only a single PLC is responsible to ensure that power delivery, a hospital’s ability to provide power in a specified time frame would be jeopardized.

We also see redundancy warranted in data centers where financial transactions worth billions of dollars could be impacted by a compromised computer system. In addition, certain manufacturing processes would lead to waste, nonproductivity, or safety repercussions if not for redundant systems.


What are the levels of redundancy?

There are three types of redundancy:


Cold redundancy

In this situation, you likely have spare parts on a shelf that you can put into service in case of failure. Spare equipment probably isn’t pre-programmed and requires human intervention to install, but it beats not having a recovery plan! Depending on your situation, process, and facility, this type of redundancy may work for you.


Warm redundancy

The spare equipment you have is pre-programmed and ready to put into service. For example, if you have a primary computer that runs your SCADA system, you might also have a secondary computer in the same room that is always configured with up-to-date configuration. If the primary fails, you simply turn on the new computer, move network and video cables, and you’re back up and running. Warm redundancy is faster than cold, but still not as quick as automated system redundancy.


Hot redundancy

With hot redundancy, everything is automatic. If the primary control mechanism fails, the secondary controller takes over seamlessly. The transition wouldn’t be noticed by equipment or processes. The primary and backup controller are essentially in lockstep, monitoring each other’s statuses. Obviously, a high availability redundant solution will be more expensive than systems using warm or cold redundancy.


Is the cost of redundancy worth it?

All major PLC manufacturers have high availability solutions for high and mid-range controllers. In the past five years, hardware costs and configuration time have substantially decreased.

Here’s a great example.

A small (<20,000 sq ft) data center installed a non-redundant DDC control system to control their chilled water distribution. (Related: DDC vs. PLC). However, they recently experienced significant downtime because their system wasn’t redundant. Affinity Energy was asked to research a retrofit of their control system.

For their situation, a DDC control system cost ~$17,000. A retrofitted PLC-based redundant solution runs ~$35,000. It was up to the owner to determine if the extra $18,000 was worth it to avoid the cost of downtime and potential loss of customers.

I can’t emphasize enough how important it is to understand the cost of downtime within your organization/process/piece of equipment. Like this data center owner, most don’t understand the cost of downtime until they experience it for the first time. By that time, the amount of money lost would more than have paid for the cost of a redundant system. But at that point, it’s too late and they end up paying more to upgrade the system than the original cost.


What availability level makes sense?

Understanding requirements and goals you’re trying to achieve within your facility is a great way to determine which system and level of availability or redundancy you should have.

For data center infrastructure, availability classification is already provided by the Uptime Institute in terms of system requirements. The lowest tier, Tier 1, is a basic non-redundant system, allowed 28.8 hours of downtime per year. The highest tier, Tier 4, is only allowed 26.3 minutes of downtime per year and should be able to run 96 hours (4 days) on power protection.


We know redundant systems

Have your engineering firm engage a control system integrator (or engage one yourself) to help guide and specify the types of systems for correct redundant systems. We specialize in mission critical power, and have designed many redundant and high availability control systems, so contact us with any questions about redundancy.


Allan Evora - Founder | Affinity EnergyAllan D. Evora is a leading expert in control systems integration and president of Affinity Energy with over 20 years of industry experience working in every capacity of the power automation project life cycle. With a background at Boeing Company and General Electric, Allan made the decision to establish Affinity Energy in 2002. Allan is an alumnus of Syracuse University with a B.S. in Aerospace Engineering, graduate of the NC State Energy Management program, and qualified as a Certified Measurement & Verification Professional (CMVP).

Throughout his career, Allan has demonstrated his passion for providing solutions. In 1990, he developed FIRST (Fast InfraRed Signature Technique), a preliminary design software tool used to rapidly assess rotary craft infrared signatures. In 2008, Allan was the driving force behind the development of Affinity Energy's Utilitrend; a commercially available, cloud-based utility resource trending, tracking, and reporting software.

Allan has been instrumental on large scale integration projects for utilities, universities, airports, financial institutions, medical campus utility plants, and manufacturing corporations, and has worked with SCADA systems since the early ‘90s. A passion for data acquisition, specialty networks, and custom software drives him to incorporate openness, simplicity, and integrity into every design in which he is involved.