Underspecify redundancy and you might end up paying for it later.

By: Allan Evora

We see a lot of misinformation and underspecification when it comes to high availability in mission critical control systems. Sometimes this is due to the misconception that high availability is expensive or overly complicated. We’re going to debunk that myth, and talk about other things owners should know about high availability and redundancy.

 

What is redundancy?

When designing a high availability system, we strive to eliminate single points of failure. If you have a single PLC, there should be a mechanism by which processes and equipment will continue to operate even if the primary controller fails. This is redundancy.

 

Who needs redundancy?

High availability is needed whenever loss of power would significantly impact a facility’s production, efficiency, or life safety. For example, in most hospitals, the NFPA 110 requirement states when there is a loss of normal power, a generator must provide power to loads within 10 seconds. If only a single PLC is responsible to ensure that power delivery, a hospital’s ability to provide power in a specified time frame would be jeopardized.

We also see redundancy warranted in data centers where financial transactions worth billions of dollars could be impacted by a compromised computer system. In addition, certain manufacturing processes would lead to waste, nonproductivity, or safety repercussions if not for redundant systems.

 

What are the levels of redundancy?

There are three types of redundancy:

 

Cold redundancy

In this situation, you likely have spare parts on a shelf that you can put into service in case of failure. Spare equipment probably isn’t pre-programmed and requires human intervention to install, but it beats not having a recovery plan! Depending on your situation, process, and facility, this type of redundancy may work for you.

 

Warm redundancy

The spare equipment you have is pre-programmed and ready to put into service. For example, if you have a primary computer that runs your SCADA system, you might also have a secondary computer in the same room that is always configured with up-to-date configuration. If the primary fails, you simply turn on the new computer, move network and video cables, and you’re back up and running. Warm redundancy is faster than cold, but still not as quick as automated system redundancy.

 

Hot redundancy

With hot redundancy, everything is automatic. If the primary control mechanism fails, the secondary controller takes over seamlessly. The transition wouldn’t be noticed by equipment or processes. The primary and backup controller are essentially in lockstep, monitoring each other’s statuses. Obviously, a high availability redundant solution will be more expensive than systems using warm or cold redundancy.

 

Is the cost of redundancy worth it?

All major PLC manufacturers have high availability solutions for high and mid-range controllers. In the past five years, hardware costs and configuration time have substantially decreased.

Here’s a great example.

A small (<20,000 sq ft) data center installed a non-redundant DDC control system to control their chilled water distribution. (Related: DDC vs. PLC). However, they recently experienced significant downtime because their system wasn’t redundant. Affinity Energy was asked to research a retrofit of their control system.

For their situation, a DDC control system cost ~$17,000. A retrofitted PLC-based redundant solution runs ~$35,000. It was up to the owner to determine if the extra $18,000 was worth it to avoid the cost of downtime and potential loss of customers.

I can’t emphasize enough how important it is to understand the cost of downtime within your organization/process/piece of equipment. Like this data center owner, most don’t understand the cost of downtime until they experience it for the first time. By that time, the amount of money lost would more than have paid for the cost of a redundant system. But at that point, it’s too late and they end up paying more to upgrade the system than the original cost.

 

What availability level makes sense?

Understanding requirements and goals you’re trying to achieve within your facility is a great way to determine which system and level of availability or redundancy you should have.

For data center infrastructure, availability classification is already provided by the Uptime Institute in terms of system requirements. The lowest tier, Tier 1, is a basic non-redundant system, allowed 28.8 hours of downtime per year. The highest tier, Tier 4, is only allowed 26.3 minutes of downtime per year and should be able to run 96 hours (4 days) on power protection.

 

We know redundant systems

Have your engineering firm engage a control system integrator (or engage one yourself) to help guide and specify the types of systems for correct redundant systems. We specialize in mission critical power, and have designed many redundant and high availability control systems, so contact us with any questions about redundancy.