Don’t Base Solar Site Performance Off Misleading Facts.
Normalize Data Instead.

Don’t Base Solar Site Performance Off Misleading Facts.
Normalize Data Instead.

Comparing Apples to Apples

By: Adam Baker

Compare apples to apples when analyzing solar site performance data.

Many solar plant owners base their plant performance off misleading solar data. Many SCADA integrators simply set up systems with raw values, which is sometimes desirable, but can be extremely deceptive. Comparative values should be what is most important to plant operators.


What is Data Normalization?

Data normalization is simply the act of taking raw data and adjusting it based on other values or percentages onsite. Normalizing data helps present a better overall picture of what’s happening at your solar site, and makes it easy to see the magnitude of facility problems like underperforming modules, bad connections, and blown fuses.

Learn why too many useless data points = inefficient solar farm operation

Some SCADA integrators have the belief that their job is to deliver data, and once that job is done, it’s the owner’s job to interpret that data. If the SCADA developer hasn't spent time in the field, or they don't have a deep understanding of the hierarchy of energy production at a solar site (cell, module, string, harness, combiner, inverter, transformer, and GSU if transmission connected), then the only context they have to go by is the data map of devices. From this perspective, it's easy to assume all similar devices are created equal.


What Solar Data Should be Normalized?

The most important solar data to normalize is combiner box current. Other variables that can be normalized include module bin class (if a site ends up with a mix based on supplier availability), and DC watts behind inverters.

Combiner box output
Most of the sites I worked on early in my career were large and rectangular. Arrays were identical for most or all inverters around the sites. With the variability often found in smaller sites, array layouts may be unique for every inverter. The result is a set of combiner boxes with different quantities of harnesses coming into them

Looking at combiner box current as a raw value validly tells you that you have 330A from CBX1, and 220A from CBX2. An operator could draw the conclusion that there was a problem at CBX2, when in fact combiner 1 has 16 harnesses feeding it, while CBX2 has 10 harnesses, and in fact it is CBX1 that has an issue, even though the raw data might trigger a work order to investigate CBX2.

In this scenario, the raw value is only useful if the operator knows what the current should be. A normalized value of “current per harness” will divide total combiner DC current by # of harnesses into that combiner. Using this metric, the scenario described in the previous example would identify CBX1 as underperforming by 6%.

This is the lowest hanging fruit for solar data normalization.

Array DC normalization
A more difficult metric to normalize is inverter energy when the amount of DC behind it varies across a site. If the first 1MW inverter has 1.2MW of DC behind it, but inverter 2 has 1.3MW of DC, then both inverters will peak at 1MW of output. But the smaller array will get to clipping later and leave earlier than the array with more DC, making the energy from the two inverters differ slightly, even though they will both reach max power. Normalizing energy is the key here.  It’s a difficult analysis for someone who isn’t highly knowledgeable in how PV solar works, and too complex for this post.


How to Normalize Solar Data For Best Performance

The easiest way to normalize solar data is by looking at percentages rather than values, and the best way to visually understand this is by looking at a bar chart.

The graphs below represent the exact same data. One chart shows the values (amps) of each combiner box, and the other shows the performance percentage of the same combiners using the example described above.

Inverter Current RAW

When looking at the raw values, the user has to know how to interpret those values to determine if something is working well or badly. (Is there something wrong with CBX2? Why isn’t it performing as well as the other inverters in the same area? And why is CBX1 so high?)


Inverter Current Normalized

However, when you look at the inverter output by scaled by strings per CBX, CBX2 is performing at its maximum, and it's actually combiner 1 that's underperforming. In this representation, even very small differences will be apparent. CBX 4 is underperforming by 2%, which represents the effect of one bad cell in one module in the feeding strings.


RELATED: Case Study in Freelance String Monitoring Shows Utility-Scale Solar Sites Don't Have the Correct Info to Identify Site Problems


Case Study: How Normalized Data Can Save You

The persistent problem with raw solar data is knowing what the values should be. Operators monitoring more than one site will be hard pressed to know what every piece of data should be, and looking at every bit of data on every screen will be difficult to hit on a regular basis.

In previous implementations of very large solar sites (many hundreds of inverters), I have implemented up to 50 bar charts side by side because of how easy it becomes to identify the one that's different from the rest. I could imagine this expanding to even more in some cases.

Comparative data is not applicable for every application. However, PV solar happens to have a very large amount of data available from array to array with very little variability from device to device (outside the number of strings as inputs to combiners). Thus, comparing inverter to inverter can seem easy, but capturing all the nuances of DC/AC ratio when not identical, require knowledgeable engineering to implement a system that makes performance easy to quantify.

Current per string is the easiest, and perhaps the most valuable opportunity to normalize.

I am reminded that a small amount of energy loss today is multiplied by days, weeks, and years if not identified and corrected. Even at pennies per kWh, when considering one bad cell in one module, the math over the life of the plant can translate to $10,000 in cost. Being able to identify these small problems early is critical to maximizing the long term revenue of the solar plant.



Adam Baker is Senior Sales Executive at Affinity Energy with responsibility for providing subject matter expertise in utility-scale solar plant controls, instrumentation, and data acquisition. With 23 years of experience in automation and control, Adam’s previous companies include Rockwell Automation (Allen-Bradley), First Solar, DEPCOM Power, and GE Fanuc Automation.

Adam was instrumental in the development and deployment of three of the largest PV solar power plants in the United States, including 550 MW Topaz Solar in California, 290 MW Agua Caliente Solar in Arizona, and 550 MW Desert Sunlight in the Mojave Desert.

After a 6-year stint in controls design and architecture for the PV solar market, Adam joined Affinity Energy in 2016 and returned to sales leadership, where he has spent most of his career. Adam has a B.S. in Electrical Engineering from the University of Massachusetts, and has been active in environmental and good food movements for several years.