
Contributed by Amit Sinha, Lead Artificial Intelligence and Machine Learning Engineer at TRC
AI runs on data. It’s the gas in the engine of AI initiatives. But there’s something that every utility executive needs to know about that fuel: It must be the right data, or your AI initiative will be at risk of setbacks, delays or even failure.
Not all data is the same. In fact, most data are not good enough for AI, and using it can undermine the success of your AI initiatives – just like putting kerosene in your car or putting diesel into a rocket engine. For AI initiatives to succeed, they need to be powered by high-quality data that is timely, accurate and complete.
The challenge however is that the data utilities have in their IT and OT systems typically falls short of being AI-ready. And that is a major obstacle to achieving the objectives of utilities’ AI projects.
The data problem is not just an abstract concern. There is clear evidence that AI initiatives across all industries are struggling because of data issues. A recent Forbes article, titled “Why Artificial Intelligence Hype Isn’t Living up to Expectations,” puts a spotlight on statistics revealing that less than 7% of companies are successfully using AI. And this white paper by Rackspace reports that the leading cause of those failures is data quality.
For AI initiatives to deliver, organizations need a data strategy that ensures that data is truly AI-ready. And the best data strategy is one that builds processes that achieve this not just at a single point of time before a certain AI project, but continuously over time.
Is your data accessible?
This is a critical question for utilities because of how much of their data is siloed within multiple IT and OT systems across their operations. This is particularly true for location-based data, which lives in multiple systems across an organization. Because these systems are often disconnected from one another, data likely exists in locations that are not accessible for AI initiatives. Breaking down those silos should be a focus of a utility’s data strategy.
But there is an important corollary to this: utilities need to not only break down walls for accessing data where it exists but break down barriers for accessing it when it is needed. Legacy IT and OT systems typically perform data aggregation, reporting and analysis on a periodic basis. Rather than being reported continuously, data often sits in limbo for 12 hours, an entire day or even a full week awaiting batch reporting and processing. Not only does this lack of real-time data create operational headaches for utilities; it is also a major issue for AI initiatives.
DERMS data is a great example of why it is so important to make data more accessible and available when it is needed. DERMS data comes from a variety of sources, ranging from renewable energy assets to conventional sources of energy. This data is often scattered across multiple systems, making it difficult to aggregate in a timely manner. The same is often true for data about energy usage by houses, factories, etc. These are classified as metering data and typically reside in systems that are not readily available. Removing the barriers in your IT and OT systems to access this data when you need it should be a priority for utilities to eliminate inefficiencies that hamper energy distribution at grid levels.
Does the data exist?
This is another critical question for utilities because the answer may be surprising. Yes, the investments that utilities have made into their connected infrastructure has successfully generated a wealth of data that can be analyzed by business systems and AI. But there are blind spots where little or no data exists. And those blind spots are often in areas where AI initiatives need data in order to be successful.
One example is substations, which are a common “black box” in utility’s digital model of their infrastructure – a portion of the infrastructure that does not have detailed information about what it contains, how it is performing, etc. Lack of detailed, digitized information about substations and other black boxes like them creates massive inefficiencies when work crews do not know exactly what they will find until they are on-site. This lack of information is also a major issue for AI initiatives, which are then forced to analyze datasets with significant holes in them.
To eliminate these blind spots, utilities should have a data strategy that identifies these black boxes, shines a light on them and begins gathering information about them. There are a number of proven techniques including computer vision tools that work crews can use on-site to capture missing info. Another technique is to use AI modeling to fill in gaps through extrapolation of existing data. Using AI to make data more AI-ready may feel a bit ironic, but some utilities are already proving the efficacy as a blueprint for the rest of the industry.
Is your data accurate and rich enough?
Even if data exists and is accessible, it still may not be AI-ready. That is because not all data is high-quality. It can fall short in terms of accuracy. Or it can lack the depth it needs. Or both. This can often be the case with utility data, which can vary in quality.
For example, one data point just captured by a field worker using a mobile tablet with 3D scanning might check every box for accuracy and precision. But another data point may be from a legacy system based on handwritten notes on paper maps and reports that are years or decades old. That second data point may fall far short of the threshold for accuracy and richness, which makes it a liability for AI initiatives.
For that reason, utilities need a data strategy that assesses quality and augments accuracy and richness. With the latest technologies and techniques, you can rapidly improve the quality and richness of both existing data and new data on an ongoing basis. One of the ways utilities can do this at scale is through AI-driven data conflation, which automates the process of identifying and enhancing data that needs improvement.
Is your data compliant?
After you make sure it exists, is accessible, is accurate enough and rich enough, there’s another challenge that your data strategy needs to address compliance. Regulations like GDPR in Europe and state-level regulations in the U.S., such as California’s privacy law, make it critical to track ownership of data, which determines whether it can be used in AI initiatives. Large language models that mistakenly use private data will need to be revised or even destroyed – leading to costly setbacks in AI initiatives.
To mitigate the risk of using data that should not be fed into AI models, utilities need a data strategy that makes it possible to not only track the provenance of data but also manage that data over time. For example, a piece of data may be usable at first, but what if a customer designates it as private?
GDPR gives us a clear standard for data management that should be incorporated into utilities’ data strategy: the FAIR guideline. FAIR stands for findability, accessibility, interoperability and reusability – a set of criteria that aligns with what I have discussed in this article.
It is a blueprint for data stewardship that is relevant not only to compliance with GDPR and other regulations, but also very relevant to making data AI-ready. Using FAIR as a map for achieving compliance puts utilities on the path to ensuring that data is also AI-ready.
For those who want to do a deeper dive into what makes data AI-ready, this blog post by Adam Roche on Snowplow’s website is an excellent read that provides more technical detail: What is AI-Ready Data? | Snowplow Blog
How long is the path to AI-ready data?
This may feel like a long list of steps to ensuring that data is AI-ready, but utility executives should not fear the length of time it will take to put a data strategy into action. Based on our experience supporting companies, implementing a data strategy based on these principles can be completed in just a few months if we use cloud vendors that specialize in big data management. More complex projects in industries that are heavily regulated may require more time than that.
About the author
Amit Sinha is the Lead Artificial Intelligence and Machine Learning Engineer at TRC, a global leader providing environmentally focused and digitally powered solutions tailored to meet the unique challenges of the energy transition. In this role, he develops innovative applications of AI and ML technologies for gas, electric and water utilities as well as a range of other client companies. His entire career has been devoted to extracting deep insights from data to solve business challenges, including his prior roles at Esri and DoorDash. He has a BS/MS in Engineering and a Ph.D. in optimization, simulation and automatic differentiation in Artificial Intelligence and Machine Learning from the Indian Institute of Technology in Bombay.