The solar manufacturing edge: Software design-for-test

Software is not a major direct cost in solar manufacturing, but it can have major effects on ramp time. Wayne Lobb and Karl Aeder from Foliage Software Systems explain what’s required to enable flexible, reliable control/automation software on the factory floor.

by Wayne Lobb, Karl Aeder, Foliage

Solar manufacturing processes today commonly start as research and ramp up slowly to full production. Though software is not a major direct cost, it can have major effects on ramp time. Delays from enhancing or fixing control and automation software can stop floor progress for days at a time. A production line cannot tolerate unnecessary delays for software modifications. For these reasons, software for solar manufacturing must be agile in the purest sense: quick to evolve, yet reliable as soon as it arrives on the factory floor.

In our experience, the single most important factor in software agility is the ability to functionally test new or change software, on demand, immediately and thoroughly, separately from hardware.

Untested or lightly tested software will break unnecessarily when installed on hardware, no matter how skilled the developers are. Manual testing on hardware is feasible early on, but it takes too long as the line approaches full function. Delays waiting for manual testing to finish are as painful as delays due to feature-poor or buggy software. But shortcutting testing only leads to overlooked bugs that disrupt progress unpredictably.

The solution is to design and build fast, comprehensive automated tests from the start, applying software design-for-test techniques.

Design-for-test

Hardware design-for-test (DFT) has long been an important strategy in semiconductor circuit manufacturing. Two key concepts are state observability and controllability: how to know exactly what is happening inside this complex system, and how to repeat it and efficiently bring it to a particular state. These concepts also apply to software that controls equipment or automates materials handling. Some hardware DFT techniques have parallels in software: building special test circuitry (analogously, software test code), reducing state-space size by decomposing logic into units (modularizing), and incorporating diagnostics (tracing, logging).

Competitors who employ automated software testing gain significant edges in time-to-market and production reliability. In today’s economic climate, you cannot afford not to take this approach. Here are proven techniques that are inexpensive when used from the start that pay back dramatically during the ramp to production.

Enable simulation of everything through interfaces

For fast turnarounds of software changes, we recommend that one architects, designs and implements simulation of every aspect of the system that is not the software itself.

In particular, simulate factory floor operators, process engineers, and maintenance technicians. This might sound strange, but it’s crucial; you must design your software so that human interactions occur only through application programming interfaces (APIs). Further, on the outside of APIs, where the humans are, no automation or control logic must execute, such as logic related to shutdown or automated recovery (Figures 1 and 2).

 

Click to Enlarge
Figure 1. Equipment software configuration: production.


You don’t need to simulate everything that a human could do, only passing of new or changed information between humans and the rest of the software. For example, software should generate alarm information that has no inherent assumption that actual human interaction will be involved. When your software posts a test alarm, a user-interaction simulator easily emulates operator acknowledgement or timeout.

Similarly, emulate equipment interactions with host systems through APIs. You don’t need to simulate the entire host, only aspects that apply directly to your software. Further, simulate bar-code readers, light-tower states, material handling subsystems, and all other hardware-software subsystems. Always start small and inexpensively with a simple API, and grow APIs and simulations together as needed.

Each simulated entity needs to be quickly settable to known states to support testing and debugging. For example, if your implementation uses a relational database, then you can make the network location of the database externally configurable to support multiple pre-populated instances of the database. This way, your tests can start or restart immediately from a known database state.

Advantages of software-only testing through simulation are test speed and thoroughness. As illustrated in Figs. 1 and 2, simulation can insert problems that are difficult or impossible to replicate on actual equipment. Automated tests with simulators can be used to artificially scale up demands on software, revealing performance bottlenecks and memory-usage problems before deployment. Not all problems can be replicated or found through simulation, of course; only execution on real hardware can reveal everything.

Click to Enlarge
Figure 2 Equipment software configuration: simulation and test.


It is essential to recognize that the core software inside the APIs does not change at all between Figs. 1 and 2, and it does not know whether its current configuration is for production or for test.

Cost-effectiveness of simulation and automated testing will depend heavily on your choice of test framework. You may be better off with a batch-oriented test framework such as JUnit-JSystem or xUnit, as opposed to a complex and expensive general-purpose automation tool or programming environment.

Clearly, simulation and automated testing require significant work and time. Therefore it is critical to treat simulation as a true project, not just a set of side tasks. Rigor here will pay back many-fold by preventing delays on the floor.

Isolate real code from test and simulation code

In the push to get simulation and test running at the same time as core code that will run the equipment, it’s tempting to mix simulation and test code with the core source. This is a mistake, however. The actual run-time code should never know if it’s running with real hardware versus in simulation or test mode — or even a combination.

Two ways of mixing test and simulation code with core code are conditional compilation and run-time-mode flags. Both ways lead to executing somewhat different paths through the source during testing versus on the floor. Both can therefore lead to missing some bugs during testing that will waste time when the bugs emerge on the floor. Other bugs will turn out to be in the test code, not in core code, and will waste time before floor arrival. For these reasons, simulation should be done only indirectly through APIs, not through mixing of test code and real code

Enable deterministic execution for debugging

Most modern automation and control systems are event-driven and use asynchronous messaging on multiple threads to push information to internal and external entities. This differs from polling, in which controllers continually pull state information synchronously from elsewhere and take action on certain signals. Asynchronous messaging is more economical and elegant, because information is passed only when needed.

A challenge with asynchronous communication, however, is that the sequence of delivery of event information is usually not deterministic. That is, two runs from identical initial states using the identical inputs can have different outcomes. Why? Because a particular operating system context at run time can lead to event X being delivered to a subscriber before event Y — but another program run from the same starting point might pass through different contexts, and event Y might happen to be delivered before event X. Underneath, execution has gone through two different series of states. Sporadic bugs due to non-determinism of event sequencing can be maddeningly difficult to fix.

One technique for dealing with non-determinism is to timestamp event messages at the moment of origin. Another is priority queues that use origin timestamps as the basis for priority instead of time of receipt at the queue.

To track down execution problems due to non-determinism, you should from the start, build in software mechanisms that enable debugging to step through event processing in the exact time sequence that events were generated and processed.

Forced-deterministic testing accelerates delivery of problem-free code. Applying this principle from the start of your project need not be expensive or difficult. Retrofitting it later is usually impossible.

Design logging facilities for debugging and diagnostics

Software developers know that tracing/logging facilities are needed to debug code. A common pattern, however, is to pull something together quickly early on, but never revisit or flesh it out later because “there’s no time for that now.” This is a failure mode. Instead, design tracing and logging to provide accurate diagnostic information that is readable by humans, and that enables them to reconstruct system behavior to determine quickly what went wrong during failures.

Logging to files can be computationally heavy. Also, turning on logging can change low-level details of what is happening, causing certain bugs to disappear. To prevent these problems, always trace to a port, but actively monitor that port and log its data only when configured to do so. Selectively monitoring a tracing port spares main processor power while supporting deterministic execution.

Additional logging techniques include multiple verbosity levels, timestamps tied directly to when events occur (i.e., not when events are processed), and enabling simple parsing of log data by both machines and humans.

If your software team designs logging carefully from the start, your project will save significant time and money. Otherwise, that will be your loss and your competitor’s gain.

Don’t invent interfaces, use semiconductor standards

Most of these principles are good software engineering. This next principle is generic but it has a special meaning for solar. Rather than inventing new APIs, use applicable, widely-adopted semiconductor manufacturing API standards, i.e., SEMI standards, wherever reasonable.

“SEMI” is the Semiconductor Equipment and Materials International organization headquartered in San Jose, CA. SEMI provides guidance on all aspects of semiconductor manufacturing, including equipment control and automation software. Years of thought, invention, development, and real-world usage have shaken down and ironed out widely-used SEMI API standards. Maturation takes time and money. Immature software is prone to errors. You do not have to pay the money, or lose the time, needed to mature a brand-new interface that could have been reused from SEMI, either conceptually or as an implementation.

Two particular test-related advantages of using SEMI standards are:

– Standards specifications contain much detail about error handling and recovery. This aspect of interfaces takes the longest to work out.
– Off-the-shelf test approaches, protocols, and tools are available for some standards, notably host interfaces and material handoff protocols. Test suites remove ambiguities in written specifications.

Some specific standards that can be cost-effectively leveraged include serial communications (SEMI E4), host-equipment interfaces (E5, E30, E57), reliability tracking (E10, E116), automation handoff protocols (E84), and data collection (E134). (details are at SEMI’s Web site). SEMI’S Photovoltaic Group also is adapting SEMI standards for solar and developing new solar standards.

Foliage has used design-for-test techniques and standards to build scores of real-world control and automation software products in multiple manufacturing domains. Solar’s differentiating factors include a wide range of manufactured items — wafers, cells, panels, cylinders, ribbons, fabrics — plus intense competitive time pressures and fast-growing importance of end products. Every available efficiency matters. Software design-for-test may be your edge for significantly accelerating the ramp to production.

References

1. Robert Margolis, “Solar Energy: Rapidly Evolving Technologies, Markets, and Policies,” National Renewable Energy Laboratory, May 2008, http://www.nrel.gov/analysis/seminar/docs/2008/ea_seminar_may_8.ppt.
2. Brad Pettichord, “Homebrew Test Automation,” ThoughtWorks, September 2004.
3. Michael C. Feathers, Working Effectively with Legacy Software, Prentice-Hall, 2005.

Biographies

Wayne Lobb received his doctorate degree in mathematics from the U. of Illinois at Urbana-Champaign and an undergraduate degree from the California Institute of Technology and is engineering director at Foliage Software Systems, 168 Middlesex Turnpike, Burlington, MA 01803 USA; ph. 781-993-5500, email wlobb@foliage.com.

Karl Aeder received his BS degree in computer science from the U. of Vermont and is principal software architect at Foliage Software Systems.

No posts to display