Bill Scanlon, NREL
May 06, 2013 | 1 Comments
WASHINGTON, D.C. -- Biofuels scientists are asking more complex questions about how molecules spin, bond, and break when enzymes attack plants — all in the name of quickening the process of turning biomass into fuels for the sake of cleaner air and better energy security.
They're the kinds of questions that require trillions of mathematical operations each second on supercomputers. But, software engineers hadn't been able to keep up with the ever-increasing demands of the scientists and the growing capabilities of modern supercomputers. That is, until unique work at the U.S. Department of Energy's National Renewable Energy Laboratory (NREL) supercharged an essential decades-old software program to run on a single high performance computer such as the new petascale computer at NREL's Energy Systems Integration Facility.
Software engineers at NREL have reworked codes and algorithms on the CHARMM (Chemistry at Harvard Molecular Mechanics) program to allow it to simulate molecular motion with millions to billions of steps of computation. It does so by simulating nanoseconds to microseconds of molecular motion, which takes days of computing time.
How long is a nanosecond? Well, a nanosecond (a billionth of a second) is to a second as a second is to 31.7 years.
And a nanosecond is a very long time when measuring all the movements of thousands of atoms in a molecule.
It takes a million molecular dynamics (MD) steps to simulate a nanosecond of molecular motion.
"For an average system of 100,000 atoms on a single modern processor core, it would take us half a day of computing to simulate less than half a nanosecond," NREL Senior Scientist Michael Crowley said.
But they need to simulate molecular motion for much longer than that — as long as 100 nanoseconds.
"Using the original version of parallel CHARMM, it would take half a year, no matter how many processors we used, to simulate molecular motion for that long," Crowley said.
Thanks to the improvements the NREL engineers made to the CHARMM algorithms and code, they can now do that simulation in a day with hundreds of processors running in parallel.
"To get a microsecond [1,000 nanoseconds] on a thousand processors will now take a few days," Crowley said.
The only limit on the questions scientists can ask — and expect answers to — is the speed of computing power. For more than a decade, each time scientists asked new questions that required faster computer power to answer, engineers could count on a computer's speed doubling every year or so to keep up.
"But this is not enough to keep up anymore. Computer chips are not getting any faster — they are getting more parallel," said NREL's Antti-Pekka Hynninen, a physicist and software engineer. "We now have to parallelize the code" to multiply the speed at which the simulations can be run.
CHARMM Models Biological Reactions
CHARMM was developed at Harvard University in the 1980s to allow scientists to generate and analyze a wide range of molecular simulations, including production runs of a molecular dynamics trajectory for proteins, nucleic acids, lipids, and carbohydrates.
It is a favorite program of molecular researchers around the world for simulating biological reactions such as the action of cellulase on cellulose for converting biomass into ethanol. CHARMM is also a crucial code for the pharmaceutical industry.
CHARMM is unique in its ability to build, simulate, and analyze results of molecular motion in a single program. "It provides more methods of simulation than any other program, and the newest and most cutting-edge methods for thermodynamics, reaction sampling, quantum mechanics, molecular mechanics, and advanced imaging," Crowley said.
For all its advantages, though, CHARMM's crunching velocity hadn't kept up with the new demands and the new questions. The size of the new biomolecular simulations is so large (more than 1 million atoms) and the simulation time so long (5 million time steps for the 10-nanosecond simulation) that they exceeded the capabilities of CHARMM.
So, three years ago, Crowley hired Hynninen to update the code and increase its performance.
If Hynninen had tried writing the entire 600,000 lines of code, he estimates it would have taken him about 10 years.
Instead, he focused on rewriting the heart of CHARMM, the molecular dynamics engine, and he was able to parse the chore down to two years. The molecular dynamics engine is where all the heavy computation is done. It may only represent 5% to 10% of the total lines of code, but it accounts for approximately 99% of the central processing unit (CPU) time in a typical simulation.
He's the first to admit it wasn't exactly a day (or two years) at the beach.
Hard, Laborious Work — with Shortcuts
"It's one of those very hard problems, mechanics of atoms and enzymes," Hynninen said. "There is really no limit to how a molecule can behave." Its motions are determined by the interplay of a multitude of interactions between each atom and every other atom nearby — through both chemical bonds and non-bonded interactions, he noted. That results in thousands of different kinds of interactions per atom. And there can be hundreds of thousands of atoms in a simulation. "And this makes writing the algorithms and code quite challenging."
The day-long task using hundreds or thousands of processors simulates a very brief moment cataloguing every move by thousands of atoms. "It's not just that they all move but that each atom is feeling forces from thousands of other atoms," Crowley said. "And each one of those forces has to be calculated for every atom at every step."
On the time scale most of us are used to, observing action in microseconds of nanoseconds seems ridiculously short. "But they are long enough to answer lots of questions, because they show us what the molecule is probably doing most of the time," Crowley said.
And simulating the motion of atoms answers important questions about how any enzyme can access the sugars in a plant.
The sugars the biofuels industry wants are locked up in a polymer called cellulose, which forms bundles or fibers of a few dozen polymer chains. CHARMM's molecular dynamics can simulate those bundles and find how strongly they are held together, as well as what interactions are holding them together. Using CHARMM, scientists can also model the interaction of an enzyme with those bundles and determine how the enzyme peels the polymers out of the bundle. "We learn what forces it uses or how it reduces forces holding the bundle together," Crowley said.