The Large Hadron Collider (LHC) at CERN pushes the smallest known atomic particles to velocities near the speed of light using super-cooled electromagnets kept at operating temperatures near zero degrees Kelvin.
This is a huge accomplishment in itself, but then hadron particles are sent to collide with other particles moving near the speed of light in the opposite direction. The whole process is observed by powerful sensors, and the output is studied 24 hours a day, seven days a week by thousands of renowned physicists.
Effectively monitoring the supercollider's systems, which require high precision, high accuracy and above all high availability is an intense challenge. Felix Ehm, a member of CERN's beam control group, says it is done with specially tuned open source middleware. He described his experiences with that software – ActiveMQ messaging and the Apache Camel integration framework – at the recent CamelOne conference in Boston.
MORE ON OPEN SOURCE MIDDLEWARE
The Java Messaging Services-based (JMS-based) ActiveMQ software at work at CERN transfers log activity for storage and display from control systems that watch everything from electricity, ventilation, office environment and fire safety systems to the dipoles and quadrapoles that accelerate and condense the crucial particle beam. If the JMS goes down, someone needs to be hired to drive around the CERN campuses looking for signs of fire, kids Ehm. (In fact, major cogs in the messaging system are duplicated to assure high availability.)
Beam loss monitors, sequencers and other vital CERN systems now utilize ActiveMQ messaging middleware. In Ehm's CamelOne presentation, he explained that the LHC consists of 85,000 devices that comprise over two million I/O endpoints. All of these systems must be tied together to work in concert. In this way, the middleware can be thought of as CERN's nervous system. It sends messages back and forth between the control centers where scientists monitor data and adjust the beam and the hardware devices that actually make these things happen. Some processes involve many crucial sensors and require high message throughput and ultra-close attention; other processes can more safely run on their own in the background, more occasionally updating a client console.
For the LHC at CERN, one of those background systems is the beam loss monitor. This is the system that observes beam activity for signs that it might be necessary to dump the beam. Dumping the beam might happen if, for example, one of the bipoles began malfunctioning. This is an incredibly important process.
If the beam strays from its proper course within the vacuum tubes of the LHC, it could obliterate millions, maybe billions, of dollars worth of equipment. On the other hand, shutting down and restarting the beam is a very expensive process that can easily create months of unnecessary downtime in a single false positive. JMS systems control the beam loss monitor that ensures if anything at all goes wrong with the beam it is safely dumped.
System monitoring related to the dump process, as described by Ehm, highlights the messaging issues CERN's beam control grooup considers. The message routing is fairly simple for the dump process, indicated Ehm, in that there is only one message to send every second or so. It is always on the same topic. And, it only has to go out to 20 to 30 clients. On the other hand, because there are so many checks that have to clear, that one message is about two megabytes, whereas other messages are typically sized at less than ten kilobytes.
JMS systems control the beam loss monitor that ensures, if anything at all goes wrong with the beam, it is safely dumped. JMS, in Ehm’s words, “has become a vital part of beam instrumentation.”
As the JMS system has gained in use, it has required tuning. “The service suffered by its own success,” said Ehm, indicating that the system had some difficulty scaling up to meet growing demand. “Now 80 Java developers work with it, and more and more data is being sent around,” he said. Twenty production message brokers handle the load.
This was first published in June 2012