CERN, Scaling Issues: Big Data’s Tough Fate

European accelerator CERN updates data reading systems for particle detection experiment

The European accelerator of elementary particles CERN produces a daily petabyte of data, requiring modern and reliable computing equipment for processing. CERN’s main activity involves the Large Hadron Collider (LHC), which accelerates subatomic particles on a 27 km underground ring track. One of the experiments conducted at this installation is the CMS experiment, which aims to detect particles responsible for dark matter. [1]

From 2018 to 2022, LHC experiments were suspended for modernization. After the restart in July of the previous year, a three-year period of Run 3 began, during which scientists will collect data with increased energy and frequency. In preparation for this, the four large LHC experiments have updated their data reading systems and infrastructure. [2]

According to scientist Bridge Kishor Jahash from the CMS team, they are currently collecting 30 terabytes of data every 30 days to monitor the performance of their computing infrastructure. Jahash stated, “Going into the new era of Operations’ Run 3, we will encounter an increase in the scale of data storage. One of our main tasks is to ensure all the requirements and management of data storage.” [3]

Previously, the infrastructure monitoring system for processing physical data at CERN was based on the databases of Influxdb and Prometheus. However, due to scalability and reliability issues, alternative solutions were sought. The startup Victoriametrics, based in San Francisco and using open source code, was chosen to address these challenges. Roman Khavronenko, co-founder of Victoriametrics, mentioned that the previous system faced problems with high radio and frequent data changes, which have been resolved with the new system. Jahash confirmed their satisfaction with the scalability of their clusters and services, stating, “We have not yet encountered any scalability restrictions.” [4]

The new system, from Victoriametrics, operates in CERN’s own Data Center on clusters with x86 architecture. In March of this year, Influxdb announced that it had solved the problem of cardinality with its new IOX storage engine. [5]

References:

[1]

/Reports, release notes, official announcements.