SCNAT and its network are committed to a sustainable science and society. They support policy-making, administration and business with expert knowledge and actively participate in public discourse. They strengthen the exchange across scientific disciplines and promote early career academics.

Image: Sebastian, stock.adobe.com

Particle physics for everyone

CERN makes measurement data from its large-scale experiments available to the public

The European Research Centre for Particle Physics (CERN) in Geneva is currently working on making the measurement data from four large-scale experiments available to the public via internet. This form of 'open science' benefits particle physicists around the globe, but also students or high school students - for example in Switzerland.

At the CERN computer centre, the data from the experiments at the LHC particle accelerator are stored on magnetic tapes.
Image: B. Vogel

In 2012, the Higgs particle was discovered at CERN. It was one of the most spectacular discoveries in fundamental physics research in the last decade. Now, CERN has decided to make all the data that led to the discovery of the particle available to the public. Even more: CERN wants to make the data from all four large experiments – ALICE, ATLAS, CMS and LHCb - at the LHC (for: 'Large Hadron Collider') publicly available. The data will be gradually made available starting starting this year. All data collected at the LHC from it's start at 2010 to 2016 will be available on the internet soon.

A fascinating prospect, because this publication means in plain language: if the Higgs was not already discovered, anyone who evaluates the CERN data appropriately could make the Higgs discovery themselves - which, after all, led to a Nobel Prize in 2013. It is true that the Higgs has already been discovered. But who knows, maybe there is another scientific sensation hidden in the CERN data?

Sharing data publicly

Admitted, it takes a lot of expertise to properly analyse CERN data, and access to the measurement data alone is unlikely to lead to sensational discoveries. Nevertheless, CERN's approach is remarkable. It is a prominent example of 'open science': scientists share the data they have obtained with the public, thereby creating the basis for maximum knowledge to be gained from the measurements. It is the approach that science organisations such as NASA or projects such as the computer operating system Linux or the online encyclopaedia Wikipedia follow in a similar way.

CERN's new open data policy is laid down in the 'Open Data Policy'. CERN published this policy last December after the 'European Strategy for Particle Physics' called for open data handling. Since then, data from the large LHC experiments ATLAS, LHCb and ALICE have started to be released. The data from CMS - the fourth large experiment - is already available. This research collaboration had in fact agreed to the publication of its data earlier.

Focus on Level 3 data

CERN's 'Open Data Policy' mainly refers to the so-called Level 3 data. In order to understand what is meant by this, one has to visualise the way the LHC researchers work: In the particle accelerator, protons are made to collide and the resulting particle tracks are then recorded with sophisticated measuring devices. Since around one billion collisions take place every second in an LHC experiment, a very large amount of data is generated. To prevent scientists from drowning in this flood of information, the part of the measurement data that is not expected to yield any scientific findings is not processed or recorded further.

What remains is still a large amount of raw data, which on its own, however, is hardly meaningful. Only when the raw data is processed with appropriate computer programs do 'reconstructed' data emerge: they describe, for example, the track parameters of a particle that was observed after a proton-proton collision. 'Reconstructed' data consequently form the basis of all analyses at the four LHC experiments - and thus the basis of all discoveries at CERN.

Publication with a time lag

The 'reconstructed' data is called Level 3 data to distinguish it from other types of data: Firstly, against the raw data mentioned above (Level 4 data), which in itself have little value and will not be published in the future either. Secondly, against the data published in scientific articles (Level 1 data), which has always been accessible to the public through Open Access scientific journals. Remaining the Level 2 data: This is the data that CERN has already used and will continue to use for educational purposes and outreach activities. One example are the international 'Masterclasses' that are also held annually in Switzerland – one-day courses in which high school students from all around the world use real but simplified CERN data to track down elementary particles and thus gain a very realistic insight into the workings of particle physics.

The fact that CERN is going to publish the Level 3 data has also aroused reservations amongst researchers. They feared that external scientists could profit from CERN data without having to participate in the complex set-up of the CERN experiments. They could then get a scientific profit quasi undeservedly. "We had to strike the right balance here," says CERN physicist Jamie Boyd, who was at the front in formulating the Open Data Policy. "To ensure that CERN researchers are not deprived of the fruits of their labour, Level 3 data will not be made public until five years after it has been collected."

Right at the border

Physically, all the data is stored on magnetic tapes at CERN's computer centre, on French soil if you're really precise, because the computer centre is a few metres across the Swiss border. The data is stored according to the so-called FAIR standard. This ensures that the data is findable, accessible, interoperable and re-usable. Each of the four large LHC experiments produces the huge amount of data of around one petabyte every year. Added to this is the same amount of simulated data, which is indispensable for the analyses of the CERN researchers. From 2027, the LHC will make another jump in performance. The number of collisions in the LHC will then increase five-fold, leading to a large increase in the amount of data.

It has already been shown that the CERN data can also be of interest to scientists who are not working at CERN. According to Jamie Boyd, for example, around 10 scientific publications have been produced from the data that the CMS experiment has already made publicly accessible in recent years. CERN is convinced in it's press release that this example could now set a precedent thanks to the Open Data Policy: «The new policy could be used as a blueprint for other experiments at CERN and in other scientific organisations.»

Author: Benedikt Vogel

The Open Data portal can be found at: http://opendata.cern.ch

Categories

  • Particle Physics

Contact

Swiss Institute of Particle Physics (CHIPP)
c/o Prof. Dr. Ben Kilminster
UZH
Department of Physics
36-J-50
Winterthurerstrasse 190
8057 Zürich
Switzerland