NLsignals: pilot projects for the Netherlands Initiative for Next Generation
Signal Identification in Astro(particle) Physics

The aim of this initiative is to co-design the next-generation data processing pipelines that combine energyefficient computing with effective signal detection algorithms. Such a holistic approach is urgently needed to process and analyse the large data streams generated by the updated and upcoming experiments in astrophysics/particle physics and has a high valorisation potential in industry. To this end, we have formed a Dutch consortium of top-notch experts in data analysis from various (astro-)particle physics experiments, theoretical physics, astronomy, data science, philosophy of science and high-performance/highthroughput computing. Through collaboration with partners from data-intensive industries and ICT companies, as well as education and outreach partners, we ensure the broad applicability and the high impact of our methods.

Subatomic physics finds itself at a historical crossroad. Despite its overwhelming successes, our current fundamental theories cannot, e.g., describe gravity at the quantum level, account for Dark Matter, or explain the absence of antimatter in our Universe. New phenomena that would shed light on these striking questions can only be uncovered by analysing more experimental data. Opportunities to address these fundamental conundrums will be provided by the upgrade of the Large Hadron Collider and in astrophysics experiments (e.g. MeerLicht/BlackGEM). However, these upgrades come with a large increase - i.e., by several orders of magnitude - in the amount and complexity of data they generate. For example, up to about 1 billion particle collisions can take
place every second inside the Large Hadron Collider (LHC) experiment's detectors, while in optical astronomy we are at the brink of breaking the 1Hz barrier in the detection of transiens and stellar variability through largeformat CMOS detectors that monitor millions of stars and galaxies (which in turn results again in millions of data decisions per second). To deal with these massive data streams, new data processing workflows, optimized for a broad search for unknown (and also unexpected) signals, are mandatory. These cannot be naively scaled-up or scaled-out solutions: we must codesign data processing workflows and systems to minimise energy consumption, maximise energy efficiency, and reduce the environmental impact. In this project, we combine the latest advances in efficient computing and data science with intelligent, specialised signal-detection algorithms, and optimise the entire workflow with simulations of generic models for new physics. With this approach, we can set up data processing that is optimized for both high-accuracy signal detection (application-drive) and low power consumption (societal-drive and application-drive), while being carried out efficiently on a dedicated, simultaneously co-designed computer infrastructure. 

This proposal, NLsignals, asks funding for two Pilot projects to support building the Netherlands Initiative for Next Generation Signal Identification in Astronomy and Particle Physics. Pilot 1: Efficient signal detection for particle physics and astronomy: In anticipation of Run 3 of the LHC (2021-2023) and the High Luminosity Phase (starting in 2026) of the LHC, we have to urgently design and implement online and offline data processing strategies that broaden searches for new particles. In anticipation of emerging astronomical CMOS technologies that enable rapid image capture and thus -if efficient signal detection algorithms can be developed- real time source identification, we must investigate the performance of automated deep learning based source identification (see e.g. www.autosourceid.org) using data from the current MeerLICHT telescope and the upcoming BlackGEM telescope array (PI Groot@Radboud), as stand-ins for CMOS detectors. 

Also, our international partner the University of Cape Town/SAAO a wide-field CMOS-equipped telescope will be used for this project. Due to the availability of large-format CMOS detectors optical astronomy needs to speed up processing by a factor >60 to be able to keep up with the acquisition rates of >1Hz now becoming available. Current pipelines (based on e.g. SExtractor and ZOGY) cannot cope with this incoming flood. We must reduce the latency of processing and the issuing of alerts with new sources from minutes to seconds, opening a new window on the violent and energetic Universe. In Pilot 1 we update and improve signal detection algorithms for the LHC and astronomical source processing, provide accurate and detailed performance data (e.g., latency, throughput, power consumption, and energy efficiency) for these algorithms on different computing architectures (e.g. CPU, GPU, FPGA, neuromorphic e.g. spiking networks, in memory computing), and estimate their environmental impact using basic sustainability metrics. Specifically, we will explore u-net-based neural-network architectures for rapid track reconstruction in the ATLAS experiment at the LHC and u-net-based neural network architectures to locate and classify point sources in astronomical images. 

To tackle realistic application scales, we must rely on high-performance and highefficiency, computing systems. We build these by matching the requirements of specific workflows and data scales with suitable hardware configurations (e.g. CPU + FPGA + GPU or CPU + GPU) through a co-design methodology driven by performance models. The resulting co-designed signal-detection compute systems minimise compute waste, while still providing performance guarantees. Pilot 2: Sociotechnological ramifications of next-generation data-intensive and energy-efficient signal detection to scientists, students and to the general public Our aim and approach to enable the (sustainable) detection of signals in a growing amount of data in particle physics and astronomy has broad implications for the future of scientific practice and science education. 

In addition, the challenges and approaches need to be communicated to the society also due to its large industrial and societal implications. The objective of this 2nd pilot is to explore these implications together with philosophers of science. More concretely, we address the questions: (1) What will future science look like with ever growing datasets analyzed by algorithms? (2) If new science depends largely on handling complex data collections with machine learning algorithms on efficient computing architectures, what does this mean for the ways in which science is practised, taught and communicated, both internally and to the wider public? (3) How can the Dutch society profit from these developments and who are the optimal (societal) partners for this project?