Key Exploitable Result

Technical Blueprint

Technical Blueprint for a European Compute and Data Continuum

This Technical Blueprint outlines a framework for a European Compute and Data Continuum that integrates computational resources, data management systems, and orchestration into a set of interdependent capabilities supporting large-scale scientific discovery. The framework is based on input from the SPECTRUM Community of Practice (SPECTRUMCoP), analysis of use cases from HEP and RA, and a review of existing European e-infrastructures, as well as existing recommendations for their eco-design.

Overview

The European scientific computing landscape is undergoing a major transition as communities prepare for the exascale era. High-Energy Physics (HEP) and Radio Astronomy (RA) are among the most demanding fields in terms of data and computation, and so they illustrate the scale of the challenge ahead. Therefore, we have selected them as clear examples of future needs, with all the considerations being extendable to virtually all data-intensive scientific domains. The LOFAR2.0 radio interferometer will begin operating in 2026, and is expected to produce over a 100 Pb of science data by the end of the decade following approximately 1 billion CPU hours. The High-Luminosity Large Hadron Collider (HL-LHC), expected to start operations in 2030, will generate data volumes far beyond any previous scientific instrument. The two largest experiments at the LHC, namely ATLAS and CMS, are forecasting requirements for disk storage of approximately 3 EB and 1.5 EB respectively over the next decade; at the same time, both forecast tape archival storage requirements of approximately 4-6 EB by the mid 2030s,. The Square Kilometre Array Observatory (SKAO) will produce continuous data streams requiring advanced data processing at data centers in the host countries, and during full operations distribute up to 700 PB/year of scientific data for subsequent analysis to a network of regional centers known as the SKA Regional Centre Network (SRCNet).

High-performance computing (HPC) is set to become a core component for HEP and RA. HEP experiments have been using HPC resources for more than 15 years, with HPC now making significant contributions to both production and analysis pipelines,,,,. The Worldwide LHC Computing Grid (WLCG) already comprises approximately 1.4 million cores and over 1.5 EB of storage across 170 sites (home.cern), and HPC resources exploited by experiments range from 3 sites integrated by ALICE and LHCb, to more than 10 sites by ATLAS and CMS. The upcoming High-Luminosity LHC and SKAO will only increase these demands.

The European e-infrastructure ecosystem has grown organically over several decades and today includes national, international, and domain-specific initiatives. The EuroHPC Joint Undertaking (JU) provides access to leadership-class supercomputing resources through a network of petascale and exascale systems. The European Open Science Cloud (EOSC) serves as a federated environment for data, services, and software, enabling cross-disciplinary collaboration and FAIR data access across European research communities. The EGI Federation coordinates high-throughput computing resources across multiple member countries and domains. Domain-specific initiatives such as the WLCG and the emerging SRCNet address the particular needs of their scientific communities through specialized architectures and governance structures.

While these individual initiatives have achieved considerable success within their respective domains, the increasing convergence of scientific requirements, as outlined in Section 2 of this document, and the scale of emerging challenges necessitate a deeper integration between these large scale initiatives while better accounting for the requirements of the scientific users.. By having a coherent, well integrated and accessible scalable compute and data continuum. The science community would benefit from economies of scale, increased efficiency, and the lowering of thresholds in difficulty for using the large-scale computing resources available in Europe. These challenges are not only about raw computing power. They also raise questions about how infrastructures should be designed and operated in a world of growing data volumes and increasingly complex workflows. Future architectural choices will need to build on the strengths of loosely coupled approaches while carefully assessing where strongly coupled models remain justified by scientific need. Systems must accommodate diverse computing architectures, different data types, and workflows that span facilities and domains. HPC, particularly at supercomputing centers, typically prioritizes tightly coupled, large-scale parallel jobs. As a result, operators are concerned that large ensembles of short, data-driven tasks could overload shared services (such as the Slurm scheduler database) or reduce overall system efficiency. In contrast, data-intensive communities tend to prioritize flexibility and high-throughput task execution.

Recent evaluations of the environmental impacts of the IT industry, have shown that these are rapidly increasing such that an absolute reduction is required to meet the Paris agreement on climate change and, more generally, to stay within the planetary boundaries. Worldwide, IT is responsible for as much greenhouse gas (GHG) as double that of Canada or 5.5 times that of France, and its impact amounts to 40% of every connected individual’s sustainable budget. As a leading European example, in France, it represents 11% of the total electricity consumption, 4.4% of the total GHG emissions distributed as 50% for terminals, 46% for datacentres and 4% for networks. These encompass significantly more than scientific IT infrastructures, with important contributions from the consumer market and the Cloud. The forthcoming increase in capacity must be carried out responsibly, that is sustainably, for the corresponding science to remain socially acceptable. This calls for environmental management of the compute and data continuum described in this Blueprint. Life-cycle analysis, from design to disposal, is prescribed to provide a systemic view, allowing for balancing the impacts of successive phases and limiting the total footprint. For existing systems this will account for realised impacts and guide usage, maintenance and end-of-life policies, while for future ones, it will allow trade-offs between capacity increase and footprint reduction to make the growth sustainable.

It is important to recognise that the communities addressed by SPECTRUM encompass distinct categories of users, each with different operational needs and interaction patterns. These include long-tail scientists who require accessible, lightweight interfaces and opportunistic access; software developers and pipeline engineers responsible for adapting and optimising complex scientific workflows; and instrument and observatory operations teams whose priorities centre on reliability, sustained throughput, and predictable long-term resource commitments. Explicitly differentiating these user groups helps align technical solutions with their respective expectations and ensures that the Blueprint remains relevant across the full spectrum of scientific stakeholders.

This document represents the culmination of extensive collaborative effort involving leading European scientific organizations and infrastructure providers. The resulting framework provides both tactical guidance for near-term infrastructure decisions and strategic vision for long-term capability development that will enable breakthrough scientific discoveries across multiple domains while establishing European leadership in scientific computing.