This deliverable presents the analysis of requirements for representative next-generation use cases in big data sciences, as well as presents a cross-cutting analysis of common themes and trends. Included is a description of the methodology for identifying, selecting, and analysing the use cases, as well as a short reflection on applications to other fields of research.
The use cases are primarily selected from the fields of High Energy Physics (HEP) and Radio Astronomy (RA), which in the coming years will launch new instruments around the same time that will produce in excess of one Exabyte of data per year each, several orders of magnitude larger than current compute capabilities.
Several other complementary use cases are presented that highlight related challenges. The main findings in this deliverable are reported in a cross-cutting analysis highlighting common requirements and challenges, developed from the technical analysis of individual use case requirements.
The key findings identify a significant need to greatly expand the compute capabilities of the fields to process unprecedented volumes of data. It is clear this must include heterogeneous compute models with GPUs and specialized accelerators, an area in which all use cases are investing heavily. Deployment of advanced workflow management systems capable of orchestrating large multi-step workflows across distributed resources is critical to Exascale computing. The geographical distribution of the research collaborations and the sheer scale of data involved necessitate robust data federation mechanisms. These must support efficient data discovery, access, and transfer across multiple sites and infrastructures.
Finally, significant technical challenges remain, particularly in I/O performance, and energy efficiency.
This deliverable was executed by the SPECTRUM Work Package 5 on Landscape, use cases, challenges and
gaps, along with significant contributions from subject matter experts from the selected use cases
The deliverable will serve as one of the inputs to the second phase of SPECTRUM (months 16-30) and eventually for the SRIDA and Technical Blueprint, to be delivered at the end of the project.