Papers | Parallel Computing

2024

Giulio Malenza, Valentina Cesare, Marco Aldinucci, Ugo Becciani, Alberto Vecchiato

Toward HPC application portability via C++ PSTL: the Gaia AVU-GSR code assessment Journal Article

In: The Journal of Supercomputing, 2024, ISSN: 09208542.

Abstract | Links | BibTeX | Tags: eupex, HPC, icsc

Marco Edoardo Santimaria, Samuele Fonio, Giulio Malenza, Iacopo Colonnelli, Marco Aldinucci

Benchmarking Parallelization Models through Karmarkar Interior-point method Proceedings Article

In: Chis, Horacio González-Vélez Adriana E. (Ed.): Proc. of 32nd Euromicro intl. Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 1-8, IEEE, Dublin, Ireland, 2024, ISSN: 2377-5750.

Abstract | Links | BibTeX | Tags: HPC, icsc

2023

Gianluca Mittone, Samuele Fonio

Benchmarking Federated Learning Scalability Proceedings Article

In: Proceedings of the 2nd Italian Conference on Big Data and Data Science, ITADATA 2023, September 11-13, 2023, CEUR, Naples, Italy, 2023.

Abstract | Links | BibTeX | Tags: eupilot, HPC, icsc

Valentina Cesare, Ugo Becciani, Alberto Vecchiato, Mario Gilberto Lattanzi, Fabio Pitari, Marco Aldinucci, Beatrice Bucciarelli

The MPI + CUDA Gaia AVU–GSR Parallel Solver Toward Next-generation Exascale Infrastructures Journal Article

In: Publications of the Astronomical Society of the Pacific, vol. 135, no. 1049, 2023.

Abstract | Links | BibTeX | Tags: HPC

@article{23:GAIAMPI_PASP,

title = {The MPI + CUDA Gaia AVU–GSR Parallel Solver Toward Next-generation Exascale Infrastructures},

author = {Valentina Cesare and Ugo Becciani and Alberto Vecchiato and Mario Gilberto Lattanzi and Fabio Pitari and Marco Aldinucci and Beatrice Bucciarelli},

url = {https://iopscience.iop.org/article/10.1088/1538-3873/acdf1e/pdf},

doi = {10.1088/1538-3873/acdf1e},

year  = {2023},

date = {2023-08-01},

journal = {Publications of the Astronomical Society of the Pacific},

volume = {135},

number = {1049},

abstract = {We ported to the GPU with CUDA the Astrometric Verification Unit–Global Sphere Reconstruction (AVU–GSR) Parallel Solver developed for the ESA Gaia mission, by optimizing a previous OpenACC porting of this application. The code aims to find, with a [10, 100] μarcsec precision, the astrometric parameters of about 10^8 stars, the attitude and instrumental settings of the Gaia satellite, and the global parameter γ of the parametrized Post-Newtonian formalism, by solving a system of linear equations, A × x = b, with the LSQR iterative algorithm. The coefficient matrix A of the final Gaia data set is large, with ∼1011 × 108 elements, and sparse, reaching a size of ∼10–100 TB, typical for the Big Data analysis, which requires an efficient parallelization to obtain scientific results in reasonable timescales. The speedup of the CUDA code over the original AVU–GSR solver, parallelized on the CPU with MPI + OpenMP, increases with the system size and the number of resources, reaching a maximum of ∼14×, >9× over the OpenACC application. This result is obtained by comparing the two codes on the CINECA cluster Marconi100, with 4 V100 GPUs per node. After verifying the agreement between the solutions of a set of systems with different sizes computed with the CUDA and the OpenMP codes and that the solutions showed the required precision, the CUDA code was put in production on Marconi100, essential for an optimal AVU–GSR pipeline and the successive Gaia Data Releases. This analysis represents a first step to understand the (pre-)Exascale behavior of a class of applications that follow the same structure of this code. In the next months, we plan to run this code on the pre-Exascale platform Leonardo of CINECA, with 4 next-generation A200 GPUs per node, toward a porting on this infrastructure, where we expect to obtain even higher performances.},

key = {icsc, eupex},

keywords = {HPC},

pubstate = {published},

tppubtype = {article}

}

Gianluca Mittone, Nicolò Tonci, Robert Birke, Iacopo Colonnelli, Doriana Medić, Andrea Bartolini, Roberto Esposito, Emanuele Parisi, Francesco Beneventi, Mirko Polato, Massimo Torquati, Luca Benini, Marco Aldinucci

Experimenting with Emerging RISC-V Systems for Decentralised Machine Learning Proceedings Article

In: 20th ACM International Conference on Computing Frontiers (CF '23), ACM, Bologna, Italy, 2023, ISBN: 979-8-4007-0140-5/23/05, (https://arxiv.org/abs/2302.07946).

Abstract | Links | BibTeX | Tags: ai, confidential, eupilot, HPC, icsc, riscv

Giorgio Audrito, Alberto Riccardo Martinelli, Gianluca Torta

Parallelising an Aggregate Programming Framework with Message-Passing Interface Proceedings Article

In: 2023 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), pp. 140–145, 2023.

Links | BibTeX | Tags: HPC

Javier Garcia-Blas, Genaro Sanchez-Gallegos, Cosmin Petre, Alberto Riccardo Martinelli, Marco Aldinucci, Jesus Carretero

Hercules: Scalable and Network Portable In-Memory Ad-Hoc File System for Data-Centric and High-Performance Applications Proceedings Article

In: Cano, José, Dikaiakos, Marios D., Papadopoulos, George A., Pericàs, Miquel, Sakellariou, Rizos (Ed.): Euro-Par 2023: Parallel Processing, pp. 679–693, Springer Nature Switzerland, Cham, 2023, ISBN: 978-3-031-39698-4.

Abstract | BibTeX | Tags: admire, HPC

2021

Marco Aldinucci, Valentina Cesare, Iacopo Colonnelli, Alberto Riccardo Martinelli, Gianluca Mittone, Barbara Cantalupo, Carlo Cavazzoni, Maurizio Drocco

Practical Parallelization of Scientific Applications with OpenMP, OpenACC and MPI Journal Article

In: Journal of Parallel and Distributed Computing, vol. 157, pp. 13–29, 2021.

Abstract | Links | BibTeX | Tags: HPC

Daniele D'Agostino, Ivan Merelli, Marco Aldinucci, Daniele Cesini

Hardware and Software Solutions for Energy-Efficient Computing in Scientific Programming Journal Article

In: Scientific Programming, vol. 2021, pp. 5514284, 2021, ISBN: 1058-9244.

Abstract | Links | BibTeX | Tags: HPC

2020

Vasco Amaral, Beatriz Norberto, Miguel Goulão, Marco Aldinucci, Siegfried Benkner, Andrea Bracciali, Paulo Carreira, Edgars Celms, Luís Correia, Clemens Grelck, Helen Karatza, Christoph Kessler, Peter Kilpatrick, Hugo Martiniano, Ilias Mavridis, Sabri Pllana, Ana Respício, José Simão, Luís Veiga, Ari Visa

Programming languages for data-Intensive HPC applications: A systematic mapping study Journal Article

In: Parallel Computing, pp. 102584, 2020, ISSN: 0167-8191.

Abstract | Links | BibTeX | Tags: HPC

@article{20:sms:chipset,

title = {Programming languages for data-Intensive HPC applications: A systematic mapping study},

author = {Vasco Amaral and Beatriz Norberto and Miguel Goulão and Marco Aldinucci and Siegfried Benkner and Andrea Bracciali and Paulo Carreira and Edgars Celms and Luís Correia and Clemens Grelck and Helen Karatza and Christoph Kessler and Peter Kilpatrick and Hugo Martiniano and Ilias Mavridis and Sabri Pllana and Ana Respício and José Simão and Luís Veiga and Ari Visa},

url = {https://iris.unito.it/retrieve/689605/1-s2.0-S0167819119301759-main.pdf},

doi = {https://doi.org/10.1016/j.parco.2019.102584},

issn = {0167-8191},

year  = {2020},

date = {2020-01-01},

journal = {Parallel Computing},

pages = {102584},

abstract = {A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity of the software for HPC, it is useful to identify programming languages that can be used to alleviate this issue. Because the existing literature on the topic of HPC is very dispersed, we performed a Systematic Mapping Study (SMS) in the context of the European COST Action cHiPSet. This literature study maps characteristics of various programming languages for data-intensive HPC applications, including category, typical user profiles, effectiveness, and type of articles. We organised the SMS in two phases. In the first phase, relevant articles are identified employing an automated keyword-based search in eight digital libraries. This lead to an initial sample of 420 papers, which was then narrowed down in a second phase by human inspection of article abstracts, titles and projects to 152 relevant articles published in the period 2006–2018. The analysis of these articles enabled us to identify 26 programming languages referred to in 33 of relevant articles. We compared the outcome of the mapping study with results of our questionnaire-based survey that involved 57 HPC experts. The mapping study and the survey revealed that the desired features of programming languages for data-intensive HPC applications are portability, performance and usability. Furthermore, we observed that the majority of the programming languages used in the context of data-intensive HPC applications are text-based general-purpose programming languages. Typically these have a steep learning curve, which makes them difficult to adopt. We believe that the outcome of this study will inspire future research and development in programming languages for data-intensive HPC applications.},

keywords = {HPC},

pubstate = {published},

tppubtype = {article}

}

2019

Clemens Grelck, Ewa Niewiadomska-Szynkiewicz, Marco Aldinucci, Andrea Bracciali, Elisabeth Larsson

Why High-Performance Modelling and Simulation for Big Data Applications Matters Book Chapter

In: Kołodziej, Joanna, González-Vélez, Horacio (Ed.): High-Performance Modelling and Simulation for Big Data Applications: Selected Results of the COST Action IC1406 cHiPSet, no. 11400, pp. 1–35, Springer International Publishing, Cham, 2019, ISBN: 978-3-030-16272-6.

Abstract | Links | BibTeX | Tags: HPC

2018

Claudia Misale, Maurizio Drocco, Guy Tremblay, Alberto R. Martinelli, Marco Aldinucci

PiCo: High-performance data analytics pipelines in modern C++ Journal Article

In: Future Generation Computer Systems, vol. 87, pp. 392–403, 2018.

Abstract | Links | BibTeX | Tags: fastflow, HPC, toreador

Marco Aldinucci, Marco Danelutto, Maurizio Drocco, Peter Kilpatrick, Claudia Misale, Guilherme Peretti Pezzi, Massimo Torquati

A Parallel Pattern for Iterative Stencil + Reduce Journal Article

In: Journal of Supercomputing, vol. 74, no. 11, pp. 5690–5705, 2018.

Abstract | Links | BibTeX | Tags: HPC, repara, rephrase

2017

Salvatore Cuomo, Marco Aldinucci, Massimo Torquati

Guest Editorial for Programming Models and Algorithms for Data Analysis in HPC Systems Journal Article

In: International Journal of Parallel Programming, pp. 1–3, 2017, ISSN: 0885-7458, (Editorial).

Abstract | Links | BibTeX | Tags: HPC

Paolo Viviani, Massimo Torquati, Marco Aldinucci, Roberto d'Ippolito

Multiple back-end support for the Armadillo linear algebra interface Proceedings Article

In: In proc. of the 32nd ACM Symposium on Applied Computing (SAC), pp. 1566–1573, Marrakesh, Morocco, 2017.

Abstract | Links | BibTeX | Tags: HPC, repara, rephrase

Marco Aldinucci, Stefano Bagnasco, Stefano Lusso, Paolo Pasteris, Sergio Rabellino

OCCAM: a flexible, multi-purpose and extendable HPC cluster Proceedings Article

In: Journal of Physics: Conf. Series (CHEP 2016), pp. 082039, San Francisco, USA, 2017.

Abstract | Links | BibTeX | Tags: c3s, HPC

@inproceedings{16:occam:chep,

title = {OCCAM: a flexible, multi-purpose and extendable HPC cluster},

author = {Marco Aldinucci and Stefano Bagnasco and Stefano Lusso and Paolo Pasteris and Sergio Rabellino},

url = {http://iopscience.iop.org/article/10.1088/1742-6596/898/8/082039/meta},

doi = {10.1088/1742-6596/898/8/082039},

year  = {2017},

date = {2017-01-01},

booktitle = {Journal of Physics: Conf. Series (CHEP 2016)},

volume = {898},

number = {8},

pages = {082039},

address = {San Francisco, USA},

abstract = {Obtaining CPU cycles on an HPC cluster is nowadays relatively simple and sometimes even cheap for academic institutions. However, in most of the cases providers of HPC services would not allow changes on the configuration, implementation of special features or a lower-level control on the computing infrastructure and networks, for example for testing new computing patterns or conducting research on HPC itself. The variety of use cases proposed by several departments of the University of Torino, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others, called for different and sometimes conflicting configurations; furthermore, several R&D activities in the field of scientific computing, with topics ranging from GPU acceleration to Cloud Computing technologies, needed a platform to be carried out on. The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multi-purpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Torino branch of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing needs, as well as a platform for R&D activities on computational technologies themselves. Extending it with novel architecture CPU, accelerator or hybrid microarchitecture (such as forthcoming Intel Xeon Phi Knights Landing) should be as a simple as plugging a node in a rack. The initial system counts slightly more than 1100 cpu cores and includes different types of computing nodes (standard dual-socket nodes, large quad-sockets nodes with 768 GB RAM, and multi-GPU nodes) and two separate disk storage subsystems: a smaller high-performance scratch area, based on the Lustre file system, intended for direct computational I/O and a larger one, of the order of 1PB, to archive near-line data for archival purposes. All the components of the system are interconnected through a 10Gb/s Ethernet layer with one-level topology and an InfiniBand FDR 56Gbps layer in fat-tree topology. A system of this kind, heterogeneous and reconfigurable by design, poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, virtual farms, jobs, interactive bare-metal sessions, etc. This poster describes some of the use cases that prompted the design ad construction of the HPC cluster, its architecture and a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.},

keywords = {c3s, HPC},

pubstate = {published},

tppubtype = {inproceedings}

}

Obtaining CPU cycles on an HPC cluster is nowadays relatively simple and sometimes even cheap for academic institutions. However, in most of the cases providers of HPC services would not allow changes on the configuration, implementation of special features or a lower-level control on the computing infrastructure and networks, for example for testing new computing patterns or conducting research on HPC itself. The variety of use cases proposed by several departments of the University of Torino, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others, called for different and sometimes conflicting configurations; furthermore, several R&D activities in the field of scientific computing, with topics ranging from GPU acceleration to Cloud Computing technologies, needed a platform to be carried out on. The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multi-purpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Torino branch of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing needs, as well as a platform for R&D activities on computational technologies themselves. Extending it with novel architecture CPU, accelerator or hybrid microarchitecture (such as forthcoming Intel Xeon Phi Knights Landing) should be as a simple as plugging a node in a rack. The initial system counts slightly more than 1100 cpu cores and includes different types of computing nodes (standard dual-socket nodes, large quad-sockets nodes with 768 GB RAM, and multi-GPU nodes) and two separate disk storage subsystems: a smaller high-performance scratch area, based on the Lustre file system, intended for direct computational I/O and a larger one, of the order of 1PB, to archive near-line data for archival purposes. All the components of the system are interconnected through a 10Gb/s Ethernet layer with one-level topology and an InfiniBand FDR 56Gbps layer in fat-tree topology. A system of this kind, heterogeneous and reconfigurable by design, poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, virtual farms, jobs, interactive bare-metal sessions, etc. This poster describes some of the use cases that prompted the design ad construction of the HPC cluster, its architecture and a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.

Concetto Spampinato, Simone Palazzo, Daniela Giordano, Marco Aldinucci, Rosalia Leonardi

Deep learning for automated skeletal bone age assessment in X-ray images Journal Article

In: Medical Image Analysis, vol. 36, pp. 41–51, 2017.

Abstract | Links | BibTeX | Tags: HPC

@article{17:deepx:conce,

title = {Deep learning for automated skeletal bone age assessment in X-ray images},

author = {Concetto Spampinato and Simone Palazzo and Daniela Giordano and Marco Aldinucci and Rosalia Leonardi},

url = {https://iris.unito.it/retrieve/e27ce42b-5743-2581-e053-d805fe0acbaa/main.pdf},

doi = {10.1016/j.media.2016.10.010},

year  = {2017},

date = {2017-01-01},

journal = {Medical Image Analysis},

volume = {36},

pages = {41–51},

abstract = {Skeletal bone age assessment is a common clinical practice to investigate endocrinology, genetic and growth disorders in children. It is generally performed by radiological examination of the left hand by using either the Greulich and Pyle (G&P) method or the Tanner–Whitehouse (TW) one. However, both clinical procedures show several limitations, from the examination effort of radiologists to (most importantly) significant intra- and inter-operator variability. To address these problems, several automated approaches (especially relying on the TW method) have been proposed; nevertheless, none of them has been proved able to generalize to different races, age ranges and genders. In this paper, we propose and test several deep learning approaches to assess skeletal bone age automatically; the results showed an average discrepancy between manual and automatic evaluation of about 0.8 years, which is state-of-the-art performance. Furthermore, this is the first automated skeletal bone age assessment work tested on a public dataset and for all age ranges, races and genders, for which the source code is available, thus representing an exhaustive baseline for future research in the field. Beside the specific application scenario, this paper aims at providing answers to more general questions about deep learning on medical images: from the comparison between deep-learned features and manually-crafted ones, to the usage of deep-learning methods trained on general imagery for medical problems, to how to train a CNN with few images.},

keywords = {HPC},

pubstate = {published},

tppubtype = {article}

}

Wissam Abu Ahmad, Andrea Bartolini, Francesco Beneventi, Luca Benini, Andrea Borghesi, Marco Cicala, Privato Forestieri, Cosimo Gianfreda, Daniele Gregori, Antonio Libri, Filippo Spiga, Simone Tinti

Design of an Energy Aware Petaflops Class High Performance Cluster Based on Power Architecture Proceedings Article

In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017, pp. 964–973, 2017.

Links | BibTeX | Tags: HPC

2016

Paolo Viviani, Marco Aldinucci, Roberto d'Ippolito

An hybrid linear algebra framework for engineering Proceedings Article

In: Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) – Poster Abstracts, Fiuggi, Italy, 2016.

Abstract | Links | BibTeX | Tags: HPC, repara

Bogdan Nicolae, Carlos H. A. Costa, Claudia Misale, Kostas Katrinis, Yoonho Park

Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics Proceedings Article

In: IEEE/ACM 16th Intl. Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, IEEE, Cartagena, Colombia, 2016.

Abstract | Links | BibTeX | Tags: HPC

Paolo Viviani, Marco Aldinucci, Roberto d'Ippolito, Jean Lemeire, Dean Vucinic

A flexible numerical framework for engineering - a Response Surface Modelling application Unpublished

2016.

Abstract | BibTeX | Tags: HPC, repara, rephrase

@unpublished{16:acex:armadillo,

title = {A flexible numerical framework for engineering - a Response Surface Modelling application},

author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito and Jean Lemeire and Dean Vucinic},

year  = {2016},

date = {2016-01-01},

booktitle = {10th Intl. Conference on Advanced Computational Engineering and Experimenting (ACE-X)},

abstract = {This work presents the innovative approach adopted for the development of a new numerical software framework for accelerating Dense Linear Algebra calculations and its application within an engineering context. In particular, Response Surface Models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different Response Surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using Ensemble Modelling techniques). On the other hand, it is a known challenge that the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a Radial Basis Function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.},

keywords = {HPC, repara, rephrase},

pubstate = {published},

tppubtype = {unpublished}

}

2015

Marco Aldinucci, Marco Danelutto, Maurizio Drocco, Peter Kilpatrick, Guilherme Peretti Pezzi, Massimo Torquati

The Loop-of-Stencil-Reduce paradigm Proceedings Article

In: Proc. of Intl. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara), pp. 172–177, IEEE, Helsinki, Finland, 2015.

Abstract | Links | BibTeX | Tags: fastflow, HPC, repara

Marco Aldinucci, Guilherme Peretti Pezzi, Maurizio Drocco, Concetto Spampinato, Massimo Torquati

Parallel Visual Data Restoration on Multi-GPGPUs using Stencil-Reduce Pattern Journal Article

In: International Journal of High Performance Computing Applications, vol. 29, no. 4, pp. 461–472, 2015.

Abstract | Links | BibTeX | Tags: fastflow, HPC, impact, paraphrase

2014

Marco Aldinucci, Massimo Torquati, Maurizio Drocco, Guilherme Peretti Pezzi, Concetto Spampinato

FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs Proceedings Article

In: GPU Technology Conference (GTC), San Jose, CA, USA, 2014.

Abstract | Links | BibTeX | Tags: fastflow, HPC, impact, paraphrase

Marco Aldinucci, Massimo Torquati, Maurizio Drocco, Guilherme Peretti Pezzi, Concetto Spampinato

An Overview of FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs Proceedings Article

In: GPU Technology Conference (GTC), San Jose, CA, USA, 2014.

Abstract | Links | BibTeX | Tags: fastflow, HPC, impact, paraphrase

Maurizio Drocco, Marco Aldinucci, Massimo Torquati

A Dynamic Memory Allocator for heterogeneous platforms Proceedings Article

In: Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) – Poster Abstracts, HiPEAC, Fiuggi, Italy, 2014.

Abstract | Links | BibTeX | Tags: fastflow, HPC

WE ARE HIRING! If you are Research Engineers, Ph.D. Candidates and Post-Doctoral Researchers send your CV to alpha@di.unito.it

WE ARE HIRING! If you are Research Engineers, Ph.D. Candidates and Post-Doctoral Researchers send your CV to alpha@di.unito.it

Papers | Parallel Computing

2024

2023

2021

2020

2019

2018

2017

2016

2015

2014