Giulio Malenza
PhD Student
Computer Science Department, University of Turin
Parallel Computing group
Via Pessinetto 12, 10149 Torino – Italy
E-mail: giulio.malenza@unito.it
Short Bio
Giulio Malenza is a Ph.D. student in computer science at the University of Turin. He has a master in High Performance Computing, a master degree in Mathematical Engineering and a bachelor degree in Physics.
Fields of interest:
- High Performance Computing
- Parallel Computing
- Scientific Computing
Publications
2024
Gianluca Mittone, Giulio Malenza, Marco Aldinucci, Robert Birke
Distributed Edge Inference: an Experimental Study on Multiview Detection Proceedings Article
In: Proc. of the 16th IEEE/ACM Intl. Conference on Utility and Cloud Computing Companion (UCC), pp. 1-6, ACM, Taormina, Italy, 2024, (eupilot, icsc).
Abstract | Links | BibTeX | Tags: ai, eupilot, icsc
@inproceedings{23:mittone:multiview,
title = {Distributed Edge Inference: an Experimental Study on Multiview Detection},
author = {Gianluca Mittone and Giulio Malenza and Marco Aldinucci and Robert Birke},
url = {https://iris.unito.it/handle/2318/1950083},
doi = {10.1145/3603166.3632561},
year = {2024},
date = {2024-12-01},
booktitle = {Proc. of the 16th IEEE/ACM Intl. Conference on Utility and Cloud Computing Companion (UCC)},
volume = {30},
pages = {1-6},
publisher = {ACM},
address = {Taormina, Italy},
institution = {Computer Science Department, University of Torino},
abstract = {Computing is evolving rapidly to cater to the increasing demand for sophisticated services, and Cloud computing lays a solid foundation for flexible on-demand provisioning. However, as the size of applications grows, the centralised client-server approach used by Cloud computing increasingly limits the applications' scalability. To achieve ultra-scalability, cloud/edge/fog computing converges into the compute continuum, completely decentralising the infrastructure to encompass universal, pervasive resources. The compute continuum makes devising applications benefitting from this complex environment a challenging research problem. We put the opportunities the compute continuum others to the test through a real-world multi-view detection model (MvDet) implemented with the FastFL C/C++ high-performance edge inference framework. Computational performance is discussed considering many experimental scenarios, encompassing different edge computational capabilities and network bandwidths. We obtain up to 1.92x speedup in inference time over a centralised solution using the same devices.},
note = {eupilot, icsc},
keywords = {ai, eupilot, icsc},
pubstate = {published},
tppubtype = {inproceedings}
}
Giulio Malenza, Valentina Cesare, Marco Aldinucci, Ugo Becciani, Alberto Vecchiato
Toward HPC application portability via C++ PSTL: the Gaia AVU-GSR code assessment Journal Article
In: The Journal of Supercomputing, 2024, ISSN: 09208542.
Abstract | Links | BibTeX | Tags: eupex, HPC, icsc
@article{24:jsupe:Gaia,
title = {Toward HPC application portability via C++ PSTL: the Gaia AVU-GSR code assessment},
author = {Giulio Malenza and Valentina Cesare and Marco Aldinucci and Ugo Becciani and Alberto Vecchiato},
doi = {10.1007/s11227-024-06011-1},
issn = {09208542},
year = {2024},
date = {2024-03-01},
journal = {The Journal of Supercomputing},
publisher = {Springer},
abstract = {The computing capacity needed to process the data generated in modern scientific experiments is approaching ExaFLOPs. Currently, achieving such performances is only feasible through GPU-accelerated supercomputers. Different languages were developed to program GPUs at different levels of abstraction. Typically, the more abstract the languages, the more portable they are across different GPUs. However, the less abstract and co-designed with the hardware, the more room for code optimization and, eventually, the more performance. In the HPC context, portability and performance are a fairly traditional dichotomy. The current C++ Parallel Standard Template Library (PSTL) has the potential to go beyond this dichotomy. In this work, we analyze the main performance benefits and limitations of PSTL using as a use-case the Gaia Astrometric Verification Unit-Global Sphere Reconstruction parallel solver developed by the European Space Agency Gaia mission. The code aims to find the astrometric parameters of $$sim10^8$$stars in the Milky Way by iteratively solving a linear system of equations with the LSQR algorithm, originally GPU-ported with the CUDA language. We show that the performance obtained with the PSTL version, which is intrinsically more portable than CUDA, is comparable to the CUDA one on NVIDIA GPU architecture.},
keywords = {eupex, HPC, icsc},
pubstate = {published},
tppubtype = {article}
}
Marco Edoardo Santimaria, Samuele Fonio, Giulio Malenza, Iacopo Colonnelli, Marco Aldinucci
Benchmarking Parallelization Models through Karmarkar Interior-point method Proceedings Article
In: Chis, Horacio González-Vélez Adriana E. (Ed.): Proc. of 32nd Euromicro intl. Conference on Parallel, Distributed and Network-based Processing (PDP), pp. 1-8, IEEE, Dublin, Ireland, 2024, ISSN: 2377-5750.
Abstract | Links | BibTeX | Tags: HPC, icsc
@inproceedings{24:pdp:karmarkar,
title = {Benchmarking Parallelization Models through Karmarkar Interior-point method},
author = {Marco Edoardo Santimaria and Samuele Fonio and Giulio Malenza and Iacopo Colonnelli and Marco Aldinucci},
editor = {Horacio González-Vélez Adriana E. Chis},
url = {https://hdl.handle.net/2318/1964571},
doi = {10.1109/PDP62718.2024.00010},
issn = {2377-5750},
year = {2024},
date = {2024-03-01},
booktitle = {Proc. of 32nd Euromicro intl. Conference on Parallel, Distributed and Network-based Processing (PDP)},
pages = {1-8},
publisher = {IEEE},
address = {Dublin, Ireland},
abstract = {Optimization problems are one of the main focus of scientific research. Their computational-intensive nature makes them prone to be parallelized with consistent improvements in performance. This paper sheds light on different parallel models for accelerating Karmarkar's Interior-point method. To do so, we assess parallelization strategies for individual operations within the aforementioned Karmarkar's algorithm using OpenMP, GPU acceleration with CUDA, and the recent Parallel Standard C++ Linear Algebra library (PSTL) executing both on GPU and CPU. Our different implementations yield interesting benchmark results that show the optimal approach for parallelizing interior point algorithms for general Linear Programming (LP) problems. In addition, we propose a more theoretical perspective of the parallelization of this algorithm, with a detailed study of our OpenMP implementation, showing the limits of optimizing the single operations},
keywords = {HPC, icsc},
pubstate = {published},
tppubtype = {inproceedings}
}
Adriano Marques Garcia, Giulio Malenza, Robert Birke, Marco Aldinucci
Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors Proceedings Article
In: Antelmi, Alessia, Carlini, Emanuele, Dazzi, Patrizio (Ed.): Proceedings of BigHPC2024: Special Track on Big Data and High-Performance Computing, co-located with the 3textsuperscriptrd Italian Conference on Big Data and Data Science, ITADATA2024, pp. 1-9, CEUR-WS.org, Pisa, Italy, 2024.
Abstract | Links | BibTeX | Tags: eupilot, icsc
@inproceedings{24:garcia:itadata,
title = {Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors},
author = {Adriano Marques Garcia and Giulio Malenza and Robert Birke and Marco Aldinucci},
editor = {Alessia Antelmi and Emanuele Carlini and Patrizio Dazzi},
url = {https://iris.unito.it/retrieve/1540f675-5e88-4f57-95e7-df8e0fe5f1df/paper110.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of BigHPC2024: Special Track on Big Data and High-Performance Computing, co-located with the 3textsuperscriptrd Italian Conference on Big Data and Data Science, ITADATA2024},
volume = {3785},
pages = {1-9},
publisher = {CEUR-WS.org},
address = {Pisa, Italy},
series = {CEUR Workshop Proceedings},
abstract = {The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.},
keywords = {eupilot, icsc},
pubstate = {published},
tppubtype = {inproceedings}
}
Iacopo Colonnelli, Robert Birke, Giulio Malenza, Gianluca Mittone, Alberto Mulone, Jeroen Galjaard, Lydia Y. Chen, Sanzio Bassini, Gabriella Scipione, Jan Martinovič, Vit Vondrák, Marco Aldinucci
Cross-Facility Federated Learning Journal Article
In: Procedia Computer Science, vol. 240, pp. 3–12, 2024, ISSN: 1877-0509.
Abstract | Links | BibTeX | Tags: icsc, space, streamflow
@article{24:eurohpc:xffl,
title = {Cross-Facility Federated Learning},
author = {Iacopo Colonnelli and Robert Birke and Giulio Malenza and Gianluca Mittone and Alberto Mulone and Jeroen Galjaard and Lydia Y. Chen and Sanzio Bassini and Gabriella Scipione and Jan Martinovič and Vit Vondrák and Marco Aldinucci},
url = {https://www.sciencedirect.com/science/article/pii/S1877050924016909},
doi = {10.1016/j.procs.2024.07.003},
issn = {1877-0509},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of the First EuroHPC user day},
journal = {Procedia Computer Science},
volume = {240},
pages = {3–12},
publisher = {Elsevier},
address = {Bruxelles, Belgium},
abstract = {In a decade, AI frontier research transitioned from the researcher's workstation to thousands of high-end hardware-accelerated compute nodes. This rapid evolution shows no signs of slowing down in the foreseeable future. While top cloud providers may be able to keep pace with this growth rate, obtaining and efficiently exploiting computing resources at that scale is a daunting challenge for universities and SMEs. This work introduces the Cross-Facility Federated Learning (XFFL) framework to bridge this compute divide, extending the opportunity to efficiently exploit multiple independent data centres for extreme-scale deep learning tasks to data scientists and domain experts. XFFL relies on hybrid workflow abstractions to decouple tasks from environment-specific technicalities, reducing complexity and enhancing reusability. In addition, Federated Learning (FL) algorithms eliminate the need to move large amounts of data between different facilities, reducing time-to-solution and preserving data privacy. The XFFL approach is empirically evaluated by training a full LLaMAv2 7B instance on two facilities of the EuroHPC JU, showing how the increased computing power completely compensates for the additional overhead introduced by two data centres.},
keywords = {icsc, space, streamflow},
pubstate = {published},
tppubtype = {article}
}
Talks
2024
Giulio Malenza
Exploiting C++ Parallel Algorithms through FastFlow Miscellaneous
2024.
Abstract | Links | BibTeX | Tags: icsc
@misc{24:gmalenza:BigHPC2024,
title = {Exploiting C++ Parallel Algorithms through FastFlow},
author = {Giulio Malenza},
url = {https://datacloud.di.unito.it/index.php/s/GcpQ8cz9BRyM85B},
year = {2024},
date = {2024-09-01},
address = {Pisa, Italy},
abstract = {High-performance computing and artificial intelligent simulations necessitate the rapid processing of large quantities of data. To handle such data volumes efficiently, leveraging the parallelism inherent in algorithms is crucial. Consequently, parallel programming frameworks have been developed to fully exploit modern parallel architectures. Among these, C++ PSTL stands out for its user-friendliness, portability, and high performance.
In this study, we introduce a back-end for the PSTL implemented using the FastFlow parallel programming framework. We will evaluate correctness and performance of the back-end comparing results with other coming from traditional vendor-dependent back-ends like TBB and nvc++. Performance metrics are derived from running the LULESH application on both RISC-V and ARM architectures. Our results indicate that all three back-ends deliver comparable performance.},
keywords = {icsc},
pubstate = {published},
tppubtype = {misc}
}
In this study, we introduce a back-end for the PSTL implemented using the FastFlow parallel programming framework. We will evaluate correctness and performance of the back-end comparing results with other coming from traditional vendor-dependent back-ends like TBB and nvc++. Performance metrics are derived from running the LULESH application on both RISC-V and ARM architectures. Our results indicate that all three back-ends deliver comparable performance.
Giulio Malenza
Exploring energy consumption of AI frameworks on a 64-core RV64 Server CPU Miscellaneous
2024.
Abstract | Links | BibTeX | Tags: ai, DYMAN, icsc
@misc{24:gmalenza:scihpcexa,
title = {Exploring energy consumption of AI frameworks on a 64-core RV64 Server CPU},
author = {Giulio Malenza},
url = {https://datacloud.di.unito.it/index.php/s/5aTdyzNB6n9CREq},
year = {2024},
date = {2024-09-01},
address = {Pisa, Italy},
abstract = {In today's era of rapid technological advancement, artificial intelligence (AI) applications require large-scale, high-performance, and data-intensive computations, leading to significant energy demands. Addressing this challenge necessitates a combined approach involving both hardware and software innovations. Hardware manufacturers are developing new, efficient, and specialized solutions, with the RISC-V architecture emerging as a prominent player due to its open, extensible, and energy-efficient instruction set architecture (ISA). Simultaneously, software developers are creating new algorithms and frameworks,
yet their energy efficiency often remains unclear.
In this study, we conduct a comprehensive benchmark analysis of machine learning (ML) applications on the 64-core SOPHON SG2042 RISC-V architecture. Specifically, we examine the energy consumption of deep learning inference models across various AI frameworks. By comparing the performance of different frameworks, we aim to provide a detailed understanding of how these frameworks can optimize energy consumption on this architecture.},
keywords = {ai, DYMAN, icsc},
pubstate = {published},
tppubtype = {misc}
}
yet their energy efficiency often remains unclear.
In this study, we conduct a comprehensive benchmark analysis of machine learning (ML) applications on the 64-core SOPHON SG2042 RISC-V architecture. Specifically, we examine the energy consumption of deep learning inference models across various AI frameworks. By comparing the performance of different frameworks, we aim to provide a detailed understanding of how these frameworks can optimize energy consumption on this architecture.
Giulio Malenza
Preliminary analysis of model parallelism applications on a 64-core RV64 Server CPU Miscellaneous
2024.
Abstract | Links | BibTeX | Tags: eupilot, icsc
@misc{24:gmalenza:hlpp:MPRISC-v,
title = {Preliminary analysis of model parallelism applications on a 64-core RV64 Server CPU},
author = {Giulio Malenza},
url = {https://datacloud.di.unito.it/index.php/s/JrWwKALeaFEJSQo},
year = {2024},
date = {2024-07-01},
address = {Pisa, Italy},
abstract = {Massive Data Parallel workloads, driven by inference on large ML models, are pushing hardware vendors to develop efficient and cost-effective multi-core server CPUs. The RISC-V architecture plays a prominent role due to its open, extensible and energy-friendly ISA. Despite significant progress in recent years, finding efficient methods to run parallel applications on new architectures to harness their maximum performance fully remains a challenge. In this study, we benchmark the inference of machine learning models on the SOPHON SG2042 SoC, the first server-grade CPU based on the RV64 ISA, composed of 64 cores arranged in a grid of 16 groups of 4 cores. Specifically, we aim to enhance performance via better cache hit ratios stemming from model parallelism to split and assign parts of the model to specific (groups of) cores using a pipeline execution. We orchestrate execution using FastFlow, a low-level programming framework designed for multithreaded streaming applications. By comparing the results against the standard multi-core inference and analyzing the effects of different submodel-to-core mapping strategies, we aim to provide a comprehensive understanding of how the model parallel approach can maximize efficiency and utilization of hardware resources.},
keywords = {eupilot, icsc},
pubstate = {published},
tppubtype = {misc}
}
Iacopo Colonnelli, Robert Birke, Giulio Malenza, Gianluca Mittone, Alberto Mulone, Marco Aldinucci
Cross-Facility Federated Learning - Part II Miscellaneous
2024, (Invited talk).
Links | BibTeX | Tags: eupex, icsc, space
@misc{24:ic:elise:xffl,
title = {Cross-Facility Federated Learning - Part II},
author = {Iacopo Colonnelli and Robert Birke and Giulio Malenza and Gianluca Mittone and Alberto Mulone and Marco Aldinucci},
url = {https://datacloud.di.unito.it/index.php/s/7HonBpcWPxotXLX},
year = {2024},
date = {2024-06-01},
address = {Helsinki, Finland},
note = {Invited talk},
keywords = {eupex, icsc, space},
pubstate = {published},
tppubtype = {misc}
}
Giulio Malenza, Marco Edoardo Santimaria
Benchmarking Parallelization Models through Karmarkar`s algorithm Miscellaneous
2024.
Abstract | Links | BibTeX | Tags: HPC, icsc
@misc{24:pdp:karmarkartalk,
title = {Benchmarking Parallelization Models through Karmarkar`s algorithm},
author = {Giulio Malenza and Marco Edoardo Santimaria},
url = {https://datacloud.di.unito.it/index.php/s/JjKcAJpYS7ctX9r},
year = {2024},
date = {2024-03-01},
address = {Dublin, Irelans},
abstract = {Optimization problems are one of the main focus of scientific research. Their computational-intensive nature makes them prone to be parallelized with consistent improvements in performance. This paper sheds light on different parallel models for accelerating Karmarkar’s Interior-point method. To do so, we assess parallelization strategies for individual operations within the aforementioned Karmarkar’s algorithm using OpenMP, GPU acceleration with CUDA, and the recent Parallel Standard C++ Linear Algebra library (PSTL) executing both on GPU and CPU. Our different implementations yield interesting benchmark results that show the optimal approach for parallelizing interior point algorithms for general Linear Programming (LP) problems. In addition, we propose a more theoretical perspective of the parallelization of this algorithm, with a detailed study of our OpenMP implementation, showing the limits of optimizing the single operations},
keywords = {HPC, icsc},
pubstate = {published},
tppubtype = {misc}
}
2023
Iacopo Colonnelli, Robert Birke, Giulio Malenza, Gianluca Mittone, Alberto Mulone, Marco Aldinucci, Valerio Basile, Marco Antonio Stranisci, Viviana Patti, Jeroen Galjaard, Lydia Y. Chen, Sanzio Bassini, Massimiliano Guarrasi, Gabriella Scipione, Jan Martinovič, Vit Vondrák
Cross-Facility Federated Learning Miscellaneous
1st EuroHPC User Day, 2023.
Links | BibTeX | Tags: across, ai, eupex, eupilot, HPC
@misc{23:eurohpc,
title = {Cross-Facility Federated Learning},
author = {Iacopo Colonnelli and Robert Birke and Giulio Malenza and Gianluca Mittone and Alberto Mulone and Marco Aldinucci and Valerio Basile and Marco Antonio Stranisci and Viviana Patti and Jeroen Galjaard and Lydia Y. Chen and Sanzio Bassini and Massimiliano Guarrasi and Gabriella Scipione and Jan Martinovič and Vit Vondrák},
url = {https://datacloud.di.unito.it/index.php/s/DDAz4QkJP3WZ68M},
year = {2023},
date = {2023-12-01},
address = {Bruxelles, Belgium},
howpublished = {1st EuroHPC User Day},
keywords = {across, ai, eupex, eupilot, HPC},
pubstate = {published},
tppubtype = {misc}
}
Gianluca Mittone, Giulio Malenza, Marco Aldinucci, Robert Birke
Distributed Edge Inference: an Experimental Study on Multiview Detection Miscellaneous
The 16th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2023), 2023.
Abstract | Links | BibTeX | Tags: ai, eupilot, icsc
@misc{23:ucc:multiview,
title = {Distributed Edge Inference: an Experimental Study on Multiview Detection},
author = {Gianluca Mittone and Giulio Malenza and Marco Aldinucci and Robert Birke},
url = {https://datacloud.di.unito.it/index.php/s/XfjNZEPSNfSKPFr},
year = {2023},
date = {2023-12-01},
address = {Taormina, Italy},
abstract = {Computing is evolving rapidly to cater to the increasing demand for sophisticated services, and Cloud computing lays a solid foundation for flexible on-demand provisioning. However, as the size of applications grows, the centralised client-server approach used by Cloud computing increasingly limits the applications scalability. To achieve ultra-scalability, cloud/edge/fog computing converges into the compute continuum, completely decentralising the infrastructure to encompass universal, pervasive resources. The compute continuum makes devising applications benefitting from this complex environment a challenging research problem. We put the opportunities the compute continuum others to the test through a real-world multi-view detection model (MvDet) implemented with the FastFL C/C++ high-performance edge inference framework. Computational performance is discussed considering many experimental scenarios, encompassing different edge computational capabilities and network bandwidths. We obtain up to 1.92x speedup in inference time over a centralised solution using the same devices.},
howpublished = {The 16th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2023)},
keywords = {ai, eupilot, icsc},
pubstate = {published},
tppubtype = {misc}
}
Giulio Malenza, Valentina Cesare, Marco Aldinucci
Performance portability in HPC: the Gaia use-case. Miscellaneous
2nd Italian Conference on Big Data and Data Science (ITADATA 2023), 2023.
@misc{23:GAIA:bigHPC,
title = {Performance portability in HPC: the Gaia use-case.},
author = {Giulio Malenza and Valentina Cesare and Marco Aldinucci},
url = {https://datacloud.di.unito.it/index.php/s/RqcZpizFtC9toFq},
year = {2023},
date = {2023-09-01},
address = {Naples, Italy},
howpublished = {2nd Italian Conference on Big Data and Data Science (ITADATA 2023)},
keywords = {icsc},
pubstate = {published},
tppubtype = {misc}
}
Giulio Malenza
Building an accelerated OpenFOAM Proof-of-Concept application using Modern C++. Miscellaneous
18th OpenFOAM Workshop 2023, Genova, 2023.
@misc{23:OF:genova,
title = {Building an accelerated OpenFOAM Proof-of-Concept application using Modern C++.},
author = {Giulio Malenza},
url = {https://datacloud.di.unito.it/index.php/s/mB6omsDB8ERBkGW},
year = {2023},
date = {2023-07-01},
address = {Genova, Italy},
howpublished = {18th OpenFOAM Workshop 2023, Genova},
keywords = {icsc},
pubstate = {published},
tppubtype = {misc}
}