Adriano Marques Garcia
Postdoctoral Researcher
Computer Science Department, University of Turin
Via Pessinetto 12, 10149 Torino – Italy
Email: adriano.marquesgarcia@unito.it
ORCID: 0000-0003-4796-773X
Short Bio
Adriano Marques Garcia is a postdoctoral researcher at the University of Turin. He received his PhD with honours in Computer Science at the Pontifical Catholic University of Rio Grande do Sul with a thesis on easing the benchmarking of parallel stream processing targeting multicore parallelism. He also received a Master’s Degree in Electrical Engineering from the Federal University of Pampa with a master’s thesis on a new parallel benchmark suite for evaluating the performance and energy consumption of parallel programming interfaces. Adriano also worked as a research fellow at the SAP SE company, focusing on researching and developing new methods for fault recovery in distributed data-flow graphs.
Open Source Software
- Creator and maintainer of SPBench, a framework for benchmarking C++ stream processing applications. The main goal of SPBench is to enable users to easily create custom benchmarks from real-world stream processing applications and evaluate multiple parallel programming interfaces.
- Creator and maintainer of PAMPAR, a parallel benchmark suite that provides a broad set of benchmarks (micro, kernels, and pseudo-applications), all parallelized using well-known parallel programming interfaces, such as OpenMP, POSIX Threads, MPI-1.0, and MPI-2.0.
Achievements
- [2023] Winner of the IEEE SBAC-PAD 2023 2nd Best PhD Thesis Award.
- [2020] Winner of ICCSA 2020 Best Paper Award.
- [2014] Winner of an 18-month Computer Science visiting student scholarship at Dublin Business School in Dublin, Ireland.
Publications
2024
Adriano Marques Garcia, Giulio Malenza, Robert Birke, Marco Aldinucci
Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors Proceedings Article
In: Antelmi, Alessia, Carlini, Emanuele, Dazzi, Patrizio (Ed.): Proceedings of BigHPC2024: Special Track on Big Data and High-Performance Computing, co-located with the 3textsuperscriptrd Italian Conference on Big Data and Data Science, ITADATA2024, pp. 1-9, CEUR-WS.org, Pisa, Italy, 2024.
Abstract | Links | BibTeX | Tags: eupilot, icsc
@inproceedings{24:garcia:itadata,
title = {Assessing Large Language Models Inference Performance on a 64-core RISC-V CPU with Silicon-Enabled Vectors},
author = {Adriano Marques Garcia and Giulio Malenza and Robert Birke and Marco Aldinucci},
editor = {Alessia Antelmi and Emanuele Carlini and Patrizio Dazzi},
url = {https://iris.unito.it/retrieve/1540f675-5e88-4f57-95e7-df8e0fe5f1df/paper110.pdf},
year = {2024},
date = {2024-01-01},
booktitle = {Proceedings of BigHPC2024: Special Track on Big Data and High-Performance Computing, co-located with the 3textsuperscriptrd Italian Conference on Big Data and Data Science, ITADATA2024},
volume = {3785},
pages = {1-9},
publisher = {CEUR-WS.org},
address = {Pisa, Italy},
series = {CEUR Workshop Proceedings},
abstract = {The rising usage of compute-intensive AI applications with fast response time requirements, such as text generation using large language models, underscores the need for more efficient and versatile hardware solutions. This drives the exploration of emerging architectures like RISC-V, which has the potential to deliver strong performance within tight power constraints. The recent commercial release of processors with RISC-V Vector (RVV) silicon-enabled extensions further amplifies the significance of RISC-V architectures, offering enhanced capabilities for parallel processing and accelerating tasks critical to large language models and other AI applications. This work aims to evaluate the BERT and GPT-2 language models inference performance on the SOPHON SG2042 64-core RISC-V architecture with silicon-enabled RVV v0.7.1. We benchmarked the models with and without RVV, using OpenBLAS and BLIS as BLAS backends for PyTorch to enable vectorization. Enabling RVV in OpenBLAS improved the inference performance by up to 40% in some cases.},
keywords = {eupilot, icsc},
pubstate = {published},
tppubtype = {inproceedings}
}
Adriano Marques Garcia, Dalvan Griebler, Claudio Schepke, José Daniel García, Javier Fernández Muñoz, Luiz Gustavo Fernandes
Performance and programmability of GrPPI for parallel stream processing on multi-cores Journal Article
In: The Journal of Supercomputing, vol. In press, no. In press, pp. 1-35, 2024, ISBN: 1573-0484.
Abstract | Links | BibTeX | Tags: admire
@article{GARCIA:JSuper:24,
title = {Performance and programmability of GrPPI for parallel stream processing on multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and José Daniel García and Javier Fernández Muñoz and Luiz Gustavo Fernandes},
url = {https://iris.unito.it/retrieve/fff66640-fcbe-4080-a4f1-3279c9fadafb/s11227-024-05934-z.pdf},
doi = {10.1007/s11227-024-05934-z},
isbn = {1573-0484},
year = {2024},
date = {2024-01-01},
journal = {The Journal of Supercomputing},
volume = {In press},
number = {In press},
pages = {1-35},
publisher = {Springer},
abstract = {GrPPI library aims to simplify the burdening task of parallel programming. It provides a unified, abstract, and generic layer while promising minimal overhead on performance. Although it supports stream parallelism, GrPPI lacks an evaluation regarding representative performance metrics for this domain, such as throughput and latency. This work evaluates GrPPI focused on parallel stream processing. We compare the throughput and latency performance, memory usage, and programmability of GrPPI against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks and benchmarks with handwritten parallel code using the same backends supported by GrPPI. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is often competitive with handwritten parallel code, the infeasibility of fine-tuning GrPPI is a crucial drawback for emerging applications. Despite this, programmability experiments estimate that GrPPI can potentially reduce the development time of parallel applications by about three times.},
keywords = {admire},
pubstate = {published},
tppubtype = {article}
}
2023
Adriano Marques Garcia, Dalvan Griebler, Claudio Schepke, André Sacilotto Santos, José Daniel García, Javier Fernández Muñoz, Luiz Gustavo Fernandes
A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores Proceedings Article
In: 31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 164-168, IEEE, Naples, Italy, 2023.
Abstract | Links | BibTeX | Tags: admire
@inproceedings{GARCIA:PDP:23,
title = {A Latency, Throughput, and Programmability Perspective of GrPPI for Streaming on Multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and André Sacilotto Santos and José Daniel García and Javier Fernández Muñoz and Luiz Gustavo Fernandes},
url = {https://iris.unito.it/retrieve/9165d2ef-7140-4645-87cc-269050341c1d/PDP_2023_SPbench_with_GrPPI.pdf},
doi = {10.1109/PDP59025.2023.00033},
year = {2023},
date = {2023-03-01},
booktitle = {31st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {164-168},
publisher = {IEEE},
address = {Naples, Italy},
series = {PDP'23},
abstract = {Several solutions aim to simplify the burdening task of parallel programming. The GrPPI library is one of them. It allows users to implement parallel code for multiple backends through a unified, abstract, and generic layer while promising minimal overhead on performance. An outspread evaluation of GrPPI regarding stream parallelism with representative metrics for this domain, such as throughput and latency, was not yet done. In this work, we evaluate GrPPI focused on stream processing. We evaluate performance, memory usage, and programming effort and compare them against handwritten parallel code. For this, we use the benchmarking framework SPBench to build custom GrPPI benchmarks. The basis of the benchmarks is real applications, such as Lane Detection, Bzip2, Face Recognizer, and Ferret. Experiments show that while performance is competitive with handwritten code in some cases, in other cases, the infeasibility of fine-tuning GrPPI is a crucial drawback. Despite this, programmability experiments estimate that GrPPI has the potential to reduce by about three times the development time of parallel applications.},
keywords = {admire},
pubstate = {published},
tppubtype = {inproceedings}
}
Adriano Marques Garcia, Dalvan Griebler, Claudio Schepke, Luiz Gustavo Fernandes
Micro-batch and data frequency for stream processing on multi-cores Journal Article
In: The Journal of Supercomputing, vol. 79, no. 8, pp. 9206-9244, 2023, ISBN: 1573-0484.
Abstract | Links | BibTeX | Tags: parallel
@article{GARCIA:JSuper:23,
title = {Micro-batch and data frequency for stream processing on multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://iris.unito.it/retrieve/9328dbca-98ae-4ac5-b856-57c72db4444a/s11227-022-05024-y_preprint.pdf},
doi = {10.1007/s11227-022-05024-y},
isbn = {1573-0484},
year = {2023},
date = {2023-01-01},
journal = {The Journal of Supercomputing},
volume = {79},
number = {8},
pages = {9206-9244},
publisher = {Springer},
abstract = {Latency or throughput is often critical performance metrics in stream processing. Applications’ performance can fluctuate depending on the input stream. This unpredictability is due to the variety in data arrival frequency and size, complexity, and other factors. Researchers are constantly investigating new ways to mitigate the impact of these variations on performance with self-adaptive techniques involving elasticity or micro-batching. However, there is a lack of benchmarks capable of creating test scenarios to further evaluate these techniques. This work extends and improves the SPBench benchmarking framework to support dynamic micro-batching and data stream frequency management. We also propose a set of algorithms that generates the most commonly used frequency patterns for benchmarking stream processing in related work. It allows the creation of a wide variety of test scenarios. To validate our solution, we use SPBench to create custom benchmarks and evaluate the impact of micro-batching and data stream frequency on the performance of Intel TBB and FastFlow. These are two libraries that leverage stream parallelism for multi-core architectures. Our results demonstrated that our test cases did not benefit from micro-batches on multi-cores. For different data stream frequency configurations, TBB ensured the lowest latency, while FastFlow assured higher throughput in shorter pipelines.},
keywords = {parallel},
pubstate = {published},
tppubtype = {article}
}
2022
Adriano Marques Garcia, Dalvan Griebler, Claudio Schepke, Luiz Gustavo Fernandes
Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores Proceedings Article
In: 30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 10-17, IEEE, Valladolid, Spain, 2022.
Abstract | Links | BibTeX | Tags: parallel
@inproceedings{GARCIA:PDP:22,
title = {Evaluating Micro-batch and Data Frequency for Stream Processing Applications on Multi-cores},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://iris.unito.it/retrieve/f6d113e5-789b-4f8b-924d-8ca3d38e8d62/PDP_2022__SPBench_with_Batch_and_Data_Frequency_.pdf},
doi = {10.1109/PDP55904.2022.00011},
year = {2022},
date = {2022-04-01},
booktitle = {30th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {10-17},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'22},
abstract = {In stream processing, data arrives constantly and is often unpredictable. It can show large fluctuations in arrival frequency, size, complexity, and other factors. These fluctuations can strongly impact application latency and throughput, which are critical factors in this domain. Therefore, there is a significant amount of research on self-adaptive techniques involving elasticity or micro-batching as a way to mitigate this impact. However, there is a lack of benchmarks and tools for helping researchers to investigate micro-batching and data stream frequency implications. In this paper, we extend a benchmarking framework to support dynamic micro-batching and data stream frequency management. We used it to create custom benchmarks and compare latency and throughput aspects from two different parallel libraries. We validate our solution through an extensive analysis of the impact of micro-batching and data stream frequency on stream processing applications using Intel TBB and FastFlow, which are two libraries that leverage stream parallelism on multi-core architectures. Our results demonstrated up to 33% throughput gain over latency using micro-batches. Additionally, while TBB ensures lower latency, FastFlow ensures higher throughput in the parallel applications for different data stream frequency configurations.},
keywords = {parallel},
pubstate = {published},
tppubtype = {inproceedings}
}
Adriano Marques Garcia, Dalvan Griebler, Claudio Schepke, Luiz Gustavo Fernandes
SPBench: a framework for creating benchmarks of stream processing applications Journal Article
In: Computing, vol. 105, no. 5, pp. 1077-1099, 2022, ISBN: 1436-5057.
Abstract | Links | BibTeX | Tags: parallel
@article{GARCIA:Computing:22,
title = {SPBench: a framework for creating benchmarks of stream processing applications},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://iris.unito.it/retrieve/f17ea8c2-ddd8-425b-b4e7-8315218a6969/s00607-021-01025-6_preprint.pdf},
doi = {10.1007/s00607-021-01025-6},
isbn = {1436-5057},
year = {2022},
date = {2022-01-01},
journal = {Computing},
volume = {105},
number = {5},
pages = {1077-1099},
publisher = {Springer},
abstract = {In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream Parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application Programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures.},
keywords = {parallel},
pubstate = {published},
tppubtype = {article}
}
2021
Adriano Marques Garcia, Dalvan Griebler, Claudio Schepke, Luiz Gustavo Fernandes
Introducing a Stream Processing Framework for Assessing Parallel Programming Interfaces Proceedings Article
In: 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 84-88, IEEE, Valladolid, Spain, 2021.
Abstract | Links | BibTeX | Tags: parallel
@inproceedings{GARCIA:PDP:21,
title = {Introducing a Stream Processing Framework for Assessing Parallel Programming Interfaces},
author = {Adriano Marques Garcia and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes},
url = {https://iris.unito.it/retrieve/8aa73a3f-0b1f-41e4-9440-a87bbaf6e9c4/PDP_2021__Stream_bench_Framework_.pdf},
doi = {10.1109/PDP52278.2021.00021},
year = {2021},
date = {2021-03-01},
booktitle = {29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)},
pages = {84-88},
publisher = {IEEE},
address = {Valladolid, Spain},
series = {PDP'21},
abstract = {Stream Processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently computation. It can be done through Stream Parallelism, which is still a challenging task and most reserved for experts. We introduce a Stream Processing framework for assessing Parallel Programming Interfaces (PPIs). Our framework targets multi-core architectures and C++ stream processing applications, providing an API that abstracts the details of the stream operators of these applications. Therefore, users can easily identify all the basic operators and implement parallelism through different PPIs. In this paper, we present the proposed framework, implement three applications using its API, and show how it works, by using it to parallelize and evaluate the applications with the PPIs Intel TBB, FastFlow, and SPar. The performance results were consistent with the literature.},
keywords = {parallel},
pubstate = {published},
tppubtype = {inproceedings}
}
2020
Adriano Marques Garcia, Matheus Serpa, Dalvan Griebler, Claudio Schepke, Luiz Gustavo Fernandes, Philippe O. A. Navaux
The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures Proceedings Article
In: International Conference on Computational Science and its Applications (ICCSA), pp. 142-157, Springer, Cagliari, Italy, 2020.
Abstract | Links | BibTeX | Tags: parallel
@inproceedings{GARCIA:ICCSA:20,
title = {The Impact of CPU Frequency Scaling on Power Consumption of Computing Infrastructures},
author = {Adriano Marques Garcia and Matheus Serpa and Dalvan Griebler and Claudio Schepke and Luiz Gustavo Fernandes and Philippe O. A. Navaux},
url = {https://iris.unito.it/retrieve/3b8f3dc0-cd4d-4f36-801d-9e8c613ea2e8/ICCSA_Energy_governors_preprint.pdf},
doi = {10.1007/978-3-030-58817-5_12},
year = {2020},
date = {2020-07-01},
booktitle = {International Conference on Computational Science and its Applications (ICCSA)},
volume = {12254},
pages = {142-157},
publisher = {Springer},
address = {Cagliari, Italy},
series = {ICCSA'20},
abstract = {Since the demand for computing power increases, new architectures emerged to obtain better performance. Reducing the power and energy consumption of these architectures is one of the main challenges to achieving high-performance computing. Current research trends aim at developing new software and hardware techniques to achieve the best performance and energy trade-offs. In this work, we investigate the impact of different CPU frequency scaling techniques such as ondemand, performance, and powersave on the power and energy consumption of multi-core based computer infrastructure. We apply these techniques in PAMPAR, a parallel benchmark suite implemented in PThreads, OpenMP, MPI-1, and MPI-2 (spawn). We measure the energy and execution time of 10 benchmarks, varying the number of threads. Our results show that although powersave consumes up to 43.1% less power than performance and ondemand governors, it consumes the triple of energy due to the high execution time. Our experiments also show that the performance governor consumes up to 9.8% more energy than ondemand for CPU-bound benchmarks. Finally, our results show that PThreads has the lowest power consumption, consuming less than the sequential version for memory-bound benchmarks. Regarding performance, the performance governor achieved 3% of performance over the ondemand.},
keywords = {parallel},
pubstate = {published},
tppubtype = {inproceedings}
}
2019
Adriano Marques Garcia, Claudio Schepke, Alessandro Gonçalves Girardi
PAMPAR: A new parallel benchmark for performance and energy consumption evaluation Journal Article
In: Concurrency and Computation: Practice and Experience, vol. 32, no. 20, pp. 1-21, 2019.
Abstract | Links | BibTeX | Tags: parallel
@article{GARCIA:CCPE:19,
title = {PAMPAR: A new parallel benchmark for performance and energy consumption evaluation},
author = {Adriano Marques Garcia and Claudio Schepke and Alessandro Gonçalves Girardi},
url = {https://iris.unito.it/retrieve/d514c682-a567-4a02-93b7-9e27b6d3da03/Concurrency___Computation__Practice___Experience__Final_Version_.pdf},
doi = {10.1002/cpe.5504},
year = {2019},
date = {2019-10-01},
journal = {Concurrency and Computation: Practice and Experience},
volume = {32},
number = {20},
pages = {1-21},
abstract = {This paper presents PAMPAR, a new benchmark to evaluate the performance and energy consumption of different Parallel Programming Interfaces (PPIs). The benchmark is composed of 11 algorithms implemented in PThreads, OpenMP, MPI-1, and MPI-2 (spawn) PPIs. Previous studies have used some of these pseudo-applications to perform this type of evaluation in different architectures since there is no benchmark that offers this variety of PPIs and communication models. In this work, we measure the energy and performance of each pseudo-application in a single architecture, varying the number of threads/processes. We also organize the pseudo-applications according to their memory accesses, floating-point operations, and branches. The goal is to show that this set of pseudo-applications has enough features to build a parallel benchmark. The results show that there is no single best case that provides both better performance and low energy consumption in the presented scenarios. Moreover, the pseudo-applications usage of the system resources are different enough to represent different scenarios and be efficient as a benchmark.},
keywords = {parallel},
pubstate = {published},
tppubtype = {article}
}
Adriano Marques Garcia, Claudio Schepke, Alessandro Gonçalves Girardi, Sherlon Almeida Silva
A New Parallel Benchmark for Performance Evaluation and Energy Consumption Proceedings Article
In: High Performance Computing for Computational Science – VECPAR 2018, pp. 188-201, Springer International Publishing, Cham, 2019, ISBN: 978-3-030-15996-2.
Abstract | Links | BibTeX | Tags: parallel
@inproceedings{GARCIA:VECPAR:19,
title = {A New Parallel Benchmark for Performance Evaluation and Energy Consumption},
author = {Adriano Marques Garcia and Claudio Schepke and Alessandro Gonçalves Girardi and Sherlon Almeida Silva},
url = {https://iris.unito.it/retrieve/1272dea3-b1ea-4356-af0d-d180cef341b9/VECPAR_2018_paper_preprint.pdf},
doi = {10.1007/978-3-030-15996-2_14},
isbn = {978-3-030-15996-2},
year = {2019},
date = {2019-03-01},
booktitle = {High Performance Computing for Computational Science – VECPAR 2018},
pages = {188-201},
publisher = {Springer International Publishing},
address = {Cham},
abstract = {This paper presents a new benchmark to evaluate performance and energy consumption of different Parallel Programming Interfaces (PPIs). The benchmark is composed of 11 algorithms implemented in PThreads, OpenMP, MPI-1 and MPI-2 (spawn) PPIs. Previous studies have used some of these applications to perform this type of evaluation in different architectures, since there is no benchmark that offers this variety of PPIs and communication models. In this work we measure the energy and performance of each application in a single architecture, varying the number of threads/processes. The goal is to show that this set of applications has enough features to form a parallel benchmark. The results show that there is no single best case that provides both better performance and low energy consumption in the presented scenarios. However, PThreads and OpenMP achieve the best trade-offs between performance and energy in most cases.},
keywords = {parallel},
pubstate = {published},
tppubtype = {inproceedings}
}
2018
Adriano Marques Garcia, Claudio Schepke, Alessandro Gonçalves Girardi, Sherlon Almeida Silva
Power Consumption of Parallel Programming Interfaces in Multicore Architectures: A Case Study Proceedings Article
In: 2018 Symposium on High Performance Computing Systems (WSCAD), pp. 77-83, 2018.
Abstract | Links | BibTeX | Tags: parallel
@inproceedings{GARCIA:WSCAD:18,
title = {Power Consumption of Parallel Programming Interfaces in Multicore Architectures: A Case Study},
author = {Adriano Marques Garcia and Claudio Schepke and Alessandro Gonçalves Girardi and Sherlon Almeida Silva},
url = {https://iris.unito.it/retrieve/cab823a1-a6f7-483f-929a-607a166e0e78/A_Case_Study___Adriano___IEEE.pdf},
doi = {10.1109/WSCAD.2018.00021},
year = {2018},
date = {2018-10-01},
booktitle = {2018 Symposium on High Performance Computing Systems (WSCAD)},
pages = {77-83},
abstract = {This paper presents a case study on the power consumption of different Parallel Programming Interfaces (PPIs) in multicore architectures. The study is based on the PAMPAR benchmark, which is composed of 11 algorithms implemented in PThreads, OpenMP, MPI-1 and MPI-2 (spawn) PPIs. The results show that there is no single best case that provides both better performance and low power consumption in the presented scenarios. However, PThreads and OpenMP achieve the best trade-offs between performance and power in most cases.},
keywords = {parallel},
pubstate = {published},
tppubtype = {inproceedings}
}