
Ph.D. student in Modeling and Data Science, University of Turin
Parallel Computing group
Via Pessinetto 12, 10149 Torino – Italy
Email: bruno.casella@unito.it
Short Bio
Bruno Casella is a PhD student in Modeling and Data Science at UniTO, financed by Leonardo Company.
He graduated in Computer Engineering in 2020 with a thesis on the performances of AlphaZero in different scenarios.
He also received the Master’s Degree in Data Science for management in 2021 with a thesis on Federated Transfer Learning.
Fields of interest
- Federated Learning
- Deep learning
- High Performance Computing
Publications
2023
- M. Pennisi, F. Proietto Salanitri, G. Bellitto, B. Casella, M. Aldinucci, S. Palazzo, and C. Spampinato, “FedER: Federated Learning through Experience Replay and Privacy-Preserving Data Synthesis,” Computer Vision and Image Understanding, 2023. doi:doi.org/10.1016/j.cviu.2023.103882
[BibTeX] [Abstract] [Download PDF]
In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data. However, recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis. Federated learning (FL) aims at sidestepping this limitation by bringing AI-based solutions to data owners and only sharing local AI models, or parts thereof, that need then to be aggregated. However, most of the existing federated learning solutions are still at their infancy and show several shortcomings, from the lack of a reliable and effective aggregation scheme able to retain the knowledge learned locally to weak privacy preservation as real data may be reconstructed from model updates. Furthermore, the majority of these approaches, especially those dealing with medical data, relies on a centralized distributed learning strategy that poses robustness, scalability and trust issues. In this paper we present a federated and decentralized learning strategy, FedER, that, exploiting experience replay and generative adversarial concepts, effectively integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy. FedER is tested on two tasks — tuberculosis and melanoma classification — using multiple datasets in order to simulate realistic non-i.i.d. medical data scenarios. Results show that our approach achieves performance comparable to standard (non-federated) learning and significantly outperforms state-of-the-art federated methods in their centralized (thus, more favourable) formulation. Code is available at https://github.com/perceivelab/FedER
@article{23:casella:FedER, author = {Pennisi, Matteo and Proietto Salanitri, Federica and Bellitto, Giovanni and Casella, Bruno and Aldinucci, Marco and Palazzo, Simone and Spampinato, Concetto}, journal = {Computer Vision and Image Understanding}, doi = {doi.org/10.1016/j.cviu.2023.103882}, institution = {Computer Science Department, University of Torino}, note = {https://www.sciencedirect.com/science/article/pii/S107731422300262X?via%3Dihub}, title = {FedER: Federated Learning through Experience Replay and Privacy-Preserving Data Synthesis}, url = {https://www.sciencedirect.com/science/article/pii/S107731422300262X?via%3Dihub}, year = 2023, abstract = {In the medical field, multi-center collaborations are often sought to yield more generalizable findings by leveraging the heterogeneity of patient and clinical data. However, recent privacy regulations hinder the possibility to share data, and consequently, to come up with machine learning-based solutions that support diagnosis and prognosis. Federated learning (FL) aims at sidestepping this limitation by bringing AI-based solutions to data owners and only sharing local AI models, or parts thereof, that need then to be aggregated. However, most of the existing federated learning solutions are still at their infancy and show several shortcomings, from the lack of a reliable and effective aggregation scheme able to retain the knowledge learned locally to weak privacy preservation as real data may be reconstructed from model updates. Furthermore, the majority of these approaches, especially those dealing with medical data, relies on a centralized distributed learning strategy that poses robustness, scalability and trust issues. In this paper we present a federated and decentralized learning strategy, FedER, that, exploiting experience replay and generative adversarial concepts, effectively integrates features from local nodes, providing models able to generalize across multiple datasets while maintaining privacy. FedER is tested on two tasks — tuberculosis and melanoma classification — using multiple datasets in order to simulate realistic non-i.i.d. medical data scenarios. Results show that our approach achieves performance comparable to standard (non-federated) learning and significantly outperforms state-of-the-art federated methods in their centralized (thus, more favourable) formulation. Code is available at https://github.com/perceivelab/FedER} }
- B. Casella, W. Riviera, M. Aldinucci, and G. Menegaz, “MERGE: A model for multi-input biomedical federated learning,” Patterns, p. 100856, 2023. doi:10.1016/j.patter.2023.100856
[BibTeX] [Abstract] [Download PDF]
Summary Driven by the deep learning (DL) revolution, artificial intelligence (AI) has become a fundamental tool for many biomedical tasks, including analyzing and classifying diagnostic images. Imaging, however, is not the only source of information. Tabular data, such as personal and genomic data and blood test results, are routinely collected but rarely considered in DL pipelines. Nevertheless, DL requires large datasets that often must be pooled from different institutions, raising non-trivial privacy concerns. Federated learning (FL) is a cooperative learning paradigm that aims to address these issues by moving models instead of data across different institutions. Here, we present a federated multi-input architecture using images and tabular data as a methodology to enhance model performance while preserving data privacy. We evaluated it on two showcases: the prognosis of COVID-19 and patients’ stratification in Alzheimer’s disease, providing evidence of enhanced accuracy and F1 scores against single-input models and improved generalizability against non-federated models.
@article{23:fl:patterns, title = {MERGE: A model for multi-input biomedical federated learning}, journal = {Patterns}, pages = {100856}, year = {2023}, issn = {2666-3899}, doi = {10.1016/j.patter.2023.100856}, url = {https://www.sciencedirect.com/science/article/pii/S2666389923002404}, author = {Bruno Casella and Walter Riviera and Marco Aldinucci and Gloria Menegaz}, keywords = {icsc, epi}, abstract = {Summary Driven by the deep learning (DL) revolution, artificial intelligence (AI) has become a fundamental tool for many biomedical tasks, including analyzing and classifying diagnostic images. Imaging, however, is not the only source of information. Tabular data, such as personal and genomic data and blood test results, are routinely collected but rarely considered in DL pipelines. Nevertheless, DL requires large datasets that often must be pooled from different institutions, raising non-trivial privacy concerns. Federated learning (FL) is a cooperative learning paradigm that aims to address these issues by moving models instead of data across different institutions. Here, we present a federated multi-input architecture using images and tabular data as a methodology to enhance model performance while preserving data privacy. We evaluated it on two showcases: the prognosis of COVID-19 and patients’ stratification in Alzheimer’s disease, providing evidence of enhanced accuracy and F1 scores against single-input models and improved generalizability against non-federated models.} }
- O. de Filippo, F. Bruno, T. H. Pinxterhuis, M. G{k a}sior, L. Perl, L. Gaido, D. Tuttolomondo, A. Greco, R. Verardi, G. Lo Martire, M. Iannaccone, A. Leone, G. Liccardo, S. Caglioni, R. González Ferreiro, G. Rodinò, G. Musumeci, G. Patti, I. Borzillo, G. Tarantini, W. Wańha, B. Casella, E. H. Ploumen, {. Pyka, R. Kornowski, A. Gagnor, R. Piccolo, S. R. Roubin, D. Capodanno, P. Zocca, F. Conrotto, G. M. De Ferrari, C. von Birgelen, and F. D’Ascenzo, “Predictors of target lesion failure after treatment of left main, bifurcation, or chronic total occlusion lesions with ultrathin-strut drug-eluting coronary stents in the ULTRA registry,” Catheterization and Cardiovascular Interventions, 2023. doi:10.1002/ccd.30696
[BibTeX] [Abstract] [Download PDF]
Background: Data about the long-term performance of new-generation ultrathin-strut drug-eluting stents (DES) in challenging coronary lesions, such as left main (LM), bifurcation, and chronic total occlusion (CTO) lesions are scant. Methods: The international multicenter retrospective observational ULTRA study included consecutive patients treated from September 2016 to August 2021 with ultrathin-strut (<70µm) DES in challenging de novo lesions. Primary endpoint was target lesion failure (TLF): composite of cardiac death, target-lesion revascularization (TLR), target-vessel myocardial infarction (TVMI), or definite stent thrombosis (ST). Secondary endpoints included all-cause death, acute myocardial infarction (AMI), target vessel revascularization, and TLF components. TLF predictors were assessed with Cox multivariable analysis. Results: Of 1801 patients (age: 66.6$\pm$11.2 years; male: 1410 [78.3\%]), 170 (9.4\%) experienced TLF during follow-up of 3.1$\pm$1.4 years. In patients with LM, CTO, and bifurcation lesions, TLF rates were 13.5\%, 9.9\%, and 8.9\%, respectively. Overall, 160 (8.9\%) patients died (74 [4.1\%] from cardiac causes). AMI and TVMI rates were 6.0\% and 3.2\%, respectively. ST occurred in 11 (1.1\%) patients while 77 (4.3\%) underwent TLR. Multivariable analysis identified the following predictors of TLF: age, STEMI with cardiogenic shock, impaired left ventricular ejection fraction, diabetes, and renal dysfunction. Among the procedural variables, total stent length increased TLF risk (HR: 1.01, 95\% CI: 1-1.02 per mm increase), while intracoronary imaging reduced the risk substantially (HR: 0.35, 95\% CI: 0.12-0.82). Conclusions: Ultrathin-strut DES showed high efficacy and satisfactory safety, even in patients with challenging coronary lesions. Yet, despite using contemporary gold-standard DES, the association persisted between established patient- and procedure-related features of risk and impaired 3-year clinical outcome.
@article{23:casella:ultra, abstract = {Background: Data about the long-term performance of new-generation ultrathin-strut drug-eluting stents (DES) in challenging coronary lesions, such as left main (LM), bifurcation, and chronic total occlusion (CTO) lesions are scant. Methods: The international multicenter retrospective observational ULTRA study included consecutive patients treated from September 2016 to August 2021 with ultrathin-strut (<70µm) DES in challenging de novo lesions. Primary endpoint was target lesion failure (TLF): composite of cardiac death, target-lesion revascularization (TLR), target-vessel myocardial infarction (TVMI), or definite stent thrombosis (ST). Secondary endpoints included all-cause death, acute myocardial infarction (AMI), target vessel revascularization, and TLF components. TLF predictors were assessed with Cox multivariable analysis. Results: Of 1801 patients (age: 66.6$\pm$11.2 years; male: 1410 [78.3\%]), 170 (9.4\%) experienced TLF during follow-up of 3.1$\pm$1.4 years. In patients with LM, CTO, and bifurcation lesions, TLF rates were 13.5\%, 9.9\%, and 8.9\%, respectively. Overall, 160 (8.9\%) patients died (74 [4.1\%] from cardiac causes). AMI and TVMI rates were 6.0\% and 3.2\%, respectively. ST occurred in 11 (1.1\%) patients while 77 (4.3\%) underwent TLR. Multivariable analysis identified the following predictors of TLF: age, STEMI with cardiogenic shock, impaired left ventricular ejection fraction, diabetes, and renal dysfunction. Among the procedural variables, total stent length increased TLF risk (HR: 1.01, 95\% CI: 1-1.02 per mm increase), while intracoronary imaging reduced the risk substantially (HR: 0.35, 95\% CI: 0.12-0.82). Conclusions: Ultrathin-strut DES showed high efficacy and satisfactory safety, even in patients with challenging coronary lesions. Yet, despite using contemporary gold-standard DES, the association persisted between established patient- and procedure-related features of risk and impaired 3-year clinical outcome.}, author = {de Filippo, Ovidio and Bruno, Francesco and Pinxterhuis, Tineke H. and G{\k a}sior, Mariusz and Perl, Leor and Gaido, Luca and Tuttolomondo, Domenico and Greco, Antonio and Verardi, Roberto and Lo Martire, Gianluca and Iannaccone, Mario and Leone, Attilio and Liccardo, Gaetano and Caglioni, Serena and Gonz{\'a}lez Ferreiro, Rocio and Rodin{\`o}, Giulio and Musumeci, Giuseppe and Patti, Giuseppe and Borzillo, Irene and Tarantini, Giuseppe and Wa{\'n}ha, Wojciech and Casella, Bruno and Ploumen, Eline H and Pyka, {\L}ukasz and Kornowski, Ran and Gagnor, Andrea and Piccolo, Raffaele and Roubin, Sergio Raposeiras and Capodanno, Davide and Zocca, Paolo and Conrotto, Federico and De Ferrari, Gaetano M and von Birgelen, Clemens and D'Ascenzo, Fabrizio}, doi = {10.1002/ccd.30696}, journal = {Catheterization and Cardiovascular Interventions}, title = {Predictors of target lesion failure after treatment of left main, bifurcation, or chronic total occlusion lesions with ultrathin-strut drug-eluting coronary stents in the ULTRA registry}, url = {https://onlinelibrary.wiley.com/doi/full/10.1002/ccd.30696}, year = {2023}, bdsk-url-1 = {https://onlinelibrary.wiley.com/doi/full/10.1002/ccd.30696}, bdsk-url-2 = {https://doi.org/10.1002/ccd.30696} }
- M. Pennisi, F. Proietto Salanitri, G. Bellitto, B. Casella, M. Aldinucci, S. Palazzo, and C. Spampinato, "Experience Replay as an Effective Strategy for Optimizing Decentralized Federated Learning." 2023. doi:TO DO
[BibTeX] [Abstract] [Download PDF]
Federated and continual learning are training paradigms addressing data distribution shift in space and time. More specifically, federated learning tackles non-i.i.d data in space as information is distributed in multiple nodes, while continual learning faces with temporal aspect of training as it deals with continuous streams of data. Distribution shifts over space and time is what it happens in real federated learning scenarios that show multiple challenges. First, the federated model needs to learn sequentially while retaining knowledge from the past training rounds. Second, the model has also to deal with concept drift from the distributed data distributions. To address these complexities, we attempt to combine continual and federated learning strategies by proposing a solution inspired by experience replay and generative adversarial concepts for supporting decentralized distributed training. In particular, our approach relies on using limited memory buffers of synthetic privacy-preserving samples and interleaving training on local data and on buffer data. By translating the CL formulation into the task of integrating distributed knowledge with local knowledge, our method enables models to effectively integrate learned representation from local nodes, providing models the capability to generalize across multiple datasets. We test our integrated strategy on two realistic medical image analysis tasks — tuberculosis and melanoma classification — using multiple datasets in order to simulate realistic non-i.i.d. medical data scenarios. Results show that our approach achieves performance comparable to standard (non-federated) learning and significantly outperforms state-of-the-art federated methods in their centralized (thus, more favourable) formulation.
@inproceedings{23:casella:continualFL, author = {Pennisi, Matteo and Proietto Salanitri, Federica and Bellitto, Giovanni and Casella, Bruno and Aldinucci, Marco and Palazzo, Simone and Spampinato, Concetto}, doi = {TO DO}, institution = {Computer Science Department, University of Torino}, note = {https://openaccess.thecvf.com/content/ICCV2023W/VCL/papers/Pennisi_Experience_Replay_as_an_Effective_Strategy_for_Optimizing_Decentralized_Federated_ICCVW_2023_paper.pdf}, title = {Experience Replay as an Effective Strategy for Optimizing Decentralized Federated Learning}, url = {https://openaccess.thecvf.com/content/ICCV2023W/VCL/papers/Pennisi_Experience_Replay_as_an_Effective_Strategy_for_Optimizing_Decentralized_Federated_ICCVW_2023_paper.pdf}, year = 2023, abstract = {Federated and continual learning are training paradigms addressing data distribution shift in space and time. More specifically, federated learning tackles non-i.i.d data in space as information is distributed in multiple nodes, while continual learning faces with temporal aspect of training as it deals with continuous streams of data. Distribution shifts over space and time is what it happens in real federated learning scenarios that show multiple challenges. First, the federated model needs to learn sequentially while retaining knowledge from the past training rounds. Second, the model has also to deal with concept drift from the distributed data distributions. To address these complexities, we attempt to combine continual and federated learning strategies by proposing a solution inspired by experience replay and generative adversarial concepts for supporting decentralized distributed training. In particular, our approach relies on using limited memory buffers of synthetic privacy-preserving samples and interleaving training on local data and on buffer data. By translating the CL formulation into the task of integrating distributed knowledge with local knowledge, our method enables models to effectively integrate learned representation from local nodes, providing models the capability to generalize across multiple datasets. We test our integrated strategy on two realistic medical image analysis tasks — tuberculosis and melanoma classification — using multiple datasets in order to simulate realistic non-i.i.d. medical data scenarios. Results show that our approach achieves performance comparable to standard (non-federated) learning and significantly outperforms state-of-the-art federated methods in their centralized (thus, more favourable) formulation.} }
- B. Casella and L. Paletto, "Predicting Cryptocurrencies Market Phases through On-Chain Data Long-Term Forecasting," , Dubai, United Arab Emirates, 2023.
[BibTeX] [Abstract] [Download PDF]
Blockchain, the underlying technology of Bitcoin and several other cryptocurrencies, like Ethereum, produces a massive amount of open-access data that can be analyzed, providing important information about the network's activity and its respective token. The on-chain data have extensively been used as input to Machine Learning algorithms for predicting cryptocurrencies' future prices; however, there is a lack of study in predicting the future behaviour of on-chain data. This study aims to show how on-chain data can be used to detect cryptocurrency market regimes, like minimum and maximum, bear and bull market phases, and how forecasting these data can provide an optimal asset allocation for long-term investors.
@inproceedings{23:casella:blockchain, abstract = {Blockchain, the underlying technology of Bitcoin and several other cryptocurrencies, like Ethereum, produces a massive amount of open-access data that can be analyzed, providing important information about the network's activity and its respective token. The on-chain data have extensively been used as input to Machine Learning algorithms for predicting cryptocurrencies' future prices; however, there is a lack of study in predicting the future behaviour of on-chain data. This study aims to show how on-chain data can be used to detect cryptocurrency market regimes, like minimum and maximum, bear and bull market phases, and how forecasting these data can provide an optimal asset allocation for long-term investors.}, address = {Dubai, United Arab Emirates}, author = {Casella, Bruno and Paletto, Lorenzo}, institution = {Computer Science Department, University of Torino}, keywords = {epi, icsc}, note = {https://iris.unito.it/retrieve/2e845fe4-8562-4898-bb97-184675c455c0/6.%20ICBC23%20-%20PREDICTING%20BTC.pdf}, publisher = {{IEEE}}, title = {Predicting Cryptocurrencies Market Phases through On-Chain Data Long-Term Forecasting}, url = {https://iris.unito.it/retrieve/2e845fe4-8562-4898-bb97-184675c455c0/6.%20ICBC23%20-%20PREDICTING%20BTC.pdf}, year = {2023}, bdsk-url-1 = {https://iris.unito.it/retrieve/2e845fe4-8562-4898-bb97-184675c455c0/6.%20ICBC23%20-%20PREDICTING%20BTC.pdf} }
- B. Casella, R. Esposito, A. Sciarappa, C. Cavazzoni, and M. Aldinucci, "Experimenting with Normalization Layers in Federated Learning on non-IID scenarios," Computer Science Department, University of Torino 2023.
[BibTeX] [Abstract] [Download PDF]
Training Deep Learning (DL) models require large, high-quality datasets, often assembled with data from different institutions. Federated Learning (FL) has been emerging as a method for privacy-preserving pooling of datasets employing collaborative training from different institutions by iteratively globally aggregating locally trained models. One critical performance challenge of FL is operating on datasets not independently and identically distributed (non-IID) among the federation participants. Even though this fragility cannot be eliminated, it can be debunked by a suitable optimization of two hyperparameters: layer normalization methods and collaboration frequency selection. In this work, we benchmark five different normalization layers for training Neural Networks (NNs), two families of non-IID data skew, and two datasets. Results show that Batch Normalization, widely employed for centralized DL, is not the best choice for FL, whereas Group and Layer Normalization consistently outperform Batch Normalization. Similarly, frequent model aggregation decreases convergence speed and mode quality.
@techreport{23:casella:normalization, abstract = {Training Deep Learning (DL) models require large, high-quality datasets, often assembled with data from different institutions. Federated Learning (FL) has been emerging as a method for privacy-preserving pooling of datasets employing collaborative training from different institutions by iteratively globally aggregating locally trained models. One critical performance challenge of FL is operating on datasets not independently and identically distributed (non-IID) among the federation participants. Even though this fragility cannot be eliminated, it can be debunked by a suitable optimization of two hyperparameters: layer normalization methods and collaboration frequency selection. In this work, we benchmark five different normalization layers for training Neural Networks (NNs), two families of non-IID data skew, and two datasets. Results show that Batch Normalization, widely employed for centralized DL, is not the best choice for FL, whereas Group and Layer Normalization consistently outperform Batch Normalization. Similarly, frequent model aggregation decreases convergence speed and mode quality.}, author = {Casella, Bruno and Esposito, Roberto and Sciarappa, Antonio and Cavazzoni, Carlo and Aldinucci, Marco}, institution = {Computer Science Department, University of Torino}, keywords = {epi, icsc}, title = {Experimenting with Normalization Layers in Federated Learning on non-IID scenarios}, url = {https://arxiv.org/pdf/2303.10630.pdf}, year = {2023}, bdsk-url-1 = {https://arxiv.org/pdf/2303.10630.pdf} }
- I. Colonnelli, B. Casella, G. Mittone, Y. Arfat, B. Cantalupo, R. Esposito, A. R. Martinelli, D. Medić, and M. Aldinucci, "Federated Learning meets HPC and cloud," in Astrophysics and Space Science Proceedings, Catania, Italy, 2023, p. 193–199. doi:10.1007/978-3-031-34167-0_39
[BibTeX] [Abstract] [Download PDF]
HPC and AI are fated to meet for several reasons. This article will discuss some of them and argue why this will happen through the set of methods and technologies that underpin cloud computing. As a paradigmatic example, we present a new federated learning system that collaboratively trains a deep learning model in different supercomputing centers. The system is based on the StreamFlow workflow manager designed for hybrid cloud-HPC infrastructures.
@inproceedings{22:ml4astro, abstract = {HPC and AI are fated to meet for several reasons. This article will discuss some of them and argue why this will happen through the set of methods and technologies that underpin cloud computing. As a paradigmatic example, we present a new federated learning system that collaboratively trains a deep learning model in different supercomputing centers. The system is based on the StreamFlow workflow manager designed for hybrid cloud-HPC infrastructures.}, address = {Catania, Italy}, author = {Iacopo Colonnelli and Bruno Casella and Gianluca Mittone and Yasir Arfat and Barbara Cantalupo and Roberto Esposito and Alberto Riccardo Martinelli and Doriana Medi\'{c} and Marco Aldinucci}, booktitle = {Astrophysics and Space Science Proceedings}, volume = {60}, pages = {193--199}, editor = {Bufano, Filomena and Riggi, Simone and Sciacca, Eva and Schilliro, Francesco}, keywords = {across, eupilot, streamflow}, publisher = {Springer}, title = {Federated Learning meets {HPC} and cloud}, url = {https://iris.unito.it/retrieve/5631da1c-96a0-48c0-a48e-2cdf6b84841d/main.pdf}, year = {2023}, doi = {10.1007/978-3-031-34167-0_39}, isbn = {978-3-031-34167-0}, bdsk-url-1 = {https://iris.unito.it/retrieve/5631da1c-96a0-48c0-a48e-2cdf6b84841d/main.pdf} }
2022
- B. Casella, R. Esposito, C. Cavazzoni, and M. Aldinucci, "Benchmarking FedAvg and FedCurv for Image Classification Tasks," in Proceedings of the 1st Italian Conference on Big Data and Data Science, ITADATA 2022, September 20-21, 2022, 2022.
[BibTeX] [Abstract] [Download PDF]
Classic Machine Learning (ML) techniques require training on data available in a single data lake (either centralized or distributed). However, aggregating data from different owners is not always convenient for different reasons, including security, privacy and secrecy. Data carry a value that might vanish when shared with others; the ability to avoid sharing the data enables industrial applications where security and privacy are of paramount importance, making it possible to train global models by implementing only local policies which can be run independently and even on air-gapped data centres. Federated Learning (FL) is a distributed machine learning approach which has emerged as an effective way to address privacy concerns by only sharing local AI models while keeping the data decentralized. Two critical challenges of Federated Learning are managing the heterogeneous systems in the same federated network and dealing with real data, which are often not independently and identically distributed (non-IID) among the clients. In this paper, we focus on the second problem, i.e., the problem of statistical heterogeneity of the data in the same federated network. In this setting, local models might be strayed far from the local optimum of the complete dataset, thus possibly hindering the convergence of the federated model. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv), aiming at tackling the non-IID setting, have already been proposed. This work provides an empirical assessment of the behaviour of FedAvg and FedCurv in common non-IID scenarios. Results show that the number of epochs per round is an important hyper-parameter that, when tuned appropriately, can lead to significant performance gains while reducing the communication cost. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
@inproceedings{casella2022benchmarking, author = {Bruno Casella and Roberto Esposito and Carlo Cavazzoni and Marco Aldinucci}, booktitle = {Proceedings of the 1st Italian Conference on Big Data and Data Science, {ITADATA} 2022, September 20-21, 2022}, editor = {Marco Anisetti and Angela Bonifati and Nicola Bena and Claudio Ardagna and Donato Malerba}, keywords = {eupilot}, publisher = {CEUR-WS.org}, series = {{CEUR} Workshop Proceedings}, title = {Benchmarking FedAvg and FedCurv for Image Classification Tasks}, url = {https://ceur-ws.org/Vol-3340/paper40.pdf}, volume = {3340}, year = {2022}, abstract = {Classic Machine Learning (ML) techniques require training on data available in a single data lake (either centralized or distributed). However, aggregating data from different owners is not always convenient for different reasons, including security, privacy and secrecy. Data carry a value that might vanish when shared with others; the ability to avoid sharing the data enables industrial applications where security and privacy are of paramount importance, making it possible to train global models by implementing only local policies which can be run independently and even on air-gapped data centres. Federated Learning (FL) is a distributed machine learning approach which has emerged as an effective way to address privacy concerns by only sharing local AI models while keeping the data decentralized. Two critical challenges of Federated Learning are managing the heterogeneous systems in the same federated network and dealing with real data, which are often not independently and identically distributed (non-IID) among the clients. In this paper, we focus on the second problem, i.e., the problem of statistical heterogeneity of the data in the same federated network. In this setting, local models might be strayed far from the local optimum of the complete dataset, thus possibly hindering the convergence of the federated model. Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv), aiming at tackling the non-IID setting, have already been proposed. This work provides an empirical assessment of the behaviour of FedAvg and FedCurv in common non-IID scenarios. Results show that the number of epochs per round is an important hyper-parameter that, when tuned appropriately, can lead to significant performance gains while reducing the communication cost. As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.}, bdsk-url-1 = {https://iris.unito.it/bitstream/2318/1870961/1/Benchmarking_FedAvg_and_FedCurv_for_Image_Classification_Tasks.pdf} }
- B. Casella, A. Chisari, S. Battiato, and M. Giuffrida., "Transfer Learning via Test-time Neural Networks Aggregation," in Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2022, Volume 5: VISAPP, Online Streaming, February 6-8, 2022, 2022, pp. 642-649. doi:10.5220/0010907900003124
[BibTeX] [Abstract] [Download PDF]
It has been demonstrated that deep neural networks outperform traditional machine learning. However, deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution due to the domain shift. In order to tackle this known issue, several transfer learning approaches have been proposed, where the knowledge of a trained model is transferred into another to improve performance with different data. However, most of these approaches require additional training steps, or they suffer from catastrophic forgetting that occurs when a trained model has overwritten previously learnt knowledge. We address both problems with a novel transfer learning approach that uses network aggregation. We train dataset-specific networks together with an aggregation network in a unified framework. The loss function includes two main components: a task-specific loss (such as cross-entropy) and an aggregation loss. The proposed aggregation loss allows our model to learn how trained deep network parameters can be aggregated with an aggregation operator. We demonstrate that the proposed approach learns model aggregation at test time without any further training step, reducing the burden of transfer learning to a simple arithmetical operation. The proposed approach achieves comparable performance w.r.t. the baseline. Besides, if the aggregation operator has an inverse, we will show that our model also inherently allows for selective forgetting, i.e., the aggregated model can forget one of the datasets it was trained on, retaining information on the others.
@inproceedings{22:VISAPP:transferlearning, abstract = {It has been demonstrated that deep neural networks outperform traditional machine learning. However, deep networks lack generalisability, that is, they will not perform as good as in a new (testing) set drawn from a different distribution due to the domain shift. In order to tackle this known issue, several transfer learning approaches have been proposed, where the knowledge of a trained model is transferred into another to improve performance with different data. However, most of these approaches require additional training steps, or they suffer from catastrophic forgetting that occurs when a trained model has overwritten previously learnt knowledge. We address both problems with a novel transfer learning approach that uses network aggregation. We train dataset-specific networks together with an aggregation network in a unified framework. The loss function includes two main components: a task-specific loss (such as cross-entropy) and an aggregation loss. The proposed aggregation loss allows our model to learn how trained deep network parameters can be aggregated with an aggregation operator. We demonstrate that the proposed approach learns model aggregation at test time without any further training step, reducing the burden of transfer learning to a simple arithmetical operation. The proposed approach achieves comparable performance w.r.t. the baseline. Besides, if the aggregation operator has an inverse, we will show that our model also inherently allows for selective forgetting, i.e., the aggregated model can forget one of the datasets it was trained on, retaining information on the others. }, author = {Bruno Casella and Alessio Chisari and Sebastiano Battiato and Mario Giuffrida.}, booktitle = {Proceedings of the 17th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, {VISIGRAPP} 2022, Volume 5: VISAPP, Online Streaming, February 6-8, 2022}, doi = {10.5220/0010907900003124}, editor = {Giovanni Maria Farinella and Petia Radeva and Kadi Bouatouch}, isbn = {978-989-758-555-5}, organization = {INSTICC}, pages = {642-649}, publisher = {SciTePress}, title = {Transfer Learning via Test-time Neural Networks Aggregation}, url = {https://iris.unito.it/retrieve/handle/2318/1844159/947123/TRANSFER_LEARNING_VIA_TEST_TIME_NEURAL_NETWORKS_AGGREGATION.pdf}, year = {2022}, bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1844159/947123/TRANSFER_LEARNING_VIA_TEST_TIME_NEURAL_NETWORKS_AGGREGATION.pdf}, bdsk-url-2 = {https://doi.org/10.5220/0010907900003124} }