Iacopo Colonnelli

PhD Candidate
Alpha Research Group (Parallel Computing)
University of Turin, Computer Science Dept.
Via Pessinetto 12, 10149 Torino – Italy
Email: iacopo.colonnelli@unito.it

Short Bio

Iacopo Colonnelli is a Ph.D. student in Modeling and Data Science at Università di Torino. He received his master’s degree in Computer Engineering from Politecnico di Torino with a thesis on a high-performance parallel tracking algorithm for the ALICE experiment at CERN.

His research focuses on both statistical and computational aspects of data analysis at large scale and on workflow modeling and management in heterogeneous distributed architectures.

Publications

2021

  • I. Colonnelli, B. Cantalupo, R. Esposito, M. Pennisi, C. Spampinato, and M. Aldinucci, “HPC Application Cloudification: The StreamFlow Toolkit,” in 12th workshop on parallel programming and run-time management techniques for many-core architectures and 10th workshop on design tools and architectures for multicore embedded computing platforms (parma-ditam 2021), Dagstuhl, Germany, 2021, p. 5:1–5:13. doi:10.4230/OASIcs.PARMA-DITAM.2021.5
    [BibTeX] [Abstract] [Download PDF]

    Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud infrastructure can be offloaded to HPC environments to speed them up. We introduce StreamFlow, a novel Workflow Management System that supports such a design pattern and makes it possible to run the steps of a standard workflow model on independent processing elements with no shared storage. We validated the proposed approach’s effectiveness on the CLAIRE COVID-19 universal pipeline, i.e. a reproducible workflow capable of automating the comparison of (possibly all) state-of-the-art pipelines for the diagnosis of COVID-19 interstitial pneumonia from CT scans images based on Deep Neural Networks (DNNs).

    @inproceedings{colonnelli_et_al:OASIcs.PARMA-DITAM.2021.5,
    abstract = {Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud Finding an effective way to improve accessibility to High-Performance Computing facilities, still anchored to SSH-based remote shells and queue-based job submission mechanisms, is an open problem in computer science. This work advocates a cloudification of HPC applications through a cluster-as-accelerator pattern, where computationally demanding portions of the main execution flow hosted on a Cloud infrastructure can be offloaded to HPC environments to speed them up. We introduce StreamFlow, a novel Workflow Management System that supports such a design pattern and makes it possible to run the steps of a standard workflow model on independent processing elements with no shared storage. We validated the proposed approach's effectiveness on the CLAIRE COVID-19 universal pipeline, i.e. a reproducible workflow capable of automating the comparison of (possibly all) state-of-the-art pipelines for the diagnosis of COVID-19 interstitial pneumonia from CT scans images based on Deep Neural Networks (DNNs).},
    address = {Dagstuhl, Germany},
    annote = {Keywords: cloud computing, distributed computing, high-performance computing, streamflow, workflow management systems},
    author = {Colonnelli, Iacopo and Cantalupo, Barbara and Esposito, Roberto and Pennisi, Matteo and Spampinato, Concetto and Aldinucci, Marco},
    booktitle = {12th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and 10th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms (PARMA-DITAM 2021)},
    doi = {10.4230/OASIcs.PARMA-DITAM.2021.5},
    editor = {Bispo, Jo\~{a}o and Cherubin, Stefano and Flich, Jos\'{e}},
    isbn = {978-3-95977-181-8},
    issn = {2190-6807},
    keywords = {deephealth, hpc4ai},
    pages = {5:1--5:13},
    publisher = {Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
    series = {Open Access Series in Informatics (OASIcs)},
    title = {{HPC Application Cloudification: The StreamFlow Toolkit}},
    url = {https://drops.dagstuhl.de/opus/volltexte/2021/13641/pdf/OASIcs-PARMA-DITAM-2021-5.pdf},
    urn = {urn:nbn:de:0030-drops-136419},
    volume = {88},
    year = {2021},
    bdsk-url-1 = {https://drops.dagstuhl.de/opus/volltexte/2021/13641/pdf/OASIcs-PARMA-DITAM-2021-5.pdf},
    bdsk-url-2 = {https://doi.org/10.4230/OASIcs.PARMA-DITAM.2021.5}
    }

  • F. D’Ascenzo, O. De Filippo, G. Gallone, G. Mittone, M. A. Deriu, M. Iannaccone, A. Ariza-Solé, C. Liebetrau, S. Manzano-Fernández, G. Quadri, T. Kinnaird, G. Campo, J. P. Simao Henriques, J. M. Hughes, A. Dominguez-Rodriguez, M. Aldinucci, U. Morbiducci, G. Patti, S. Raposeiras-Roubin, E. Abu-Assi, G. M. De Ferrari, F. Piroli, A. Saglietto, F. Conrotto, P. Omedé, A. Montefusco, M. Pennone, F. Bruno, P. P. Bocchino, G. Boccuzzi, E. Cerrato, F. Varbella, M. Sperti, S. B. Wilton, L. Velicki, I. Xanthopoulou, A. Cequier, A. Iniguez-Romo, I. Munoz Pousa, M. Cespon Fernandez, B. Caneiro Queija, R. Cobas-Paz, A. Lopez-Cuenca, A. Garay, P. F. Blanco, A. Rognoni, G. Biondi Zoccai, S. Biscaglia, I. Nunez-Gil, T. Fujii, A. Durante, X. Song, T. Kawaji, D. Alexopoulos, Z. Huczek, J. R. Gonzalez Juanatey, S. Nie, M. Kawashiri, I. Colonnelli, B. Cantalupo, R. Esposito, S. Leonardi, W. Grosso Marra, A. Chieffo, U. Michelucci, D. Piga, M. Malavolta, S. Gili, M. Mennuni, C. Montalto, L. Oltrona Visconti, and Y. Arfat, “Machine learning-based prediction of adverse events following an acute coronary syndrome (PRAISE): a modelling study of pooled datasets,” The lancet, vol. 397, iss. 10270, pp. 199-207, 2021. doi:10.1016/S0140-6736(20)32519-8
    [BibTeX] [Abstract] [Download PDF]

    Background The accuracy of current prediction tools for ischaemic and bleeding events after an acute coronary syndrome (ACS) remains insufficient for individualised patient management strategies. We developed a machine learning-based risk stratification model to predict all-cause death, recurrent acute myocardial infarction, and major bleeding after ACS. Methods Different machine learning models for the prediction of 1-year post-discharge all-cause death, myocardial infarction, and major bleeding (defined as Bleeding Academic Research Consortium type 3 or 5) were trained on a cohort of 19826 adult patients with ACS (split into a training cohort [80%] and internal validation cohort [20%]) from the BleeMACS and RENAMI registries, which included patients across several continents. 25 clinical features routinely assessed at discharge were used to inform the models. The best-performing model for each study outcome (the PRAISE score) was tested in an external validation cohort of 3444 patients with ACS pooled from a randomised controlled trial and three prospective registries. Model performance was assessed according to a range of learning metrics including area under the receiver operating characteristic curve (AUC). Findings The PRAISE score showed an AUC of 0.82 (95% CI 0.78-0.85) in the internal validation cohort and 0.92 (0.90-0.93) in the external validation cohort for 1-year all-cause death; an AUC of 0.74 (0.70-0.78) in the internal validation cohort and 0.81 (0.76-0.85) in the external validation cohort for 1-year myocardial infarction; and an AUC of 0.70 (0.66-0.75) in the internal validation cohort and 0.86 (0.82-0.89) in the external validation cohort for 1-year major bleeding. Interpretation A machine learning-based approach for the identification of predictors of events after an ACS is feasible and effective. The PRAISE score showed accurate discriminative capabilities for the prediction of all-cause death, myocardial infarction, and major bleeding, and might be useful to guide clinical decision making.

    @article{21:lancet,
    abstract = {Background The accuracy of current prediction tools for ischaemic and bleeding events after an acute coronary syndrome (ACS) remains insufficient for individualised patient management strategies. We developed a machine learning-based risk stratification model to predict all-cause death, recurrent acute myocardial infarction, and major bleeding after ACS.
    Methods Different machine learning models for the prediction of 1-year post-discharge all-cause death, myocardial infarction, and major bleeding (defined as Bleeding Academic Research Consortium type 3 or 5) were trained on a cohort of 19826 adult patients with ACS (split into a training cohort [80%] and internal validation cohort [20%]) from the BleeMACS and RENAMI registries, which included patients across several continents. 25 clinical features routinely assessed at discharge were used to inform the models. The best-performing model for each study outcome (the PRAISE score) was tested in an external validation cohort of 3444 patients with ACS pooled from a randomised controlled trial and three prospective registries. Model performance was assessed according to a range of learning metrics including area under the receiver operating characteristic curve (AUC).
    Findings The PRAISE score showed an AUC of 0.82 (95% CI 0.78-0.85) in the internal validation cohort and 0.92 (0.90-0.93) in the external validation cohort for 1-year all-cause death; an AUC of 0.74 (0.70-0.78) in the internal validation cohort and 0.81 (0.76-0.85) in the external validation cohort for 1-year myocardial infarction; and an AUC of 0.70 (0.66-0.75) in the internal validation cohort and 0.86 (0.82-0.89) in the external validation cohort for 1-year major bleeding.
    Interpretation A machine learning-based approach for the identification of predictors of events after an ACS is feasible and effective. The PRAISE score showed accurate discriminative capabilities for the prediction of all-cause death, myocardial infarction, and major bleeding, and might be useful to guide clinical decision making.},
    author = {Fabrizio D'Ascenzo and Ovidio {De Filippo} and Guglielmo Gallone and Gianluca Mittone and Marco Agostino Deriu and Mario Iannaccone and Albert Ariza-Sol\'e and Christoph Liebetrau and Sergio Manzano-Fern\'andez and Giorgio Quadri and Tim Kinnaird and Gianluca Campo and Jose Paulo {Simao Henriques} and James M Hughes and Alberto Dominguez-Rodriguez and Marco Aldinucci and Umberto Morbiducci and Giuseppe Patti and Sergio Raposeiras-Roubin and Emad Abu-Assi and Gaetano Maria {De Ferrari} and Francesco Piroli and Andrea Saglietto and Federico Conrotto and Pierluigi Omed\'e and Antonio Montefusco and Mauro Pennone and Francesco Bruno and Pier Paolo Bocchino and Giacomo Boccuzzi and Enrico Cerrato and Ferdinando Varbella and Michela Sperti and Stephen B. Wilton and Lazar Velicki and Ioanna Xanthopoulou and Angel Cequier and Andres Iniguez-Romo and Isabel {Munoz Pousa} and Maria {Cespon Fernandez} and Berenice {Caneiro Queija} and Rafael Cobas-Paz and Angel Lopez-Cuenca and Alberto Garay and Pedro Flores Blanco and Andrea Rognoni and Giuseppe {Biondi Zoccai} and Simone Biscaglia and Ivan Nunez-Gil and Toshiharu Fujii and Alessandro Durante and Xiantao Song and Tetsuma Kawaji and Dimitrios Alexopoulos and Zenon Huczek and Jose Ramon {Gonzalez Juanatey} and Shao-Ping Nie and Masa-aki Kawashiri and Iacopo Colonnelli and Barbara Cantalupo and Roberto Esposito and Sergio Leonardi and Walter {Grosso Marra} and Alaide Chieffo and Umberto Michelucci and Dario Piga and Marta Malavolta and Sebastiano Gili and Marco Mennuni and Claudio Montalto and Luigi {Oltrona Visconti} and Yasir Arfat},
    date-modified = {2021-03-26 23:53:19 +0100},
    doi = {10.1016/S0140-6736(20)32519-8},
    issn = {0140-6736},
    journal = {The Lancet},
    keywords = {deephealth, hpc4ai},
    number = {10270},
    pages = {199-207},
    title = {Machine learning-based prediction of adverse events following an acute coronary syndrome {(PRAISE)}: a modelling study of pooled datasets},
    url = {https://www.researchgate.net/profile/James_Hughes3/publication/348501148_Machine_learning-based_prediction_of_adverse_events_following_an_acute_coronary_syndrome_PRAISE_a_modelling_study_of_pooled_datasets/links/6002a81ba6fdccdcb858b6c2/Machine-learning-based-prediction-of-adverse-events-following-an-acute-coronary-syndrome-PRAISE-a-modelling-study-of-pooled-datasets.pdf},
    volume = {397},
    year = {2021},
    bdsk-url-1 = {https://www.researchgate.net/profile/James_Hughes3/publication/348501148_Machine_learning-based_prediction_of_adverse_events_following_an_acute_coronary_syndrome_PRAISE_a_modelling_study_of_pooled_datasets/links/6002a81ba6fdccdcb858b6c2/Machine-learning-based-prediction-of-adverse-events-following-an-acute-coronary-syndrome-PRAISE-a-modelling-study-of-pooled-datasets.pdf},
    bdsk-url-2 = {https://doi.org/10.1016/S0140-6736(20)32519-8}
    }

2020

  • I. Colonnelli, B. Cantalupo, I. Merelli, and M. Aldinucci, “Streamflow: cross-breeding cloud with HPC,” IEEE Transactions on Emerging Topics in Computing, 2020. doi:10.1109/TETC.2020.3019202
    [BibTeX] [Abstract] [Download PDF]

    Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single cell transcriptomic data analysis workflow.

    @article{20Lstreamflow:tect,
    abstract = {Workflows are among the most commonly used tools in a variety of execution environments. Many of them target a specific environment; few of them make it possible to execute an entire workflow in different environments, e.g. Kubernetes and batch clusters. We present a novel approach to workflow execution, called StreamFlow, that complements the workflow graph with the declarative description of potentially complex execution environments, and that makes it possible the execution onto multiple sites not sharing a common data space. StreamFlow is then exemplified on a novel bioinformatics pipeline for single cell transcriptomic data analysis workflow.},
    author = {Iacopo Colonnelli and Barbara Cantalupo and Ivan Merelli and Marco Aldinucci},
    date-added = {2020-08-27 09:29:49 +0200},
    date-modified = {2020-08-27 09:36:33 +0200},
    doi = {10.1109/TETC.2020.3019202},
    journal = {{IEEE} {T}ransactions on {E}merging {T}opics in {C}omputing},
    keywords = {deephealth, hpc4ai, streamflow},
    title = {StreamFlow: cross-breeding cloud with {HPC}},
    url = {https://arxiv.org/pdf/2002.01558},
    year = {2020},
    bdsk-url-1 = {https://arxiv.org/pdf/2002.01558},
    bdsk-url-2 = {https://doi.org/10.1109/TETC.2020.3019202}
    }

  • V. Cesare, I. Colonnelli, and M. Aldinucci, “Practical parallelization of scientific applications,” in Proc. of 28th euromicro intl. conference on parallel distributed and network-based processing (pdp), Västerås, Sweden, 2020, pp. 376-384. doi:10.1109/PDP50117.2020.00064
    [BibTeX] [Abstract] [Download PDF]

    This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We propose an automatable methodology to parallelize scientific applications designed with a purely sequential programming mindset, thus possibly using global variables, aliasing, random number generators, and stateful functions. We demonstrate the methodology by way of an astrophysical application, where we model at the same time the kinematic profiles of 30 disk galaxies with a Monte Carlo Markov Chain (MCMC), which is sequential by definition. The parallel code exhibits a 12 times speedup on a 48-core platform.

    @inproceedings{20:looppar:pdp,
    abstract = {This work aims at distilling a systematic methodology to modernize existing sequential scientific codes with a limited re-designing effort, turning an old codebase into modern code, i.e., parallel and robust code. We propose an automatable methodology to parallelize scientific applications designed with a purely sequential programming mindset, thus possibly using global variables, aliasing, random number generators, and stateful functions. We demonstrate the methodology by way of an astrophysical application, where we model at the same time the kinematic profiles of 30 disk galaxies with a Monte Carlo Markov Chain (MCMC), which is sequential by definition. The parallel code exhibits a 12 times speedup on a 48-core platform.},
    address = {V{\"a}ster{\aa}s, Sweden},
    author = {Valentina Cesare and Iacopo Colonnelli and Marco Aldinucci},
    booktitle = {Proc. of 28th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
    date-modified = {2020-04-05 02:21:31 +0200},
    doi = {10.1109/PDP50117.2020.00064},
    keywords = {hpc4ai, c3s},
    pages = {376-384},
    publisher = {IEEE},
    title = {Practical Parallelization of Scientific Applications},
    url = {https://iris.unito.it/retrieve/handle/2318/1735377/601141/2020_looppar_PDP.pdf},
    year = {2020},
    bdsk-url-1 = {https://doi.org/10.1109/PDP50117.2020.00064},
    bdsk-url-2 = {https://iris.unito.it/retrieve/handle/2318/1735377/601141/2020_looppar_PDP.pdf}
    }

2019

  • P. Viviani, M. Drocco, D. Baccega, I. Colonnelli, and M. Aldinucci, “Deep learning at scale,” in Proc. of 27th euromicro intl. conference on parallel distributed and network-based processing (pdp), Pavia, Italy, 2019, pp. 124-131. doi:10.1109/EMPDP.2019.8671552
    [BibTeX] [Abstract] [Download PDF]

    This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training to scale while retaining good convergence properties. Moreover, an experimental implementation is presented as well as some preliminary results.

    @inproceedings{19:deeplearn:pdp,
    abstract = {This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training to scale while retaining good convergence properties. Moreover, an experimental implementation is presented as well as some preliminary results.},
    address = {Pavia, Italy},
    author = {Paolo Viviani and Maurizio Drocco and Daniele Baccega and Iacopo Colonnelli and Marco Aldinucci},
    booktitle = {Proc. of 27th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
    date-added = {2020-01-30 10:48:12 +0100},
    date-modified = {2020-11-15 15:00:34 +0100},
    doi = {10.1109/EMPDP.2019.8671552},
    keywords = {machine learning},
    pages = {124-131},
    publisher = {IEEE},
    title = {Deep Learning at Scale},
    url = {https://iris.unito.it/retrieve/handle/2318/1695211/487778/19_deeplearning_PDP.pdf},
    year = {2019},
    bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1695211/487778/19_deeplearning_PDP.pdf},
    bdsk-url-2 = {https://doi.org/10.1109/EMPDP.2019.8671552}
    }

  • M. Drocco, P. Viviani, I. Colonnelli, M. Aldinucci, and M. Grangetto, “Accelerating spectral graph analysis through wavefronts of linear algebra operations,” in Proc. of 27th euromicro intl. conference on parallel distributed and network-based processing (pdp), Pavia, Italy, 2019, pp. 9-16. doi:10.1109/EMPDP.2019.8671640
    [BibTeX] [Abstract] [Download PDF]

    The wavefront pattern captures the unfolding of a parallel computation in which data elements are laid out as a logical multidimensional grid and the dependency graph favours a diagonal sweep across the grid. In the emerging area of spectral graph analysis, the computing often consists in a wavefront running over a tiled matrix, involving expensive linear algebra kernels. While these applications might benefit from parallel heterogeneous platforms (multi-core with GPUs),programming wavefront applications directly with high-performance linear algebra libraries yields code that is complex to write and optimize for the specific application. We advocate a methodology based on two abstractions (linear algebra and parallel pattern-based run-time), that allows to develop portable, self-configuring, and easy-to-profile code on hybrid platforms.

    @inproceedings{19:gsp:pdp,
    abstract = {The wavefront pattern captures the unfolding of a parallel computation in which data elements are laid out as a logical multidimensional grid and the dependency graph favours a diagonal sweep across the grid. In the emerging area of spectral graph analysis, the computing often consists in a wavefront running over a tiled matrix, involving expensive linear algebra kernels. While these applications might benefit from parallel heterogeneous platforms (multi-core with GPUs),programming wavefront applications directly with high-performance linear algebra libraries yields code that is complex to write and optimize for the specific application. We advocate a methodology based on two abstractions (linear algebra and parallel pattern-based run-time), that allows to develop portable, self-configuring, and easy-to-profile code on hybrid platforms.},
    address = {Pavia, Italy},
    author = {Maurizio Drocco and Paolo Viviani and Iacopo Colonnelli and Marco Aldinucci and Marco Grangetto},
    booktitle = {Proc. of 27th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
    date-modified = {2021-04-24 23:22:22 +0200},
    doi = {10.1109/EMPDP.2019.8671640},
    pages = {9-16},
    publisher = {IEEE},
    title = {Accelerating spectral graph analysis through wavefronts of linear algebra operations},
    url = {https://iris.unito.it/retrieve/handle/2318/1695315/488105/19_wavefront_PDP.pdf},
    year = {2019},
    bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1695315/488105/19_wavefront_PDP.pdf},
    bdsk-url-2 = {https://doi.org/10.1109/EMPDP.2019.8671640}
    }