# Maurizio Drocco

Formerly PhD student in the Parallel Computing group, from May 2018 Postdoctoral Research Associate in the High-Performance Computing (HCP) group at the Pacific Northwest National Laboratory (Richland, WA).

## Summary

Maurizio Drocco received his Ph.D. in Computer Science from the University of Torino in October 2017, with a thesis about distributed programming in C++.

He has been Research Intern at the IBM TJ Watson Research Center (NY) in 2015, working on parallel graph processing (Graph500), and at the IBM Dublin Research Lab in 2013, working on semi-automatic tuning of parallel applications. He is research associate at University of Torino since 2009. He has co-authored papers in international journals and conference proceedings (Google Scholar h-index 8).

His research focuses on high-level parallel programming for high-performance computing, in particular models and methods for heterogeneous platforms.

## Publications

### 2019

• P. Viviani, M. Drocco, D. Baccega, I. Colonnelli, and M. Aldinucci, “Deep learning at scale,” in Proc. of 27th euromicro intl. conference on parallel distributed and network-based processing (pdp), Pavia, Italy, 2019, pp. 124-131. doi:10.1109/EMPDP.2019.8671552

This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training to scale while retaining good convergence properties. Moreover, an experimental implementation is presented as well as some preliminary results.

@inproceedings{19:deeplearn:pdp,
abstract = {This work presents a novel approach to distributed training of deep neural networks (DNNs) that aims to overcome the issues related to mainstream approaches to data parallel training. Established techniques for data parallel training are discussed from both a parallel computing and deep learning perspective, then a different approach is presented that is meant to allow DNN training to scale while retaining good convergence properties. Moreover, an experimental implementation is presented as well as some preliminary results.},
author = {Paolo Viviani and Maurizio Drocco and Daniele Baccega and Iacopo Colonnelli and Marco Aldinucci},
booktitle = {Proc. of 27th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2020-01-30 10:48:12 +0100},
doi = {10.1109/EMPDP.2019.8671552},
keywords = {deep learning, distributed computing, machine learning, large scale, C++},
pages = {124-131},
publisher = {IEEE},
title = {Deep Learning at Scale},
url = {https://iris.unito.it/retrieve/handle/2318/1695211/487778/19_deeplearning_PDP.pdf},
year = {2019},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1695211/487778/19_deeplearning_PDP.pdf}
}

• M. Drocco, P. Viviani, I. Colonnelli, M. Aldinucci, and M. Grangetto, “Accelerating spectral graph analysis through wavefronts of linear algebra operations,” in Proc. of 27th euromicro intl. conference on parallel distributed and network-based processing (pdp), Pavia, Italy, 2019, pp. 9-16. doi:10.1109/EMPDP.2019.8671640

The wavefront pattern captures the unfolding of a parallel computation in which data elements are laid out as a logical multidimensional grid and the dependency graph favours a diagonal sweep across the grid. In the emerging area of spectral graph analysis, the computing often consists in a wavefront running over a tiled matrix, involving expensive linear algebra kernels. While these applications might benefit from parallel heterogeneous platforms (multi-core with GPUs),programming wavefront applications directly with high-performance linear algebra libraries yields code that is complex to write and optimize for the specific application. We advocate a methodology based on two abstractions (linear algebra and parallel pattern-based run-time), that allows to develop portable, self-configuring, and easy-to-profile code on hybrid platforms.

@inproceedings{19:gsp:pdp,
abstract = {The wavefront pattern captures the unfolding of a parallel computation in which data elements are laid out as a logical multidimensional grid and the dependency graph favours a diagonal sweep across the grid. In the emerging area of spectral graph analysis, the computing often consists in a wavefront running over a tiled matrix, involving expensive linear algebra kernels. While these applications might benefit from parallel heterogeneous platforms (multi-core with GPUs),programming wavefront applications directly with high-performance linear algebra libraries yields code that is complex to write and optimize for the specific application. We advocate a methodology based on two abstractions (linear algebra and parallel pattern-based run-time), that allows to develop portable, self-configuring, and easy-to-profile code on hybrid platforms.},
author = {Maurizio Drocco and Paolo Viviani and Iacopo Colonnelli and Marco Aldinucci and Marco Grangetto},
booktitle = {Proc. of 27th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2019-03-22 23:07:10 +0100},
doi = {10.1109/EMPDP.2019.8671640},
keywords = {eigenvalues, wavefront, GPU, CUDA, linear algebra},
pages = {9-16},
publisher = {IEEE},
title = {Accelerating spectral graph analysis through wavefronts of linear algebra operations},
url = {https://iris.unito.it/retrieve/handle/2318/1695315/488105/19_wavefront_PDP.pdf},
year = {2019},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1695315/488105/19_wavefront_PDP.pdf},
bdsk-url-2 = {https://doi.org/10.1109/EMPDP.2019.8671640}
}

• M. Aldinucci, M. Drocco, C. Misale, and G. Tremblay, “Languages for big data analysis,” in Encyclopedia of big data technologies, S. Sakr and A. Zomaya, Eds., Cham: Springer international publishing, 2019. doi:10.1007/978-3-319-63962-8_142-1

In this chapter, some of the most common tools for Big Data analytics are surveyed, inter-alia, Apache Spark, Flink, Storm, and Beam. They are compared against well-defined features concerning programming model (language expressivity and semantics), and execution model (parallel behaviour and run-time support). The implementation of a running example is provided for all of them.

@inbook{bigdata:encyclopedia:18,
abstract = {In this chapter, some of the most common tools for Big Data analytics are surveyed, inter-alia, Apache Spark, Flink, Storm, and Beam. They are compared against well-defined features concerning programming model (language expressivity and semantics), and execution model (parallel behaviour and run-time support). The implementation of a running example is provided for all of them.},
author = {Aldinucci, Marco and Drocco, Maurizio and Misale, Claudia and Tremblay, Guy},
booktitle = {Encyclopedia of Big Data Technologies},
date-modified = {2019-03-22 08:13:33 +0100},
doi = {10.1007/978-3-319-63962-8_142-1},
editor = {Sakr, Sherif and Zomaya, Albert},
isbn = {978-3-319-63962-8},
publisher = {Springer International Publishing},
title = {Languages for Big Data analysis},
url = {http://alpha.di.unito.it/storage/papers/2019_bigdataframeworks_enc.pdf},
year = {2019},
bdsk-url-1 = {http://alpha.di.unito.it/storage/papers/2019_bigdataframeworks_enc.pdf},
bdsk-url-2 = {https://doi.org/10.1007/978-3-319-63962-8_142-1}
}

• M. Torquati, G. Mencagli, M. Drocco, M. Aldinucci, T. De Matteis, and M. Danelutto, “On dynamic memory allocation in sliding-window parallel patterns for streaming analytics,” The journal of supercomputing, vol. 75, iss. 8, p. 4114–4131, 2019. doi:10.1007/s11227-017-2152-1
[BibTeX] [Abstract]

This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper we consider two streaming parallel patterns and we discuss different implementation variants related on how dynamic memory is managed. The results show that the standard mechanisms provided by modern C++ are not entirely adequate for maximizing the performance. Instead, the combined use of an efficient general-purpose memory allocator, a custom allocator optimized for the pattern considered and a custom variant of the C++ shared pointer mechanism, provides a performance improvement up to 16{\%} on the best case.

@article{17:dmadasp:jsupe,
abstract = {This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper we consider two streaming parallel patterns and we discuss different implementation variants related on how dynamic memory is managed. The results show that the standard mechanisms provided by modern C++ are not entirely adequate for maximizing the performance. Instead, the combined use of an efficient general-purpose memory allocator, a custom allocator optimized for the pattern considered and a custom variant of the C++ shared pointer mechanism, provides a performance improvement up to 16{\%} on the best case.},
author = {Massimo Torquati and Gabriele Mencagli and Maurizio Drocco and Marco Aldinucci and De Matteis, Tiziano and Marco Danelutto},
doi = {10.1007/s11227-017-2152-1},
journal = {The Journal of Supercomputing},
keywords = {rephrase, fastflow},
number = {8},
pages = {4114--4131},
title = {On Dynamic Memory Allocation in Sliding-Window Parallel Patterns for Streaming Analytics},
volume = {75},
year = 2019,
bdsk-url-1 = {https://doi.org/10.1007/s11227-017-2152-1}
}

### 2018

• P. Viviani, M. Drocco, and M. Aldinucci, “Pushing the boundaries of parallel deep learning – A practical approach,” Corr, vol. abs/1806.09528, 2018.
[BibTeX]
@article{18:arxiv:deeplearning,
author = {Paolo Viviani and Maurizio Drocco and Marco Aldinucci},
journal = {CoRR},
title = {Pushing the boundaries of parallel Deep Learning - {A} practical approach},
volume = {abs/1806.09528},
year = {2018}
}

• C. Misale, M. Drocco, G. Tremblay, A. R. Martinelli, and M. Aldinucci, “Pico: high-performance data analytics pipelines in modern c++,” Future generation computer systems, vol. 87, pp. 392-403, 2018. doi:10.1016/j.future.2018.05.030

In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: (1) unifying batch and stream data access models, (2) decoupling processing from data layout, and (3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.

@article{18:fgcs:pico,
abstract = {In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo's programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: (1) unifying batch and stream data access models, (2) decoupling processing from data layout, and (3) exploiting a stream-oriented, scalable, efficient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to reuse the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo, when compared to Spark and Flink, can attain better performances in terms of execution times and can hugely improve memory utilization, both for batch and stream processing.},
author = {Claudia Misale and Maurizio Drocco and Guy Tremblay and Alberto R. Martinelli and Marco Aldinucci},
booktitle = {Future Generation Computer Systems},
date-modified = {2018-12-27 18:39:31 +0100},
doi = {10.1016/j.future.2018.05.030},
journal = {Future Generation Computer Systems},
keywords = {toreador, big data, fastflow},
pages = {392-403},
title = {PiCo: High-performance data analytics pipelines in modern C++},
url = {https://iris.unito.it/retrieve/handle/2318/1668444/414280/fgcs_pico.pdf},
volume = {87},
year = {2018},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1668444/414280/fgcs_pico.pdf},
bdsk-url-2 = {https://doi.org/10.1016/j.future.2018.05.030}
}

• M. Aldinucci, S. Rabellino, M. Pironti, F. Spiga, P. Viviani, M. Drocco, M. Guerzoni, G. Boella, M. Mellia, P. Margara, I. Drago, R. Marturano, G. Marchetto, E. Piccolo, S. Bagnasco, S. Lusso, S. Vallero, G. Attardi, A. Barchiesi, A. Colla, and F. Galeazzi, “Hpc4ai, an ai-on-demand federated platform endeavour,” in Acm computing frontiers, Ischia, Italy, 2018. doi:10.1145/3203217.3205340

In April 2018, under the auspices of the POR-FESR 2014-2020 program of Italian Piedmont Region, the Turin’s Centre on High-Performance Computing for Artificial Intelligence (HPC4AI) was funded with a capital investment of 4.5Me and it began its deployment. HPC4AI aims to facilitate scientific research and engineering in the areas of Artificial Intelligence and Big Data Analytics. HPC4AI will specifically focus on methods for the on-demand provisioning of AI and BDA Cloud services to the regional and national industrial community, which includes the large regional ecosystem of Small-Medium Enterprises (SMEs) active in many different sectors such as automotive, aerospace, mechatronics, manufacturing, health and agrifood.

@inproceedings{18:hpc4ai_acm_CF,
abstract = {In April 2018, under the auspices of the POR-FESR 2014-2020 program of Italian Piedmont Region, the Turin's Centre on High-Performance Computing for Artificial Intelligence (HPC4AI) was funded with a capital investment of 4.5Me and it began its deployment. HPC4AI aims to facilitate scientific research and engineering in the areas of Artificial Intelligence and Big Data Analytics. HPC4AI will specifically focus on methods for the on-demand provisioning of AI and BDA Cloud services to the regional and national industrial community, which includes the large regional ecosystem of Small-Medium Enterprises (SMEs) active in many different sectors such as automotive, aerospace, mechatronics, manufacturing, health and agrifood.
},
author = {Marco Aldinucci and Sergio Rabellino and Marco Pironti and Filippo Spiga and Paolo Viviani and Maurizio Drocco and Marco Guerzoni and Guido Boella and Marco Mellia and Paolo Margara and Idillio Drago and Roberto Marturano and Guido Marchetto and Elio Piccolo and Stefano Bagnasco and Stefano Lusso and Sara Vallero and Giuseppe Attardi and Alex Barchiesi and Alberto Colla and Fulvio Galeazzi},
booktitle = {ACM Computing Frontiers},
date-modified = {2018-12-17 23:57:55 +0100},
doi = {10.1145/3203217.3205340},
month = may,
title = {HPC4AI, an AI-on-demand federated platform endeavour},
url = {http://alpha.di.unito.it/storage/papers/2018_hpc4ai_ACM_CF.pdf},
year = {2018},
bdsk-url-1 = {http://alpha.di.unito.it/storage/papers/2018_hpc4ai_ACM_CF.pdf},
bdsk-url-2 = {https://doi.org/10.1145/3203217.3205340}
}

• P. Viviani, M. Drocco, and M. Aldinucci, “Scaling dense linear algebra on multicore and beyond: a survey,” in Proc. of 26th euromicro intl. conference on parallel distributed and network-based processing (pdp), Cambridge, United Kingdom, 2018. doi:10.1109/PDP2018.2018.00122

The present trend in big-data analytics is to exploit algorithms with (sub-)linear time complexity, in this sense it is usually worth to investigate if the available techniques can be approximated to reach an affordable complexity. However, there are still problems in data science and engineering that involve algorithms with higher time complexity, like matrix inversion or Singular Value Decomposition (SVD). This work presents the results of a survey that reviews a number of tools meant to perform dense linear algebra at “Big Data” scale: namely, the proposed approach aims first to define a feasibility boundary for the problem size of shared-memory matrix factorizations, then to understand whether it is convenient to employ specific tools meant to scale out such dense linear algebra tasks on distributed platforms. The survey will eventually discuss the presented tools from the point of view of domain experts (data scientist, engineers), hence focusing on the trade-off between usability and performance.

@inproceedings{svd:pdp:18,
abstract = {The present trend in big-data analytics is to exploit algorithms with (sub-)linear time complexity, in this sense it is usually worth to investigate if the available techniques can be approximated to reach an affordable complexity. However, there are still problems in data science and engineering that involve algorithms with higher time complexity, like matrix inversion or Singular Value Decomposition (SVD). This work presents the results of a survey that reviews a number of tools meant to perform dense linear algebra at Big Data'' scale: namely, the proposed approach aims first to define a feasibility boundary for the problem size of shared-memory matrix factorizations, then to understand whether it is convenient to employ specific tools meant to scale out such dense linear algebra tasks on distributed platforms. The survey will eventually discuss the presented tools from the point of view of domain experts (data scientist, engineers), hence focusing on the trade-off between usability and performance.},
author = {Paolo Viviani and Maurizio Drocco and Marco Aldinucci},
booktitle = {Proc. of 26th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2019-03-22 23:51:52 +0100},
doi = {10.1109/PDP2018.2018.00122},
keywords = {svd, big data, linear algebra},
publisher = {IEEE},
title = {Scaling Dense Linear Algebra on Multicore and Beyond: a Survey},
url = {https://iris.unito.it/retrieve/handle/2318/1659340/387685/preprint_aperto.pdf},
year = {2018},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1659340/387685/preprint_aperto.pdf}
}

• C. Misale, M. Drocco, G. Tremblay, and M. Aldinucci, “Pico: a novel approach to stream data analytics,” in Proc. of euro-par workshops: 1st intl. workshop on autonomic solutions for parallel and distributed data stream processing (auto-dasp 2017), Santiago de Compostela, Spain, 2018. doi:10.1007/978-3-319-75178-8_10

In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo’s programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: 1) unifying batch and stream data access models, 2) decoupling processing from data layout, and 3) exploiting a stream-oriented, scalable, effiicient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to re-use the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo can attain better performances in terms of execution times and hugely improve memory utilization when compared to Spark and Flink in both batch and stream processing.

@inproceedings{pico:autodasp:17,
abstract = {In this paper, we present a new C++ API with a fluent interface called PiCo (Pipeline Composition). PiCo's programming model aims at making easier the programming of data analytics applications while preserving or enhancing their performance. This is attained through three key design choices: 1) unifying batch and stream data access models, 2) decoupling processing from data layout, and 3) exploiting a stream-oriented, scalable, effiicient C++11 runtime system. PiCo proposes a programming model based on pipelines and operators that are polymorphic with respect to data types in the sense that it is possible to re-use the same algorithms and pipelines on different data models (e.g., streams, lists, sets, etc.). Preliminary results show that PiCo can attain better performances in terms of execution times and hugely improve memory utilization when compared to Spark and Flink in both batch and stream processing.},
address = {Santiago de Compostela, Spain},
author = {Claudia Misale and Maurizio Drocco and Guy Tremblay and Marco Aldinucci},
booktitle = {Proc. of Euro-Par Workshops: 1st Intl. Workshop on Autonomic Solutions for Parallel and Distributed Data Stream Processing (Auto-DaSP 2017)},
date-modified = {2018-01-21 16:08:28 +0000},
doi = {10.1007/978-3-319-75178-8_10},
month = aug,
publisher = {Springer},
series = {{LNCS}},
title = {PiCo: a Novel Approach to Stream Data Analytics},
url = {https://iris.unito.it/retrieve/handle/2318/1659344/409520/autodasp.pdf},
volume = {10659},
year = {2018},
bdsk-url-1 = {https://dx.doi.org/10.1007/978-3-319-75178-8_10}
}

• M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, C. Misale, G. Peretti Pezzi, and M. Torquati, “A parallel pattern for iterative stencil + reduce,” Journal of supercomputing, vol. 74, iss. 11, pp. 5690-5705, 2018. doi:10.1007/s11227-016-1871-z

We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

@article{16:stencilreduce:jsupe,
abstract = {We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.},
author = {Marco Aldinucci and Marco Danelutto and Maurizio Drocco and Peter Kilpatrick and Claudia Misale and Guilherme {Peretti Pezzi} and Massimo Torquati},
date-modified = {2018-12-27 18:19:41 +0100},
doi = {10.1007/s11227-016-1871-z},
journal = {Journal of Supercomputing},
keywords = {nvidia, repara, rephrase},
number = {11},
pages = {5690-5705},
title = {A Parallel Pattern for Iterative Stencil + Reduce},
url = {http://arxiv.org/pdf/1609.04567v1.pdf},
volume = {74},
year = {2018},
bdsk-url-1 = {http://dx.doi.org/10.1007/s11227-016-1871-z},
bdsk-url-2 = {http://arxiv.org/pdf/1609.04567v1.pdf}
}

### 2017

• M. Drocco, “Parallel programming with global asynchronous memory: models, C++ APIs and implementations,” PhD Thesis, 2017. doi:10.5281/zenodo.1037585

@phdthesis{17:gam:drocco:thesis,
abstract = {In the realm of High Performance Computing (HPC), message passing
has been the programming paradigm of choice for over twenty years.
The durable MPI (Message Passing Interface) standard, with send/receive
communication,
broadcast, gather/scatter, and reduction collectives is still used to construct
parallel programs where each communication is orchestrated by the
de\-vel\-oper-based precise knowledge of data distribution and overheads;
collective communications simplify the orchestration but might induce excessive
synchronization.
Early attempts to bring shared-memory programming model---with its programming
adv\-antages---to distributed computing, referred as the Distributed Shared
Memory (DSM) model, faded away; one of the main issue was to combine
performance and programmability with the memory consistency model.
The recently proposed Partitioned Global Address Space (PGAS) model is a modern
revamp of DSM that exposes data placement to enable optimizations based on
locality, but it still addresses (simple) data-parallelism only and it relies
on expensive sharing protocols.
We advocate an alternative programming model for distributed computing based on
a Global Asynchronous Memory (GAM), aiming to \emph{avoid} coherency and
consistency problems rather than solving them.
We materialize GAM by designing and implementing a \emph{distributed smart
pointers} library, inspired by C++ smart pointers.
In this model, public and private pointers (resembling C++ shared and unique
pointers, respectively) are moved around instead of messages (i.e., data), thus
alleviating the user from the burden of minimizing transfers.
On top of smart pointers, we propose a high-level C++ template library for
writing applications in terms of dataflow-like networks, namely GAM nets,
consisting of stateful processors exchanging pointers in fully asynchronous
fashion.
We demonstrate the validity of the proposed approach, from the expressiveness
perspective, by showing how GAM nets can be exploited to implement higher-level
parallel programming models, such as data and task parallelism.
As for the performance perspective, the execution of two non-toy benchmarks on
a number of different small-scale HPC clusters exhibits both close-to-ideal
scalability and negligible overhead with respect to state-of-the-art benchmark
implementations.
For instance, the GAM implementation of a high-quality video restoration filter
sustains a 100 fps throughput over 70\%-noisy high-quality video streams on a
4-node cluster of Graphics Processing Units (GPUs), with minimal programming
effort.},
author = {Maurizio Drocco},
date-modified = {2017-12-12 15:09:35 +0000},
doi = {10.5281/zenodo.1037585},
keywords = {fastflow, rephrase, toreador, repara, paraphrase},
month = oct,
school = {Computer Science Department, University of Torino},
title = {Parallel Programming with Global Asynchronous Memory: Models, {C++} {API}s and Implementations},
url = {https://zenodo.org/record/1037585/files/Drocco_phd_thesis.pdf},
year = {2017},
bdsk-url-1 = {https://zenodo.org/record/1037585/files/Drocco_phd_thesis.pdf},
bdsk-url-2 = {http://dx.doi.org/10.5281/zenodo.1037585}
}

• C. Misale, M. Drocco, M. Aldinucci, and G. Tremblay, “A comparison of big data frameworks on a layered dataflow model,” Parallel processing letters, vol. 27, iss. 01, p. 1–20, 2017. doi:10.1142/S0129626417400035

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

@article{17:bigdatasurvey:PPL,
abstract = {In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.},
author = {Misale, Claudia and Drocco, Maurizio and Aldinucci, Marco and Tremblay, Guy},
date-modified = {2017-12-12 12:16:32 +0000},
date-published = {March 2017},
doi = {10.1142/S0129626417400035},
eprint = {http://www.worldscientific.com/doi/pdf/10.1142/S0129626417400035},
journal = {Parallel Processing Letters},
number = {01},
pages = {1--20},
title = {A Comparison of Big Data Frameworks on a Layered Dataflow Model},
url = {https://iris.unito.it/retrieve/handle/2318/1626287/303421/preprintPPL_4aperto.pdf},
volume = {27},
year = {2017},
bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1626287/303421/preprintPPL_4aperto.pdf},
bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626417400035}
}

• F. Tordini, M. Drocco, C. Misale, L. Milanesi, P. Liò, I. Merelli, M. Torquati, and M. Aldinucci, “NuChart-II: the road to a fast and scalable tool for Hi-C data analysis,” International journal of high performance computing applications, vol. 31, iss. 3, pp. 196-211, 2017. doi:10.1177/1094342016668567

Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.

@article{16:ijhpca:nuchart,
abstract = {Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.},
author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li{\o} and Ivan Merelli and Massimo Torquati and Marco Aldinucci},
date-modified = {2018-12-27 19:06:22 +0100},
doi = {10.1177/1094342016668567},
journal = {International Journal of High Performance Computing Applications},
keywords = {fastflow, bioinformatics, repara, rephrase, interomics, mimomics},
number = {3},
pages = {196-211},
title = {{NuChart-II}: the road to a fast and scalable tool for {Hi-C} data analysis},
url = {https://iris.unito.it/retrieve/handle/2318/1607126/238747/main.pdf},
volume = {31},
year = {2017},
bdsk-url-1 = {http://hdl.handle.net/2318/1607126},
bdsk-url-2 = {http://dx.doi.org/10.1177/1094342016668567}
}

### 2016

• C. Misale, M. Drocco, M. Aldinucci, and G. Tremblay, “A comparison of big data frameworks on a layered dataflow model,” in Proc. of intl. workshop on high-level parallel programming (hlpp), Muenster, Germany, 2016, pp. 1-19. doi:10.5281/zenodo.321866

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

@inproceedings{16:bigdatasurvey:hlpp,
abstract = {In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.},
author = {Claudia Misale and Maurizio Drocco and Marco Aldinucci and Guy Tremblay},
booktitle = {Proc. of Intl. Workshop on High-Level Parallel Programming (HLPP)},
date-modified = {2017-12-12 14:49:47 +0000},
doi = {10.5281/zenodo.321866},
month = jul,
pages = {1-19},
publisher = {arXiv.org},
title = {A Comparison of Big Data Frameworks on a Layered Dataflow Model},
url = {http://arxiv.org/pdf/1606.05293v1.pdf},
year = {2016},
bdsk-url-1 = {http://arxiv.org/pdf/1606.05293v1.pdf},
bdsk-url-2 = {http://dx.doi.org/10.5281/zenodo.321866}
}

• M. Drocco, C. Misale, and M. Aldinucci, “A cluster-as-accelerator approach for SPMD-free data parallelism,” in Proc. of 24th euromicro intl. conference on parallel distributed and network-based processing (pdp), Crete, Greece, 2016, p. 350–353. doi:10.1109/PDP.2016.97

In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high- level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as general inter- preters of user-defined functional tasks over node-local portions of distributed data structures. We envision coupling a simple yet powerful programming model with a lightweight, locality- aware distributed runtime as a promising step along the road towards high-performance data analytics, in particular under the perspective of the upcoming exascale era. We implemented the proposed approach in SkeDaTo, a prototyping C++ library of data-parallel skeletons exploiting cluster-as-accelerator at the bottom layer of the runtime software stack.

@inproceedings{skedato:pdp:16,
abstract = {In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high- level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as general inter- preters of user-defined functional tasks over node-local portions of distributed data structures. We envision coupling a simple yet powerful programming model with a lightweight, locality- aware distributed runtime as a promising step along the road towards high-performance data analytics, in particular under the perspective of the upcoming exascale era. We implemented the proposed approach in SkeDaTo, a prototyping C++ library of data-parallel skeletons exploiting cluster-as-accelerator at the bottom layer of the runtime software stack.},
author = {Maurizio Drocco and Claudia Misale and Marco Aldinucci},
booktitle = {Proc. of 24th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2017-12-12 14:49:31 +0000},
doi = {10.1109/PDP.2016.97},
keywords = {rephrase, fastflow},
pages = {350--353},
publisher = {IEEE},
title = {A Cluster-As-Accelerator approach for {SPMD}-free Data Parallelism},
url = {http://alpha.di.unito.it/storage/papers/2016_pdp_skedato.pdf},
year = {2016},
bdsk-url-1 = {http://alpha.di.unito.it/storage/papers/2016_pdp_skedato.pdf},
bdsk-url-2 = {https://doi.org/10.1109/PDP.2016.97}
}

### 2015

• M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, G. Peretti Pezzi, and M. Torquati, “The loop-of-stencil-reduce paradigm,” in Proc. of intl. workshop on reengineering for parallelism in heterogeneous parallel platforms (repara), Helsinki, Finland, 2015, pp. 172-177. doi:10.1109/Trustcom.2015.628

In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.

@inproceedings{opencl:ff:ispa:15,
abstract = {In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.},
author = {Marco Aldinucci and Marco Danelutto and Maurizio Drocco and Peter Kilpatrick and Guilherme {Peretti Pezzi} and Massimo Torquati},
booktitle = {Proc. of Intl. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara)},
date-modified = {2015-09-24 11:14:56 +0000},
doi = {10.1109/Trustcom.2015.628},
keywords = {fastflow, repara, nvidia},
month = aug,
pages = {172-177},
publisher = {IEEE},
url = {http://alpha.di.unito.it/storage/papers/2015_RePara_ISPA.pdf},
year = {2015},
bdsk-url-1 = {http://alpha.di.unito.it/storage/papers/2015_RePara_ISPA.pdf},
bdsk-url-2 = {https://doi.org/10.1109/Trustcom.2015.628}
}

• F. Tordini, M. Drocco, C. Misale, L. Milanesi, P. Liò, I. Merelli, and M. Aldinucci, “Parallel exploration of the nuclear chromosome conformation with NuChart-II,” in Proc. of 23rd euromicro intl. conference on parallel distributed and network-based processing (pdp), 2015. doi:10.1109/PDP.2015.104

High-throughput molecular biology techniques are widely used to identify physical interactions between genetic elements located throughout the human genome. Chromosome Conformation Capture (3C) and other related techniques allow to investigate the spatial organisation of chromosomes in the cell’s natural state. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we introduce NuChart-II, a tool for Hi-C data analysis that provides a gene-centric view of the chromosomal neighbour- hood in a graph-based manner. NuChart-II is an efficient and highly optimized C++ re-implementation of a previous prototype package developed in R. Representing Hi-C data using a graph- based approach overcomes the common view relying on genomic coordinates and permits the use of graph analysis techniques to explore the spatial conformation of a gene neighbourhood.

@inproceedings{nuchar:tool:15,
abstract = {High-throughput molecular biology techniques are widely used to identify physical interactions between genetic elements located throughout the human genome. Chromosome Conformation Capture (3C) and other related techniques allow to investigate the spatial organisation of chromosomes in the cell's natural state. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we introduce NuChart-II, a tool for Hi-C data analysis that provides a gene-centric view of the chromosomal neighbour- hood in a graph-based manner. NuChart-II is an efficient and highly optimized C++ re-implementation of a previous prototype package developed in R. Representing Hi-C data using a graph- based approach overcomes the common view relying on genomic coordinates and permits the use of graph analysis techniques to explore the spatial conformation of a gene neighbourhood.
},
author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li{\o} and Ivan Merelli and Marco Aldinucci},
booktitle = {Proc. of 23rd Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2017-12-12 13:55:10 +0000},
doi = {10.1109/PDP.2015.104},
keywords = {fastflow, bioinformatics, paraphrase, repara, impact},
month = mar,
publisher = {IEEE},
title = {Parallel Exploration of the Nuclear Chromosome Conformation with {NuChart-II}},
year = {2015},
bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2015.104}
}

• I. Merelli, F. Tordini, M. Drocco, M. Aldinucci, P. Liò, and L. Milanesi, “Integrating multi-omic features exploiting Chromosome Conformation Capture data,” Frontiers in genetics, vol. 6, iss. 40, 2015. doi:10.3389/fgene.2015.00040

The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell’s natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.

@article{nuchart:frontiers:15,
abstract = {The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.},
author = {Merelli, Ivan and Tordini, Fabio and Drocco, Maurizio and Aldinucci, Marco and Li{\o}, Pietro and Milanesi, Luciano},
date-modified = {2015-09-24 11:23:10 +0000},
doi = {10.3389/fgene.2015.00040},
issn = {1664-8021},
journal = {Frontiers in Genetics},
keywords = {bioinformatics, fastflow, interomics, hirma, mimomics},
number = {40},
title = {Integrating Multi-omic features exploiting {Chromosome Conformation Capture} data},
url = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},
volume = {6},
year = {2015},
bdsk-url-1 = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},
bdsk-url-2 = {http://dx.doi.org/10.3389/fgene.2015.00040}
}

• M. Drocco, C. Misale, G. Peretti Pezzi, F. Tordini, and M. Aldinucci, “Memory-optimised parallel processing of Hi-C data,” in Proc. of 23rd euromicro intl. conference on parallel distributed and network-based processing (pdp), 2015, pp. 1-8. doi:10.1109/PDP.2015.63

This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.

@inproceedings{nuchart:speedup:15,
abstract = {This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.},
author = {Maurizio Drocco and Claudia Misale and Guilherme {Peretti Pezzi} and Fabio Tordini and Marco Aldinucci},
booktitle = {Proc. of 23rd Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2017-12-12 14:45:09 +0000},
doi = {10.1109/PDP.2015.63},
keywords = {fastflow,bioinformatics, paraphrase, repara, impact},
month = mar,
pages = {1-8},
publisher = {IEEE},
title = {Memory-Optimised Parallel Processing of {Hi-C} Data},
year = {2015},
bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2015.63}
}

• M. Aldinucci, G. Peretti Pezzi, M. Drocco, C. Spampinato, and M. Torquati, “Parallel visual data restoration on multi-GPGPUs using stencil-reduce pattern,” International journal of high performance computing applications, vol. 29, iss. 4, pp. 461-472, 2015. doi:10.1177/1094342014567907

In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented.

@article{ff:denoiser:ijhpca:15,
abstract = {In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented.},
author = {Marco Aldinucci and Guilherme {Peretti Pezzi} and Maurizio Drocco and Concetto Spampinato and Massimo Torquati},
date-modified = {2015-09-24 11:21:20 +0000},
doi = {10.1177/1094342014567907},
journal = {International Journal of High Performance Computing Applications},
keywords = {fastflow, paraphrase, impact, nvidia},
number = {4},
pages = {461-472},
title = {Parallel Visual Data Restoration on Multi-{GPGPUs} using Stencil-Reduce Pattern},
volume = {29},
year = {2015},
bdsk-url-2 = {http://dx.doi.org/10.1177/1094342014567907}
}

• F. Tordini, M. Drocco, I. Merelli, L. Milanesi, P. Liò, and M. Aldinucci, “NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data,” in Proc. of 11th intl. meeting on computational intelligence methods for bioinformatics and biostatistics (cibb), Cambridge, UK, 2015, pp. 298-311. doi:10.1007/978-3-319-24462-4_25

Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.

@inproceedings{14:ff:nuchart:cibb,
abstract = {Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.},
author = {Fabio Tordini and Maurizio Drocco and Ivan Merelli and Luciano Milanesi and Pietro Li{\o} and Marco Aldinucci},
booktitle = {Proc. of 11th Intl. Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB)},
date-modified = {2017-12-12 15:19:25 +0000},
doi = {10.1007/978-3-319-24462-4_25},
editor = {Clelia Di Serio and Pietro Li{\{o}} and Alessandro Nonis and Roberto Tagliaferri},
isbn = {978-3-319-24461-7},
keywords = {fastflow, bioinformatics, paraphrase, repara, interomics, mimomics, hirma},
month = jun,
pages = {298-311},
publisher = {Springer},
series = {{LNCS}},
title = {{NuChart-II}: a graph-based approach for the analysis and interpretation of {Hi-C} data},
volume = {8623},
year = {2015},
bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-24462-4_25}
}

### 2014

• M. Aldinucci, M. Torquati, M. Drocco, G. Peretti Pezzi, and C. Spampinato, “An overview of fastflow: combining pattern-level abstraction and efficiency in GPGPUs,” in Gpu technology conference (gtc), San Jose, CA, USA, 2014.

Get an overview of FastFlow’s parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. For a more detailed and technical review of FastFlow’s parallel patterns as well as a use case where we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.

@inproceedings{ff:gtc:2014:short,
abstract = {Get an overview of FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. For a more detailed and technical review of FastFlow's parallel patterns as well as a use case where we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.},
address = {San Jose, CA, USA},
author = {Marco Aldinucci and Massimo Torquati and Maurizio Drocco and Guilherme {Peretti Pezzi} and Concetto Spampinato},
booktitle = {GPU Technology Conference (GTC)},
date-modified = {2017-12-12 13:54:20 +0000},
keywords = {fastflow, gpu, nvidia, impact, paraphrase},
month = mar,
title = {An Overview of FastFlow: Combining Pattern-Level Abstraction and Efficiency in {GPGPUs}},
year = {2014},
}

• M. Aldinucci, M. Torquati, M. Drocco, G. Peretti Pezzi, and C. Spampinato, “Fastflow: combining pattern-level abstraction and efficiency in GPGPUs,” in Gpu technology conference (gtc), San Jose, CA, USA, 2014.

Learn how FastFlow’s parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.

@inproceedings{ff:gtc:2014,
abstract = {Learn how FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.},
address = {San Jose, CA, USA},
author = {Marco Aldinucci and Massimo Torquati and Maurizio Drocco and Guilherme {Peretti Pezzi} and Concetto Spampinato},
booktitle = {GPU Technology Conference (GTC)},
date-modified = {2017-12-12 13:54:25 +0000},
keywords = {fastflow, gpu, nvidia, impact, paraphrase},
month = mar,
title = {FastFlow: Combining Pattern-Level Abstraction and Efficiency in {GPGPUs}},
year = {2014},
}

• M. Aldinucci, M. Torquati, C. Spampinato, M. Drocco, C. Misale, C. Calcagno, and M. Coppo, “Parallel stochastic systems biology in the cloud,” Briefings in bioinformatics, vol. 15, iss. 5, pp. 798-813, 2014. doi:10.1093/bib/bbt040

The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud.

@article{cwc:cloud:bib:13,
abstract = {The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud.},
author = {Marco Aldinucci and Massimo Torquati and Concetto Spampinato and Maurizio Drocco and Claudia Misale and Cristina Calcagno and Mario Coppo},
date-modified = {2015-09-27 12:33:52 +0000},
doi = {10.1093/bib/bbt040},
issn = {1467-5463},
journal = {Briefings in Bioinformatics},
keywords = {fastflow, bioinformatics, cloud, paraphrase, impact, biobits},
number = {5},
pages = {798-813},
title = {Parallel stochastic systems biology in the cloud},
url = {http://alpha.di.unito.it/storage/papers/2013_ff_bio_cloud_briefings.pdf},
volume = {15},
year = {2014},
bdsk-url-1 = {http://alpha.di.unito.it/storage/papers/2013_ff_bio_cloud_briefings.pdf},
bdsk-url-2 = {https://doi.org/10.1093/bib/bbt040}
}

• M. Aldinucci, C. Calcagno, M. Coppo, F. Damiani, M. Drocco, E. Sciacca, S. Spinella, M. Torquati, and A. Troina, “On designing multicore-aware simulators for systems biology endowed with on-line statistics,” Biomed research international, 2014. doi:10.1155/2014/207041

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.

@article{cwcsim:ff:multicore:biomed:14,
abstract = {The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.},
author = {Marco Aldinucci and Cristina Calcagno and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Eva Sciacca and Salvatore Spinella and Massimo Torquati and Angelo Troina},
date-modified = {2015-09-27 12:17:05 +0000},
doi = {10.1155/2014/207041},
journal = {BioMed Research International},
keywords = {fastflow,bioinformatics, paraphrase, biobits},
title = {On designing multicore-aware simulators for systems biology endowed with on-line statistics},
year = {2014},
bdsk-url-3 = {http://dx.doi.org/10.1155/2014/207041}
}

• M. Aldinucci, M. Drocco, G. Peretti Pezzi, C. Misale, F. Tordini, and M. Torquati, “Exercising high-level parallel programming on streams: a systems biology use case,” in Proc. of 34th ieee intl. conference on distributed computing systems workshops (icdcsw), Madrid, Spain, 2014. doi:10.1109/ICDCSW.2014.38

The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.

@inproceedings{cwc:gpu:dcperf:14,
abstract = {The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.},
author = {Marco Aldinucci and Maurizio Drocco and Guilherme {Peretti Pezzi} and Claudia Misale and Fabio Tordini and Massimo Torquati},
booktitle = {Proc. of 34th IEEE Intl. Conference on Distributed Computing Systems Workshops (ICDCSW)},
date-modified = {2017-12-12 13:53:58 +0000},
doi = {10.1109/ICDCSW.2014.38},
keywords = {fastflow, gpu, bioinformatics, paraphrase, impact, nvidia},
publisher = {IEEE},
title = {Exercising high-level parallel programming on streams: a systems biology use case},
year = {2014},
bdsk-url-2 = {http://dx.doi.org/10.1109/ICDCSW.2014.38}
}

• M. Aldinucci, G. Peretti Pezzi, M. Drocco, F. Tordini, P. Kilpatrick, and M. Torquati, “Parallel video denoising on heterogeneous platforms,” in Proc. of intl. workshop on high-level programming for heterogeneous and hierarchical parallel systems (hlpgpu), 2014.

In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.

@inproceedings{ff:video:hlpgpu:14,
abstract = {In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.},
author = {Marco Aldinucci and Guilherme {Peretti Pezzi} and Maurizio Drocco and Fabio Tordini and Peter Kilpatrick and Massimo Torquati},
booktitle = {Proc. of Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU)},
date-modified = {2015-09-27 12:42:02 +0000},
keywords = {fastflow, paraphrase, impact},
title = {Parallel video denoising on heterogeneous platforms},
year = {2014},
}

• M. Drocco, M. Aldinucci, and M. Torquati, “A dynamic memory allocator for heterogeneous platforms,” in Advanced computer architecture and compilation for high-performance and embedded systems (acaces) – poster abstracts, Fiuggi, Italy, 2014.

Modern computers are built upon heterogeneous multi-core/many cores architectures (e.g. GPGPU connected to multi-core CPU). Achieving peak performance on these architectures is hard and may require a substantial programming effort. High-level programming patterns, coupled with efficient low-level runtime supports, have been proposed to relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance. Among them are (parallel) dynamic memory allocation and effective exploitation of the memory hierarchy. The memory allocator is often a bottleneck that severely limits program scalability, robustness and portability on parallel systems. In this work we introduce a novel memory allocator, based on the FastFlow’s allocator and the recently proposed CUDA Unified Memory, which aims to efficiently integrate host and device memories into a unique dynamic-allocable memory space, accessible transparently by both host and device code.

@inproceedings{ff:acaces:14,
abstract = {Modern computers are built upon heterogeneous multi-core/many cores architectures (e.g. GPGPU connected to multi-core CPU). Achieving peak performance on these architectures is hard and may require a substantial programming effort. High-level programming patterns, coupled with efficient low-level runtime supports, have been proposed to relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance. Among them are (parallel) dynamic memory allocation and effective exploitation of the memory hierarchy. The memory allocator is often a bottleneck that severely limits program scalability, robustness and portability on parallel systems.
In this work we introduce a novel memory allocator, based on the FastFlow's allocator and the recently proposed CUDA Unified Memory, which aims to efficiently integrate host and device memories into a unique dynamic-allocable memory space, accessible transparently by both host and device code.},
author = {Maurizio Drocco and Marco Aldinucci and Massimo Torquati},
booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
date-modified = {2016-08-20 17:29:47 +0000},
keywords = {fastflow, nvidia},
publisher = {HiPEAC},
title = {A Dynamic Memory Allocator for heterogeneous platforms},
year = {2014},
}

### 2013

• M. Drocco, “Parallel stochastic simulators in systems biology: the evolution of the species,” Master Thesis, 2013.

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It is often an enlightening technique, especially for multi-stable systems whose dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation/analysis workflow may however result in being computationally expensive, thus compromising the interactivity required especially in model tuning. In this work we discuss the main opportunities to speed up the framework by parallelisation on modern multicore and hybrid multicore and distributed platforms, advocating the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators endowed with on-line statistical analysis. In particular, the Calculus of Wrapped Compartments (CWC) Simulator, which is designed according to the FastFlow’s pattern-based approach, is presented and discussed in this work.

@mastersthesis{tesi:drocco:13,
abstract = {The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It is often an enlightening technique, especially for multi-stable systems whose dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation/analysis workflow may however result in being computationally expensive, thus compromising the interactivity required especially in model tuning. In this work we discuss the main opportunities to speed up the framework by parallelisation on modern multicore and hybrid multicore and distributed platforms, advocating the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators endowed with on-line statistical analysis. In particular, the Calculus of Wrapped Compartments (CWC) Simulator, which is designed according to the FastFlow's pattern-based approach, is presented and discussed in this work.},
author = {Maurizio Drocco},
date-modified = {2013-11-24 00:29:54 +0000},
keywords = {fastflow},
month = jul,
school = {Computer Science Department, University of Torino, Italy},
title = {Parallel stochastic simulators in systems biology: the evolution of the species},
year = {2013},
}

• M. Aldinucci, F. Tordini, M. Drocco, M. Torquati, and M. Coppo, “Parallel stochastic simulators in system biology: the evolution of the species,” in Proc. of 21st euromicro intl. conference on parallel distributed and network-based processing (pdp), Belfast, Nothern Ireland, U.K., 2013. doi:10.1109/PDP.2013.66

The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow’s pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.

@inproceedings{ff_cwc_distr:pdp:13,
abstract = {The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow's pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.},
address = {Belfast, Nothern Ireland, U.K.},
author = {Marco Aldinucci and Fabio Tordini and Maurizio Drocco and Massimo Torquati and Mario Coppo},
booktitle = {Proc. of 21st Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2017-12-12 13:53:10 +0000},
doi = {10.1109/PDP.2013.66},
keywords = {fastflow, bioinformatics},
month = feb,
publisher = {IEEE},
title = {Parallel stochastic simulators in system biology: the evolution of the species},
year = {2013},
bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2013.66}
}

### 2012

• M. Aldinucci, C. Spampinato, M. Drocco, M. Torquati, and S. Palazzo, “A parallel edge preserving algorithm for salt and pepper image denoising,” in Proc. of 2nd intl. conference on image processing theory tools and applications (ipta), Istambul, Turkey, 2012, pp. 97-102. doi:10.1109/IPTA.2012.6469567

In this paper a two-phase filter for removing “salt and pepper” noise is proposed. In the first phase, an adaptive median filter is used to identify the set of the noisy pixels; in the second phase, these pixels are restored according to a regularization method, which contains a data-fidelity term reflecting the impulse noise characteristics. The algorithm, which exhibits good performance both in denoising and in restoration, can be easily and effectively parallelized to exploit the full power of multi-core CPUs and GPGPUs; the proposed implementation based on the FastFlow library achieves both close-to-ideal speedup and very good wall-clock execution figures.

@inproceedings{denoiser:ff:ipta:12,
abstract = {In this paper a two-phase filter for removing salt and pepper'' noise is proposed. In the first phase, an adaptive median filter is used to identify the set of the noisy pixels; in the second phase, these pixels are restored according to a regularization method, which contains a data-fidelity term reflecting the impulse noise characteristics. The algorithm, which exhibits good performance both in denoising and in restoration, can be easily and effectively parallelized to exploit the full power of multi-core CPUs and GPGPUs; the proposed implementation based on the FastFlow library achieves both close-to-ideal speedup and very good wall-clock execution figures.},
author = {Marco Aldinucci and Concetto Spampinato and Maurizio Drocco and Massimo Torquati and Simone Palazzo},
booktitle = {Proc. of 2nd Intl. Conference on Image Processing Theory Tools and Applications (IPTA)},
date-modified = {2015-09-27 12:53:53 +0000},
doi = {10.1109/IPTA.2012.6469567},
editor = {K. Djemal and M. Deriche and W. Puech and Osman N. Ucan},
isbn = {978-1-4673-2582-0},
keywords = {fastflow, impact},
month = oct,
pages = {97-102},
publisher = {IEEE},
title = {A Parallel Edge Preserving Algorithm for Salt and Pepper Image Denoising},
year = {2012},
bdsk-url-2 = {http://dx.doi.org/10.1109/IPTA.2012.6469567}
}

• M. Coppo, F. Damiani, M. Drocco, E. Grassi, E. Sciacca, S. Spinella, and A. Troina, “Simulation techniques for the calculus of wrapped compartments,” Theoretical computer science, vol. 431, pp. 75-95, 2012. doi:10.1016/j.tcs.2011.12.063
[BibTeX] [Abstract]

The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of Ordinary Differential Equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyse the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in the Calculus of Wrapped Compartments (CWC), a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.

@article{DBLP:journals/tcs/CoppoDDGSST12,
abstract = {The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of Ordinary Differential Equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyse the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in the Calculus of Wrapped Compartments (CWC), a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.},
author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Eva Sciacca and Salvatore Spinella and Angelo Troina},
bibsource = {DBLP, http://dblp.uni-trier.de},
date-modified = {2013-12-13 10:37:47 +0000},
doi = {10.1016/j.tcs.2011.12.063},
ee = {http://dx.doi.org/10.1016/j.tcs.2011.12.063},
journal = {Theoretical Computer Science},
pages = {75-95},
title = {Simulation techniques for the calculus of wrapped compartments},
volume = {431},
year = {2012},
bdsk-url-1 = {http://dx.doi.org/10.1016/j.tcs.2011.12.063}
}

• M. Aldinucci, M. Coppo, F. Damiani, M. Drocco, E. Sciacca, S. Spinella, M. Torquati, and A. Troina, “On parallelizing on-line statistics for stochastic biological simulations,” in Proc. of euro-par workshops: 2nd workshop on high performance bioinformatics and biomedicine (hibb), Bordeaux, France, 2012, pp. 3-12. doi:10.1007/978-3-642-29740-3_2

This work concerns a general technique to enrich parallel version of stochastic simulators for biological systems with tools for on-line statistical analysis of the results. In particular, within the FastFlow parallel programming framework, we describe the methodology and the implementation of a parallel Monte Carlo simulation infrastructure extended with user-defined on-line data filtering and mining functions. The simulator and the on-line analysis were validated on large multi-core platforms and representative proof-of-concept biological systems.

@inproceedings{cwcsim:onlinestats:ff:hibb:11,
abstract = {This work concerns a general technique to enrich parallel version of stochastic simulators for biological systems with tools for on-line statistical analysis of the results. In particular, within the FastFlow parallel programming framework, we describe the methodology and the implementation of a parallel Monte Carlo simulation infrastructure extended with user-defined on-line data filtering and mining functions. The simulator and the on-line analysis were validated on large multi-core platforms and representative proof-of-concept biological systems.},
author = {Marco Aldinucci and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Eva Sciacca and Salvatore Spinella and Massimo Torquati and Angelo Troina},
booktitle = {Proc. of Euro-Par Workshops: 2nd Workshop on High Performance Bioinformatics and Biomedicine (HiBB)},
date-modified = {2017-12-12 14:47:15 +0000},
doi = {10.1007/978-3-642-29740-3_2},
editor = {Michael Alexander and Pasqua D'Ambra and Adam Belloum and George Bosilca and Mario Cannataro and Marco Danelutto and Beniamino Di Martino and Michael Gerndt and Emmanuel Jeannot and Raymond Namyst and Jean Roman and Stephen L. Scott and Jesper Larsson Tr{\"a}ff and Geoffroy Vall{\'e}e and Josef Weidendorfer},
keywords = {bioinformatics, fastflow},
pages = {3-12},
publisher = {Springer},
series = {LNCS},
title = {On Parallelizing On-Line Statistics for Stochastic Biological Simulations},
volume = {7156},
year = {2012},
bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-29740-3_2}
}

### 2011

• M. Coppo, F. Damiani, M. Drocco, E. Grassi, M. Guether, and A. Troina, “Modelling ammonium transporters in arbuscular mycorrhiza symbiosis,” Transactions on computational systems biology, vol. 6575, iss. 13, pp. 85-109, 2011. doi:10.1007/978-3-642-19748-2_5
[BibTeX] [Abstract]

The Stochastic Calculus of Wrapped Compartments (SCWC) is a recently proposed variant of the Stochastic Calculus of Looping Sequences (SCLS), a language for the representation and simulation of biological systems. In this work we apply SCWC to model a newly discovered ammonium transporter. This transporter is believed to play a fundamental role for plant mineral acquisition, which takes place in the arbuscular mycorrhiza, the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In our experiments the passage of NH3/NH4+ from the fungus to the plant has been dissected in known and hypothetical mechanisms; with the model so far we have been able to simulate the behavior of the system under different conditions. Our simulations confirmed some of the latest experimental results about the LjAMT2;2 transporter. Moreover, by comparing the behaviour of LjAMT2;2 with the behaviour of another ammonium transporter which exists in plants, viz. LjAMT1;1, our simulations support an hypothesis about why LjAMT2;2 is so selectively expressed in arbusculated cells.

@article{DBLP:journals/tcsb/Coppo/DDGGT11,
abstract = {The Stochastic Calculus of Wrapped Compartments (SCWC) is a recently proposed variant of the Stochastic Calculus of Looping Sequences (SCLS), a language for the representation and simulation of biological systems. In this work we apply SCWC to model a newly discovered ammonium transporter. This transporter is believed to play a fundamental role for plant mineral acquisition, which takes place in the arbuscular mycorrhiza, the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In our experiments the passage of NH3/NH4+ from the fungus to the plant has been dissected in known and hypothetical mechanisms; with the model so far we have been able to simulate the behavior of the system under different conditions. Our simulations confirmed some of the latest experimental results about the LjAMT2;2 transporter. Moreover, by comparing the behaviour of LjAMT2;2 with the behaviour of another ammonium transporter which exists in plants, viz. LjAMT1;1, our simulations support an hypothesis about why LjAMT2;2 is so selectively expressed in arbusculated cells.},
author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Mike Guether and Angelo Troina},
date-modified = {2017-12-12 13:50:01 +0000},
doi = {10.1007/978-3-642-19748-2_5},
journal = {Transactions on Computational Systems Biology},
number = {13},
pages = {85-109},
title = {Modelling Ammonium Transporters in Arbuscular Mycorrhiza Symbiosis},
volume = {6575},
year = {2011},
bdsk-url-1 = {http://dx.doi.org/10.1007/978-3-642-19748-2_5}
}

• C. Calcagno, M. Coppo, F. Damiani, M. Drocco, E. Sciacca, S. Spinella, and A. Troina, “Modelling spatial interactions in the arbuscular mycorrhizal symbiosis using the calculus of wrapped compartments,” in Proc. of 3rd intl. workshop on computational models for cell processes (compmod), Aachen, Germany, 2011, pp. 3-18.
[BibTeX] [Abstract]

Arbuscular mycorrhiza (AM) is the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In previous work we used the Calculus of Wrapped Compartments (CWC) to investigate different phases of the AM symbiosis. In this paper, we continue this line of research by modelling the colonisation of the plant root cells by the fungal hyphae spreading in the soil. This study requires the description of some spatial interaction. Although CWC has no explicit feature modelling a spatial geometry, the compartment labelling feature can be effectively exploited to define a discrete surface topology outlining the relevant sectors which determine the spatial properties of the system under consideration. Different situations and interesting spatial properties can be modelled and analysed in such a lightweight framework (which has not an explicit notion of geometry with coordinates and spatial metrics), thus exploiting the existing CWC simulation tool.

@inproceedings{DBLP:journals/corr/abs-1109-1363,
abstract = {Arbuscular mycorrhiza (AM) is the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In previous work we used the Calculus of Wrapped Compartments (CWC) to investigate different phases of the AM symbiosis. In this paper, we continue this line of research by modelling the colonisation of the plant root cells by the fungal hyphae spreading in the soil. This study requires the description of some spatial interaction. Although CWC has no explicit feature modelling a spatial geometry, the compartment labelling feature can be effectively exploited to define a discrete surface topology outlining the relevant sectors which determine the spatial properties of the system under consideration. Different situations and interesting spatial properties can be modelled and analysed in such a lightweight framework (which has not an explicit notion of geometry with coordinates and spatial metrics), thus exploiting the existing CWC simulation tool.},
author = {Cristina Calcagno and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Eva Sciacca and Salvatore Spinella and Angelo Troina},
bibsource = {DBLP, http://dblp.uni-trier.de},
booktitle = {Proc. of 3rd Intl. Workshop on Computational Models for Cell Processes (CompMod)},
date-modified = {2017-12-12 13:51:04 +0000},
editor = {Ion Petre and Erik P. de Vink},
ee = {http://dx.doi.org/10.4204/EPTCS.67.3},
month = sep,
pages = {3-18},
series = {EPTCS},
title = {Modelling Spatial Interactions in the Arbuscular Mycorrhizal Symbiosis using the Calculus of Wrapped Compartments},
volume = {67},
year = {2011}
}

• M. Aldinucci, M. Coppo, F. Damiani, M. Drocco, M. Torquati, and A. Troina, “On designing multicore-aware simulators for biological systems,” in Proc. of 19th euromicro intl. conference on parallel distributed and network-based processing (pdp), Ayia Napa, Cyprus, 2011, pp. 318-325. doi:10.1109/PDP.2011.81

The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.

@inproceedings{ff:cwc:pdp:11,
abstract = {The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.},
author = {Marco Aldinucci and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Massimo Torquati and Angelo Troina},
booktitle = {Proc. of 19th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP)},
date-modified = {2017-12-12 13:51:21 +0000},
doi = {10.1109/PDP.2011.81},
editor = {Yiannis Cotronis and Marco Danelutto and George Angelos Papadopoulos},
keywords = {fastflow},
month = feb,
pages = {318-325},
publisher = {IEEE},
title = {On Designing Multicore-Aware Simulators for Biological Systems},
year = {2011},
bdsk-url-1 = {http://arxiv.org/pdf/1010.2438v2},
bdsk-url-3 = {http://dx.doi.org/10.1109/PDP.2011.81}
}

### 2010

• M. Coppo, F. Damiani, M. Drocco, E. Grassi, E. Sciacca, S. Spinella, and A. Troina, “Hybrid calculus of wrapped compartments,” in Proc. of 4th workshop on membrane computing and biologically inspired process calculi (mecbic), Jena, Germany, 2010, pp. 102-120.
[BibTeX] [Abstract]

The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of ordinary differential equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyze the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in CWC, a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.

@inproceedings{DBLP:journals/corr/abs-1011-0494,
abstract = {The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of ordinary differential equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyze the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in CWC, a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.},
author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Eva Sciacca and Salvatore Spinella and Angelo Troina},
bibsource = {DBLP, http://dblp.uni-trier.de},
booktitle = {Proc. of 4th Workshop on Membrane Computing and Biologically Inspired Process Calculi (MeCBIC)},
date-modified = {2013-12-13 10:30:02 +0000},
editor = {Gabriel Ciobanu and Maciej Koutny},
ee = {http://dx.doi.org/10.4204/EPTCS.40.8},
month = aug,
pages = {102-120},
series = {EPTCS},
title = {Hybrid Calculus of Wrapped Compartments},
volume = {40},
year = {2010}
}

• M. Coppo, F. Damiani, M. Drocco, E. Grassi, and A. Troina, “Stochastic calculus of wrapped compartments,” in Proc. of 8th workshop on quantitative aspects of programming languages (qapl), Paphos, Cyprus, 2010, pp. 82-98.
[BibTeX] [Abstract]

The Calculus of Wrapped Compartments (CWC) is a variant of the Calculus of Looping Sequences (CLS). While keeping the same expressiveness, CWC strongly simplifies the development of automatic tools for the analysis of biological systems. The main simplification consists in the removal of the sequencing operator, thus lightening the formal treatment of the patterns to be matched in a term (whose complexity in CLS is strongly affected by the variables matching in the sequences). We define a stochastic semantics for this new calculus. As an application we model the interaction between macrophages and apoptotic neutrophils and a mechanism of gene regulation in E.Coli.

@inproceedings{DBLP:journals/corr/abs-1006-5099,
abstract = {The Calculus of Wrapped Compartments (CWC) is a variant of the Calculus of Looping Sequences (CLS). While keeping the same expressiveness, CWC strongly simplifies the development of automatic tools for the analysis of biological systems. The main simplification consists in the removal of the sequencing operator, thus lightening the formal treatment of the patterns to be matched in a term (whose complexity in CLS is strongly affected by the variables matching in the sequences).
We define a stochastic semantics for this new calculus. As an application we model the interaction between macrophages and apoptotic neutrophils and a mechanism of gene regulation in E.Coli.},
author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Angelo Troina},
bibsource = {DBLP, http://dblp.uni-trier.de},
booktitle = {Proc. of 8th Workshop on Quantitative Aspects of Programming Languages (QAPL)},
}`