Papers | Parallel Computing

2018

Paolo Viviani, Marco Aldinucci, Roberto d'Ippolito, Jan Lemeire, Dean Vucinic

A Flexible Numerical Framework for Engineering—A Response Surface Modelling Application Book Chapter

In: Improved Performance of Materials: Design and Experimental Approaches, pp. 93–106, Springer International Publishing, Cham, 2018, ISBN: 978-3-319-59590-0.

Abstract | Links | BibTeX | Tags: repara, rephrase

@inbook{17:viviani:advstruct,

title = {A Flexible Numerical Framework for Engineering—A Response Surface Modelling Application},

author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito and Jan Lemeire and Dean Vucinic},

doi = {10.1007/978-3-319-59590-0_9},

isbn = {978-3-319-59590-0},

year  = {2018},

date = {2018-01-01},

booktitle = {Improved Performance of Materials: Design and Experimental Approaches},

pages = {93–106},

publisher = {Springer International Publishing},

address = {Cham},

abstract = {This work presents an innovative approach adopted for the development of a new numerical software framework for accelerating dense linear algebra calculations and its application within an engineering context. In particular, response surface models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different response surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using ensemble modelling techniques). On the other hand, the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a radial basis function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.},

keywords = {repara, rephrase},

pubstate = {published},

tppubtype = {inbook}

}

Marco Aldinucci, Marco Danelutto, Maurizio Drocco, Peter Kilpatrick, Claudia Misale, Guilherme Peretti Pezzi, Massimo Torquati

A Parallel Pattern for Iterative Stencil + Reduce Journal Article

In: Journal of Supercomputing, vol. 74, no. 11, pp. 5690–5705, 2018.

Abstract | Links | BibTeX | Tags: HPC, repara, rephrase

2017

Maurizio Drocco

Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations PhD Thesis

Computer Science Department, University of Torino, 2017.

Abstract | Links | BibTeX | Tags: fastflow, paraphrase, repara, rephrase, toreador

@phdthesis{17:gam:drocco:thesis,

title = {Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations},

author = {Maurizio Drocco},

url = {https://zenodo.org/record/1037585/files/Drocco_phd_thesis.pdf},

doi = {10.5281/zenodo.1037585},

year  = {2017},

date = {2017-10-01},

school = {Computer Science Department, University of Torino},

abstract = {In the realm of High Performance Computing (HPC), message passing has been the programming paradigm of choice for over twenty years. The durable MPI (Message Passing Interface) standard, with send/receive communication, broadcast, gather/scatter, and reduction collectives is still used to construct parallel programs where each communication is orchestrated by the de-vel-oper-based precise knowledge of data distribution and overheads; collective communications simplify the orchestration but might induce excessive synchronization. Early attempts to bring shared-memory programming model—with its programming adv-antages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issue was to combine performance and programmability with the memory consistency model. The recently proposed Partitioned Global Address Space (PGAS) model is a modern revamp of DSM that exposes data placement to enable optimizations based on locality, but it still addresses (simple) data-parallelism only and it relies on expensive sharing protocols. We advocate an alternative programming model for distributed computing based on a Global Asynchronous Memory (GAM), aiming to emphavoid coherency and consistency problems rather than solving them. We materialize GAM by designing and implementing a emphdistributed smart pointers library, inspired by C++ smart pointers. In this model, public and private pointers (resembling C++ shared and unique pointers, respectively) are moved around instead of messages (i.e., data), thus alleviating the user from the burden of minimizing transfers. On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers in fully asynchronous fashion. We demonstrate the validity of the proposed approach, from the expressiveness perspective, by showing how GAM nets can be exploited to implement higher-level parallel programming models, such as data and task parallelism. As for the performance perspective, the execution of two non-toy benchmarks on a number of different small-scale HPC clusters exhibits both close-to-ideal scalability and negligible overhead with respect to state-of-the-art benchmark implementations. For instance, the GAM implementation of a high-quality video restoration filter sustains a 100 fps throughput over 70%-noisy high-quality video streams on a 4-node cluster of Graphics Processing Units (GPUs), with minimal programming effort.},

keywords = {fastflow, paraphrase, repara, rephrase, toreador},

pubstate = {published},

tppubtype = {phdthesis}

}

In the realm of High Performance Computing (HPC), message passing has been the programming paradigm of choice for over twenty years. The durable MPI (Message Passing Interface) standard, with send/receive communication, broadcast, gather/scatter, and reduction collectives is still used to construct parallel programs where each communication is orchestrated by the de-vel-oper-based precise knowledge of data distribution and overheads; collective communications simplify the orchestration but might induce excessive synchronization. Early attempts to bring shared-memory programming model—with its programming adv-antages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issue was to combine performance and programmability with the memory consistency model. The recently proposed Partitioned Global Address Space (PGAS) model is a modern revamp of DSM that exposes data placement to enable optimizations based on locality, but it still addresses (simple) data-parallelism only and it relies on expensive sharing protocols. We advocate an alternative programming model for distributed computing based on a Global Asynchronous Memory (GAM), aiming to emphavoid coherency and consistency problems rather than solving them. We materialize GAM by designing and implementing a emphdistributed smart pointers library, inspired by C++ smart pointers. In this model, public and private pointers (resembling C++ shared and unique pointers, respectively) are moved around instead of messages (i.e., data), thus alleviating the user from the burden of minimizing transfers. On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers in fully asynchronous fashion. We demonstrate the validity of the proposed approach, from the expressiveness perspective, by showing how GAM nets can be exploited to implement higher-level parallel programming models, such as data and task parallelism. As for the performance perspective, the execution of two non-toy benchmarks on a number of different small-scale HPC clusters exhibits both close-to-ideal scalability and negligible overhead with respect to state-of-the-art benchmark implementations. For instance, the GAM implementation of a high-quality video restoration filter sustains a 100 fps throughput over 70%-noisy high-quality video streams on a 4-node cluster of Graphics Processing Units (GPUs), with minimal programming effort.

Claudia Misale

PiCo: A Domain-Specific Language for Data Analytics Pipelines PhD Thesis

Computer Science Department, University of Torino, 2017.

Abstract | Links | BibTeX | Tags: fastflow, paraphrase, repara, rephrase, toreador

@phdthesis{17:pico:misale:thesis,

title = {PiCo: A Domain-Specific Language for Data Analytics Pipelines},

author = {Claudia Misale},

url = {https://iris.unito.it/retrieve/handle/2318/1633743/320170/Misale_thesis.pdf},

doi = {10.5281/zenodo.579753},

year  = {2017},

date = {2017-05-01},

school = {Computer Science Department, University of Torino},

abstract = {In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. 

 From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. 

 The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. 

 Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world.},

keywords = {fastflow, paraphrase, repara, rephrase, toreador},

pubstate = {published},

tppubtype = {phdthesis}

}

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks.
From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics.
The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.
Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world.

Paolo Viviani, Massimo Torquati, Marco Aldinucci, Roberto d'Ippolito

Multiple back-end support for the Armadillo linear algebra interface Proceedings Article

In: In proc. of the 32nd ACM Symposium on Applied Computing (SAC), pp. 1566–1573, Marrakesh, Morocco, 2017.

Abstract | Links | BibTeX | Tags: HPC, repara, rephrase

Fabio Tordini, Maurizio Drocco, Claudia Misale, Luciano Milanesi, Pietro Liò, Ivan Merelli, Massimo Torquati, Marco Aldinucci

NuChart-II: the road to a fast and scalable tool for Hi-C data analysis Journal Article

In: International Journal of High Performance Computing Applications, vol. 31, no. 3, pp. 196–211, 2017.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, repara, rephrase

@article{16:ijhpca:nuchart,

title = {NuChart-II: the road to a fast and scalable tool for Hi-C data analysis},

author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Liò and Ivan Merelli and Massimo Torquati and Marco Aldinucci},

url = {https://iris.unito.it/retrieve/handle/2318/1607126/238747/main.pdf},

doi = {10.1177/1094342016668567},

year  = {2017},

date = {2017-01-01},

journal = {International Journal of High Performance Computing Applications},

volume = {31},

number = {3},

pages = {196–211},

abstract = {Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.},

keywords = {bioinformatics, fastflow, repara, rephrase},

pubstate = {published},

tppubtype = {article}

}

2016

Paolo Viviani, Marco Aldinucci, Roberto d'Ippolito

An hybrid linear algebra framework for engineering Proceedings Article

In: Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) – Poster Abstracts, Fiuggi, Italy, 2016.

Abstract | Links | BibTeX | Tags: HPC, repara

Paolo Viviani, Marco Aldinucci, Roberto d'Ippolito, Jean Lemeire, Dean Vucinic

A flexible numerical framework for engineering - a Response Surface Modelling application Unpublished

2016.

Abstract | BibTeX | Tags: HPC, repara, rephrase

@unpublished{16:acex:armadillo,

title = {A flexible numerical framework for engineering - a Response Surface Modelling application},

author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito and Jean Lemeire and Dean Vucinic},

year  = {2016},

date = {2016-01-01},

booktitle = {10th Intl. Conference on Advanced Computational Engineering and Experimenting (ACE-X)},

abstract = {This work presents the innovative approach adopted for the development of a new numerical software framework for accelerating Dense Linear Algebra calculations and its application within an engineering context. In particular, Response Surface Models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different Response Surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using Ensemble Modelling techniques). On the other hand, it is a known challenge that the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a Radial Basis Function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.},

keywords = {HPC, repara, rephrase},

pubstate = {published},

tppubtype = {unpublished}

}

Marco Aldinucci, Sonia Campa, Marco Danelutto, Peter Kilpatrick, Massimo Torquati

Pool Evolution: A Parallel Pattern for Evolutionary and Symbolic Computing Journal Article

In: International Journal of Parallel Programming, vol. 44, no. 3, pp. 531–551, 2016, ISSN: 0885-7458.

Abstract | Links | BibTeX | Tags: fastflow, paraphrase, repara

Fabio Tordini, Ivan Merelli, Pietro Liò, Luciano Milanesi, Marco Aldinucci

NuchaRt: embedding high-level parallel computing in R for augmented Hi-C data analysis Book Section

In: Publishing, Springer International (Ed.): Computational Intelligence Methods for Bioinformatics and Biostatistics, vol. 9874, pp. 259–272, Springer International Publishing, Cham (ZG), 2016, ISBN: 978-3-319-44331-7.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, repara

Manuel F. Dolz, David Rio Astorga, Javier Fernández, J. Daniel Garc'ıa, Félix Garc'ıa-Carballeira, Marco Danelutto, Massimo Torquati

Embedding Semantics of the Single-Producer/Single-Consumer Lock-Free Queue into a Race Detection Tool Proceedings Article

In: Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores, pp. 20–29, ACM, Barcelona, Spain, 2016, ISBN: 978-1-4503-4196-7.

Links | BibTeX | Tags: fastflow, repara

2015

Marco Aldinucci, Marco Danelutto, Maurizio Drocco, Peter Kilpatrick, Guilherme Peretti Pezzi, Massimo Torquati

The Loop-of-Stencil-Reduce paradigm Proceedings Article

In: Proc. of Intl. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara), pp. 172–177, IEEE, Helsinki, Finland, 2015.

Abstract | Links | BibTeX | Tags: fastflow, HPC, repara

Fabio Tordini, Maurizio Drocco, Ivan Merelli, Luciano Milanesi, Pietro Liò, Marco Aldinucci

NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data Proceedings Article

In: Serio, Clelia Di, Liò, Pietro, Nonis, Alessandro, Tagliaferri, Roberto (Ed.): Proc. of 11th Intl. Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB), pp. 298–311, Springer, Cambridge, UK, 2015, ISBN: 978-3-319-24461-7.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, paraphrase, repara

@inproceedings{14:ff:nuchart:cibb,

title = {NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data},

author = {Fabio Tordini and Maurizio Drocco and Ivan Merelli and Luciano Milanesi and Pietro Liò and Marco Aldinucci},

editor = {Clelia Di Serio and Pietro Liò and Alessandro Nonis and Roberto Tagliaferri},

url = {http://calvados.di.unipi.it/storage/paper_files/2014_nuchart_cibb.pdf},

doi = {10.1007/978-3-319-24462-4_25},

isbn = {978-3-319-24461-7},

year  = {2015},

date = {2015-06-01},

booktitle = {Proc. of 11th Intl. Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB)},

volume = {8623},

pages = {298–311},

publisher = {Springer},

address = {Cambridge, UK},

series = {LNCS},

abstract = {Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.},

keywords = {bioinformatics, fastflow, paraphrase, repara},

pubstate = {published},

tppubtype = {inproceedings}

}

Maurizio Drocco, Claudia Misale, Guilherme Peretti Pezzi, Fabio Tordini, Marco Aldinucci

Memory-Optimised Parallel Processing of Hi-C Data Proceedings Article

In: Proc. of 23rd Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP), pp. 1–8, IEEE, 2015.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, impact, paraphrase, repara

Fabio Tordini, Maurizio Drocco, Claudia Misale, Luciano Milanesi, Pietro Liò, Ivan Merelli, Marco Aldinucci

Parallel Exploration of the Nuclear Chromosome Conformation with NuChart-II Proceedings Article

In: Proc. of 23rd Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP), IEEE, 2015.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow, impact, paraphrase, repara

2014

Marco Aldinucci, Sonia Campa, Marco Danelutto, Peter Kilpatrick, Massimo Torquati

Pool evolution: a domain specific parallel pattern Proceedings Article

In: Proc.of the 7th Intl. Symposium on High-level Parallel Programming and Applications (HLPP), Amsterdam, The Netherlands, 2014.

Abstract | Links | BibTeX | Tags: fastflow, paraphrase, repara

Claudia Misale, Giulio Ferrero, Massimo Torquati, Marco Aldinucci

Sequence alignment tools: one parallel pattern to rule them all? Journal Article

In: BioMed Research International, 2014.