Software & tools | Parallel Computing

RISC-V tools

We actively contribute to the RISC-V ecosystem via different software ports. As of today, these include:

  • Fastflow. Fastflow is a C++ framework for high-level pattern-based parallel programming performance (more info here). The RISC-V port is part of the official repository.
  • Pytorch. PyTorch is one of the most popular Python/C++ frameworks for training and using DNN models. The RISC-V port is available here.
  • OpenFL for RISC-V. We managed to port the official Intel® OpenFL federated learning framework to the RISC-V platform. The Python packages are available to be installed via pip from this repository.
    To properly add our proprietary repository to your pip configuration, just run
    pip config set global.index-url https://gitlab.di.unito.it/api/v4/projects/1057/packages/pypi/simple
    Then, to install OpenFL built for RISC-V, just run
    pip install openfl-riscv
    As side results, on this repository are also available the following RISC-V compatible Python packages: ninja (ninja-riscv), meson-python (meson-python-riscv), scipy (scipy-riscv), scikit-learn (scikit-learn-riscv).
Publications

Sorry, no publications matched your criteria.

Talks

Sorry, no publications matched your criteria.

FastFederatedLearning

Fast Federated Learning (FFL) is a C/C++-based Federated Learning framework built on top of the parallel programming FastFlow framework. It exploits the Cereal library to efficiently serialise the updates sent over the network and the libtorch library to fully bypass the need for Python code. The first release of this software comprises three examples based on three different communication topologies: master-worker, peer-to-peer, and tree-based.
FastFederatedLearning is freely available on GitHub under the LGPLv3 license. It has been successfully tested on x86_64, ARM, and RISC-V platforms. FFL has scripts for automatically installing the framework and reproducing all the experiments reported in the original paper. More information about software usage can be found on the official repository.

G. Mittone, N. Tonci, R. Birke, I. Colonnelli, D. Medić, A. Bartolini, R. Esposito, E. Parisi, F. Beneventi, M. Polato, M. Torquati, L. Benini, and M. Aldinucci, “Experimenting with Emerging RISC-V Systems for Decentralised Machine Learning“, 20th ACM International Conference on Computing Frontiers, 2023. DOI: 10.1145/3587135.3592211

Publications

Sorry, no publications matched your criteria.

Talks

Sorry, no publications matched your criteria.

OpenFL-extended

OpenFL-extended is an extended version of the official Intel® OpenFL federated learning (FL) framework. OpenFL-extended fully supports the standard FL workflow already provided by OpenFL, but in addition, it provides support for both federated bagging and federated boosting approaches. Federated bagging is implemented through simple bagging of models trained by different parties from the aggregator, while federated boosting is obtained employing the AdaBoost.F algorithm developed at the University of Turin[1]. Through these approaches, OpenFL extended is fully model-agnostic, which means that it can be used to build federations out of any Machine Learning model, not only Deep Neural Networks.
OpenFL extended is freely available on GitHub under the LGPLv3 license. It is fully Python-based and comes with a wide range of ready-made examples. It has been tested on x86_64, ARM and RISC-V architectures. More information about software usage can be found on the official repository.

This software’s publication is currently under review, but an open-access version of the paper is available on arXiv.

G. Mittone, W. Riviera, I. Colonnelli, R. Birke, M. Aldinucci, “Model-Agnostic Federated Learning“, arXive, 2023. DOI: 10.48550/arXiv.2303.04906
[1] M. Polato, R. Esposito, and M. Aldinucci. “Boosting the federation: Cross-silo federated learning without gradient descent.” 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022.

Publications

Sorry, no publications matched your criteria.

Talks

Sorry, no publications matched your criteria.

Jupyter Workflow

Jupyter Workflow

Jupyter Workflow is an extension of the IPython kernel designed to support distributed literate workflows. The Jupyter Workflow kernel enables Jupyter Notebooks to describe complex workflows and to execute them in a distributed fashion on hybrid cloud/HPC infrastructures. In particular, code cells are regarded as the nodes of a distributed workflow graph, whereas cell metadata are used to express data and control dependencies, parallel execution patterns (e.g. Scatter/Gather), and target execution infrastructures.

Jupyter Workflow code is available on GitHub under the LGPLv3 license, and the related Python package is downloadable from PyPI. More details about the tool and its applications can be found on the Jupyter Workkflow website.

I. Colonnelli, M. Aldinucci, B. Cantalupo, L. Padovani, S. Rabellino, C. Spampinato, R. Morelli, R. Di Carlo, N. Magini and C. Cavazzoni, “Distributed workflows with Jupyter”, Future Generation Computer Systems, vol. 128, pp. 282-298, 2022. doi: 10.1016/j.future.2021.10.007.

Publications

Sorry, no publications matched your criteria.

Talks

Sorry, no publications matched your criteria.

StreamFlow

The StreamFlow framework is a container-native Workflow Management System (WMS) written in Python 3 and based on the Common Workflow Language (CWL) Standard.

StreamFlow has been designed around two main principles:

  • Allowing the execution of tasks in multi-container environments in order to support the concurrent execution of multiple communicating tasks in a multi-agent ecosystem
  • Relaxing the requirement of a single shared data space to allow for hybrid workflow executions on top of multi-cloud or hybrid cloud/HPC infrastructures.

StreamFlow source code is available on GitHub under the LGPLv3 license. Moreover, a Python package is downloadable from PyPI and Docker containers can be found on Docker Hub. More details about the tool and its applications can be found on the StreamFlow website.
StreamFlow has been selected as an exploring technology by the EC Innovation Radar initiative.

I. Colonnelli, B. Cantalupo, I. Merelli and M. Aldinucci, “StreamFlow: cross-breeding cloud with HPC,” in IEEE Transactions on Emerging Topics in Computing, doi: 10.1109/TETC.2020.3019202.

Publications

Sorry, no publications matched your criteria.

Talks

Sorry, no publications matched your criteria.

CAPIO

CAPIO (Cross-Application Programmable I/O) is a middleware capable of transparently injecting I/O streaming capabilities into file-based workflows, improving the computation-I/O overlap without modifying the business code. The contribution is twofold: at design time, a new I/O coordination language allows users to annotate workflow data dependencies with synchronization semantics; at run time, a user-space software layer automatically turns a batch execution into a streaming execution according to the semantics expressed in the configuration file.

CAPIO is a libre software available on Github (https://github.com/High-Performance-IO/capio) under the LGPLv3 license

Publications

Sorry, no publications matched your criteria.

Talks

Sorry, no publications matched your criteria.

FastFlow

FastFlow | Parallel programming frameworks

FastFlow (斋戒流) is a C++ parallel programming framework advocating high-level, pattern-based parallel programming. It chiefly supports streaming and data parallelism, targeting heterogenous platforms composed of clusters of shared-memory platforms, possibly equipped with computing accelerators such as NVidia GPGPUs, Xeon Phi, Tilera TILE64.

At today,  FastFlow has been the background technology of 3 European Projects and 1 National project for an aggregate total cost of 12M € (ParaPhrase FP7, REPARA FP7, Rephrase H2020, and IMPACT, see projects section). We are still actively developing  FastFlow along with its underlying technology, and we are wide open to turn challenges in research and innovation. More details can be found in the main FastFlow website.

FastFlow comes as a C++ template library designed as a stack of layers that progressively abstracts the programming of parallel applications. The goal of the stack is threefold: portability, extensibility, and performance. For this, all three layers are realized as thin strata of C++ templates that are 1) seamlessly portable, 2) easily extended via subclassing, and 3) statically compiled and cross-optimized with the application. The terse design ensures easy portability on almost all OSes and CPUs with a C++ compiler.

More details in the FastFlow website.

Publications

2012

Marco Aldinucci, Marco Danelutto, Massimo Torquati

FastFlow tutorial Technical Report

Università di Pisa, Dipartimento di Informatica, Italy no. TR-12-04, 2012.

Links | BibTeX | Tags: fastflow

Marco Aldinucci, Marco Danelutto, Lorenzo Anardu, Massimo Torquati, Peter Kilpatrick

Parallel patterns + Macro Data Flow for multi-core programming Proceedings Article

In: Proc. of Intl. Euromicro PDP 2012: Parallel Distributed and network-based Processing, pp. 27–36, IEEE, Garching, Germany, 2012.

Abstract | Links | BibTeX | Tags: fastflow

Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Eva Sciacca, Salvatore Spinella, Massimo Torquati, Angelo Troina

On Parallelizing On-Line Statistics for Stochastic Biological Simulations Proceedings Article

In: Alexander, Michael, D'Ambra, Pasqua, Belloum, Adam, Bosilca, George, Cannataro, Mario, Danelutto, Marco, Martino, Beniamino Di, Gerndt, Michael, Jeannot, Emmanuel, Namyst, Raymond, Roman, Jean, Scott, Stephen L., Träff, Jesper Larsson, Vallée, Geoffroy, Weidendorfer, Josef (Ed.): Proc. of Euro-Par Workshops: 2nd Workshop on High Performance Bioinformatics and Biomedicine (HiBB), pp. 3–12, Springer, Bordeaux, France, 2012.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

Fabio Tordini, Marco Aldinucci, Massimo Torquati

High-level lock-less programming for multicore Proceedings Article

In: Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) – Poster Abstracts, HiPEAC, Fiuggi, Italy, 2012, ISBN: 9789038219875.

Abstract | Links | BibTeX | Tags: fastflow

2011

Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, Massimo Torquati

Accelerating code on multi-cores with FastFlow Proceedings Article

In: Jeannot, E., Namyst, R., Roman, J. (Ed.): Proc. of 17th Intl. Euro-Par 2011 Parallel Processing, pp. 170–181, Springer, Bordeaux, France, 2011.

Abstract | Links | BibTeX | Tags: fastflow

Marco Aldinucci, Maurizio Drocco, Daniela Giordano, Concetto Spampinato, Massimo Torquati

A Parallel Edge Preserving Algorithm for Salt and Pepper Image Denoising Technical Report

Università degli Studi di Torino, Dip. di Informatica, Italy no. 138/2011, 2011.

Links | BibTeX | Tags: fastflow

Marco Aldinucci, Salvatore Ruggieri, Massimo Torquati

Porting Decision Tree Building and Pruning Algorithms to Multicore using FastFlow Technical Report

Università di Pisa, Dipartimento di Informatica, Italy no. TR-11-06, 2011.

Links | BibTeX | Tags: fastflow

Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, Angelo Troina

On Designing Multicore-Aware Simulators for Biological Systems Proceedings Article

In: Cotronis, Yiannis, Danelutto, Marco, Papadopoulos, George Angelos (Ed.): Proc. of 19th Euromicro Intl. Conference on Parallel Distributed and network-based Processing (PDP), pp. 318–325, IEEE, Ayia Napa, Cyprus, 2011.

Abstract | Links | BibTeX | Tags: fastflow

2010

Marco Aldinucci, Mario Coppo, Ferruccio Damiani, Maurizio Drocco, Massimo Torquati, Angelo Troina

On Designing Multicore-Aware Simulators for Biological Systems Technical Report

Università degli Studi di Torino, Dipartimento di Informatica, Italy no. 131/2010, 2010.

BibTeX | Tags: fastflow

Marco Aldinucci, Salvatore Ruggieri, Massimo Torquati

Porting Decision Tree Algorithms to Multicore using FastFlow Proceedings Article

In: Balcázar, José L., Bonchi, Francesco, Gionis, Aristides, Sebag, Michèle (Ed.): Proc. of European Conference in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), pp. 7–23, Springer, Barcelona, Spain, 2010.

Abstract | Links | BibTeX | Tags: fastflow

Marco Aldinucci, Andrea Bracciali, Pietro Liò, Anil Sorathiya, Massimo Torquati

StochKit-FF: Efficient Systems Biology on Multicore Architectures Technical Report

Università di Pisa, Dipartimento di Informatica, Italy no. TR-10-12, 2010.

Abstract | Links | BibTeX | Tags: fastflow

Marco Aldinucci, Andrea Bracciali, Pietro Liò

Formal Synthetic Immunology Journal Article

In: ERCIM News, vol. 82, pp. 40–41, 2010, ISSN: 0926-4981.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

Marco Aldinucci, Salvatore Ruggieri, Massimo Torquati

Porting Decision Tree Algorithms to Multicore using FastFlow Technical Report

Università di Pisa, Dipartimento di Informatica, Italy no. TR-10-11, 2010.

Abstract | Links | BibTeX | Tags: fastflow

Marco Aldinucci, Massimiliano Meneghin, Massimo Torquati

Efficient Smith-Waterman on multi-core with FastFlow Proceedings Article

In: Danelutto, Marco, Gross, Tom, Bourgeois, Julien (Ed.): Proc. of Intl. Euromicro PDP 2010: Parallel Distributed and network-based Processing, pp. 195–199, IEEE, Pisa, Italy, 2010.

Abstract | Links | BibTeX | Tags: fastflow

Marco Aldinucci, Marco Danelutto, Peter Kilpatrick, Massimiliano Meneghin, Massimo Torquati

Accelerating sequential programs using FastFlow and self-offloading Technical Report

Università di Pisa, Dipartimento di Informatica, Italy no. TR-10-03, 2010.

Abstract | BibTeX | Tags: fastflow

Marco Aldinucci

Efficient Parallel MonteCarlo with FastFlow Book Section

In: HPC-Europa2: Science and Supercomputing in Europe, research highlights 2010, Cineca, 2010.

Abstract | Links | BibTeX | Tags: bioinformatics, fastflow

Marco Aldinucci, Marco Danelutto, Massimiliano Meneghin, Massimo Torquati, Peter Kilpatrick

Efficient streaming applications on multi-core with FastFlow: The biosequence alignment test-bed Book Chapter

In: vol. 19, pp. 273–280, Elsevier, 2010.

Abstract | Links | BibTeX | Tags: fastflow

2009

Marco Aldinucci, Massimo Torquati, Massimiliano Meneghin

FastFlow: Efficient Parallel Streaming Applications on Multi-core Technical Report

Università di Pisa, Dipartimento di Informatica, Italy no. TR-09-12, 2009.

Abstract | Links | BibTeX | Tags: fastflow

Talks

Sorry, no publications matched your criteria.

Discontinued Parallel Computing tools

Read more

Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations
M. Drocco, “Parallel programming with global asynchronous memory: models, C++ APIs and implementations,” PhD Thesis, 2017.  doi:10.5281/zenodo.1037585 

PiCo (Pipeline Composition) is an open-source C++11 header-only DSL for high-performance data analytics, featuring low latency, high throughput, and minimal memory footprint on multi-core platforms. For more information see the PiCo paper.

The full software package supporting the development of distributed and multi-core applications based on autonomic components and behavioural skeletons is available under GPL license. More information on the GridCOMP page. The Grid Component Model (GCM) has been standardised by ETSI: DTS/GRID-0004-1 (27/08/2008), DTS/GRID-004-2 (27/08/2008), DTS/GRID-0004-3 (20/03/2009), DTS/GRID-0004-4 (24/03/2010).

Muskel is a parallel programming library providing users with structured parallel constructs (skeletons) that can be used to implement efficient parallel applications. Muskel applications run on networks/clusters of workstations equipped with Java (1.5 or greater). The skeletons are implemented exploiting macro data flow technology. Muskel extends Lithium with many interesting features, in particular with adaptive and autonomic features.

AD-HOC (Adaptive Distributed Herd of Object Caches), is a fast and robust distributed object repository. It provides applications with a distributed storage manager that virtualise PC’s memories into a unique common distributed storage space. Ad-HOC can effectively be used to implement DSMs as well as distributed cache subsystems. a high-performance distributed shared memory server for cluster and grid, and its applications. ADHOC is a basic block enabling the development of shared memory run-time supports and applications for dynamic and unreliable executing environments (C++, GPL). The libraries and applications developed on top of ADHOC include:

  • parallel file system exhibiting the same API and better performance of the PVFS;
  • distributed cache that can be plugged in the Apache webserver with no modifications of Apache code. The cache substantially improve web server farm performance with no additional costs;
  • a Distributed Shared Memory (DSM) for ASSIST.

ASSIST (A Software development System based on Integrated Skeleton Technology) is a parallel programming environment based on skeleton and coordination language technology aimed at the development of distributed high-performance applications. ASSIST applications should be compiled in binary packages that can be deployed and run on grids, including those exhibiting heterogeneous platforms. Deployment and run are provided through standard middleware services (e.g. Globus) enriched with the ASSIST run-time support. ASSIST applications are described by means of a coordination language, which can express arbitrary graphs of modules, interconnected by typed streams of data. For more information see ASSIST papers.

Lithium is a Java-based parallel programming library providing users with structured parallel constructs (patterns/skeletons) that can be used to implement efficient parallel applications on clusters. The skeletons (including pipe, farm, map, reduce, loop) are implemented exploiting macro data flow technology. Lithium skeletons admit a formal specification of both functional and extra-functional behaviour.

Eskimo (Easy SKeleton Interface – Memory Oriented), which was part of my PhD dissertation, is a first (maybe a bit naive) tentative to bring skeletal/pattern-based programming on the shared memory model. To my knowledge, there were no previous experiments since skeletal programming was exclusively living in the message passing arena. From a certain viewpoint, it can be considered an ancestor of Fastflow (and other libraries in this class, such as Intel TBB).

META is a toolkit for the source-to-source optimisation of pattern-based/skeletal parallel programs (OCaml, GPL). It includes a quite efficient subtree-matching implementation.

SkIE (Skeleton-based Integrated Environment) is a skeleton-based parallel programming environment. SkIE was an engineered version of P3L developed within Quadrics Supercomputing World (QSW) and Alenia Aerospace. Within QSW, I have designed and developed part of the compiler back-end.