Parallel Computing Tools – Marco Aldinucci

Parallel Computing tools

FastFlow (2009-now)

FastFlow | Parallel programming frameworks

FastFlow (斋戒流) is a C++ parallel programming framework advocating high-level, pattern-based parallel programming. It chiefly supports streaming and data parallelism, targeting heterogenous platforms composed of clusters of shared-memory platforms, possibly equipped with computing accelerators such as NVidia GPGPUs, Xeon Phi, Tilera TILE64.

The main design philosophy of FastFlow is to provide application designers with key features for parallel programming (e.g. time-to-market, portability, efficiency and performance portability) via suitable parallel programming abstractions and a carefully designed run-time support.

FastFlow comes as a C++ template library designed as a stack of layers that progressively abstracts out the programming of parallel applications. The goal of the stack is threefold: portability, extensibility, and performance. For this, all the three layers are realised as thin strata of C++ templates that are 1) seamlessly portable; 2) easily extended via subclassing; and 3) statically compiled and cross-optimised with the application. The terse design ensures easy portability on almost all OSes and CPUs with a C++ compiler. The main development platform is Linux/x86_64/gcc, but it has been tested also on various combinations of x86, x86_64, PPC, ARM, Tilera, NVidia with gcc, icc, Visual Studio on Linux, Mac OS, and Windows XP/7. The FastFlow core has been ported to ARM/iOS.

The FastFlow run-time support uses several techniques to efficiently support fine grain parallelism (and very high frequency streaming). Among these are:

  • non-blocking multi-threading with lock-less synchronisations;
  • zero-copy network messaging (via 0MQ/TCP and RDMA/Infiniband);
  • asynchronous data feeding for accelerator offloading.

FastFlow has been adopted by a number of research projects and third-party development initiatives, and has thus been tested in a variety of application scenarios: from systems biology to high-frequency trading.

More details in the FastFlow website.

Discontinued Parallel Computing tools

GridCOMP GCM/ProActive (2006)

The full software package supporting the development of distributed and multi-core applications based on autonomic components and behavioral skeletons is available under GPL license.

Gzipped Tarball (gridcomp_P4_b2_21_01_09.tgz)

VirtuaLinux (2006)

VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of virtualized clusters with no single point of failure. VirtuaLinux architecture supports disk-less configurations and provides an efficient, iSCSI based abstraction of the SAN. Clusters running VirtuaLinux exhibits no master node to boost resilience and flexibity. Thanks to its storage virtualisation layer, VIrtuaLinux was able to deploy hundreds of VMs in few seconds. Actually VirtuaLinux realises a cloud (but the cloud word with the current meaning did not exist in 2006).

VirtuaLinux-1.0.7-release.iso (multi-tier – 1.9 GB), take a look at the project web site on sourceforge.

Muskel (2005)

Muskel is a parallel programming library providing users with structured parallel constructs (skeletons) that can be used to implement efficient parallel applications. Muskel applications run on networks/clusters of workstations equipped with Java (1.5 or greater). The skeletons are implemented exploiting macro data flow technology. Muskel extends Lithium with many interesting features, in particular with adaptive and autonomic features.

Ad-HOC (2004)

AD-HOC (Adaptive Distributed Herd of Object Caches), is a fast and robust distributed object repository. It provides applications with a distributed storage manager that virtualise PC’s memories into a unique common distributed storage space. Ad-HOC can effectively be used to implement DSMs as well as distributed cache subsystems. a high-performance distributed shared memory server for cluster and grid, and its applications. ADHOC is a basic block enabling the development of shared memory run-time supports and applications for dynamic and unreliable executing environments (C++, GPL). The libraries and applications developed on top of ADHOC include:

  • parallel file system exhibiting the same API and a better performance of the PVFS;
  • distributed cache that can be plugged in the Apache web server with no modifications of Apache code. The cache substantially improve web server farm performance with no additional costs;
  • a Distributed Shared Memory (DSM) for ASSIST.

ASSIST (2003)

ASSIST (A Software development System based on Integrated Skeleton Technology) is a parallel programming environment based on skeleton and coordination language technology aimed at the development of distributed high-performance applications. ASSIST applications should be compiled in binary packages that can be deployed and run on grids, including those exhibiting heterogeneous platforms. Deployment and run is provided through standard middleware services (e.g. Globus) enriched with the ASSIST run-time support. ASSIST applications are described by means of a coordination language, which can express arbitrary graphs of modules, interconnected by typed streams of data. For more information see ASSIST papers.

Lithium (2002)

Lithium is a Java-based parallel programming library providing users with structured parallel constructs (patterns/skeletons) that can be used to implement efficient parallel applications on clusters. The skeletons (including pipe, farm, map, reduce, loop) are implemented exploiting macro data flow technology. Lithium skeletons admit a formal specification both functional and extra-functional behavior.

Eskimo (2002)

Eskimo (Easy SKeleton Interface – Memory Oriented), which was part of my PhD dissertation, is a first (maybe a bit naive) tentative to bring skeletal/pattern-based programming on the shared memory model. To my knowledge, there was no previous experiments since skeletal programming was exclusively living in the message passing arena. From a certain viewpoint, it can be considered an ancestor of Fastflow (and other libraries in this class, such as Intel TBB).

Meta (2000)

META is a toolkit for the source-to-source optimisation of pattern-based/skeletal parallel programs (OCaml, GPL). It includes a quite efficient subtree-matching implementation.

SkIE (1998)

SkIE (Skeleton-based Integrated Environment) is a skeleton-based parallel programming environment. SkIE was an engineered version of P3L developed within Quadrics Supercomputing World (QSW) and Alenia Aerospace. Within QSW, I have designed and developed part of the compiler back-end.

P3L (1992)

P3L: Pisa Parallel Programming Language. If you think than Google MapReduce is an original idea, take a look at P3L paper. I did not directly participated to the design, I was a student at this point in time …