Parallel programming frameworks and development tools

FastFlow

FastFlow | Parallel programming frameworks

FastFlow (斋戒流) is a C++ parallel programming framework advocating high-level, pattern-based parallel programming. It chiefly supports streaming and data parallelism, targeting heterogenous platforms composed of clusters of shared-memory platforms, possibly equipped with computing accelerators such as NVidia GPGPUs, Xeon Phi, Tilera TILE64.

The main design philosophy of FastFlow is to provide application designers with key features for parallel programming (e.g. time-to-market, portability, efficiency and performance portability) via suitable parallel programming abstractions and a carefully designed run-time support.

FastFlow comes as a C++ template library designed as a stack of layers that progressively abstracts out the programming of parallel applications. The goal of the stack is threefold: portability, extensibility, and performance. For this, all the three layers are realised as thin strata of C++ templates that are 1) seamlessly portable; 2) easily extended via subclassing; and 3) statically compiled and cross-optimised with the application. The terse design ensures easy portability on almost all OSes and CPUs with a C++ compiler. The main development platform is Linux/x86_64/gcc, but it has been tested also on various combinations of x86, x86_64, PPC, ARM, Tilera, NVidia with gcc, icc, Visual Studio on Linux, Mac OS, and Windows XP/7. The FastFlow core has been ported to ARM/iOS.

The FastFlow run-time support uses several techniques to efficiently support fine grain parallelism (and very high frequency streaming). Among these are:

  • non-blocking multi-threading with lock-less synchronisations;
  • zero-copy network messaging (via 0MQ/TCP and RDMA/Infiniband);
  • asynchronous data feeding for accelerator offloading.

FastFlow has been adopted by a number of research projects and third-party development initiatives, and has thus been tested in a variety of application scenarios: from systems biology to high-frequency trading.

FastFlow founding

At today,  FastFlow has been the background technology of 3 European Projects and 1 National project for an aggregate total cost of 12M € (ParaPhrase FP7, REPARA FP7, Rephrase H2020, and IMPACT, see projects section). We are still actively developing  FastFlow along with its underlying technology, and we are wide open to turn challenges in research and innovation. More details can be found in the main FastFlow website.

Application developed with FastFlow

FastFlow is a (pattern-based) parallel programming framework, not an end-user application. However, ease of development and the efficiency of parallel applications are the ultimate quality metrics for a programming environment. The FastFlow source code includes over one hundred micro-benchmarks and several complete applications, some of them developed in house, some others from the open source community. We keep to add benchmarks and applications in order to test every new feature against stable code (micro-benchmarks aand applications are run every night). Some of the applications have a specific interest per se, such as:

  • The CWC simulator for systems biology. Parallel Gillespie simulation pipelined with on-line stream-based data mining and statistics.
  • Bowtie-FF, the Bowtie2 tool made faster with FastFlow, two-fold faster the original in some cases (included in the FastFlow tarball).
  • Two-phase video restoration. The paradigmatic video-filtering application transparently implemented on multiple GPUs by way of the stencil-reduce pattern on top of both CUDA and OpenCL. Specifically, the restoration employs a very powerful variational approach. The application presented at NVidia GTC 2014.
  • The yadt classificator (Parallel C4.5). Tree building and pruning (management of dynamic data structures and irregular load balancing).
  • Peafowl a flexible and extensible Deep Packet Inspection (DPI) framework.
  • The Performance Enhancement Infrastructure (PEI), i.e. an  an autonomic computing framework supporting heterogeneous(CPU/GPU) platforms fully implemented with FastFlow.
  • PiCo (Pipeline Composition) is a Domain Specific Language for creating Data Analytics Pipelines.
    The main entity in PiCo is the Pipeline, basically a DAG-composition of processing elements represented by Operators, such as the map or the reduce operators.
    This model is intended to give the user a unique interface for both stream and batch processing, hiding completely data management and focusing only on operations. The DSL is entirely implemented in C++11, exploiting the FastFlow library as runtime.

Research papers, tutorial, reference manual

PiCo: Pipeline Composition (2017-now) TBD

FastFlow (2009-now)

FastFlow | Parallel programming frameworks

FastFlow (斋戒流) is a C++ parallel programming framework advocating high-level, pattern-based parallel programming. It chiefly supports streaming and data parallelism, targeting heterogenous platforms composed of clusters of shared-memory platforms, possibly equipped with computing accelerators such as NVidia GPGPUs, Xeon Phi, Tilera TILE64.

The main design philosophy of FastFlow is to provide application designers with key features for parallel programming (e.g. time-to-market, portability, efficiency and performance portability) via suitable parallel programming abstractions and a carefully designed run-time support.

FastFlow comes as a C++ template library designed as a stack of layers that progressively abstracts out the programming of parallel applications. The goal of the stack is threefold: portability, extensibility, and performance. For this, all the three layers are realised as thin strata of C++ templates that are 1) seamlessly portable; 2) easily extended via subclassing; and 3) statically compiled and cross-optimised with the application. The terse design ensures easy portability on almost all OSes and CPUs with a C++ compiler. The main development platform is Linux/x86_64/gcc, but it has been tested also on various combinations of x86, x86_64, PPC, ARM, Tilera, NVidia with gcc, icc, Visual Studio on Linux, Mac OS, and Windows XP/7. The FastFlow core has been ported to ARM/iOS.

The FastFlow run-time support uses several techniques to efficiently support fine grain parallelism (and very high frequency streaming). Among these are:

  • non-blocking multi-threading with lock-less synchronisations;
  • zero-copy network messaging (via 0MQ/TCP and RDMA/Infiniband);
  • asynchronous data feeding for accelerator offloading.

FastFlow has been adopted by a number of research projects and third-party development initiatives, and has thus been tested in a variety of application scenarios: from systems biology to high-frequency trading.

More details in the FastFlow website.