Maurizio got his PhD he is now postdoctoral researcher at Pacific Northwest National Laboratory (PNNL)

Maurizio Drocco got his PhD in Computer Science at University of Torino on October 2017 defending his thesis entitled “Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations” and tomorrow will start a new adventure as a Postdoctoral researcher at Pacific Northwest National Laboratory (PNNL), WA, USA.

Congratulations Maurizio. It has been a pleasure working with you for the past 4 years.

M. Drocco, “Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations,” PhD Thesis, 2017. doi:10.5281/zenodo.1037585

Viva snapshot

Viva committee: Prof. Jose Daniel Garcia Sanchez (UC3M, Madrid, Spain), Prof. Marco Danelutto Damiani (Universi†à di Pisa, Italy), Prof.Siegfried Benkner ( University of Vienna, Austria).

Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations

Drocco, Maurizio

Thesis supervisor

Aldinucci, Marco

In the realm of High-Performance Computing (HPC), message passing has been the programming paradigm of choice for over twenty years. The durable MPI (Message Passing Interface) standard, with send/receive communication, broadcast, gather/scatter, and reduction collectives are still used to construct parallel programs where each communication is orchestrated by the developer-based precise knowledge of data distribution and overheads; collective communications simplify the orchestration but might induce excessive synchronization.

Early attempts to bring shared-memory programming model—with its programming advantages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issues was to combine performance and programmability with the memory consistency model. The recently proposed Partitioned Global Address Space (PGAS) model is a modern revamp of DSM that exposes data placement to enable optimizations based on locality, but it still addresses (simple) data- parallelism only and it relies on expensive sharing protocols.

We advocate an alternative programming model for distributed computing based on a Global Asynchronous Memory (GAM), aiming to avoid coherency and consistency problems rather than solving them. We materialize GAM by designing and implementing a distributed smart pointers library, inspired by C++ smart pointers. In this model, public and private pointers (resembling C++ shared and unique pointers, respectively) are moved around instead of messages (i.e., data), thus alleviating the user from the burden of minimizing transfers. On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers in fully asynchronous fashion.

We demonstrate the validity of the proposed approach, from the expressiveness perspective, by showing how GAM nets can be exploited to implement both standalone applications and higher-level parallel programming models, such as data and task parallelism. As for the performance perspective, preliminary experiments show both close-to-ideal scalability and negligible overhead with respect to state-of-the-art benchmark implementations. For instance, the GAM implementation of a high-quality video restoration filter sustains a 100 fps throughput over 70%-noisy high-quality video streams on a 4-node cluster of Graphics Processing Units (GPUs), with minimal programming effort.

This entry was posted in news on by .

About Marco Aldinucci

Marco Aldinucci is an assistant professor at Computer Science Department of the University of Torino since 2008. Previously, he has been researcher at University of Pisa and Italian National Research Agency. He is the author of over a hundred papers in international journals and conference proceeding (Google scholar h-index 21). He has been participating in over 20 national and international research projects concerning parallel and autonomic computing. He is the recipient of the HPC Advisory Council University Award 2011 and the NVidia Research award 2013. He has been leading the “Low-Level Virtualization and Platform-Specific Deployment” workpackage within the EU-STREP FP7 ParaPhrase (Parallel Patterns for Adaptive Heterogeneous Multicore Systems) project, the GPGPU workpackage within the IMPACT project (Innovative Methods for Particle Colliders at the Terascale), and he is the contact person for University of Torino for the European Network of Excellence on High Performance and Embedded Architecture and Compilation. In the last year he delivered 5 invited talks in international workshops (March 2012 – March 2013). He co-designed, together with Massimo Torquati, the FastFlow programming framework and several other programming frameworks and libraries for parallel computing. His research is focused on parallel and distributed computing.