Parallel processing papers by year

2018

  • P. Viviani, M. Aldinucci, R. d’Ippolito, J. Lemeire, and D. Vucinic, “A Flexible Numerical Framework for Engineering—A Response Surface Modelling Application,” in Improved Performance of Materials: Design and Experimental Approaches, A. Öchsner and H. Altenbach, Eds., Cham: Springer International Publishing, 2018, pp. 93-106. doi:10.1007/978-3-319-59590-0_9
    [BibTeX] [Abstract] [Download PDF]

    This work presents an innovative approach adopted for the development of a new numerical software framework for accelerating dense linear algebra calculations and its application within an engineering context. In particular, response surface models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different response surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using ensemble modelling techniques). On the other hand, the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a radial basis function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.

    @inbook{17:viviani:advstruct,
      abstract = {This work presents an innovative approach adopted for the development of a new numerical software framework for accelerating dense linear algebra calculations and its application within an engineering context. In particular, response surface models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different response surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using ensemble modelling techniques). On the other hand, the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a radial basis function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.},
      address = {Cham},
      author = {Viviani, P. and Aldinucci, M. and d'Ippolito, R. and Lemeire, J. and Vucinic, D.},
      booktitle = {Improved Performance of Materials: Design and Experimental Approaches},
      doi = {10.1007/978-3-319-59590-0_9},
      editor = {{\"O}chsner, Andreas and Altenbach, Holm},
      isbn = {978-3-319-59590-0},
      pages = {93--106},
      publisher = {Springer International Publishing},
      title = {A Flexible Numerical Framework for Engineering---A Response Surface Modelling Application},
      url = {https://doi.org/10.1007/978-3-319-59590-0_9},
      year = {2018},
      bdsk-url-1 = {https://doi.org/10.1007/978-3-319-59590-0_9},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-59590-0_9}
    }

2017

  • M. Drocco, “Parallel Programming with Global Asynchronous Memory: Models, C++ APIs and Implementations,” PhD Thesis, 2017.
    [BibTeX] [Abstract]

    In the realm of High Performance Computing (HPC), message passing has been the programming paradigm of choice for over twenty years. The durable MPI (Message Passing Interface) standard, with send/receive communication, broadcast, gather/scatter, and reduction collectives is still used to construct parallel programs where each communication is orchestrated by the de\-vel\-oper-based precise knowledge of data distribution and overheads; collective communications simplify the orchestration but might induce excessive synchronization. Early attempts to bring shared-memory programming model—with its programming adv\-antages—to distributed computing, referred as the Distributed Shared Memory (DSM) model, faded away; one of the main issue was to combine performance and programmability with the memory consistency model. The recently proposed Partitioned Global Address Space (PGAS) model is a modern revamp of DSM that exposes data placement to enable optimizations based on locality, but it still addresses (simple) data-parallelism only and it relies on expensive sharing protocols. We advocate an alternative programming model for distributed computing based on a Global Asynchronous Memory (GAM), aiming to \emphavoid coherency and consistency problems rather than solving them. We materialize GAM by designing and implementing a \emphdistributed smart pointers library, inspired by C++ smart pointers. In this model, public and private pointers (resembling C++ shared and unique pointers, respectively) are moved around instead of messages (i.e., data), thus alleviating the user from the burden of minimizing transfers. On top of smart pointers, we propose a high-level C++ template library for writing applications in terms of dataflow-like networks, namely GAM nets, consisting of stateful processors exchanging pointers in fully asynchronous fashion. We demonstrate the validity of the proposed approach, from the expressiveness perspective, by showing how GAM nets can be exploited to implement higher-level parallel programming models, such as data and task parallelism. As for the performance perspective, the execution of two non-toy benchmarks on a number of different small-scale HPC clusters exhibits both close-to-ideal scalability and negligible overhead with respect to state-of-the-art benchmark implementations. For instance, the GAM implementation of a high-quality video restoration filter sustains a 100 fps throughput over 70\%-noisy high-quality video streams on a 4-node cluster of Graphics Processing Units (GPUs), with minimal programming effort.

    @phdthesis{17:gam:drocco:thesis,
      abstract = {In the realm of High Performance Computing (HPC), message passing 
    has been the programming paradigm of choice for over twenty years.
    The durable MPI (Message Passing Interface) standard, with send/receive 
    communication,
    broadcast, gather/scatter, and reduction collectives is still used to construct 
    parallel programs where each communication is orchestrated by the 
    de\-vel\-oper-based precise knowledge of data distribution and overheads; 
    collective communications simplify the orchestration but might induce excessive 
    synchronization.
    Early attempts to bring shared-memory programming model---with its programming 
    adv\-antages---to distributed computing, referred as the Distributed Shared 
    Memory (DSM) model, faded away; one of the main issue was to combine 
    performance and programmability with the memory consistency model.
    The recently proposed Partitioned Global Address Space (PGAS) model is a modern 
    revamp of DSM that exposes data placement to enable optimizations based on 
    locality, but it still addresses (simple) data-parallelism only and it relies 
    on expensive sharing protocols.
    We advocate an alternative programming model for distributed computing based on 
    a Global Asynchronous Memory (GAM), aiming to \emph{avoid} coherency and 
    consistency problems rather than solving them.
    We materialize GAM by designing and implementing a \emph{distributed smart 
    pointers} library, inspired by C++ smart pointers.
    In this model, public and private pointers (resembling C++ shared and unique 
    pointers, respectively) are moved around instead of messages (i.e., data), thus 
    alleviating the user from the burden of minimizing transfers.
    On top of smart pointers, we propose a high-level C++ template library for 
    writing applications in terms of dataflow-like networks, namely GAM nets, 
    consisting of stateful processors exchanging pointers in fully asynchronous 
    fashion.
    We demonstrate the validity of the proposed approach, from the expressiveness 
    perspective, by showing how GAM nets can be exploited to implement higher-level 
    parallel programming models, such as data and task parallelism.
    As for the performance perspective, the execution of two non-toy benchmarks on 
    a number of different small-scale HPC clusters exhibits both close-to-ideal 
    scalability and negligible overhead with respect to state-of-the-art benchmark 
    implementations.
    For instance, the GAM implementation of a high-quality video restoration filter 
    sustains a 100 fps throughput over 70\%-noisy high-quality video streams on a 
    4-node cluster of Graphics Processing Units (GPUs), with minimal programming 
    effort.},
      author = {Maurizio Drocco},
      keywords = {fastflow, rephrase, toreador, repara, paraphrase},
      month = {October},
      note = {To appear},
      school = {Computer Science Department, University of Torino},
      title = {Parallel Programming with Global Asynchronous Memory: Models, {C++} {API}s and Implementations},
      year = {2017}
    }

  • M. Torquati, G. Mencagli, M. Drocco, M. Aldinucci, T. D. Matteis, and M. Danelutto, “On Dynamic Memory Allocation in Sliding-Window Parallel Patterns for Streaming Analytics,” The Journal of Supercomputing, 2017. doi:10.1007/s11227-017-2152-1
    [BibTeX] [Abstract]

    This work studies the issues related to dynamic memory management in Data Stream Processing, an emerging paradigm enabling the real-time processing of live data streams. In this paper we consider two streaming parallel patterns and we discuss different implementation variants related on how dynamic memory is managed. The results show that the standard mechanisms provided by modern C++ are not entirely adequate for maximizing the performance. Instead, the combined use of an efficient general-purpose memory allocator, a custom allocator optimized for the pattern considered and a custom variant of the C++ shared pointer mechanism, provides a performance improvement up to 16\% on the best case.

    @article{17:dmadasp:jsupe,
      abstract = {This work studies the issues related to dynamic memory 
      management in Data Stream Processing, an emerging paradigm 
      enabling the real-time processing of live data streams.
      In this paper we consider two streaming parallel patterns and we discuss 
      different implementation variants related on how dynamic memory is managed. 
      The results show that the standard mechanisms provided by modern C++ are 
      not entirely adequate for maximizing the performance. Instead, the combined 
      use of an efficient general-purpose memory allocator, a custom allocator 
      optimized for the pattern considered and a custom variant of the C++ shared 
      pointer mechanism, provides a performance improvement up to 16{\%} on the 
      best case.},
      author = {Massimo Torquati and Gabriele Mencagli and Maurizio Drocco and Marco Aldinucci and Tiziano {De Matteis} and Marco Danelutto},
      date-modified = {2017-06-19 15:48:50 +0000},
      doi = {10.1007/s11227-017-2152-1},
      journal = {The Journal of Supercomputing},
      keywords = {Data Stream Processing, Modern C++, Dynamic Memory Allocation, Multicores, Stream Analytics, Parallel Patterns, rephrase},
      month = sep,
      title = {On Dynamic Memory Allocation in Sliding-Window Parallel Patterns for Streaming Analytics},
      year = 2017,
      bdsk-url-1 = {http://dx.doi.org/10.1007/s11227-017-2152-1}
    }

  • P. Severi, L. Padovani, E. Tuosto, and M. Dezani-Ciancaglini, “On Sessions and Infinite Data,” Logical Methods in Computer Science, vol. Volume 13, Issue 2, 2017. doi:10.23638/LMCS-13(2:9)2017
    [BibTeX] [Download PDF]
    @article{lmcs:3725,
      author = {Severi, Paula and Padovani, Luca and Tuosto, Emilio and Dezani-Ciancaglini, Mariangiola},
      doi = {10.23638/LMCS-13(2:9)2017},
      journal = {{Logical Methods in Computer Science}},
      keywords = {rephrase, lambda},
      month = jun,
      title = {On Sessions and Infinite Data},
      url = {http://lmcs.episciences.org/3725},
      volume = {{Volume 13, Issue 2}},
      year = {2017},
      bdsk-url-1 = {http://lmcs.episciences.org/3725},
      bdsk-url-2 = {http://dx.doi.org/10.23638/LMCS-13(2:9)2017}
    }

  • C. Misale, “PiCo: A Domain-Specific Language for Data Analytics Pipelines,” PhD Thesis, 2017. doi:10.5281/zenodo.579753
    [BibTeX] [Abstract] [Download PDF]

    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world.

    @phdthesis{17:pico:misale:thesis,
      abstract = {In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models---for which only informal (and often confusing) semantics is generally provided---all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks.
     
     From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics.
     
     The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.
     
     Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world.},
      author = {Claudia Misale},
      date-added = {2017-06-19 15:15:52 +0000},
      date-modified = {2017-06-19 15:55:21 +0000},
      doi = {10.5281/zenodo.579753},
      keywords = {fastflow, rephrase, toreador, repara, paraphrase},
      month = may,
      school = {Computer Science Department, University of Torino},
      title = {PiCo: A Domain-Specific Language for Data Analytics Pipelines},
      url = {https://iris.unito.it/retrieve/handle/2318/1633743/320170/Misale_thesis.pdf},
      year = {2017},
      bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1633743/320170/Misale_thesis.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.5281/zenodo.579753}
    }

  • P. Viviani, M. Torquati, M. Aldinucci, and R. d’Ippolito, “Multiple back-end support for the Armadillo linear algebra interface,” in In proc. of the 32nd ACM Symposium on Applied Computing (SAC), Marrakesh, Morocco, 2017, pp. 1566-1573.
    [BibTeX] [Abstract] [Download PDF]

    The Armadillo C++ library provides programmers with a high-level Matlab-like syntax for linear algebra. Its design aims at providing a good balance between speed and ease of use. It can be linked with different back-ends, i.e. different LAPACK-compliant libraries. In this work we present a novel run-time support of Armadillo, which gracefully extends mainstream implementation to enable back-end switching without recompilation and multiple back-end support. The extension is specifically designed to not affect Armadillo class template prototypes, thus to be easily interoperable with future evolutions of the Armadillo library itself. The proposed software stack is then tested for functionality and performance against a kernel code extracted from an industrial application.

    @inproceedings{17:sac:armadillo,
      abstract = {The Armadillo C++ library provides programmers with a high-level Matlab-like syntax for linear algebra. Its design aims at providing a good balance between speed and ease of use. It can be linked with different back-ends, i.e. different LAPACK-compliant libraries. In this work we present a novel run-time support of Armadillo, which gracefully extends mainstream implementation to enable back-end switching without recompilation and multiple back-end support. The extension is specifically designed to not affect Armadillo class template prototypes, thus to be easily interoperable with future evolutions of the Armadillo library itself. The proposed software stack is then tested for functionality and performance against a kernel code extracted from an industrial application.},
      address = {Marrakesh, Morocco},
      author = {Paolo Viviani and Massimo Torquati and Marco Aldinucci and Roberto d'Ippolito},
      booktitle = {In proc. of the 32nd ACM Symposium on Applied Computing (SAC)},
      date-added = {2016-08-19 21:47:45 +0000},
      date-modified = {2017-06-13 15:54:43 +0000},
      keywords = {nvidia, repara, rephrase, itea2},
      month = apr,
      pages = {1566--1573},
      title = {Multiple back-end support for the Armadillo linear algebra interface},
      url = {https://iris.unito.it/retrieve/handle/2318/1626229/299089/armadillo_4aperto.pdf},
      year = {2017},
      bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1626229/299089/armadillo_4aperto.pdf}
    }

  • M. Coppo, M. Dezani-Ciancaglini, A. D’iaz-Caro, I. Margaria, and M. Zacchi, “Retractions in Intersection Types,” in ITRS’16, 2017, pp. 31-47. doi:10.4204/EPTCS.242.5
    [BibTeX] [Download PDF]
    @inproceedings{CDMZ16,
      author = {Mario Coppo and Mariangiola Dezani-Ciancaglini and Alejandro D\'{\i}az-Caro and Ines Margaria and Maddalena Zacchi},
      booktitle = {ITRS'16},
      doi = {10.4204/EPTCS.242.5},
      editor = {Naoki Kobayashi},
      keywords = {rephrase, lambda},
      pages = {31--47},
      series = {EPTCS},
      title = {Retractions in Intersection Types},
      url = {http://www.di.unito.it/~dezani/papers/cddmz.pdf},
      volume = {242},
      year = {2017},
      bdsk-url-1 = {http://www.di.unito.it/~dezani/papers/cddmz.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.4204/EPTCS.242.5}
    }

  • C. Spampinato, S. Palazzo, D. Giordano, M. Aldinucci, and R. Leonardi, “Deep learning for automated skeletal bone age assessment in X-ray images,” Medical Image Analysis, vol. 36, pp. 41-51, 2017. doi:10.1016/j.media.2016.10.010
    [BibTeX] [Abstract] [Download PDF]

    Skeletal bone age assessment is a common clinical practice to investigate endocrinology, genetic and growth disorders in children. It is generally performed by radiological examination of the left hand by using either the Greulich and Pyle (G&P) method or the Tanner–Whitehouse (TW) one. However, both clinical procedures show several limitations, from the examination effort of radiologists to (most importantly) significant intra- and inter-operator variability. To address these problems, several automated approaches (especially relying on the TW method) have been proposed; nevertheless, none of them has been proved able to generalize to different races, age ranges and genders. In this paper, we propose and test several deep learning approaches to assess skeletal bone age automatically; the results showed an average discrepancy between manual and automatic evaluation of about 0.8 years, which is state-of-the-art performance. Furthermore, this is the first automated skeletal bone age assessment work tested on a public dataset and for all age ranges, races and genders, for which the source code is available, thus representing an exhaustive baseline for future research in the field. Beside the specific application scenario, this paper aims at providing answers to more general questions about deep learning on medical images: from the comparison between deep-learned features and manually-crafted ones, to the usage of deep-learning methods trained on general imagery for medical problems, to how to train a CNN with few images.

    @article{17:deepx:conce,
      abstract = {Skeletal bone age assessment is a common clinical practice to investigate endocrinology, genetic and growth disorders in children. It is generally performed by radiological examination of the left hand by using either the Greulich and Pyle (G&P) method or the Tanner--Whitehouse (TW) one. However, both clinical procedures show several limitations, from the examination effort of radiologists to (most importantly) significant intra- and inter-operator variability. To address these problems, several automated approaches (especially relying on the TW method) have been proposed; nevertheless, none of them has been proved able to generalize to different races, age ranges and genders. In this paper, we propose and test several deep learning approaches to assess skeletal bone age automatically; the results showed an average discrepancy between manual and automatic evaluation of about 0.8 years, which is state-of-the-art performance. Furthermore, this is the first automated skeletal bone age assessment work tested on a public dataset and for all age ranges, races and genders, for which the source code is available, thus representing an exhaustive baseline for future research in the field. Beside the specific application scenario, this paper aims at providing answers to more general questions about deep learning on medical images: from the comparison between deep-learned features and manually-crafted ones, to the usage of deep-learning methods trained on general imagery for medical problems, to how to train a CNN with few images.},
      author = {Concetto Spampinato and Simone Palazzo and Daniela Giordano and Marco Aldinucci and Rosalia Leonardi},
      doi = {10.1016/j.media.2016.10.010},
      journal = {Medical Image Analysis},
      keywords = {nvidia},
      pages = {41-51},
      title = {Deep learning for automated skeletal bone age assessment in X-ray images},
      url = {https://iris.unito.it/retrieve/handle/2318/1607122/341353/main.pdf},
      volume = {36},
      year = {2017},
      bdsk-url-1 = {http://dx.doi.org/10.1016/j.media.2016.10.010},
      bdsk-url-2 = {https://iris.unito.it/retrieve/handle/2318/1607122/341353/main.pdf}
    }

  • C. Misale, M. Drocco, M. Aldinucci, and G. Tremblay, “A Comparison of Big Data Frameworks on a Layered Dataflow Model,” Parallel Processing Letters, vol. 27, iss. 01, p. 1740003, 2017. doi:10.1142/S0129626417400035
    [BibTeX] [Abstract] [Download PDF]

    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

    @article{17:bigdatasurvey:PPL,
      abstract = {In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.},
      author = {Misale, Claudia and Drocco, Maurizio and Aldinucci, Marco and Tremblay, Guy},
      date-published = {March 2017},
      date-received = {January 2017},
      doi = {10.1142/S0129626417400035},
      eprint = {http://www.worldscientific.com/doi/pdf/10.1142/S0129626417400035},
      journal = {Parallel Processing Letters},
      keywords = {toreador, rephrase, IBM},
      number = {01},
      pages = {1740003},
      title = {A Comparison of Big Data Frameworks on a Layered Dataflow Model},
      url = {https://iris.unito.it/retrieve/handle/2318/1626287/303421/preprintPPL_4aperto.pdf},
      volume = {27},
      year = {2017},
      bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1626287/303421/preprintPPL_4aperto.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626417400035}
    }

  • M. Aldinucci, M. Danelutto, D. D. Sensi, G. Mencagli, and M. Torquati, “Towards Power-Aware Data Pipelining on Multicores,” in Proceedings of the 10th International Symposium on High-Level Parallel Programming and Applications, Valladolid, Spain, 2017.
    [BibTeX] [Abstract] [Download PDF]

    Power consumption management has become a major concern in software development. Continuous streaming computations are usually com- posed by different modules, exchanging data through shared message queues. The selection of the algorithm used to access such queues (i.e., the concurrency control) is a critical aspect for both performance and power consumption. In this paper, we describe the design of an adaptive concurrency control algo- rithm for implementing power-efficient communications on shared memory multicores. The algorithm provides the throughput offered by a nonblocking implementation and the power efficiency of a blocking protocol. We demon- strate that our algorithm reduces the power consumption of data streaming computations without decreasing their throughput.

    @inproceedings{17:hlpp:powerstream,
      abstract = {Power consumption management has become a major concern in software development. Continuous streaming computations are usually com- posed by different modules, exchanging data through shared message queues. The selection of the algorithm used to access such queues (i.e., the concurrency control) is a critical aspect for both performance and power consumption. In this paper, we describe the design of an adaptive concurrency control algo- rithm for implementing power-efficient communications on shared memory multicores. The algorithm provides the throughput offered by a nonblocking implementation and the power efficiency of a blocking protocol. We demon- strate that our algorithm reduces the power consumption of data streaming computations without decreasing their throughput.},
      address = {Valladolid, Spain},
      author = {Marco Aldinucci and Marco Danelutto and Daniele De Sensi and Gabriele Mencagli and Massimo Torquati},
      booktitle = {Proceedings of the 10th International Symposium on High-Level Parallel Programming and Applications},
      date-added = {2017-07-13 09:02:32 +0000},
      date-modified = {2017-07-13 09:05:21 +0000},
      keywords = {rephrase, fastflow},
      title = {Towards Power-Aware Data Pipelining on Multicores},
      url = {https://iris.unito.it/retrieve/handle/2318/1644982/351415/17_HLPP_powerstream.pdf},
      year = {2017},
      bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1644982/351415/17_HLPP_powerstream.pdf}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, “FastFlow: high-level and efficient streaming on multi-core,” in Programming Multi-core and Many-core Computing Systems, S. Pllana and F. Xhafa, Eds., Wiley, 2017.
    [BibTeX] [Abstract] [Download PDF]

    A FastFlow short tutorial

    @incollection{ff:wileybook:14,
      abstract = {A FastFlow short tutorial},
      annote = {ISBN: 0470936908},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Massimo Torquati},
      booktitle = {Programming Multi-core and Many-core Computing Systems},
      chapter = {13},
      date-added = {2011-06-18 18:28:00 +0200},
      date-modified = {2014-12-31 14:14:28 +0000},
      editor = {Sabri Pllana and Fatos Xhafa},
      keywords = {fastflow},
      publisher = {Wiley},
      series = {Parallel and Distributed Computing},
      title = {FastFlow: high-level and efficient streaming on multi-core},
      url = {http://calvados.di.unipi.it/storage/paper_files/2011_FF_tutorial-draft.pdf},
      year = {2017},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2011_FF_tutorial-draft.pdf}
    }

2016

  • M. Aldinucci, S. Bagnasco, S. Lusso, P. Pasteris, and S. Rabellino, “The Open Computing Cluster for Advanced data Manipulation (OCCAM),” in The 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP), San Francisco, USA, 2016.
    [BibTeX] [Abstract] [Download PDF]

    Obtaining CPU cycles on an HPC cluster is nowadays relatively simple and sometimes even cheap for academic institutions. However, in most of the cases providers of HPC services would not allow changes on the configuration, implementation of special features or a lower-level control on the computing infrastructure and networks, for example for testing new computing patterns or conducting research on HPC itself. The variety of use cases proposed by several departments of the University of Torino, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others, called for different and sometimes conflicting configurations; furthermore, several R&D activities in the field of scientific computing, with topics ranging from GPU acceleration to Cloud Computing technologies, needed a platform to be carried out on. The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multi-purpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Torino branch of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing needs, as well as a platform for R&D activities on computational technologies themselves. Extending it with novel architecture CPU, accelerator or hybrid microarchitecture (such as forthcoming Intel Xeon Phi Knights Landing) should be as a simple as plugging a node in a rack. The initial system counts slightly more than 1100 cpu cores and includes different types of computing nodes (standard dual-socket nodes, large quad-sockets nodes with 768 GB RAM, and multi-GPU nodes) and two separate disk storage subsystems: a smaller high-performance scratch area, based on the Lustre file system, intended for direct computational I/O and a larger one, of the order of 1PB, to archive near-line data for archival purposes. All the components of the system are interconnected through a 10Gb/s Ethernet layer with one-level topology and an InfiniBand FDR 56Gbps layer in fat-tree topology. A system of this kind, heterogeneous and reconfigurable by design, poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, virtual farms, jobs, interactive bare-metal sessions, etc. This poster describes some of the use cases that prompted the design ad construction of the HPC cluster, its architecture and a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.

    @inproceedings{16:occam:chep,
      abstract = {Obtaining CPU cycles on an HPC cluster is nowadays relatively simple and sometimes even cheap for academic institutions. However, in most of the cases providers of HPC services would not allow changes on the configuration, implementation of special features or a lower-level control on the computing infrastructure and networks, for example for testing new computing patterns or conducting research on HPC itself. The variety of use cases proposed by several departments of the University of Torino, including ones from solid-state chemistry, high-energy physics, computer science, big data analytics, computational biology, genomics and many others, called for different and sometimes conflicting configurations; furthermore, several R&D activities in the field of scientific computing, with topics ranging from GPU acceleration to Cloud Computing technologies, needed a platform to be carried out on.
    The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multi-purpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Torino branch of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible, reconfigurable and extendable infrastructure to cater to a wide range of different scientific computing needs, as well as a platform for R&D activities on computational technologies themselves. Extending it with novel architecture CPU, accelerator or hybrid microarchitecture (such as forthcoming Intel Xeon Phi Knights Landing) should be as a simple as plugging a node in a rack.
    The initial system counts slightly more than 1100 cpu cores and includes different types of computing nodes (standard dual-socket nodes, large quad-sockets nodes with 768 GB RAM, and multi-GPU nodes) and two separate disk storage subsystems: a smaller high-performance scratch area, based on the Lustre file system, intended for direct computational I/O and a larger one, of the order of 1PB, to archive near-line data for archival purposes. All the components of the system are interconnected through a 10Gb/s Ethernet layer with one-level topology and an InfiniBand FDR 56Gbps layer in fat-tree topology.
    A system of this kind, heterogeneous and reconfigurable by design, poses a number of challenges related to the frequency at which heterogeneous hardware resources might change their availability and shareability status, which in turn affect methods and means to allocate, manage, optimize, bill, monitor VMs, virtual farms, jobs, interactive bare-metal sessions, etc.
    This poster describes some of the use cases that prompted the design ad construction of the HPC cluster, its architecture and a first characterization of its performance by some synthetic benchmark tools and a few realistic use-case tests.
    },
      address = {San Francisco, USA},
      author = {Marco Aldinucci and Stefano Bagnasco and Stefano Lusso and Paolo Pasteris and Sergio Rabellino},
      booktitle = {The 22nd International Conference on Computing in High Energy and Nuclear Physics (CHEP)},
      date-modified = {2017-09-29 22:46:24 +0000},
      keywords = {nvidia},
      month = oct,
      title = {The {O}pen {C}omputing {C}luster for {A}dvanced data {M}anipulation (OCCAM)},
      url = {https://arxiv.org/pdf/1709.03715.pdf},
      year = {2016}
    }

  • P. Viviani, M. Aldinucci, and R. d’Ippolito, “An hybrid linear algebra framework for engineering,” in Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) — Poster Abstracts, Fiuggi, Italy, 2016.
    [BibTeX] [Abstract] [Download PDF]

    The aim of this work is to provide developers and domain experts with simple (Matlab-like) inter- face for performing linear algebra tasks while retaining state-of-the-art computational speed. To achieve this goal we extend Armadillo C++ library is extended in order to support with multiple LAPACK-compliant back-ends targeting different architectures including CUDA GPUs; moreover our approach involves the possibility of dynamically switching between such back-ends in order to select the one which is most convenient based on the specific problem and hardware configura- tion. This approach is eventually validated within an industrial environment.

    @inproceedings{16:acaces:armadillo,
      abstract = {The aim of this work is to provide developers and domain experts with simple (Matlab-like) inter- face for performing linear algebra tasks while retaining state-of-the-art computational speed. To achieve this goal we extend Armadillo C++ library is extended in order to support with multiple LAPACK-compliant back-ends targeting different architectures including CUDA GPUs; moreover our approach involves the possibility of dynamically switching between such back-ends in order to select the one which is most convenient based on the specific problem and hardware configura- tion. This approach is eventually validated within an industrial environment.},
      address = {Fiuggi, Italy},
      author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito},
      booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
      date-added = {2016-08-20 17:22:51 +0000},
      date-modified = {2016-08-20 17:29:35 +0000},
      keywords = {nvidia,algebra, gpu, itea2, repara},
      month = {July},
      title = {An hybrid linear algebra framework for engineering},
      url = {https://iris.unito.it/retrieve/handle/2318/1622382/300198/armadillo.pdf},
      year = {2016},
      bdsk-url-1 = {https://iris.unito.it/retrieve/handle/2318/1622382/300198/armadillo.pdf}
    }

  • C. Misale, M. Drocco, M. Aldinucci, and G. Tremblay, “A Comparison of Big Data Frameworks on a Layered Dataflow Model,” in Proc. of HLPP2016: Intl. Workshop on High-Level Parallel Programming, Muenster, Germany, 2016, pp. 1-19. doi:10.5281/zenodo.321866
    [BibTeX] [Abstract] [Download PDF]

    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

    @inproceedings{16:bigdatasurvey:hlpp,
      abstract = {In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models, for which only informal (and often confusing) semantics is generally provided, all share a common underlying model, namely, the Dataflow model. The Dataflow model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.},
      address = {Muenster, Germany},
      author = {Claudia Misale and Maurizio Drocco and Marco Aldinucci and Guy Tremblay},
      booktitle = {Proc. of HLPP2016: Intl. Workshop on High-Level Parallel Programming},
      date-added = {2016-06-17 22:15:43 +0000},
      date-modified = {2017-07-13 09:16:30 +0000},
      doi = {10.5281/zenodo.321866},
      keywords = {toreador, rephrase, IBM},
      month = jul,
      pages = {1-19},
      publisher = {arXiv.org},
      title = {A Comparison of Big Data Frameworks on a Layered Dataflow Model},
      url = {http://arxiv.org/pdf/1606.05293v1.pdf},
      year = {2016},
      bdsk-url-1 = {http://arxiv.org/pdf/1606.05293v1.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.5281/zenodo.321866}
    }

  • B. Nicolae, C. H. A. Costa, C. Misale, K. Katrinis, and Y. Park, “Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics,” IEEE Transactions on Parallel and Distributed Systems, vol. PP, iss. 99, 2016. doi:10.1109/TPDS.2016.2627558
    [BibTeX] [Abstract] [Download PDF]

    Big data analytics is an indispensable tool in transforming science, engineering, medicine, health-care, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. In this context, data shuffling, a particularly difficult transformation pattern, introduces important challenges. Specifically, data shuffling is a key component of complex computations that has a major impact on the overall performance and scalability. Thus, speeding up data shuffling is a critical goal. To this end, state-of-the-art solutions often rely on overlapping the data transfers with the shuffling phase. However, they employ simple mechanisms to decide how much data and where to fetch it from, which leads to sub-optimal performance and excessive auxiliary memory utilization for the purpose of prefetching. The latter aspect is a growing concern, given evidence that memory per computation unit is continuously decreasing while interconnect bandwidth is increasing. This paper contributes a novel shuffle data transfer strategy that addresses the two aforementioned dimensions by dynamically adapting the prefetching to the computation. We implemented this novel strategy in Spark, a popular in-memory data analytics framework. To demonstrate the benefits of our proposal, we run extensive experiments on an HPC cluster with large core count per node. Compared with the default Spark shuffle strategy, our proposal shows: up to 40\% better performance with 50\% less memory utilization for buffering and excellent weak scalability.

    @article{16:shuffle:tpds:misale,
      abstract = {Big data analytics is an indispensable tool in transforming science, engineering, medicine, health-care, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. In this context, data shuffling, a particularly difficult transformation pattern, introduces important challenges. Specifically, data shuffling is a key component of complex computations that has a major impact on the overall performance and scalability. Thus, speeding up data shuffling is a critical goal. To this end, state-of-the-art solutions often rely on overlapping the data transfers with the shuffling phase. However, they employ simple mechanisms to decide how much data and where to fetch it from, which leads to sub-optimal performance and excessive auxiliary memory utilization for the purpose of prefetching. The latter aspect is a growing concern, given evidence that memory per computation unit is continuously decreasing while interconnect bandwidth is increasing. This paper contributes a novel shuffle data transfer strategy that addresses the two aforementioned dimensions by dynamically adapting the prefetching to the computation. We implemented this novel strategy in Spark, a popular in-memory data analytics framework. To demonstrate the benefits of our proposal, we run extensive experiments on an HPC cluster with large core count per node. Compared with the default Spark shuffle strategy, our proposal shows: up to 40\% better performance with 50\% less memory utilization for buffering and excellent weak scalability.},
      author = {Bogdan Nicolae and Carlos H. A. Costa and Claudia Misale and Kostas Katrinis and Yoonho Park},
      date-modified = {2017-04-01 21:55:16 +0000},
      doi = {10.1109/TPDS.2016.2627558},
      journal = {IEEE Transactions on Parallel and Distributed Systems},
      keywords = {ibm},
      number = {99},
      title = {Leveraging Adaptive I/O to Optimize Collective Data Shuffling Patterns for Big Data Analytics},
      url = {https://iris.unito.it/retrieve/handle/2318/1624908/295954/tpds_4aperto.pdf},
      volume = {PP},
      year = {2016},
      bdsk-url-1 = {http://dx.doi.org/10.1109/TPDS.2016.2627558},
      bdsk-url-2 = {https://iris.unito.it/retrieve/handle/2318/1624908/295954/tpds_4aperto.pdf}
    }

  • F. Tordini, M. Aldinucci, L. Milanesi, P. LiÒ, and I. Merelli, “The Genome Conformation as an Integrator of Multi-Omic Data: The Example of Damage Spreading in Cancer,” Frontiers in Genetics, vol. 7, iss. 194, pp. 1-17, 2016. doi:10.3389/fgene.2016.00194
    [BibTeX] [Abstract] [Download PDF]

    Publicly available multi-omic databases, in particular if associated with medical annotations, are rich resources with the potential to lead a rapid transition from high-throughput molecular biology experiments to better clinical outcomes for patients. In this work, we propose a model for multi-omic data integration (i.e. genetic variations, gene expression, genome conformation and epigenetic patterns), which exploits a multi-layer network approach to analyse, visualize and obtain insights from such biological information, in order to use achieved results at a macroscopic level. Using this representation, we can describe how driver and passenger mutations accumulate during the development of diseases providing, for example, a tool able to characterise the evolution of cancer. Indeed, our test case concerns the MCF-7 breast cancer cell line, before and after the stimulation with estrogen, since many datasets are available for this case study. In particular, the integration of data about cancer mutations, gene functional annotations, genome conformation, epigenetic patterns, gene expression and metabolic pathways in our multi-layer representation will allow a better interpretation of the mechanisms behind a complex disease such as cancer. Thanks to this multi-layer approach, we focus on the interplay of chromatin conformation and cancer mutations in different pathways, such as metabolic processes, that are very important for tumour development. Working on this model, a variance analysis can be implemented to identify normal variations within each omics and to characterize, by contrast, variations that can be accounted to pathological samples compared to normal ones. This integrative model can be used to identify novel biomarkers and to provide innovative omic-based guidelines for treating many diseases, improving the efficacy of decision trees currently used in clinic.

    @article{2016_omics_fgenetics,
      abstract = {Publicly available multi-omic databases, in particular if associated with medical annotations, are rich resources with the potential to lead a rapid transition from high-throughput molecular biology experiments to better clinical outcomes for patients. In this work, we propose a model for multi-omic data integration (i.e. genetic variations, gene expression, genome conformation and epigenetic patterns), which exploits a multi-layer network approach to analyse, visualize and obtain insights from such biological information, in order to use achieved results at a macroscopic level.
     Using this representation, we can describe how driver and passenger mutations accumulate during the development of diseases providing, for example, a tool able to characterise the evolution of cancer. Indeed, our test case concerns the MCF-7 breast cancer cell line, before and after the stimulation with estrogen, since many datasets are available for this case study. In particular, the integration of data about cancer mutations, gene functional annotations, genome conformation, epigenetic patterns, gene expression and metabolic pathways in our multi-layer representation will allow a better interpretation of the mechanisms behind a complex disease such as cancer.
     Thanks to this multi-layer approach, we focus on the interplay of chromatin conformation and cancer mutations in different pathways, such as metabolic processes, that are very important for tumour development. Working on this model, a variance analysis can be implemented to identify normal variations within each omics and to characterize, by contrast, variations that can be accounted to pathological samples compared to normal ones. This integrative model can be used to identify novel biomarkers and to provide innovative omic-based guidelines for treating many diseases, improving the efficacy of decision trees currently used in clinic.},
      author = {Tordini, Fabio and Aldinucci, Marco and Milanesi, Luciano and Li{\`o}, Pietro and Merelli, Ivan},
      date-modified = {2016-12-22 14:19:14 +0000},
      doi = {10.3389/fgene.2016.00194},
      journal = {Frontiers in Genetics},
      number = {194},
      pages = {1--17},
      title = {The Genome Conformation as an Integrator of Multi-Omic Data: The Example of Damage Spreading in Cancer},
      url = {http://journal.frontiersin.org/article/10.3389/fgene.2016.00194},
      volume = {7},
      year = {2016},
      bdsk-url-1 = {http://journal.frontiersin.org/article/10.3389/fgene.2016.00194},
      bdsk-url-2 = {http://dx.doi.org/10.3389/fgene.2016.00194}
    }

  • A. Bracciali, M. Aldinucci, M. Patterson, T. Marschall, N. Pisanti, I. Merelli, and M. Torquati, “pWhatsHap: efficient haplotyping for future generation sequencing,” BMC Bioinformatics, vol. 17, iss. Suppl 11, p. 342, 2016. doi:10.1186/s12859-016-1170-y
    [BibTeX] [Abstract] [Download PDF]

    Background: Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WhatsHap is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology’s current trends that are producing longer fragments. Results: Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered pWhatsHap, a parallel, high-performance version of WhatsHap. pWhatsHap is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WhatsHap, pWhatsHap exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WhatsHap, which increases with coverage. Conclusions: Due to its structure and management of the large datasets, the parallelisation of WhatsHap posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, pWhatsHap, is a freely available toolkit that improves the efficiency of the analysis of genomics information.

    @article{16:pwhatshap:bmc,
      abstract = {Background: Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WhatsHap is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments.
    Results: Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered pWhatsHap, a parallel, high-performance version of WhatsHap. pWhatsHap is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WhatsHap, pWhatsHap exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WhatsHap, which increases with coverage.
    Conclusions: Due to its structure and management of the large datasets, the parallelisation of WhatsHap posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, pWhatsHap, is a freely available toolkit that improves the efficiency of the analysis of genomics information.
    },
      author = {Andrea Bracciali and Marco Aldinucci and Murray Patterson and Tobias Marschall and Nadia Pisanti and Ivan Merelli and Massimo Torquati},
      date-modified = {2016-10-17 17:28:27 +0000},
      doi = {10.1186/s12859-016-1170-y},
      journal = {BMC Bioinformatics},
      keywords = {fastflow, paraphrase, rephrase},
      number = {Suppl 11},
      pages = {342},
      title = {pWhatsHap: efficient haplotyping for future generation sequencing},
      url = {http://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1170-y?site=bmcbioinformatics.biomedcentral.com},
      volume = {17},
      year = {2016},
      bdsk-url-1 = {http://hdl.handle.net/2318/1607125},
      bdsk-url-2 = {http://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1170-y?site=bmcbioinformatics.biomedcentral.com},
      bdsk-url-3 = {http://dx.doi.org/10.1186/s12859-016-1170-y}
    }

  • I. Castellani, M. Dezani-Ciancaglini, and U. de’ Liguoro, “Secure Multiparty Sessions with Topics,” in PLACES’16, 2016, pp. 1-12.
    [BibTeX] [Download PDF]
    @inproceedings{CDL16,
      author = {Ilaria Castellani and Mariangiola Dezani-Ciancaglini and Ugo de' Liguoro},
      booktitle = {PLACES'16},
      keywords = {rephrase, lambda},
      pages = {1--12},
      series = {EPTCS},
      title = {Secure Multiparty Sessions with Topics},
      url = {http://www.di.unito.it/~dezani/papers/cdl16.pdf},
      volume = {211},
      year = {2016},
      bdsk-url-1 = {http://www.di.unito.it/~dezani/papers/cdl16.pdf}
    }

  • M. Coppo, M. Dezani-Ciancaglini, and B. Venneri, “Parallel Monitors for Self-adaptive Sessions,” in PLACES’16, 2016, pp. 25-36.
    [BibTeX] [Download PDF]
    @inproceedings{CDV16,
      author = {Mario Coppo and Mariangiola Dezani-Ciancaglini and Betti Venneri},
      booktitle = {PLACES'16},
      keywords = {rephrase, lambda},
      pages = {25--36},
      series = {EPTCS},
      title = {Parallel Monitors for Self-adaptive Sessions},
      url = {http://www.di.unito.it/~dezani/papers/cdv16.pdf},
      volume = {211},
      year = {2016},
      bdsk-url-1 = {http://www.di.unito.it/~dezani/papers/cdv16.pdf}
    }

  • M. Dezani-Ciancaglini and P. Giannini, “Reversible Multiparty Sessions with Checkpoints,” in EXPRESS/SOS’16, 2016, pp. 60-74.
    [BibTeX] [Download PDF]
    @inproceedings{DG16,
      author = {Mariangiola Dezani-Ciancaglini and Paola Giannini},
      booktitle = {EXPRESS/SOS'16},
      keywords = {rephrase, lambda},
      pages = {60--74},
      series = {EPTCS},
      title = {Reversible Multiparty Sessions with Checkpoints},
      url = {http://www.di.unito.it/~dezani/papers/dg16.pdf},
      volume = {222},
      year = {2016},
      bdsk-url-1 = {http://www.di.unito.it/~dezani/papers/dg16.pdf}
    }

  • M. Dezani-Ciancaglini, S. Ghilezan, S. Jaksic, J. Pantovic, and N. Yoshida, “Denotational and Operational Preciseness of Subtyping: A Roadmap,” in Theory and Practice of Formal Methods, 2016, pp. 155-172. doi:10.1007/978-3-319-30734-3\_12
    [BibTeX] [Download PDF]
    @inproceedings{DGJPY16,
      author = {Mariangiola Dezani-Ciancaglini and Silvia Ghilezan and Svetlana Jaksic and Jovanka Pantovic and Nobuko Yoshida},
      booktitle = {Theory and Practice of Formal Methods},
      doi = {10.1007/978-3-319-30734-3\_12},
      keywords = {rephrase, lambda},
      pages = {155-172},
      series = {LNCS},
      title = {Denotational and Operational Preciseness of Subtyping: A Roadmap},
      url = {http://www.di.unito.it/~dezani/papers/dgjpy16.pdf},
      volume = {9660},
      year = {2016},
      bdsk-url-1 = {http://www.di.unito.it/~dezani/papers/dgjpy16.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-30734-3%5C_12}
    }

  • I. Castellani, M. Dezani-Ciancaglini, and J. A. Pérez, “Self-Adaptation and Secure Information Flow in Multiparty Communications,” Formal Aspects of Computing, vol. 28, iss. 4, pp. 669-696, 2016.
    [BibTeX] [Download PDF]
    @article{CDP16,
      author = {Ilaria Castellani and Mariangiola Dezani-Ciancaglini and Jorge A. P\'{e}rez},
      journal = {{Formal Aspects of Computing}},
      keywords = {rephrase, lambda},
      number = {4},
      pages = {669--696},
      publisher = {Springer},
      title = {Self-Adaptation and Secure Information Flow in Multiparty Communications},
      url = {http://www.di.unito.it/~dezani/papers/cdp16.pdf},
      volume = {28},
      year = {2016},
      bdsk-url-1 = {http://www.di.unito.it/~dezani/papers/cdp16.pdf}
    }

  • F. Tordini, “A cloud solution for multi-omics data integration,” in Proceedings of the 16th IEEE International Conference on Scalable Computing and Communication, 2016, pp. 559-566. doi:10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.131
    [BibTeX] [Abstract] [Download PDF]

    Recent advances in molecular biology and Bioinformatics techniques have brought to an explosion of the information about the spatial organisation of the DNA inside the nucleus. In particular, 3C-based techniques are revealing the genome folding for many different cell types, and permit to create a more effective representation of the disposition of genes in the three-dimensional space. This information can be used to re-interpret heterogeneous genomic data (multi-omic) relying on 3D maps of the chromosome. The storage and computational requirements needed to accomplish such operations on raw sequenced data have to be fulfilled using HPC solutions, and the the Cloud paradigm is a valuable and convenient mean for delivering HPC to Bioinformatics. In this work we describe a data analysis work-flow that allows the integration and the interpretation of multi-omic data on a sort of “topographical” nuclear map, capable of representing the effective disposition of genes in a graph-based representation. We propose a cloud-based task farm pattern to orchestrate the services needed to accomplish genomic data analysis, where each service represents a special-purpose tool, playing a part in well known data analysis pipelines.

    @inproceedings{16:scalcom:cloud,
      abstract = {Recent advances in molecular biology and Bioinformatics techniques have brought to an explosion of the information about the spatial organisation of the DNA inside the nucleus. In particular, 3C-based techniques are revealing the genome folding for many different cell types, and permit to create a more effective representation of the disposition of genes in the three-dimensional space. This information can be used to re-interpret heterogeneous genomic data (multi-omic) relying on 3D maps of the chromosome. The storage and computational requirements needed to accomplish such operations on raw sequenced data have to be fulfilled using HPC solutions, and the the Cloud paradigm is a valuable and convenient mean for delivering HPC to Bioinformatics. In this work we describe a data analysis work-flow that allows the integration and the interpretation of multi-omic data on a sort of ``topographical'' nuclear map, capable of representing the effective disposition of genes in a graph-based representation. We propose a cloud-based task farm pattern to orchestrate the services needed to accomplish genomic data analysis, where each service represents a special-purpose tool, playing a part in well known data analysis pipelines.},
      author = {Fabio Tordini},
      booktitle = {Proceedings of the 16th IEEE International Conference on Scalable Computing and Communication},
      date-modified = {2016-08-30 10:26:12 +0000},
      doi = {10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.131},
      keywords = {fastflow, bioinformatics, rephrase},
      note = {Best paper award},
      pages = {559--566},
      publisher = {IEEE Computer Society},
      title = {{A cloud solution for multi-omics data integration}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2016_cloudpipeline_scalcom.pdf},
      year = {2016},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2016_cloudpipeline_scalcom.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.131}
    }

  • M. Drocco, C. Misale, and M. Aldinucci, “A Cluster-As-Accelerator approach for SPMD-free Data Parallelism,” in Proc. of Intl. Euromicro PDP 2016: Parallel Distributed and network-based Processing, Crete, Greece, 2016, pp. 350-353. doi:10.1109/PDP.2016.97
    [BibTeX] [Abstract] [Download PDF]

    In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high- level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as general inter- preters of user-defined functional tasks over node-local portions of distributed data structures. We envision coupling a simple yet powerful programming model with a lightweight, locality- aware distributed runtime as a promising step along the road towards high-performance data analytics, in particular under the perspective of the upcoming exascale era. We implemented the proposed approach in SkeDaTo, a prototyping C++ library of data-parallel skeletons exploiting cluster-as-accelerator at the bottom layer of the runtime software stack.

    @inproceedings{skedato:pdp:16,
      abstract = {In this paper we present a novel approach for functional-style programming of distributed-memory clusters, targeting data-centric applications. The programming model proposed is purely sequential, SPMD-free and based on high- level functional features introduced since C++11 specification. Additionally, we propose a novel cluster-as-accelerator design principle. In this scheme, cluster nodes act as general inter- preters of user-defined functional tasks over node-local portions of distributed data structures. We envision coupling a simple yet powerful programming model with a lightweight, locality- aware distributed runtime as a promising step along the road towards high-performance data analytics, in particular under the perspective of the upcoming exascale era. We implemented the proposed approach in SkeDaTo, a prototyping C++ library of data-parallel skeletons exploiting cluster-as-accelerator at the bottom layer of the runtime software stack.},
      address = {Crete, Greece},
      author = {Maurizio Drocco and Claudia Misale and Marco Aldinucci},
      booktitle = {Proc. of Intl. Euromicro PDP 2016: Parallel Distributed and network-based Processing},
      date-modified = {2016-04-21 17:33:00 +0000},
      doi = {10.1109/PDP.2016.97},
      keywords = {rephrase, fastflow},
      pages = {350--353},
      publisher = {IEEE},
      title = {A Cluster-As-Accelerator approach for {SPMD}-free Data Parallelism},
      url = {http://calvados.di.unipi.it/storage/paper_files/2016_pdp_skedato.pdf},
      year = {2016},
      bdsk-url-1 = {http://hdl.handle.net/2318/1611858},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2016_pdp_skedato.pdf},
      bdsk-url-3 = {http://dx.doi.org/10.1109/PDP.2016.97}
    }

  • P. Viviani, M. Aldinucci, R. d’Ippolito, J. Lemeire, and D. Vucinic, “A flexible numerical framework for engineering – a Response Surface Modelling application,” in 10th Intl. Conference on Advanced Computational Engineering and Experimenting (ACE-X), 2016.
    [BibTeX] [Abstract]

    This work presents the innovative approach adopted for the development of a new numerical software framework for accelerating Dense Linear Algebra calculations and its application within an engineering context. In particular, Response Surface Models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different Response Surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using Ensemble Modelling techniques). On the other hand, it is a known challenge that the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology. The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a Radial Basis Function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.

    @inproceedings{16:acex:armadillo,
      abstract = {This work presents the innovative approach adopted for the development of a new numerical software framework for accelerating Dense Linear Algebra calculations and its application within an engineering context.
    In particular, Response Surface Models (RSM) are a key tool to reduce the computational effort involved in engineering design processes like design optimization. However, RSMs may prove to be too expensive to be computed when the dimensionality of the system and/or the size of the dataset to be synthesized is significantly high or when a large number of different Response Surfaces has to be calculated in order to improve the overall accuracy (e.g. like when using Ensemble Modelling techniques).
    On the other hand, it is a known challenge that the potential of modern hybrid hardware (e.g. multicore, GPUs) is not exploited by current engineering tools, while they can lead to a significant performance improvement. To fill this gap, a software framework is being developed that enables the hybrid and scalable acceleration of the linear algebra core for engineering applications and especially of RSMs calculations with a user-friendly syntax that allows good portability between different hardware architectures, with no need of specific expertise in parallel programming and accelerator technology.
    The effectiveness of this framework is shown by comparing an accelerated code to a single-core calculation of a Radial Basis Function RSM on some benchmark datasets. This approach is then validated within a real-life engineering application and the achievements are presented and discussed.
    },
      author = {Paolo Viviani and Marco Aldinucci and Roberto d'Ippolito and Jean Lemeire and Dean Vucinic},
      booktitle = {10th Intl. Conference on Advanced Computational Engineering and Experimenting (ACE-X)},
      date-added = {2016-08-19 21:37:19 +0000},
      date-modified = {2017-06-19 15:35:39 +0000},
      keywords = {repara, rephrase, nvidia, gpu},
      title = {A flexible numerical framework for engineering - a Response Surface Modelling application},
      year = {2016}
    }

  • M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, C. Misale, G. P. Pezzi, and M. Torquati, “A Parallel Pattern for Iterative Stencil + Reduce,” Journal of Supercomputing, pp. 1-16, 2016. doi:10.1007/s11227-016-1871-z
    [BibTeX] [Abstract] [Download PDF]

    We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.

    @article{16:stencilreduce:jsupe,
      abstract = {We advocate the Loop-of-stencil-reduce pattern as a means of simplifying the implementation of data-parallel programs on heterogeneous multi-core platforms. Loop-of-stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop in both data-parallel and streaming applications, or a combination of both. The pattern makes it possible to deploy a single stencil computation kernel on different GPUs. We discuss the implementation of Loop-of-stencil-reduce in FastFlow, a framework for the implementation of applications based on the parallel patterns. Experiments are presented to illustrate the use of Loop-of-stencil-reduce in developing data-parallel kernels running on heterogeneous systems.},
      author = {Marco Aldinucci and Marco Danelutto and Maurizio Drocco and Peter Kilpatrick and Claudia Misale and Guilherme {Peretti Pezzi} and Massimo Torquati},
      date-added = {2016-08-19 21:52:17 +0000},
      date-modified = {2016-09-23 07:40:20 +0000},
      doi = {10.1007/s11227-016-1871-z},
      journal = {Journal of Supercomputing},
      keywords = {nvidia, repara, rephrase},
      pages = {1--16},
      title = {A Parallel Pattern for Iterative Stencil + Reduce},
      url = {http://arxiv.org/pdf/1609.04567v1.pdf},
      year = {2016},
      bdsk-url-1 = {http://dx.doi.org/10.1007/s11227-016-1871-z},
      bdsk-url-2 = {http://arxiv.org/pdf/1609.04567v1.pdf}
    }

  • V. Janjic, C. Brown, K. MacKenzie, K. and Hammond, M. Danelutto, M. Aldinucci, and J. D. Garcia, “RPL: A Domain-Specific Language for Designing and Implementing Parallel C++ Applications,” in Proc. of Intl. Euromicro PDP 2016: Parallel Distributed and network-based Processing, Crete, Greece, 2016. doi:10.1109/PDP.2016.122
    [BibTeX] [Abstract] [Download PDF]

    Parallelising sequential applications is usually a very hard job, due to many different ways in which an application can be parallelised and a large number of programming models (each with its own advantages and disadvantages) that can be used. In this paper, we describe a method to semi- automatically generate and evaluate different parallelisations of the same application, allowing programmers to find the best parallelisation without significant manual reengineering of the code. We describe a novel, high-level domain-specific language, Refactoring Pattern Language (RPL), that is used to represent the parallel structure of an application and to capture its extra-functional properties (such as service time). We then describe a set of RPL rewrite rules that can be used to generate alternative, but semantically equivalent, parallel structures (parallelisations) of the same application. We also describe the RPL Shell that can be used to evaluate these parallelisations, in terms of the desired extra-functional properties. Finally, we describe a set of C++ refactorings, targeting OpenMP, Intel TBB and FastFlow parallel programming models, that semi-automatically apply the desired parallelisation to the application’s source code, therefore giving a parallel version of the code. We demonstrate how the RPL and the refactoring rules can be used to derive efficient parallelisations of two realistic C++ use cases (Image Convolution and Ant Colony Optimisation).

    @inproceedings{rpl:pdp:16,
      abstract = {Parallelising sequential applications is usually a very hard job, due to many different ways in which an application can be parallelised and a large number of programming models (each with its own advantages and disadvantages) that can be used. In this paper, we describe a method to semi- automatically generate and evaluate different parallelisations of the same application, allowing programmers to find the best parallelisation without significant manual reengineering of the code. We describe a novel, high-level domain-specific language, Refactoring Pattern Language (RPL), that is used to represent the parallel structure of an application and to capture its extra-functional properties (such as service time). We then describe a set of RPL rewrite rules that can be used to generate alternative, but semantically equivalent, parallel structures (parallelisations) of the same application. We also describe the RPL Shell that can be used to evaluate these parallelisations, in terms of the desired extra-functional properties. Finally, we describe a set of C++ refactorings, targeting OpenMP, Intel TBB and FastFlow parallel programming models, that semi-automatically apply the desired parallelisation to the application's source code, therefore giving a parallel version of the code. We demonstrate how the RPL and the refactoring rules can be used to derive efficient parallelisations of two realistic C++ use cases (Image Convolution and Ant Colony Optimisation).},
      address = {Crete, Greece},
      author = {Vladimir Janjic and Christopher Brown and Kenneth MacKenzie and and Kevin Hammond and Marco Danelutto and Marco Aldinucci and Jose Daniel Garcia},
      booktitle = {Proc. of Intl. Euromicro PDP 2016: Parallel Distributed and network-based Processing},
      date-modified = {2017-06-20 08:19:39 +0000},
      doi = {10.1109/PDP.2016.122},
      keywords = {rephrase, fastflow},
      publisher = {IEEE},
      title = {{RPL}: A Domain-Specific Language for Designing and Implementing Parallel C++ Applications},
      url = {https://iris.unito.it/retrieve/handle/2318/1597172/299237/2016_jsupe_stencil_pp_4aperto.pdf},
      year = {2016},
      bdsk-url-1 = {http://hdl.handle.net/2318/1597172},
      bdsk-url-2 = {https://iris.unito.it/retrieve/handle/2318/1597172/299237/2016_jsupe_stencil_pp_4aperto.pdf},
      bdsk-url-3 = {http://dx.doi.org/10.1109/PDP.2016.122}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, P. Kilpatrick, and M. Torquati, “Pool Evolution: A Parallel Pattern for Evolutionary and Symbolic Computing,” International Journal of Parallel Programming, vol. 44, iss. 3, pp. 531-551, 2016. doi:10.1007/s10766-015-0358-5
    [BibTeX] [Abstract] [Download PDF]

    We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.

    @article{pool:ijpp:15,
      abstract = {We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Peter Kilpatrick and Massimo Torquati},
      date-added = {2015-03-21 22:15:47 +0000},
      date-modified = {2015-09-24 11:15:53 +0000},
      doi = {10.1007/s10766-015-0358-5},
      issn = {0885-7458},
      journal = {International Journal of Parallel Programming},
      keywords = {fastflow, paraphrase, repara},
      number = {3},
      pages = {531--551},
      publisher = {Springer US},
      title = {Pool Evolution: A Parallel Pattern for Evolutionary and Symbolic Computing},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_ff_pool_ijpp.pdf},
      volume = {44},
      year = {2016},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_ff_pool_ijpp.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/s10766-015-0358-5}
    }

  • B. Nicolae, C. H. A. Costa, C. Misale, K. Katrinis, and Y. Park, “Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics,” in IEEE/ACM 16th Intl. Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016, Cartagena, Colombia, 2016. doi:10.1109/CCGrid.2016.85
    [BibTeX] [Abstract] [Download PDF]

    Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.

    @inproceedings{16:ccgrid:misale,
      abstract = {Big data analytics is an indispensable tool in transforming science, engineering, medicine, healthcare, finance and ultimately business itself. With the explosion of data sizes and need for shorter time-to-solution, in-memory platforms such as Apache Spark gain increasing popularity. However, this introduces important challenges, among which data shuffling is particularly difficult: on one hand it is a key part of the computation that has a major impact on the overall performance and scalability so its efficiency is paramount, while on the other hand it needs to operate with scarce memory in order to leave as much memory available for data caching. In this context, efficient scheduling of data transfers such that it addresses both dimensions of the problem simultaneously is non-trivial. State-of-the-art solutions often rely on simple approaches that yield sub optimal performance and resource usage. This paper contributes a novel shuffle data transfer strategy that dynamically adapts to the computation with minimal memory utilization, which we briefly underline as a series of design principles.},
      address = {Cartagena, Colombia},
      author = {Bogdan Nicolae and Carlos H. A. Costa and Claudia Misale and Kostas Katrinis and Yoonho Park},
      booktitle = {{IEEE/ACM} 16th Intl. Symposium on Cluster, Cloud and Grid Computing, CCGrid 2016},
      date-modified = {2016-07-26 16:02:54 +0200},
      doi = {10.1109/CCGrid.2016.85},
      keywords = {spark, ibm},
      publisher = {IEEE},
      title = {Towards Memory-Optimized Data Shuffling Patterns for Big Data Analytics},
      url = {http://ieeexplore.ieee.org/document/7515716/},
      year = {2016},
      bdsk-url-1 = {http://ieeexplore.ieee.org/document/7515716/},
      bdsk-url-2 = {http://dx.doi.org/10.1109/CCGrid.2016.85}
    }

  • F. Tordini, M. Drocco, C. Misale, L. Milanesi, P. LiÒ, I. Merelli, M. Torquati, and M. Aldinucci, “NuChart-II: the road to a fast and scalable tool for Hi-C data analysis,” International Journal of High Performance Computing Applications (IJHPCA), pp. 1-16, 2016. doi:10.1177/1094342016668567
    [BibTeX] [Abstract]

    Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.

    @article{16:ijhpca:nuchart,
      abstract = {Recent advances in molecular biology and bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus of a cell. High-throughput molecular biology techniques provide a genome-wide capture of the spatial organization of chromosomes at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout a genome. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, an efficient and highly optimized tool for genomic data analysis that provides a gene-centric, graph-based representation of genomic information. While designing NuChart-II we addressed several common issues in the parallelisation of memory bound algorithms for shared-memory systems. With performance and usability in mind, NuChart-II is a R package that embeds a C++ engine: computing capabilities and memory hierarchy of multi-core architectures are fully exploited, while the versatile R environment for statistical analysis and data visualisation rises the level of abstraction and permits to orchestrate analysis and visualisation of genomic data.},
      author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li{\`o} and Ivan Merelli and Massimo Torquati and Marco Aldinucci},
      date-modified = {2016-10-09 21:55:39 +0000},
      doi = {10.1177/1094342016668567},
      journal = {International Journal of High Performance Computing Applications (IJHPCA)},
      keywords = {fastflow, bioinformatics, repara, rephrase, interomics, mimomics},
      pages = {1--16},
      title = {{NuChart-II}: the road to a fast and scalable tool for {Hi-C} data analysis},
      year = {2016},
      bdsk-url-1 = {http://hdl.handle.net/2318/1607126},
      bdsk-url-2 = {http://dx.doi.org/10.1177/1094342016668567}
    }

  • F. Tordini, I. Merelli, P. LiÒ, L. Milanesi, and M. Aldinucci, “NuchaRt: embedding high-level parallel computing in R for augmented Hi-C data analysis,” in Computational Intelligence Methods for Bioinformatics and Biostatistics, S. I. Publishing, Ed., Cham (ZG): Springer International Publishing, 2016, vol. 9874, pp. 259-272. doi:10.1007/978-3-319-44332-4
    [BibTeX] [Abstract] [Download PDF]

    Recent advances in molecular biology and Bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus. High-throughput chromosome conformation capture techniques provide a genome-wide capture of chromatin contacts at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout the human genome. These important studies are hampered by the lack of biologists-friendly software. In this work we present NuchaRt, an R package that wraps NuChart-II, an efficient and highly optimized C++ tool for the exploration of Hi-C data. By rising the level of abstraction, NuchaRt proposes a high-performance pipeline that allows users to orchestrate analysis and visualisation of multi-omics data, making optimal use of the computing capabilities offered by modern multi-core architectures, combined with the versatile and well known R environment for statistical analysis and data visualisation.

    @incollection{15:lnbi:nuchaRt,
      abstract = {Recent advances in molecular biology and Bioinformatics techniques brought to an explosion of the information about the spatial organisation of the DNA in the nucleus. High-throughput chromosome conformation capture techniques provide a genome-wide capture of chromatin contacts at unprecedented scales, which permit to identify physical interactions between genetic elements located throughout the human genome. These important studies are hampered by the lack of biologists-friendly software. In this work we present NuchaRt, an R package that wraps NuChart-II, an efficient and highly optimized C++ tool for the exploration of Hi-C data. By rising the level of abstraction, NuchaRt proposes a high-performance pipeline that allows users to orchestrate analysis and visualisation of multi-omics data, making optimal use of the computing capabilities offered by modern multi-core architectures, combined with the versatile and well known R environment for statistical analysis and data visualisation.},
      address = {Cham (ZG)},
      author = {Fabio Tordini and Ivan Merelli and Pietro Li{\`o} and Luciano Milanesi and Marco Aldinucci},
      booktitle = {Computational Intelligence Methods for Bioinformatics and Biostatistics},
      doi = {10.1007/978-3-319-44332-4},
      editor = {Springer International Publishing},
      isbn = {978-3-319-44331-7},
      keywords = {fastflow, bioinformatics, repara, interomics, mimomics},
      pages = {259--272},
      publisher = {Springer International Publishing},
      series = {{Lecture Notes in Computer Science}},
      title = {{NuchaRt}: embedding high-level parallel computing in {R} for augmented {Hi-C} data analysis},
      url = {http://link.springer.com/book/10.1007%2F978-3-319-44332-4},
      volume = {9874},
      year = {2016},
      bdsk-url-1 = {http://link.springer.com/book/10.1007%2F978-3-319-44332-4},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-44332-4}
    }

  • F. Tordini, “The road towards a Cloud-based High-Performance solution for genomic data analysis,” PhD Thesis, 2016.
    [BibTeX] [Abstract] [Download PDF]

    Nowadays, molecular biology laboratories are delivering more and more data about DNA organisation, at increasing resolution and in a large number of samples. So much that genomic research is now facing many of the scale-out issues that high-performance computing has been addressing for years: they require powerful infrastructures with fast computing and storage capabilities, with substantial challenges in terms of data processing, statistical analysis and data representation. With this thesis we propose a high-performance pipeline for the analysis and interpretation of heterogeneous genomic information: beside performance, usability and availability are two essential requirements that novel Bioinformatics tools should satisfy. In this perspective, we propose and discuss our efforts towards a solid infrastructure for data processing and storage, where software that operates over data is exposed as a service, and is accessible by users through the Internet. We begin by presenting NuChart-II, a tool for the analysis and interpretation of spatial genomic information. With NuChart-II we propose a graph-based representation of genomic data, which can provide insights on the disposition of genomic elements in the DNA. We also discuss our approach for the normalisation of biases that affect raw sequenced data. We believe that many currently available tools for genomic data analysis are perceived as tricky and troublesome applications, that require highly specialised skills to obtain the desired outcomes. Concerning usability, we want to rise the level of abstraction perceived by the user, but maintain high performance and correctness while providing an exhaustive solution for data visualisation. We also intend to foster the availability of novel tools: in this work we also discuss a cloud solution that delivers computation and storage as dynamically allocated virtual resources via the Internet, while needed software is provided as a service. In this way, the computational demand of genomic research can be satisfied more economically by using lab-scale and enterprise-oriented technologies. Here we discuss our idea of a task farm for the integration of heterogeneous data resulting from different sequencing experiments: we believe that the integration of multi-omic features on a nuclear map can be a valuable mean for studying the interactions among genetic elements. This can reveal insights on biological mechanisms, such as genes regulation, translocations and epigenetic patterns.

    @phdthesis{tordiniThesis16,
      abstract = {Nowadays, molecular biology laboratories are delivering more and more data about DNA organisation, at increasing resolution and in a large number of samples. So much that genomic research is now facing many of the scale-out issues that high-performance computing has been addressing for years: they require powerful infrastructures with fast computing and storage capabilities, with substantial challenges in terms of data processing, statistical analysis and data representation.
      With this thesis we propose a high-performance pipeline for the analysis and interpretation of heterogeneous genomic information: beside performance, usability and availability are two essential requirements that novel Bioinformatics tools should satisfy. In this perspective, we propose and discuss our efforts towards a solid infrastructure for data processing and storage, where software that operates over data is exposed as a service, and is accessible by users through the Internet.
      We begin by presenting NuChart-II, a tool for the analysis and interpretation of spatial genomic information. With NuChart-II we propose a graph-based representation of genomic data, which can provide insights on the disposition of genomic elements in the DNA. We also discuss our approach for the normalisation of biases that affect raw sequenced data.
      We believe that many currently available tools for genomic data analysis are perceived as tricky and troublesome applications, that require highly specialised skills to obtain the desired outcomes. Concerning usability, we want to rise the level of abstraction perceived by the user, but maintain high performance and correctness while providing an exhaustive solution for data visualisation.
      We also intend to foster the availability of novel tools: in this work we also discuss a cloud solution that delivers computation and storage as dynamically allocated virtual resources via the Internet, while needed software is provided as a service. In this way, the computational demand of genomic research can be satisfied more economically by using lab-scale and enterprise-oriented technologies. Here we discuss our idea of a task farm for the integration of heterogeneous data resulting from different sequencing experiments: we believe that the integration of multi-omic features on a nuclear map can be a valuable mean for studying the interactions among genetic elements. This can reveal insights on biological mechanisms, such as genes regulation, translocations and epigenetic patterns.},
      author = {Fabio Tordini},
      keywords = {fastflow, bioinformatics},
      month = {4},
      school = {Computer Science Department, University of Torino, Italy},
      title = {{The road towards a Cloud-based High-Performance solution for genomic data analysis}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2016_tordini_phdthesis.pdf},
      year = {2016},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2016_tordini_phdthesis.pdf}
    }

2015

  • M. Aldinucci, M. Danelutto, M. Drocco, P. Kilpatrick, G. P. Pezzi, and M. Torquati, “The Loop-of-Stencil-Reduce paradigm,” in Proc. of Intl. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara), Helsinki, Finland, 2015, pp. 172-177. doi:10.1109/Trustcom.2015.628
    [BibTeX] [Abstract] [Download PDF]

    In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.

    @inproceedings{opencl:ff:ispa:15,
      abstract = {In this paper we advocate the Loop-of-stencil-reduce pattern as a way to simplify the parallel programming of heterogeneous platforms (multicore+GPUs). Loop-of-Stencil-reduce is general enough to subsume map, reduce, map-reduce, stencil, stencil-reduce, and, crucially, their usage in a loop. It transparently targets (by using OpenCL) combinations of CPU cores and GPUs, and it makes it possible to simplify the deployment of a single stencil computation kernel on different GPUs. The paper discusses the implementation of Loop-of-stencil-reduce within the FastFlow parallel framework, considering a simple iterative data-parallel application as running example (Game of Life) and a highly effective parallel filter for visual data restoration to assess performance. Thanks to the high-level design of the Loop-of-stencil-reduce, it was possible to run the filter seamlessly on a multicore machine, on multi-GPUs, and on both.},
      address = {Helsinki, Finland},
      author = {Marco Aldinucci and Marco Danelutto and Maurizio Drocco and Peter Kilpatrick and Guilherme {Peretti Pezzi} and Massimo Torquati},
      booktitle = {Proc. of Intl. Workshop on Reengineering for Parallelism in Heterogeneous Parallel Platforms (RePara)},
      date-added = {2015-07-05 09:48:33 +0000},
      date-modified = {2015-09-24 11:14:56 +0000},
      doi = {10.1109/Trustcom.2015.628},
      keywords = {fastflow, repara, nvidia},
      month = aug,
      pages = {172-177},
      publisher = {IEEE},
      title = {The Loop-of-Stencil-Reduce paradigm},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_RePara_ISPA.pdf},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_RePara_ISPA.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/Trustcom.2015.628}
    }

  • M. Drocco, C. Misale, G. P. Pezzi, F. Tordini, and M. Aldinucci, “Memory-Optimised Parallel Processing of Hi-C Data,” in Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing, 2015, pp. 1-8. doi:10.1109/PDP.2015.63
    [BibTeX] [Abstract] [Download PDF]

    This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.

    @inproceedings{nuchart:speedup:15,
      abstract = {This paper presents the optimisation efforts on the creation of a graph-based mapping representation of gene adjacency. The method is based on the Hi-C process, starting from Next Generation Sequencing data, and it analyses a huge amount of static data in order to produce maps for one or more genes. Straightforward parallelisation of this scheme does not yield acceptable performance on multicore architectures since the scalability is rather limited due to the memory bound nature of the problem. This work focuses on the memory optimisations that can be applied to the graph construction algorithm and its (complex) data structures to derive a cache-oblivious algorithm and eventually to improve the memory bandwidth utilisation. We used as running example NuChart-II, a tool for annotation and statistic analysis of Hi-C data that creates a gene-centric neigh- borhood graph. The proposed approach, which is exemplified for Hi-C, addresses several common issue in the parallelisation of memory bound algorithms for multicore. Results show that the proposed approach is able to increase the parallel speedup from 7x to 22x (on a 32-core platform). Finally, the proposed C++ implementation outperforms the first R NuChart prototype, by which it was not possible to complete the graph generation because of strong memory-saturation problems.},
      author = {Maurizio Drocco and Claudia Misale and Guilherme {Peretti Pezzi} and Fabio Tordini and Marco Aldinucci},
      booktitle = {Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing},
      date-added = {2014-12-03 13:54:08 +0000},
      date-modified = {2015-09-24 11:17:47 +0000},
      doi = {10.1109/PDP.2015.63},
      keywords = {fastflow,bioinformatics, paraphrase, repara, impact},
      month = mar,
      pages = {1-8},
      publisher = {IEEE},
      title = {Memory-Optimised Parallel Processing of {Hi-C} Data},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_memopt.pdf},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_memopt.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2015.63}
    }

  • F. Tordini, M. Drocco, C. Misale, L. Milanesi, P. LiÒ, I. Merelli, and M. Aldinucci, “Parallel Exploration of the Nuclear Chromosome Conformation with NuChart-II,” in Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing, 2015. doi:10.1109/PDP.2015.104
    [BibTeX] [Abstract] [Download PDF]

    High-throughput molecular biology techniques are widely used to identify physical interactions between genetic elements located throughout the human genome. Chromosome Conformation Capture (3C) and other related techniques allow to investigate the spatial organisation of chromosomes in the cell’s natural state. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we introduce NuChart-II, a tool for Hi-C data analysis that provides a gene-centric view of the chromosomal neighbour- hood in a graph-based manner. NuChart-II is an efficient and highly optimized C++ re-implementation of a previous prototype package developed in R. Representing Hi-C data using a graph- based approach overcomes the common view relying on genomic coordinates and permits the use of graph analysis techniques to explore the spatial conformation of a gene neighbourhood.

    @inproceedings{nuchar:tool:15,
      abstract = {High-throughput molecular biology techniques are widely used to identify physical interactions between genetic elements located throughout the human genome. Chromosome Conformation Capture (3C) and other related techniques allow to investigate the spatial organisation of chromosomes in the cell's natural state. Recent results have shown that there is a large correlation between co-localization and co-regulation of genes, but these important information are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we introduce NuChart-II, a tool for Hi-C data analysis that provides a gene-centric view of the chromosomal neighbour- hood in a graph-based manner. NuChart-II is an efficient and highly optimized C++ re-implementation of a previous prototype package developed in R. Representing Hi-C data using a graph- based approach overcomes the common view relying on genomic coordinates and permits the use of graph analysis techniques to explore the spatial conformation of a gene neighbourhood.
    },
      author = {Fabio Tordini and Maurizio Drocco and Claudia Misale and Luciano Milanesi and Pietro Li{\`o} and Ivan Merelli and Marco Aldinucci},
      booktitle = {Proc. of Intl. Euromicro PDP 2015: Parallel Distributed and network-based Processing},
      date-added = {2014-12-03 13:51:17 +0000},
      date-modified = {2015-09-24 11:16:43 +0000},
      doi = {10.1109/PDP.2015.104},
      keywords = {fastflow, bioinformatics, paraphrase, repara, impact},
      month = mar,
      publisher = {IEEE},
      title = {Parallel Exploration of the Nuclear Chromosome Conformation with {NuChart-II}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_nuchartff.pdf},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_pdp_nuchartff.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2015.104}
    }

  • G. P. Pezzi, E. Vaissié, Y. Viala, D. Caromel, and P. Gourbesville, “Parallel profiling of water distribution networks using the Clément formula,” Applied Mathematics and Computation, vol. 267, pp. 83-95, 2015. doi:10.1016/j.amc.2015.05.084
    [BibTeX] [Abstract] [Download PDF]

    Abstract Optimization of water distribution is a crucial issue which has been targeted by many modeling tools. Useful models, implemented several decades ago, need to be updated and implemented in more powerful computing environments. This paper presents the distributed and redesigned version of a legacy hydraulic simulation software written in Fortran (IRMA) that has been used for over 30 years by the Société du Canal de Provence in order to design and to maintain water distribution networks. \IRMA\ was developed aiming mainly at the treatment of irrigation networks — by using the Clément demand model and is now used to manage more than 6000 km of piped networks. The complexity and size of networks have been growing since the creation of \IRMA\ and the legacy software could not handle the simulation of very large networks in terms of performance.SAC This limitation has finally imposed to redesign the code by using modern tools and language (Java), and also to run distributed simulations by using the ProActive Parallel Suite.

    @article{PerettiPezzi201583,
      abstract = {Abstract Optimization of water distribution is a crucial issue which has been targeted by many modeling tools. Useful models, implemented several decades ago, need to be updated and implemented in more powerful computing environments. This paper presents the distributed and redesigned version of a legacy hydraulic simulation software written in Fortran (IRMA) that has been used for over 30 years by the Soci{\'e}t{\'e} du Canal de Provence in order to design and to maintain water distribution networks. \{IRMA\} was developed aiming mainly at the treatment of irrigation networks -- by using the Cl{\'e}ment demand model and is now used to manage more than 6000 km of piped networks. The complexity and size of networks have been growing since the creation of \{IRMA\} and the legacy software could not handle the simulation of very large networks in terms of performance.SAC This limitation has finally imposed to redesign the code by using modern tools and language (Java), and also to run distributed simulations by using the ProActive Parallel Suite. },
      author = {Guilherme Peretti Pezzi and Evelyne Vaissi{\'e} and Yann Viala and Denis Caromel and Philippe Gourbesville},
      date-modified = {2015-09-27 12:12:12 +0000},
      doi = {10.1016/j.amc.2015.05.084},
      issn = {0096-3003},
      journal = {Applied Mathematics and Computation},
      keywords = {Java, impact},
      note = {The Fourth European Seminar on Computing (ESCO 2014)},
      pages = {83 - 95},
      title = {Parallel profiling of water distribution networks using the Cl{\'e}ment formula},
      url = {http://www.sciencedirect.com/science/article/pii/S0096300315007080},
      volume = {267},
      year = {2015},
      bdsk-url-1 = {http://www.sciencedirect.com/science/article/pii/S0096300315007080},
      bdsk-url-2 = {http://dx.doi.org/10.1016/j.amc.2015.05.084}
    }

  • F. Tordini, M. Drocco, I. Merelli, L. Milanesi, P. LiÒ, and M. Aldinucci, “NuChart-II: a graph-based approach for the analysis and interpretation of Hi-C data,” in Computational Intelligence Methods for Bioinformatics and Biostatistics – 11th International Meeting, CIBB 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers, Cambridge, UK, 2015, pp. 298-311. doi:10.1007/978-3-319-24462-4_25
    [BibTeX] [Abstract] [Download PDF]

    Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.

    @inproceedings{14:ff:nuchart:cibb,
      abstract = {Long-range chromosomal associations between genomic regions, and their repositioning in the 3D space of the nucleus, are now considered to be key contributors to the regulation of gene expressions, and important links have been highlighted with other genomic features involved in DNA rearrangements. Recent Chromosome Conformation Capture (3C) measurements performed with high throughput sequencing (Hi-C) and molecular dynamics studies show that there is a large correlation between co-localization and co-regulation of genes, but these important researches are hampered by the lack of biologists-friendly analysis and visualisation software. In this work we present NuChart-II, a software that allows the user to annotate and visualize a list of input genes with information relying on Hi-C data, integrating knowledge data about genomic features that are involved in the chromosome spatial organization. This software works directly with sequenced reads to identify related Hi-C fragments, with the aim of creating gene-centric neighbourhood graphs on which multi-omics features can be mapped. NuChart-II is a highly optimized implementation of a previous prototype package developed in R, in which the graph-based representation of Hi-C data was tested. The prototype showed inevitable problems of scalability while working genome-wide on large datasets: particular attention has been paid in optimizing the data structures employed while constructing the neighbourhood graph, so as to foster an efficient parallel implementation of the software. The normalization of Hi-C data has been modified and improved, in order to provide a reliable estimation of proximity likelihood for the genes.},
      address = {Cambridge, UK},
      author = {Fabio Tordini and Maurizio Drocco and Ivan Merelli and Luciano Milanesi and Pietro Li{\`o} and Marco Aldinucci},
      booktitle = {Computational Intelligence Methods for Bioinformatics and Biostatistics - 11th International Meeting, {CIBB} 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers},
      date-modified = {2015-09-24 11:22:30 +0000},
      doi = {10.1007/978-3-319-24462-4_25},
      editor = {Clelia Di Serio and Pietro Li{\`{o}} and Alessandro Nonis and Roberto Tagliaferri},
      isbn = {978-3-319-24461-7},
      keywords = {fastflow, bioinformatics, paraphrase, repara, interomics, mimomics, hirma},
      pages = {298-311},
      publisher = {Springer},
      series = {{LNCS}},
      title = {{NuChart-II}: a graph-based approach for the analysis and interpretation of {Hi-C} data},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_nuchart_cibb.pdf},
      volume = {8623},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_nuchart_cibb.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-24462-4_25}
    }

  • M. Aldinucci, G. P. Pezzi, M. Drocco, C. Spampinato, and M. Torquati, “Parallel Visual Data Restoration on Multi-GPGPUs using Stencil-Reduce Pattern,” International Journal of High Performance Computing Applications, vol. 29, iss. 4, pp. 461-472, 2015. doi:10.1177/1094342014567907
    [BibTeX] [Abstract] [Download PDF]

    In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented.

    @article{ff:denoiser:ijhpca:15,
      abstract = {In this paper, a highly effective parallel filter for visual data restoration is presented. The filter is designed following a skeletal approach, using a newly proposed stencil-reduce, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multicore machine, on multi-GPGPUs, or on both. The design and implementation of the filter are discussed, and an experimental evaluation is presented.},
      author = {Marco Aldinucci and Guilherme {Peretti Pezzi} and Maurizio Drocco and Concetto Spampinato and Massimo Torquati},
      date-added = {2014-08-23 00:06:10 +0000},
      date-modified = {2015-09-24 11:21:20 +0000},
      doi = {10.1177/1094342014567907},
      journal = {International Journal of High Performance Computing Applications},
      keywords = {fastflow, paraphrase, impact, nvidia},
      number = {4},
      pages = {461-472},
      title = {Parallel Visual Data Restoration on Multi-{GPGPUs} using Stencil-Reduce Pattern},
      url = {http://calvados.di.unipi.it/storage/paper_files/2015_ff_stencilreduce_ijhpca.pdf},
      volume = {29},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2015_ff_stencilreduce_ijhpca.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1177/1094342014567907}
    }

  • M. Aldinucci, A. Bracciali, T. Marschall, M. Patterson, N. Pisanti, and M. Torquati, “High-Performance Haplotype Assembly,” in Computational Intelligence Methods for Bioinformatics and Biostatistics – 11th International Meeting, CIBB 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers, Cambridge, UK, 2015, pp. 245-258. doi:10.1007/978-3-319-24462-4_21
    [BibTeX] [Abstract] [Download PDF]

    The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology’s current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.

    @inproceedings{14:ff:whatsapp:cibb,
      abstract = {The problem of Haplotype Assembly is an essential step in human genome analysis. It is typically formalised as the Minimum Error Correction (MEC) problem which is NP-hard. MEC has been approached using heuristics, integer linear programming, and fixed-parameter tractability (FPT), including approaches whose runtime is exponential in the length of the DNA fragments obtained by the sequencing process. Technological improvements are currently increasing fragment length, which drastically elevates computational costs for such methods. We present pWhatsHap, a multi-core parallelisation of WhatsHap, a recent FPT optimal approach to MEC. WhatsHap moves complexity from fragment length to fragment overlap and is hence of particular interest when considering sequencing technology's current trends. pWhatsHap further improves the efficiency in solving the MEC problem, as shown by experiments performed on datasets with high coverage.},
      address = {Cambridge, UK},
      author = {Marco Aldinucci and Andrea Bracciali and Tobias Marschall and Murray Patterson and Nadia Pisanti and Massimo Torquati},
      booktitle = {Computational Intelligence Methods for Bioinformatics and Biostatistics - 11th International Meeting, {CIBB} 2014, Cambridge, UK, June 26-28, 2014, Revised Selected Papers},
      date-added = {2014-12-01 23:07:21 +0000},
      date-modified = {2016-08-20 14:15:59 +0000},
      doi = {10.1007/978-3-319-24462-4_21},
      editor = {Clelia Di Serio and Pietro Li{\`{o}} and Alessandro Nonis and Roberto Tagliaferri},
      keywords = {fastflow, bioinformatics},
      pages = {245--258},
      publisher = {Springer},
      series = {{LNCS}},
      title = {High-Performance Haplotype Assembly},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_pHaplo_cibb.pdf},
      volume = {8623},
      year = {2015},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_pHaplo_cibb.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-319-24462-4_21}
    }

  • I. Merelli, F. Tordini, M. Drocco, M. Aldinucci, P. LiÒ, and L. Milanesi, “Integrating Multi-omic features exploiting Chromosome Conformation Capture data,” Frontiers in Genetics, vol. 6, iss. 40, 2015. doi:10.3389/fgene.2015.00040
    [BibTeX] [Abstract] [Download PDF]

    The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell’s natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.

    @article{nuchart:frontiers:15,
      abstract = {The representation, integration and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture (3C) allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighbourhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.},
      author = {Merelli, Ivan and Tordini, Fabio and Drocco, Maurizio and Aldinucci, Marco and Li{\`o}, Pietro and Milanesi, Luciano},
      date-added = {2015-02-01 16:38:47 +0000},
      date-modified = {2015-09-24 11:23:10 +0000},
      doi = {10.3389/fgene.2015.00040},
      issn = {1664-8021},
      journal = {Frontiers in Genetics},
      keywords = {bioinformatics, fastflow, interomics, hirma, mimomics},
      number = {40},
      title = {Integrating Multi-omic features exploiting {Chromosome Conformation Capture} data},
      url = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},
      volume = {6},
      year = {2015},
      bdsk-url-1 = {http://journal.frontiersin.org/Journal/10.3389/fgene.2015.00040/pdf},
      bdsk-url-2 = {http://dx.doi.org/10.3389/fgene.2015.00040}
    }

2014

  • M. G. Epitropakis, A. Bracciali, M. Aldinucci, E. Potts, and E. K. Burke, “Predictive scheduling for optimal cloud configuration,” in Proc. of 10th Intl. Conference on the Practice and Theory of Automated Timetabling, York, United Kingdom, 2014.
    [BibTeX] [Download PDF]
    @inproceedings{cloud:patat:14,
      address = {York, United Kingdom},
      author = {Michael G. Epitropakis and Andrea Bracciali and Marco Aldinucci and Emily Potts and Edmund K. Burke},
      booktitle = {Proc. of 10th Intl. Conference on the Practice and Theory of Automated Timetabling},
      date-added = {2015-03-15 14:34:03 +0000},
      date-modified = {2015-03-15 15:25:52 +0000},
      editor = {Ender \"Ozcan and Edmund K. Burke and Barry MCCollum},
      isbn = {978‐0‐9929984‐0‐0},
      month = aug,
      publisher = {PATAT},
      title = {Predictive scheduling for optimal cloud configuration},
      url = {http://www.patatconference.org/patat2014/proceedings/3_12.pdf},
      year = {2014},
      bdsk-url-1 = {http://www.patatconference.org/patat2014/proceedings/3_12.pdf}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, P. Kilpatrick, and M. Torquati, “Pool evolution: a domain specific parallel pattern,” in Proc.of the 7th Intl. Symposium on High-level Parallel Programming and Applications (HLPP), Amsterdam, The Netherlands, 2014.
    [BibTeX] [Abstract] [Download PDF]

    We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, developed in the framework of the ParaPhrase EU-funded FP7 project, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.

    @inproceedings{2014:ff:pool:hlpp,
      abstract = {We introduce a new parallel pattern derived from a specific application domain and show how it turns out to have application beyond its domain of origin. The pool evolution pattern models the parallel evolution of a population subject to mutations and evolving in such a way that a given fitness function is optimized. The pattern has been demonstrated to be suitable for capturing and modeling the parallel patterns underpinning various evolutionary algorithms, as well as other parallel patterns typical of symbolic computation. In this paper we introduce the pattern, developed in the framework of the ParaPhrase EU-funded FP7 project, we discuss its implementation on modern multi/many core architectures and finally present experimental results obtained with FastFlow and Erlang implementations to assess its feasibility and scalability.},
      address = {Amsterdam, The Netherlands},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Peter Kilpatrick and Massimo Torquati},
      booktitle = {Proc.of the 7th Intl. Symposium on High-level Parallel Programming and Applications (HLPP)},
      date-modified = {2015-09-27 12:14:30 +0000},
      keywords = {fastflow, paraphrase, repara},
      month = jul,
      title = {Pool evolution: a domain specific parallel pattern},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_hlpp_pool.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_hlpp_pool.pdf}
    }

  • M. Aldinucci, M. Torquati, M. Drocco, G. P. Pezzi, and C. Spampinato, “An Overview of FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs,” in GPU Technology Conference (GTC 2014), San Jose, CA, USA, 2014.
    [BibTeX] [Abstract] [Download PDF]

    Get an overview of FastFlow’s parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. For a more detailed and technical review of FastFlow’s parallel patterns as well as a use case where we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.

    @inproceedings{ff:gtc:2014:short,
      abstract = {Get an overview of FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. For a more detailed and technical review of FastFlow's parallel patterns as well as a use case where we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.},
      address = {San Jose, CA, USA},
      author = {Marco Aldinucci and Massimo Torquati and Maurizio Drocco and Guilherme {Peretti Pezzi} and Concetto Spampinato},
      booktitle = {GPU Technology Conference (GTC 2014)},
      date-added = {2014-04-13 23:20:52 +0000},
      date-modified = {2016-08-19 21:45:51 +0000},
      keywords = {fastflow, gpu, nvidia, impact, paraphrase},
      month = mar,
      title = {An Overview of FastFlow: Combining Pattern-Level Abstraction and Efficiency in {GPGPUs}},
      url = {http://calvados.di.unipi.it/storage/talks/2014_S4585-Marco-Aldinucci.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/talks/2014_S4585-Marco-Aldinucci.pdf}
    }

  • M. Aldinucci, M. Torquati, M. Drocco, G. P. Pezzi, and C. Spampinato, “FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs,” in GPU Technology Conference (GTC 2014), San Jose, CA, USA, 2014.
    [BibTeX] [Abstract] [Download PDF]

    Learn how FastFlow’s parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.

    @inproceedings{ff:gtc:2014,
      abstract = {Learn how FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.},
      address = {San Jose, CA, USA},
      author = {Marco Aldinucci and Massimo Torquati and Maurizio Drocco and Guilherme {Peretti Pezzi} and Concetto Spampinato},
      booktitle = {GPU Technology Conference (GTC 2014)},
      date-added = {2014-04-19 12:52:40 +0000},
      date-modified = {2016-08-19 21:45:39 +0000},
      keywords = {fastflow, gpu, nvidia, impact, paraphrase},
      month = mar,
      title = {FastFlow: Combining Pattern-Level Abstraction and Efficiency in {GPGPUs}},
      url = {http://calvados.di.unipi.it/storage/talks/2014_S4729-Marco-Aldinucci.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/talks/2014_S4729-Marco-Aldinucci.pdf}
    }

  • M. Drocco, M. Aldinucci, and M. Torquati, “A Dynamic Memory Allocator for heterogeneous platforms,” in Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) — Poster Abstracts, Fiuggi, Italy, 2014.
    [BibTeX] [Abstract] [Download PDF]

    Modern computers are built upon heterogeneous multi-core/many cores architectures (e.g. GPGPU connected to multi-core CPU). Achieving peak performance on these architectures is hard and may require a substantial programming effort. High-level programming patterns, coupled with efficient low-level runtime supports, have been proposed to relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance. Among them are (parallel) dynamic memory allocation and effective exploitation of the memory hierarchy. The memory allocator is often a bottleneck that severely limits program scalability, robustness and portability on parallel systems. In this work we introduce a novel memory allocator, based on the FastFlow’s allocator and the recently proposed CUDA Unified Memory, which aims to efficiently integrate host and device memories into a unique dynamic-allocable memory space, accessible transparently by both host and device code.

    @inproceedings{ff:acaces:14,
      abstract = {Modern computers are built upon heterogeneous multi-core/many cores architectures (e.g. GPGPU connected to multi-core CPU). Achieving peak performance on these architectures is hard and may require a substantial programming effort. High-level programming patterns, coupled with efficient low-level runtime supports, have been proposed to relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance. Among them are (parallel) dynamic memory allocation and effective exploitation of the memory hierarchy. The memory allocator is often a bottleneck that severely limits program scalability, robustness and portability on parallel systems.
    In this work we introduce a novel memory allocator, based on the FastFlow's allocator and the recently proposed CUDA Unified Memory, which aims to efficiently integrate host and device memories into a unique dynamic-allocable memory space, accessible transparently by both host and device code.},
      address = {Fiuggi, Italy},
      author = {Maurizio Drocco and Marco Aldinucci and Massimo Torquati},
      booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
      date-modified = {2016-08-20 17:29:47 +0000},
      keywords = {fastflow, nvidia},
      publisher = {HiPEAC},
      title = {A Dynamic Memory Allocator for heterogeneous platforms},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_ACACES_ex-abstract.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_ACACES_ex-abstract.pdf},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2014_ACACES_ex-abstract.pdf}
    }

  • A. Secco, I. Uddin, G. P. Pezzi, and M. Torquati, “Message passing on InfiniBand RDMA for parallel run-time supports,” in Proc. of Intl. Euromicro PDP 2014: Parallel Distributed and network-based Processing, Torino, Italy, 2014. doi:10.1109/PDP.2014.23
    [BibTeX] [Abstract] [Download PDF]

    InfiniBand networks are commonly used in the high performance computing area. They offer RDMA-based opera- tions that help to improve the performance of communication subsystems. In this paper, we propose a minimal message-passing communication layer providing the programmer with a point-to- point communication channel implemented by way of InfiniBand RDMA features. Differently from other libraries exploiting the InfiniBand features, such as the well-known Message Passing Interface (MPI), the proposed library is a communication layer only rather than a programming model, and can be easily used as building block for high-level parallel programming frameworks. Evaluated on micro-benchmarks, the proposed RDMA-based communication channel implementation achieves a comparable performance with highly optimised MPI/InfiniBand implemen- tations. Eventually, the flexibility of the communication layer is evaluated by integrating it within the FastFlow parallel frame- work, currently supporting TCP/IP networks (via the ZeroMQ communication library).

    @inproceedings{ff:infiniband:pdp:14,
      abstract = {InfiniBand networks are commonly used in the high performance computing area. They offer RDMA-based opera- tions that help to improve the performance of communication subsystems. In this paper, we propose a minimal message-passing communication layer providing the programmer with a point-to- point communication channel implemented by way of InfiniBand RDMA features. Differently from other libraries exploiting the InfiniBand features, such as the well-known Message Passing Interface (MPI), the proposed library is a communication layer only rather than a programming model, and can be easily used as building block for high-level parallel programming frameworks. Evaluated on micro-benchmarks, the proposed RDMA-based communication channel implementation achieves a comparable performance with highly optimised MPI/InfiniBand implemen- tations. Eventually, the flexibility of the communication layer is evaluated by integrating it within the FastFlow parallel frame- work, currently supporting TCP/IP networks (via the ZeroMQ communication library).},
      address = {Torino, Italy},
      author = {Alessandro Secco and Irfan Uddin and Guilherme {Peretti Pezzi} and Massimo Torquati},
      booktitle = {Proc. of Intl. Euromicro PDP 2014: Parallel Distributed and network-based Processing},
      date-added = {2013-12-07 18:22:35 +0000},
      date-modified = {2015-09-27 12:35:04 +0000},
      doi = {10.1109/PDP.2014.23},
      editor = {Marco Aldinucci and Daniele D'Agostino and Peter Kilpatrick},
      keywords = {fastflow, paraphrase, impact},
      publisher = {IEEE},
      title = {Message passing on InfiniBand {RDMA} for parallel run-time supports},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_infiniband_pdp.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_infiniband_pdp.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2014.23}
    }

  • C. Misale, G. Ferrero, M. Torquati, and M. Aldinucci, “Sequence alignment tools: one parallel pattern to rule them all?,” BioMed Research International, 2014. doi:10.1155/2014/539410
    [BibTeX] [Abstract] [Download PDF]

    In this paper we advocate high-level programming methodology for Next Generation Sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools against their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols and task scheduling, gaining more possibility for seamless performance tuning. In this work we show some use case in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

    @article{bowtie-bwa:ff:multicore:biomed:14,
      abstract = {In this paper we advocate high-level programming methodology for Next Generation Sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools against their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols and task scheduling, gaining more possibility for seamless performance tuning. In this work we show some use case in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.
    },
      author = {Claudia Misale and Giulio Ferrero and Massimo Torquati and Marco Aldinucci},
      date-added = {2013-01-15 15:55:59 +0000},
      date-modified = {2015-09-27 12:16:28 +0000},
      doi = {10.1155/2014/539410},
      journal = {BioMed Research International},
      keywords = {fastflow,bioinformatics, paraphrase, repara},
      title = {Sequence alignment tools: one parallel pattern to rule them all?},
      url = {http://downloads.hindawi.com/journals/bmri/2014/539410.pdf},
      year = {2014},
      bdsk-url-1 = {http://downloads.hindawi.com/journals/bmri/2014/539410.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1155/2014/539410}
    }

  • C. Spampinato, I. Kavasidis, M. Aldinucci, C. Pino, D. Giordano, and A. Faro, “Discovering Biological Knowledge by Integrating High Throughput Data and Scientific Literature on the Cloud,” Concurrency and Computation: Practice and Experience, vol. 26, iss. 10, pp. 1771-1786, 2014. doi:10.1002/cpe.3130
    [BibTeX] [Abstract] [Download PDF]

    In this paper, we present a bioinformatics knowledge discovery tool for extracting and validating associations between biological entities. By mining specialised scientific literature, the tool not only generates biological hypotheses in the form of associations between genes, proteins, miRNA and diseases, but also validates the plausibility of such associations against high-throughput biological data (e.g. microarray) and annotated databases (e.g. Gene Ontology). Both the knowledge discovery system and its validation are carried out by exploiting the advantages and the potentialities of the Cloud, which allowed us to derive and check the validity of thousands of biological associations in a reasonable amount of time. The system was tested on a dataset containing more than 1000 gene-disease associations achieving an average recall of about 71\%, outperforming existing approaches. The results also showed that porting a data-intensive application in an IaaS cloud environment boosts significantly the application’s efficiency.

    @article{biocloud:ccpe:13,
      abstract = {In this paper, we present a bioinformatics knowledge discovery tool for extracting and validating associations between biological entities. By mining specialised scientific literature, the tool not only generates biological hypotheses in the form of associations between genes, proteins, miRNA and diseases, but also validates the plausibility of such associations against high-throughput biological data (e.g. microarray) and annotated databases (e.g. Gene Ontology). Both the knowledge discovery system and its validation are carried out by exploiting the advantages and the potentialities of the Cloud, which allowed us to derive and check the validity of thousands of biological associations in a reasonable amount of time. The system was tested on a dataset containing more than 1000 gene-disease associations achieving an average recall of about 71\%, outperforming existing approaches. The results also showed that porting a data-intensive application in an IaaS cloud environment boosts significantly the application's efficiency.},
      author = {Concetto Spampinato and Isaak Kavasidis and Marco Aldinucci and Carmelo Pino and Daniela Giordano and Alberto Faro},
      date-added = {2014-12-21 17:48:24 +0000},
      date-modified = {2015-03-13 00:30:53 +0000},
      doi = {10.1002/cpe.3130},
      journal = {Concurrency and Computation: Practice and Experience},
      keywords = {cloud},
      number = {10},
      pages = {1771-1786},
      title = {Discovering Biological Knowledge by Integrating High Throughput Data and Scientific Literature on the Cloud},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_biocloud_ccpe.pdf},
      volume = {26},
      year = {2014},
      bdsk-url-1 = {http://dx.doi.org/10.1002/cpe.3130},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2013_biocloud_ccpe.pdf}
    }

  • M. Aldinucci, M. Torquati, C. Spampinato, M. Drocco, C. Misale, C. Calcagno, and M. Coppo, “Parallel stochastic systems biology in the cloud,” Briefings in Bioinformatics, vol. 15, iss. 5, pp. 798-813, 2014. doi:10.1093/bib/bbt040
    [BibTeX] [Abstract] [Download PDF]

    The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud.

    @article{cwc:cloud:bib:13,
      abstract = {The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud.},
      author = {Marco Aldinucci and Massimo Torquati and Concetto Spampinato and Maurizio Drocco and Claudia Misale and Cristina Calcagno and Mario Coppo},
      date-added = {2014-12-21 17:49:54 +0000},
      date-modified = {2015-09-27 12:33:52 +0000},
      doi = {10.1093/bib/bbt040},
      issn = {1467-5463},
      journal = {Briefings in Bioinformatics},
      keywords = {fastflow, bioinformatics, cloud, paraphrase, impact, biobits},
      number = {5},
      pages = {798-813},
      title = {Parallel stochastic systems biology in the cloud},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_ff_bio_cloud_briefings.pdf},
      volume = {15},
      year = {2014},
      bdsk-url-1 = {http://dx.doi.org/10.1093/bib/bbt040},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2013_ff_bio_cloud_briefings.pdf}
    }

  • C. Misale, “Accelerating Bowtie2 with a lock-less concurrency approach and memory affinity,” in Proc. of Intl. Euromicro PDP 2014: Parallel Distributed and network-based Processing, Torino, Italy, 2014. doi:10.1109/PDP.2014.50
    [BibTeX] [Abstract] [Download PDF]

    The implementation of DNA alignment tools for Bioinformatics lead to face different problems that dip into performances. A single alignment takes an amount of time that is not predictable and there are different factors that can affect performances, for instance the length of sequences can determine the computational grain of the task and mismatches or insertion/deletion (indels) increase time needed to complete an alignment. Moreover, an alignment is a strong memory- bound problem because of the irregular memory access pat- terns and limitations in memory-bandwidth. Over the years, many alignment tools were implemented. A concrete example is Bowtie2, one of the fastest (concurrent, Pthread-based) and state of the art not GPU-based alignment tool. Bowtie2 exploits concurrency by instantiating a pool of threads, which have access to a global input dataset, share the reference genome and have access to different objects for collecting alignment results. In this paper a modified implementation of Bowtie2 is presented, in which the concurrency structure has been changed. The proposed implementation exploits the task-farm skeleton pattern implemented as a Master-Worker. The Master-Worker pattern permits to delegate only to the Master thread dataset reading and to make private to each Worker data structures that are shared in the original version. Only the reference genome is left shared. As a further optimisation, the Master and each Worker were pinned on cores and the reference genome was allocated interleaved among memory nodes. The proposed implementation is able to gain up to 10 speedup points over the original implementation.

    @inproceedings{ff:bowtie2:pdp:14,
      abstract = {The implementation of DNA alignment tools for Bioinformatics lead to face different problems that dip into performances. A single alignment takes an amount of time that is not predictable and there are different factors that can affect performances, for instance the length of sequences can determine the computational grain of the task and mismatches or insertion/deletion (indels) increase time needed to complete an alignment. Moreover, an alignment is a strong memory- bound problem because of the irregular memory access pat- terns and limitations in memory-bandwidth. Over the years, many alignment tools were implemented. A concrete example is Bowtie2, one of the fastest (concurrent, Pthread-based) and state of the art not GPU-based alignment tool. Bowtie2 exploits concurrency by instantiating a pool of threads, which have access to a global input dataset, share the reference genome and have access to different objects for collecting alignment results. In this paper a modified implementation of Bowtie2 is presented, in which the concurrency structure has been changed. The proposed implementation exploits the task-farm skeleton pattern implemented as a Master-Worker. The Master-Worker pattern permits to delegate only to the Master thread dataset reading and to make private to each Worker data structures that are shared in the original version. Only the reference genome is left shared. As a further optimisation, the Master and each Worker were pinned on cores and the reference genome was allocated interleaved among memory nodes. The proposed implementation is able to gain up to 10 speedup points over the original implementation.},
      address = {Torino, Italy},
      author = {Claudia Misale},
      booktitle = {Proc. of Intl. Euromicro PDP 2014: Parallel Distributed and network-based Processing},
      date-added = {2013-12-07 18:25:55 +0000},
      date-modified = {2015-09-27 12:41:24 +0000},
      doi = {10.1109/PDP.2014.50},
      editor = {Marco Aldinucci and Daniele D'Agostino and Peter Kilpatrick},
      keywords = {fastflow, paraphrase},
      note = {(Best paper award)},
      publisher = {IEEE},
      title = {Accelerating Bowtie2 with a lock-less concurrency approach and memory affinity},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_pdp_bowtieff.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_pdp_bowtieff.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2014.50}
    }

  • M. Aldinucci, G. P. Pezzi, M. Drocco, F. Tordini, P. Kilpatrick, and M. Torquati, “Parallel video denoising on heterogeneous platforms,” in Proc. of Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU), 2014.
    [BibTeX] [Abstract] [Download PDF]

    In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.

    @inproceedings{ff:video:hlpgpu:14,
      abstract = {In this paper, a highly-effective parallel filter for video denoising is presented. The filter is designed using a skeletal approach, and has been implemented by way of the FastFlow parallel programming library. As a result of its high-level design, it is possible to run the filter seamlessly on a multi-core machine, on GPGPU(s), or on both. The design and the implementation of the filter are discussed, and an experimental evaluation is presented. Various mappings of the filtering stages are comparatively discussed.},
      author = {Marco Aldinucci and Guilherme {Peretti Pezzi} and Maurizio Drocco and Fabio Tordini and Peter Kilpatrick and Massimo Torquati},
      booktitle = {Proc. of Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU)},
      date-added = {2013-12-07 18:28:32 +0000},
      date-modified = {2015-09-27 12:42:02 +0000},
      keywords = {fastflow, paraphrase, impact},
      title = {Parallel video denoising on heterogeneous platforms},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_video_denoiser_hlpgpu.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_video_denoiser_hlpgpu.pdf}
    }

  • G. P. Pezzi, E. Vaissié, Y. Viala, D. Caromel, and P. Gourbesville, “Parallel Profiling of Water Distribution Networks Using the Clément Formula,” in 4th European Seminar on Computing, 2014.
    [BibTeX] [Abstract]

    Optimization of water distribution is a crucial issue which has been targeted by many modelling tools. Useful models, implemented several decades ago, need to be updated and implemented in more powerful computing environments. This paper presents the distributed and redesigned version of a legacy hydraulic simulation software written in Fortran (IRMA) that has been used for over 30 years by the Societé du Canal de Provence in order to design and to maintain water distribution networks. IRMA was developed aiming mainly the treatment of irrigation networks — by using the Clément demand model and is now used to manage more than 6.000 km of piped networks. The growing complexity and size of networks requested to redesign the code by using modern tools and language (Java) and also to run distributed simulations by using the ProActive Parallel Suite.

    @inproceedings{pezzi-clement:14,
      abstract = {Optimization of water distribution is a crucial issue which has been targeted by many modelling tools. Useful models, implemented several decades ago, need to be updated and implemented in more powerful computing environments. This paper presents the distributed and redesigned version of a legacy hydraulic simulation software written in Fortran (IRMA) that has been used for over 30 years by the Societ{\'e} du Canal de Provence in order to design and to maintain water distribution networks. IRMA was developed aiming mainly the treatment of irrigation networks -- by using the Cl{\'e}ment demand model and is now used to manage more than 6.000 km of piped networks. The growing complexity and size of networks requested to redesign the code by using modern tools and language (Java) and also to run distributed simulations by using the ProActive Parallel Suite.},
      author = {Guilherme Peretti Pezzi and Evelyne Vaissi{\'e} and Yann Viala and Denis Caromel and Philippe Gourbesville},
      booktitle = {4th European Seminar on Computing},
      date-added = {2014-12-20 15:54:08 +0000},
      date-modified = {2015-09-27 12:44:01 +0000},
      keywords = {impact},
      title = {Parallel Profiling of Water Distribution Networks Using the Cl{\'e}ment Formula},
      year = {2014}
    }

  • M. Aldinucci, C. Calcagno, M. Coppo, F. Damiani, M. Drocco, E. Sciacca, S. Spinella, M. Torquati, and A. Troina, “On designing multicore-aware simulators for systems biology endowed with on-line statistics,” BioMed Research International, 2014. doi:10.1155/2014/207041
    [BibTeX] [Abstract] [Download PDF]

    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.

    @article{cwcsim:ff:multicore:biomed:14,
      abstract = {The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.},
      author = {Marco Aldinucci and Cristina Calcagno and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Eva Sciacca and Salvatore Spinella and Massimo Torquati and Angelo Troina},
      date-added = {2014-06-26 21:30:32 +0000},
      date-modified = {2015-09-27 12:17:05 +0000},
      doi = {10.1155/2014/207041},
      journal = {BioMed Research International},
      keywords = {fastflow,bioinformatics, paraphrase, biobits},
      title = {On designing multicore-aware simulators for systems biology endowed with on-line statistics},
      url = {http://downloads.hindawi.com/journals/bmri/2014/207041.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_ff_cwc_bmri.pdf},
      bdsk-url-2 = {http://downloads.hindawi.com/journals/bmri/2014/207041.pdf},
      bdsk-url-3 = {http://dx.doi.org/10.1155/2014/207041}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, P. Kilpatrick, and M. Torquati, “Design patterns percolating to parallel programming framework implementation,” International Journal of Parallel Programming, vol. 42, iss. 6, pp. 1012-1031, 2014. doi:10.1007/s10766-013-0273-6
    [BibTeX] [Abstract] [Download PDF]

    Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks (RISC-pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized domain specific parallel patterns. We show how an implementation of RISC-pb2l can be realised via the FastFlow framework and present experimental evidence of the feasibility and efficiency of the approach.

    @article{ijpp:patterns:13,
      abstract = {Structured parallel programming is recognised as a viable and effective means of tackling parallel programming problems. Recently, a set of simple and powerful parallel building blocks (RISC-pb2l) has been proposed to support modelling and implementation of parallel frameworks. In this work we demonstrate how that same parallel building block set may be used to model both general purpose parallel programming abstractions, not usually listed in classical skeleton sets, and more specialized domain specific parallel patterns. We show how an implementation of RISC-pb2l can be realised via the FastFlow framework and present experimental evidence of the feasibility and efficiency of the approach.},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Peter Kilpatrick and Massimo Torquati},
      date-added = {2014-12-21 17:47:21 +0000},
      date-modified = {2015-09-27 12:32:37 +0000},
      doi = {10.1007/s10766-013-0273-6},
      issn = {0885-7458},
      journal = {International Journal of Parallel Programming},
      keywords = {fastflow, paraphrase},
      number = {6},
      pages = {1012-1031},
      title = {Design patterns percolating to parallel programming framework implementation},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_ijpp_patterns-web.pdf},
      volume = {42},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_ijpp_patterns.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/s10766-013-0273-6},
      bdsk-url-3 = {http://calvados.di.unipi.it/storage/paper_files/2013_ijpp_patterns-web.pdf}
    }

  • M. Aldinucci, M. Drocco, G. P. Pezzi, C. Misale, F. Tordini, and M. Torquati, “Exercising high-level parallel programming on streams: a systems biology use case,” in Proc. of the 2014 IEEE 34th Intl. Conference on Distributed Computing Systems Workshops (ICDCS), Madrid, Spain, 2014. doi:10.1109/ICDCSW.2014.38
    [BibTeX] [Abstract] [Download PDF]

    The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.

    @inproceedings{cwc:gpu:dcperf:14,
      abstract = {The stochastic modelling of biological systems, cou- pled with Monte Carlo simulation of models, is an increasingly popular technique in Bioinformatics. The simulation-analysis workflow may result into a computationally expensive task reducing the interactivity required in the model tuning. In this work, we advocate high-level software design as a vehicle for building efficient and portable parallel simulators for a variety of platforms, ranging from multi-core platforms to GPGPUs to cloud. In particular, the Calculus of Wrapped Compartments (CWC) parallel simulator for systems biology equipped with on- line mining of results, which is designed according to the FastFlow pattern-based approach, is discussed as a running example. In this work, the CWC simulator is used as a paradigmatic example of a complex C++ application where the quality of results is correlated with both computation and I/O bounds, and where high-quality results might turn into big data. The FastFlow parallel programming framework, which advocates C++ pattern- based parallel programming makes it possible to develop portable parallel code without relinquish neither run-time efficiency nor performance tuning opportunities. Performance and effectiveness of the approach are validated on a variety of platforms, inter-alia cache-coherent multi-cores, cluster of multi-core (Ethernet and Infiniband) and the Amazon Elastic Compute Cloud.},
      address = {Madrid, Spain},
      author = {Marco Aldinucci and Maurizio Drocco and Guilherme {Peretti Pezzi} and Claudia Misale and Fabio Tordini and Massimo Torquati},
      booktitle = {Proc. of the 2014 IEEE 34th Intl. Conference on Distributed Computing Systems Workshops (ICDCS)},
      date-added = {2014-04-19 12:44:39 +0000},
      date-modified = {2015-09-27 12:43:13 +0000},
      doi = {10.1109/ICDCSW.2014.38},
      keywords = {fastflow, gpu, bioinformatics, paraphrase, impact, nvidia},
      publisher = {IEEE},
      title = {Exercising high-level parallel programming on streams: a systems biology use case},
      url = {http://calvados.di.unipi.it/storage/paper_files/2014_dcperf_cwc_gpu.pdf},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2014_dcperf_cwc_gpu.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/ICDCSW.2014.38}
    }

  • M. Aldinucci, S. Ruggieri, and M. Torquati, “Decision Tree Building on Multi-Core using FastFlow,” Concurrency and Computation: Practice and Experience, vol. 26, iss. 3, pp. 800-820, 2014. doi:10.1002/cpe.3063
    [BibTeX] [Abstract] [Download PDF]

    The whole computer hardware industry embraced multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an in-depth study of the parallelisation of an implementation of the C4.5 algorithm for multi-core architectures. We characterise elapsed time lower bounds for the forms of parallelisations adopted, and achieve close to optimal performances. Our implementation is based on the FastFlow parallel programming environment and it requires minimal changes to the original sequential code.

    @article{yadtff:ccpe:13,
      abstract = {The whole computer hardware industry embraced multi-core. The extreme optimisation of sequential algorithms is then no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an in-depth study of the parallelisation of an implementation of the C4.5 algorithm for multi-core architectures. We characterise elapsed time lower bounds for the forms of parallelisations adopted, and achieve close to optimal performances. Our implementation is based on the FastFlow parallel programming environment and it requires minimal changes to the original sequential code.},
      author = {Marco Aldinucci and Salvatore Ruggieri and Massimo Torquati},
      date-added = {2014-12-21 17:46:33 +0000},
      date-modified = {2015-09-27 12:17:52 +0000},
      doi = {10.1002/cpe.3063},
      journal = {Concurrency and Computation: Practice and Experience},
      keywords = {fastflow, paraphrase},
      number = {3},
      pages = {800-820},
      title = {Decision Tree Building on Multi-Core using FastFlow},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_yadtff_ccpe.pdf},
      volume = {26},
      year = {2014},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_yadtff_ccpe.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1002/cpe.3063}
    }

2013

  • M. Drocco, “Parallel stochastic simulators in systems biology: the evolution of the species,” Master Thesis, 2013.
    [BibTeX] [Abstract] [Download PDF]

    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It is often an enlightening technique, especially for multi-stable systems whose dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation/analysis workflow may however result in being computationally expensive, thus compromising the interactivity required especially in model tuning. In this work we discuss the main opportunities to speed up the framework by parallelisation on modern multicore and hybrid multicore and distributed platforms, advocating the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators endowed with on-line statistical analysis. In particular, the Calculus of Wrapped Compartments (CWC) Simulator, which is designed according to the FastFlow’s pattern-based approach, is presented and discussed in this work.

    @mastersthesis{tesi:drocco:13,
      abstract = {The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It is often an enlightening technique, especially for multi-stable systems whose dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation/analysis workflow may however result in being computationally expensive, thus compromising the interactivity required especially in model tuning. In this work we discuss the main opportunities to speed up the framework by parallelisation on modern multicore and hybrid multicore and distributed platforms, advocating the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators endowed with on-line statistical analysis. In particular, the Calculus of Wrapped Compartments (CWC) Simulator, which is designed according to the FastFlow's pattern-based approach, is presented and discussed in this work.},
      author = {Maurizio Drocco},
      date-modified = {2013-11-24 00:29:54 +0000},
      keywords = {fastflow},
      month = jul,
      school = {Computer Science Department, University of Torino, Italy},
      title = {Parallel stochastic simulators in systems biology: the evolution of the species},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_tesi_drocco.pdf},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_tesi_drocco.pdf}
    }

  • M. Aldinucci, F. Tordini, M. Drocco, M. Torquati, and M. Coppo, “Parallel stochastic simulators in system biology: the evolution of the species,” in Proc. of Intl. Euromicro PDP 2013: Parallel Distributed and network-based Processing, Belfast, Nothern Ireland, U.K., 2013. doi:10.1109/PDP.2013.66
    [BibTeX] [Abstract] [Download PDF]

    The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow’s pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.

    @inproceedings{ff_cwc_distr:pdp:13,
      abstract = {The stochastic simulation of biological systems is an increasingly popular technique in Bioinformatics. It is often an enlightening technique, especially for multi-stable systems which dynamics can be hardly captured with ordinary differential equations. To be effective, stochastic simulations should be supported by powerful statistical analysis tools. The simulation-analysis workflow may however result in being computationally expensive, thus compromising the interactivity required in model tuning. In this work we advocate the high-level design of simulators for stochastic systems as a vehicle for building efficient and portable parallel simulators. In particular, the Calculus of Wrapped Components (CWC) simulator, which is designed according to the FastFlow's pattern-based approach, is presented and discussed in this work. FastFlow has been extended to support also clusters of multi-cores with minimal coding effort, assessing the portability of the approach.},
      address = {Belfast, Nothern Ireland, U.K.},
      author = {Marco Aldinucci and Fabio Tordini and Maurizio Drocco and Massimo Torquati and Mario Coppo},
      booktitle = {Proc. of Intl. Euromicro PDP 2013: Parallel Distributed and network-based Processing},
      date-added = {2012-01-20 19:22:15 +0100},
      date-modified = {2013-11-24 00:30:43 +0000},
      doi = {10.1109/PDP.2013.66},
      keywords = {fastflow, bioinformatics},
      month = feb,
      publisher = {IEEE},
      title = {Parallel stochastic simulators in system biology: the evolution of the species},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_cwc_d_PDP.pdf},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_cwc_d_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2013.66}
    }

  • K. Hammond, M. Aldinucci, C. Brown, F. Cesarini, M. Danelutto, H. González-Vélez, P. Kilpatrick, R. Keller, M. Rossbory, and G. Shainer, “The ParaPhrase Project: Parallel Patterns for Adaptive Heterogeneous Multicore Systems,” in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures, B. Beckert, F. Damiani, F. S. de Boer, and M. M. Bonsangue, Eds., Springer, 2013, vol. 7542, pp. 218-236. doi:10.1007/978-3-642-35887-6_12
    [BibTeX] [Abstract] [Download PDF]

    This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation.

    @incollection{paraphrase:fmco:11,
      abstract = {This paper describes the ParaPhrase project, a new 3-year targeted research project funded under EU Framework 7 Objective 3.4 (Computer Systems), starting in October 2011. ParaPhrase aims to follow a new approach to introducing parallelism using advanced refactoring techniques coupled with high-level parallel design patterns. The refactoring approach will use these design patterns to restructure programs defined as networks of software components into other forms that are more suited to parallel execution. The programmer will be aided by high-level cost information that will be integrated into the refactoring tools. The implementation of these patterns will then use a well-understood algorithmic skeleton approach to achieve good parallelism. A key ParaPhrase design goal is that parallel components are intended to match heterogeneous architectures, defined in terms of CPU/GPU combinations, for example. In order to achieve this, the ParaPhrase approach will map components at link time to the
    available hardware, and will then re-map them during program execution, taking account of multiple applications, changes in hardware resource availability, the desire to reduce communication costs etc. In this way, we aim to develop a new approach to programming that will be able to produce software that can adapt to dynamic changes in the system environment. Moreover, by using a strong component basis for parallelism, we can achieve potentially significant gains in terms of reducing sharing at a high level of abstraction, and so in reducing or even eliminating the costs that are usually associated with cache management, locking, and synchronisation.},
      author = {Kevin Hammond and Marco Aldinucci and Chris Brown and Francesco Cesarini and Marco Danelutto and Horacio Gonz\'alez-V\'elez and Peter Kilpatrick and Rainer Keller and Michael Rossbory and Gilad Shainer},
      booktitle = {Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures},
      date-added = {2012-06-04 19:21:18 +0200},
      date-modified = {2013-11-24 00:33:27 +0000},
      doi = {10.1007/978-3-642-35887-6_12},
      editor = {Bernhard Beckert and Ferruccio Damiani and Frank S. de Boer and Marcello M. Bonsangue},
      isbn = {978-3-642-35886-9},
      keywords = {paraphrase},
      pages = {218-236},
      publisher = {Springer},
      series = {LNCS},
      title = {The ParaPhrase Project: Parallel Patterns for Adaptive Heterogeneous Multicore Systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_paraphrase.pdf},
      volume = {7542},
      year = {2013},
      bdsk-url-1 = {http://dx.doi.org/10.1007/978-3-642-35887-6_12},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_paraphrase.pdf}
    }

  • M. Aldinucci, S. Campa, F. Tordini, M. Torquati, and P. Kilpatrick, “An abstract annotation model for skeletons,” in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures, B. Beckert, F. Damiani, F. S. de Boer, and M. M. Bonsangue, Eds., Springer, 2013, vol. 7542, pp. 257-276. doi:10.1007/978-3-642-35887-6_14
    [BibTeX] [Abstract] [Download PDF]

    Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code.

    @incollection{toolchain:fmco:11,
      abstract = {Multi-core and many-core platforms are becoming increasingly heterogeneous and asymmetric. This significantly increases the porting and tuning effort required for parallel codes, which in turn often leads to a growing gap between peak machine power and actual application performance. In this work a first step toward the automated optimization of high level skeleton-based parallel code is discussed. The paper presents an abstract annotation model for skeleton programs aimed at formally describing suitable mapping of parallel activities on a high-level platform representation. The derived mapping and scheduling strategies are used to generate optimized run-time code.},
      author = {Marco Aldinucci and Sonia Campa and Fabio Tordini and Massimo Torquati and Peter Kilpatrick},
      booktitle = {Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures},
      date-added = {2012-06-04 19:23:25 +0200},
      date-modified = {2013-11-24 00:33:41 +0000},
      doi = {10.1007/978-3-642-35887-6_14},
      editor = {Bernhard Beckert and Ferruccio Damiani and Frank S. de Boer and Marcello M. Bonsangue},
      isbn = {978-3-642-35886-9},
      keywords = {fastflow, paraphrase},
      pages = {257-276},
      publisher = {Springer},
      series = {LNCS},
      title = {An abstract annotation model for skeletons},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_annotation.pdf},
      volume = {7542},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_annotation.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-35887-6_14}
    }

  • M. Aldinucci, S. Campa, P. Kilpatrick, and M. Torquati, “Structured Data Access Annotations for Massively Parallel Computations,” in Euro-Par 2012 Workshops, Proc. of the ParaPhrase Workshop on Parallel Processing, 2013, pp. 381-390. doi:10.1007/978-3-642-36949-0_42
    [BibTeX] [Abstract] [Download PDF]

    We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skele-ton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel com-putation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases,cost models are used to drive the refactoring process. We show howsample use case applications/kernels may be optimized and discuss pre-liminary experiments with FastFlow assessing the theoretical results.

    @inproceedings{annotation:para:12,
      abstract = {We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skele-ton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel com-putation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases,cost models are used to drive the refactoring process. We show howsample use case applications/kernels may be optimized and discuss pre-liminary experiments with FastFlow assessing the theoretical results.},
      author = {Marco Aldinucci and Sonia Campa and Peter Kilpatrick and Massimo Torquati},
      booktitle = {Euro-Par 2012 Workshops, Proc. of the ParaPhrase Workshop on Parallel Processing},
      date-added = {2012-07-23 21:22:03 +0000},
      date-modified = {2015-09-27 12:49:52 +0000},
      doi = {10.1007/978-3-642-36949-0_42},
      keywords = {fastflow, paraphrase},
      pages = {381-390},
      publisher = {Springer},
      series = {LNCS},
      title = {Structured Data Access Annotations for Massively Parallel Computations},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_annot_europar_workshops.pdf},
      volume = {7640},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_annot_europar_workshops.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-36949-0_42}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, C. Montangero, and L. Semini, “Managing Adaptivity in Parallel Systems,” in Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures, B. Beckert, F. Damiani, F. S. de Boer, and M. M. Bonsangue, Eds., Springer, 2013, vol. 7542, pp. 199-217. doi:10.1007/978-3-642-35887-6_11
    [BibTeX] [Abstract] [Download PDF]

    The management of non-functional features (performance, security, power management, etc.) is traditionally a difficult, error prone task for programmers of parallel applications. To take care of these non-functional features, autonomic managers running policies represented as rules using sensors and actuators to monitor and transform a running parallel application may be used. We discuss an approach aimed at providing formal tool support to the integration of independently developed autonomic managers taking care of different non-functional concerns within the same parallel application. Our approach builds on the Behavioural Skeleton experience (autonomic management of non-functional features in structured parallel applications) and on previous results on conflict detection and resolution in rule-based systems.

    @incollection{adaptivity:fmco:11,
      abstract = {The management of non-functional features (performance, security, power management, etc.) is traditionally a difficult, error prone task for programmers of parallel applications. To take care of these non-functional features, autonomic managers running policies represented as rules using sensors and actuators to monitor and transform a running parallel application may be used. We discuss an approach aimed at providing formal tool support to the integration of independently developed autonomic managers taking care of different non-functional concerns within the same parallel application. Our approach builds on the Behavioural Skeleton experience (autonomic management of non-functional features in structured parallel applications) and on previous results on conflict detection and resolution in rule-based systems.},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Carlo Montangero and Laura Semini},
      booktitle = {Formal Methods for Components and Objects: Intl. Symposium, FMCO 2011, Torino, Italy, October 3-5, 2011, Revised Invited Lectures},
      date-added = {2012-06-04 19:05:16 +0200},
      date-modified = {2016-08-19 21:44:58 +0000},
      doi = {10.1007/978-3-642-35887-6_11},
      editor = {Bernhard Beckert and Ferruccio Damiani and Frank S. de Boer and Marcello M. Bonsangue},
      isbn = {978-3-642-35886-9},
      keywords = {distributed, paraphrase},
      pages = {199-217},
      publisher = {Springer},
      series = {LNCS},
      title = {Managing Adaptivity in Parallel Systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_adaptivity.pdf},
      volume = {7542},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_fmco11_adaptivity.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-35887-6_11}
    }

  • C. Misale, M. Aldinucci, and M. Torquati, “Memory affinity in multi-threading: the Bowtie2 case study,” in Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) — Poster Abstracts, Fiuggi, Italy, 2013.
    [BibTeX] [Abstract] [Download PDF]

    The diffusion of the Next Generation Sequencing (NGS) has increased the amount of data obtainable by genomic experiments. From a DNA sample a NGS run is able to produce millions of short sequences (called reads), which should be mapped into a reference genome. In this paper, we analyse the performance of Bowtie2, a fast and popular DNA mapping tool. Bowtie2 exhibits a multithreading implementation on top of pthreads, spin-locks and SSE2 SIMD extension. From parallel computing viewpoint, is a paradigmatic example of a software requiring to address three fundamental problems in shared-memory programming for cache-coherent multi-core platforms: synchronisation efficiency at very fine grain (due to short reads), load-balancing (due to long reads), and efficient usage of memory subsystem (due to SSE2 memory pressure). We compare the original implementation against an alternative implementation on top of the FastFlow pattern-based programming framework. The proposed design exploits the high-level farm pattern of FastFlow, which is implemented top of nonblocking multi-threading and lock-less (CAS-free) queues, and provides the programmer with high-level mechanism to tune task scheduling to achieve both load-balancing and memory affinity. The proposed design, despite the high-level design, is always faster and more scalable with respect to the original one. The design of both original and alternative version will be presented along with their experimental evaluation on real-world data sets.

    @inproceedings{ff:acaces:13,
      abstract = {The diffusion of the Next Generation Sequencing (NGS) has increased
    the amount of data obtainable by genomic experiments. From a DNA sample a NGS run is able to produce millions of short sequences (called reads), which should be mapped into a reference genome. In this paper, we analyse the performance of Bowtie2, a fast and popular DNA mapping tool. Bowtie2 exhibits a multithreading implementation on top of pthreads, spin-locks and SSE2 SIMD extension.
    From parallel computing viewpoint, is a paradigmatic example of a software requiring to address three
    fundamental problems in shared-memory programming for cache-coherent multi-core platforms: synchronisation efficiency at very fine grain (due to short reads), load-balancing (due to long reads), and efficient usage of memory subsystem (due to SSE2 memory pressure).
    We compare the original implementation against an alternative implementation on top of the
    FastFlow pattern-based programming framework. The proposed design exploits the high-level farm pattern of FastFlow, which is implemented top of nonblocking multi-threading and lock-less (CAS-free) queues, and provides the programmer with high-level mechanism to tune task scheduling to achieve both load-balancing and memory affinity. The proposed design, despite the high-level  design, is always faster and more scalable with respect to the original one.
    The design of both original and alternative version will be presented along with their experimental evaluation on real-world data sets.},
      address = {Fiuggi, Italy},
      author = {Claudia Misale and Marco Aldinucci and Massimo Torquati},
      booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
      date-added = {2015-03-21 15:12:59 +0000},
      date-modified = {2015-03-21 15:12:59 +0000},
      isbn = {9789038221908},
      keywords = {fastflow},
      publisher = {HiPEAC},
      title = {Memory affinity in multi-threading: the Bowtie2 case study},
      url = {http://calvados.di.unipi.it/storage/paper_files/2013_ACACES_ex-abstract.pdf},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2013_ACACES_ex-abstract.pdf}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, P. Kilpatrick, and M. Torquati, “Targeting Distributed Systems in FastFlow,” in Euro-Par 2012 Workshops, Proc. of the CoreGrid Workshop on Grids, Clouds and P2P Computing, 2013, pp. 47-56. doi:10.1007/978-3-642-36949-0_7
    [BibTeX] [Abstract] [Download PDF]

    FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting a network of multi-core workstation as well. The extension supports the execution of FastFlow programs by coordinating — in a structured way — the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes.

    @inproceedings{ff:distr:cgs:12,
      abstract = {FastFlow is a structured parallel programming framework targeting shared memory multi-core architectures. In this paper we introduce a FastFlow extension aimed at supporting a network of multi-core workstation as well. The extension supports the execution of FastFlow programs by coordinating -- in a structured way -- the fine grain parallel activities running on a single workstation. We discuss the design and the implementation of this extension presenting preliminary experimental results validating it on state-of-the-art networked multi-core nodes.},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Peter Kilpatrick and Massimo Torquati},
      booktitle = {Euro-Par 2012 Workshops, Proc. of the CoreGrid Workshop on Grids, Clouds and P2P Computing},
      date-added = {2012-07-23 21:22:03 +0000},
      date-modified = {2015-09-27 12:47:54 +0000},
      doi = {10.1007/978-3-642-36949-0_7},
      keywords = {fastflow, paraphrase},
      pages = {47-56},
      publisher = {Springer},
      series = {LNCS},
      title = {Targeting Distributed Systems in FastFlow},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_distr_ff_cgsymph.pdf},
      volume = {7640},
      year = {2013},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_distr_ff_cgsymph.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-36949-0_7}
    }

2012

  • M. Aldinucci, C. Spampinato, M. Drocco, M. Torquati, and S. Palazzo, “A Parallel Edge Preserving Algorithm for Salt and Pepper Image Denoising,” in Proc. of 2nd Intl. Conference on Image Processing Theory Tools and Applications (IPTA), Istambul, Turkey, 2012, pp. 97-102. doi:10.1109/IPTA.2012.6469567
    [BibTeX] [Abstract] [Download PDF]

    In this paper a two-phase filter for removing “salt and pepper” noise is proposed. In the first phase, an adaptive median filter is used to identify the set of the noisy pixels; in the second phase, these pixels are restored according to a regularization method, which contains a data-fidelity term reflecting the impulse noise characteristics. The algorithm, which exhibits good performance both in denoising and in restoration, can be easily and effectively parallelized to exploit the full power of multi-core CPUs and GPGPUs; the proposed implementation based on the FastFlow library achieves both close-to-ideal speedup and very good wall-clock execution figures.

    @inproceedings{denoiser:ff:ipta:12,
      abstract = {In this paper a two-phase filter for removing ``salt and pepper'' noise is proposed. In the first phase, an adaptive median filter is used to identify the set of the noisy pixels; in the second phase, these pixels are restored according to a regularization method, which contains a data-fidelity term reflecting the impulse noise characteristics. The algorithm, which exhibits good performance both in denoising and in restoration, can be easily and effectively parallelized to exploit the full power of multi-core CPUs and GPGPUs; the proposed implementation based on the FastFlow library achieves both close-to-ideal speedup and very good wall-clock execution figures.},
      address = {Istambul, Turkey},
      author = {Marco Aldinucci and Concetto Spampinato and Maurizio Drocco and Massimo Torquati and Simone Palazzo},
      booktitle = {Proc. of 2nd Intl. Conference on Image Processing Theory Tools and Applications (IPTA)},
      date-added = {2012-06-04 18:38:01 +0200},
      date-modified = {2015-09-27 12:53:53 +0000},
      doi = {10.1109/IPTA.2012.6469567},
      editor = {K. Djemal and M. Deriche and W. Puech and Osman N. Ucan},
      isbn = {978-1-4673-2582-0},
      keywords = {fastflow, impact},
      month = oct,
      pages = {97-102},
      publisher = {IEEE},
      title = {A Parallel Edge Preserving Algorithm for Salt and Pepper Image Denoising},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_2phasedenoiser_ff_ipta.pdf},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_2phasedenoiser_ff_ipta.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/IPTA.2012.6469567}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati, “An Efficient Unbounded Lock-Free Queue for Multi-core Systems,” in Proc. of 18th Intl. Euro-Par 2012 Parallel Processing, Rhodes Island, Greece, 2012, pp. 662-673. doi:10.1007/978-3-642-32820-6_65
    [BibTeX] [Abstract] [Download PDF]

    The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications.

    @inproceedings{ff:spsc:europar:12,
      abstract = {The use of efficient synchronization mechanisms is crucial for implementing fine grained parallel programs on modern shared cache multi-core architectures. In this paper we study this problem by considering Single-Producer/Single-Consumer (SPSC) coordination using unbounded queues. A novel unbounded SPSC algorithm capable of reducing the row synchronization latency and speeding up Producer-Consumer coordination is presented. The algorithm has been extensively tested on a shared-cache multi-core platform and a sketch proof of correctness is presented. The queues proposed have been used as basic building blocks to implement the FastFlow parallel framework, which has been demonstrated to offer very good performance for fine-grain parallel applications.},
      address = {Rhodes Island, Greece},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Massimiliano Meneghin and Massimo Torquati},
      booktitle = {Proc. of 18th Intl. Euro-Par 2012 Parallel Processing},
      date-added = {2011-04-19 10:22:00 +0200},
      date-modified = {2015-09-27 12:55:20 +0000},
      doi = {10.1007/978-3-642-32820-6_65},
      keywords = {fastflow, paraphrase},
      month = aug,
      pages = {662-673},
      publisher = {Springer},
      series = {LNCS},
      title = {An Efficient Unbounded Lock-Free Queue for Multi-core Systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_spsc_europar.pdf},
      volume = {7484},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_spsc_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-32820-6_65}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, and M. Torquati, “Targeting heterogeneous architectures via macro data flow,” Parallel Processing Letters, vol. 22, iss. 2, 2012. doi:10.1142/S0129626412400063
    [BibTeX] [Abstract] [Download PDF]

    We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more “domain specific” programming models. Experimental results demonstrating the feasibility of the approach are presented.

    @article{mdf:hplgpu:ppl:12,
      abstract = {We propose a data flow based run time system as an efficient tool for supporting execution of parallel code on heterogeneous architectures hosting both multicore CPUs and GPUs. We discuss how the proposed run time system may be the target of both structured parallel applications developed using algorithmic skeletons/parallel design patterns and also more ``domain specific'' programming models. Experimental results demonstrating the feasibility of the approach are presented.},
      annote = {Extended version of Intl. Workshop on High-level Programming for Heterogeneous and Hierarchical Parallel Systems (HLPGPU)},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Massimo Torquati},
      date-added = {2012-04-25 13:20:40 +0000},
      date-modified = {2015-09-27 12:55:11 +0000},
      doi = {10.1142/S0129626412400063},
      issn = {0129-6264},
      journal = {Parallel Processing Letters},
      keywords = {fastflow, paraphrase},
      month = jun,
      number = {2},
      title = {Targeting heterogeneous architectures via macro data flow},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_mdf_PPL-hplgpu.pdf},
      volume = {22},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_mdf_PPL-hplgpu.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626412400063}
    }

  • M. Aldinucci, M. Danelutto, L. Anardu, M. Torquati, and P. Kilpatrick, “Parallel patterns + Macro Data Flow for multi-core programming,” in Proc. of Intl. Euromicro PDP 2012: Parallel Distributed and network-based Processing, Garching, Germany, 2012, pp. 27-36. doi:10.1109/PDP.2012.44
    [BibTeX] [Abstract] [Download PDF]

    Data flow techniques have been around since the early ’70s when they were used in compilers for sequential languages. Shortly after their introduction they were also considered as a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow “macro” instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting “object” code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented.

    @inproceedings{dataflow:pdp:12,
      abstract = {Data flow techniques have been around since the early '70s when they were used in compilers for sequential languages. Shortly after their introduction they were also considered as a possible model for parallel computing, although the impact here was limited. Recently, however, data flow has been identified as a candidate for efficient implementation of various programming models on multi-core architectures. In most cases, however, the burden of determining data flow ``macro'' instructions is left to the programmer, while the compiler/run time system manages only the efficient scheduling of these instructions. We discuss a structured parallel programming approach supporting automatic compilation of programs to macro data flow and we show experimental results demonstrating the feasibility of the approach and the efficiency of the resulting ``object'' code on different classes of state-of-the-art multi-core architectures. The experimental results use different base mechanisms to implement the
    macro data flow run time support, from plain pthreads with condition variables to more modern and effective lock- and fence-free parallel frameworks. Experimental results comparing efficiency of the proposed approach with those achieved using other, more classical, parallel frameworks are also presented.},
      address = {Garching, Germany},
      author = {Marco Aldinucci and Marco Danelutto and Lorenzo Anardu and Massimo Torquati and Peter Kilpatrick},
      booktitle = {Proc. of Intl. Euromicro PDP 2012: Parallel Distributed and network-based Processing},
      date-added = {2012-10-24 17:29:14 +0000},
      date-modified = {2013-11-24 00:35:34 +0000},
      doi = {10.1109/PDP.2012.44},
      keywords = {fastflow},
      month = feb,
      pages = {27-36},
      publisher = {IEEE},
      title = {Parallel patterns + Macro Data Flow for multi-core programming},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_mdf_PDP.pdf},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_mdf_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2012.44}
    }

  • T. Weigold, M. Aldinucci, M. Danelutto, and V. Getov, “Process-Driven Biometric Identification by means of Autonomic Grid Components,” Int. J. of Autonomous and Adaptive Communications Systems, vol. 5, iss. 3, pp. 274-291, 2012. doi:10.1504/IJAACS.2012.047659
    [BibTeX] [Abstract] [Download PDF]

    Today’s business applications are increasingly process driven, meaning that the main application logic is executed by a dedicate process engine. In addition, component-oriented software development has been attracting attention for building complex distributed applications. In this paper we present the experiences gained from building a process-driven biometric identification application that makes use of Grid infrastructures via the Grid Component Model (GCM). GCM, besides guaranteeing access to Grid resources, supports autonomic management of notable parallel composite components. This feature is exploited within our biometric identification application to ensure real time identification of fingerprints. Therefore, we briefly introduce the GCM framework and the process engine used, and we describe the implementation of the application by means of autonomic GCM components. Finally, we summarize the results, experiences, and lessons learned focusing on the integration of autonomic GCM components and the process-driven approach.

    @article{ibm:ijaacs:12,
      abstract = {Today's business applications are increasingly process driven, meaning that the main application logic is executed by a dedicate process engine. In addition, component-oriented software development has been attracting attention for building complex distributed applications. In this paper we present the experiences gained from building a process-driven biometric identification application that makes use of Grid infrastructures via the Grid Component Model (GCM). GCM, besides guaranteeing access to Grid resources, supports autonomic management of notable parallel composite components. This feature is exploited within our biometric identification application to ensure real time identification of fingerprints. Therefore, we briefly introduce the GCM framework and the process engine used, and we describe the implementation of the application by means of autonomic GCM components. Finally, we summarize the results, experiences, and lessons learned focusing on the integration of autonomic GCM components
    and the process-driven approach.},
      author = {Thomas Weigold and Marco Aldinucci and Marco Danelutto and Vladimir Getov},
      date-added = {2009-08-01 21:01:36 +0200},
      date-modified = {2013-06-17 14:14:36 +0000},
      doi = {10.1504/IJAACS.2012.047659},
      issn = {1754-8632},
      journal = {Int. J. of Autonomous and Adaptive Communications Systems},
      number = {3},
      pages = {274-291},
      publisher = {Inderscience Enterprises Ltd.},
      title = {Process-Driven Biometric Identification by means of Autonomic Grid Components},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_JAACS_Weigold.pdf},
      volume = {5},
      year = {2012},
      bdsk-url-1 = {http://www.inderscience.com/info/inarticletoc.php?jcode=ijaacs&year=2012&vol=5&issue=3},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2012_JAACS_Weigold.pdf},
      bdsk-url-3 = {http://dx.doi.org/10.1504/IJAACS.2012.047659}
    }

  • F. Tordini, M. Aldinucci, and M. Torquati, “High-level lock-less programming for multicore,” in Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) — Poster Abstracts, Fiuggi, Italy, 2012.
    [BibTeX] [Abstract] [Download PDF]

    Modern computers are built upon multi-core architectures. Achieving peak performance on these architectures is hard and may require a substantial programming effort. The synchronisation of many processes racing to access a common resource (the shared memory) has been a fundamental problem on parallel computing for years, and many solutions have been proposed to address this issue. Non-blocking synchronisation and transactional primitives have been envisioned as a way to reduce memory wall problem. Despite sometimes effective (and exhibiting a great momentum in the research community), they are only one facet of the problem, as their exploitation still requires non-trivial programming skills. With non-blocking philosophy in mind, we propose high-level programming patterns that will relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance, like proper (distributed) dynamic memory allocation and effective exploitation of the memory hierarchy.

    @inproceedings{ff:acaces:12,
      abstract = {Modern computers are built upon multi-core architectures. Achieving peak performance on these architectures is hard and may require a substantial programming effort. The synchronisation of many processes racing to access a common resource (the shared memory) has been a fundamental problem on parallel computing for years, and many solutions have been proposed to address this issue. Non-blocking synchronisation and transactional primitives have been envisioned as a way to reduce memory wall problem. Despite sometimes effective (and exhibiting a great momentum in the research community), they are only one facet of the problem, as their exploitation still requires non-trivial programming skills.
    With non-blocking philosophy in mind, we propose high-level programming patterns that will relieve the programmer from worrying about low-level details such as synchronisation of racing processes as well as those fine tunings needed to improve the overall performance, like proper (distributed) dynamic memory allocation and effective exploitation of the memory hierarchy.},
      address = {Fiuggi, Italy},
      author = {Fabio Tordini and Marco Aldinucci and Massimo Torquati},
      booktitle = {Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (ACACES) -- Poster Abstracts},
      date-added = {2012-07-17 17:58:06 +0200},
      date-modified = {2013-11-24 00:36:10 +0000},
      isbn = {9789038219875},
      keywords = {fastflow},
      publisher = {HiPEAC},
      title = {High-level lock-less programming for multicore},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_ACACES_ex-abstract.pdf},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_ACACES_ex-abstract.pdf}
    }

  • M. Coppo, F. Damiani, M. Drocco, E. Grassi, E. Sciacca, S. Spinella, and A. Troina, “Simulation techniques for the calculus of wrapped compartments,” Theoretical Computer Science, vol. 431, pp. 75-95, 2012. doi:10.1016/j.tcs.2011.12.063
    [BibTeX] [Abstract]

    The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of Ordinary Differential Equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyse the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in the Calculus of Wrapped Compartments (CWC), a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.

    @article{DBLP:journals/tcs/CoppoDDGSST12,
      abstract = {The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of Ordinary Differential Equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyse the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in the Calculus of Wrapped Compartments (CWC), a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.},
      author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Eva Sciacca and Salvatore Spinella and Angelo Troina},
      bibsource = {DBLP, http://dblp.uni-trier.de},
      date-added = {2013-12-12 22:28:07 +0000},
      date-modified = {2013-12-13 10:37:47 +0000},
      doi = {10.1016/j.tcs.2011.12.063},
      ee = {http://dx.doi.org/10.1016/j.tcs.2011.12.063},
      journal = {Theoretical Computer Science},
      pages = {75-95},
      title = {Simulation techniques for the calculus of wrapped compartments},
      volume = {431},
      year = {2012},
      bdsk-url-1 = {http://dx.doi.org/10.1016/j.tcs.2011.12.063}
    }

  • M. Aldinucci, M. Coppo, F. Damiani, M. Drocco, E. Sciacca, S. Spinella, M. Torquati, and A. Troina, “On Parallelizing On-Line Statistics for Stochastic Biological Simulations,” in Euro-Par 2011 Workshops, Proc. of the 2st Workshop on High Performance Bioinformatics and Biomedicine (HiBB), Bordeaux, France, 2012, pp. 3-12. doi:10.1007/978-3-642-29740-3_2
    [BibTeX] [Abstract] [Download PDF]

    This work concerns a general technique to enrich parallel version of stochastic simulators for biological systems with tools for on-line statistical analysis of the results. In particular, within the FastFlow parallel programming framework, we describe the methodology and the implementation of a parallel Monte Carlo simulation infrastructure extended with user-defined on-line data filtering and mining functions. The simulator and the on-line analysis were validated on large multi-core platforms and representative proof-of-concept biological systems.

    @inproceedings{cwcsim:onlinestats:ff:hibb:11,
      abstract = {This work concerns a general technique to enrich parallel version of stochastic simulators for biological systems with tools for on-line statistical analysis of the results. In particular, within the FastFlow parallel programming framework, we describe the methodology and the implementation of a parallel Monte Carlo simulation infrastructure extended with user-defined on-line data filtering and mining functions. The simulator and the on-line analysis were validated on large multi-core platforms and representative proof-of-concept biological systems.},
      address = {Bordeaux, France},
      author = {Marco Aldinucci and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Eva Sciacca and Salvatore Spinella and Massimo Torquati and Angelo Troina},
      booktitle = {Euro-Par 2011 Workshops, Proc. of the 2st Workshop on High Performance Bioinformatics and Biomedicine (HiBB)},
      date-added = {2010-08-15 00:50:09 +0200},
      date-modified = {2013-11-24 00:35:51 +0000},
      doi = {10.1007/978-3-642-29740-3_2},
      editor = {Michael Alexander and Pasqua D'Ambra and Adam Belloum and George Bosilca and Mario Cannataro and Marco Danelutto and Beniamino Di Martino and Michael Gerndt and Emmanuel Jeannot and Raymond Namyst and Jean Roman and Stephen L. Scott and Jesper Larsson Tr{\"a}ff and Geoffroy Vall{\'e}e and Josef Weidendorfer},
      keywords = {bioinformatics, fastflow},
      pages = {3-12},
      publisher = {Springer},
      series = {LNCS},
      title = {On Parallelizing On-Line Statistics for Stochastic Biological Simulations},
      url = {http://calvados.di.unipi.it/storage/paper_files/2012_onlinestat_HiBB2011.pdf},
      volume = {7156},
      year = {2012},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2012_onlinestat_HiBB2011.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-29740-3_2}
    }

2011

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, and V. Xhagjika, “LIBERO: a framework for autonomic management of multiple non-functional concerns,” in Euro-Par 2010 Workshops, Proc. of the CoreGrid Workshop on Grids, Clouds and P2P Computing, Ischia, Italy, 2011, pp. 237-245. doi:10.1007/978-3-642-21878-1_30
    [BibTeX] [Abstract] [Download PDF]

    We describe a lightweight prototype framework (LIBERO) designed for experimentation with behavioural skeletons-components implementing a well-known parallelism exploitation pattern and a rule-based autonomic manager taking care of some non-functional feature related to pattern computation. LIBERO supports multiple autonomic managers within the same behavioural skeleton, each taking care of a different non-functional concern. We introduce LIBERO — built on plain Java and JBoss — and discuss how multiple managers may be coordinated to achieve a common goal using a two-phase coordination protocol developed in earlier work. We present experimental results that demonstrate how the prototype may be used to investigate autonomic management of multiple, independent concerns.

    @inproceedings{libero:cgsymph:10,
      abstract = {We describe a lightweight prototype framework (LIBERO) designed for experimentation with behavioural skeletons-components implementing a well-known parallelism exploitation pattern and a rule-based autonomic manager taking care of some non-functional feature related to pattern computation. LIBERO supports multiple autonomic managers within the same behavioural skeleton, each taking care of a different non-functional concern. We introduce LIBERO -- built on plain Java and JBoss -- and discuss how multiple managers may be coordinated to achieve a common goal using a two-phase coordination protocol developed in earlier work. We present experimental results that demonstrate how the prototype may be used to investigate autonomic management of multiple, independent concerns.},
      address = {Ischia, Italy},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Vamir Xhagjika},
      booktitle = {Euro-Par 2010 Workshops, Proc. of the CoreGrid Workshop on Grids, Clouds and P2P Computing},
      date-added = {2011-09-12 14:58:27 +0200},
      date-modified = {2012-12-27 14:26:15 +0000},
      doi = {10.1007/978-3-642-21878-1_30},
      editor = {M. R. Guarracino and F. Vivien and J. L. Tr\"aff and M. Cannataro and M. Danelutto and A. Hast and F. Perla and A. Kn\"upfer and B. Di Martino and M. Alexander},
      month = sep,
      pages = {237-245},
      publisher = {Springer},
      series = {LNCS},
      title = {LIBERO: a framework for autonomic management of multiple non-functional concerns},
      url = {http://calvados.di.unipi.it/storage/paper_files/2011_libero_coregridworkshop2010.pdf},
      volume = {6586},
      year = {2011},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2011_libero_coregridworkshop2010.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-21878-1_30}
    }

  • C. Calcagno, M. Coppo, F. Damiani, M. Drocco, E. Sciacca, S. Spinella, and A. Troina, “Modelling Spatial Interactions in the Arbuscular Mycorrhizal Symbiosis using the Calculus of Wrapped Compartments,” in Proc. of Third International Workshop on Computational Models for Cell Processes (CompMod), Aachen, Germany, 2011, pp. 3-18.
    [BibTeX] [Abstract]

    Arbuscular mycorrhiza (AM) is the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In previous work we used the Calculus of Wrapped Compartments (CWC) to investigate different phases of the AM symbiosis. In this paper, we continue this line of research by modelling the colonisation of the plant root cells by the fungal hyphae spreading in the soil. This study requires the description of some spatial interaction. Although CWC has no explicit feature modelling a spatial geometry, the compartment labelling feature can be effectively exploited to define a discrete surface topology outlining the relevant sectors which determine the spatial properties of the system under consideration. Different situations and interesting spatial properties can be modelled and analysed in such a lightweight framework (which has not an explicit notion of geometry with coordinates and spatial metrics), thus exploiting the existing CWC simulation tool.

    @inproceedings{DBLP:journals/corr/abs-1109-1363,
      abstract = {Arbuscular mycorrhiza (AM) is the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In previous work we used the Calculus of Wrapped Compartments (CWC) to investigate different phases of the AM symbiosis. In this paper, we continue this line of research by modelling the colonisation of the plant root cells by the fungal hyphae spreading in the soil. This study requires the description of some spatial interaction. Although CWC has no explicit feature modelling a spatial geometry, the compartment labelling feature can be effectively exploited to define a discrete surface topology outlining the relevant sectors which determine the spatial properties of the system under consideration. Different situations and interesting spatial properties can be modelled and analysed in such a lightweight framework (which has not an explicit notion of geometry with coordinates and spatial metrics), thus exploiting the existing CWC simulation tool.},
      address = {Aachen, Germany},
      author = {Cristina Calcagno and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Eva Sciacca and Salvatore Spinella and Angelo Troina},
      bibsource = {DBLP, http://dblp.uni-trier.de},
      booktitle = {Proc. of Third International Workshop on Computational Models for Cell Processes (CompMod)},
      date-added = {2013-12-12 22:25:03 +0000},
      date-modified = {2013-12-13 10:32:18 +0000},
      editor = {Ion Petre and Erik P. de Vink},
      ee = {http://dx.doi.org/10.4204/EPTCS.67.3},
      month = sep,
      pages = {3-18},
      series = {EPTCS},
      title = {Modelling Spatial Interactions in the Arbuscular Mycorrhizal Symbiosis using the Calculus of Wrapped Compartments},
      volume = {67},
      year = {2011}
    }

  • M. Aldinucci, A. Bracciali, P. LiÒ, A. Sorathiya, and M. Torquati, “StochKit-FF: Efficient Systems Biology on Multicore Architectures,” in Euro-Par 2010 Workshops, Proc. of the 1st Workshop on High Performance Bioinformatics and Biomedicine (HiBB), Ischia, Italy, 2011, pp. 167-175. doi:10.1007/978-3-642-21878-1_21
    [BibTeX] [Abstract] [Download PDF]

    The stochastic modelling of biological systems is an informative, and in some cases, very adequate technique, which may however result in being more expensive than other modelling approaches, such as differential equations. We present StochKit-FF, a parallel version of StochKit, a reference toolkit for stochastic simulations. StochKit-FF is based on the FastFlow programming toolkit for multicores and exploits the novel concept of selective memory. We experiment StochKit-FF on a model of HIV infection dynamics, with the aim of extracting information from efficiently run experiments, here in terms of average and variance and, on a longer term, of more structured data.

    @inproceedings{stochkit-ff:hibb:10,
      abstract = {The stochastic modelling of biological systems is an informative, and in some cases, very adequate technique, which may however result in being more expensive than other modelling approaches, such as differential equations. We present StochKit-FF, a parallel version of StochKit, a reference toolkit for stochastic simulations. StochKit-FF is based on the FastFlow programming toolkit for multicores and exploits the novel concept of selective memory. We experiment StochKit-FF on a model of HIV infection dynamics, with the aim of extracting information from efficiently run experiments, here in terms of average and variance and, on a longer term, of more structured data.},
      address = {Ischia, Italy},
      author = {Marco Aldinucci and Andrea Bracciali and Pietro Li\`o and Anil Sorathiya and Massimo Torquati},
      booktitle = {Euro-Par 2010 Workshops, Proc. of the 1st Workshop on High Performance Bioinformatics and Biomedicine (HiBB)},
      date-added = {2012-04-12 11:23:46 +0000},
      date-modified = {2013-11-24 00:36:38 +0000},
      doi = {10.1007/978-3-642-21878-1_21},
      editor = {M. R. Guarracino and F. Vivien and J. L. Tr\"aff and M. Cannataro and M. Danelutto and A. Hast and F. Perla and A. Kn\"upfer and B. Di Martino and M. Alexander},
      keywords = {bioinformatics},
      month = aug,
      pages = {167-175},
      publisher = {Springer},
      series = {{LNCS}},
      title = {{StochKit-FF}: Efficient Systems Biology on Multicore Architectures},
      url = {http://calvados.di.unipi.it/storage/paper_files/2010_stochkit-ff_hibb.pdf},
      volume = {6586},
      year = {2011},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2010_stochkit-ff_hibb.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-21878-1_21}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, M. Meneghin, and M. Torquati, “Accelerating code on multi-cores with FastFlow,” in Proc. of 17th Intl. Euro-Par 2011 Parallel Processing, Bordeaux, France, 2011, pp. 170-181. doi:10.1007/978-3-642-23397-5_17
    [BibTeX] [Abstract] [Download PDF]

    FastFlow is a programming framework specifically targeting cache-coherent shared-memory multicores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications.

    @inproceedings{ff:acc:europar:11,
      abstract = {FastFlow is a programming framework specifically targeting cache-coherent shared-memory multicores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications.},
      address = {Bordeaux, France},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Massimiliano Meneghin and Massimo Torquati},
      booktitle = {Proc. of 17th Intl. Euro-Par 2011 Parallel Processing},
      date-added = {2012-06-04 18:35:57 +0200},
      date-modified = {2013-12-12 00:46:59 +0000},
      doi = {10.1007/978-3-642-23397-5_17},
      editor = {E. Jeannot and R. Namyst and J. Roman},
      keywords = {fastflow},
      month = aug,
      pages = {170-181},
      publisher = {Springer},
      series = {LNCS},
      title = {Accelerating code on multi-cores with FastFlow},
      url = {http://calvados.di.unipi.it/storage/paper_files/2011_fastflow_acc_europar.pdf},
      volume = {6853},
      year = {2011},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2011_fastflow_acc_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-23397-5_17}
    }

  • M. Aldinucci, M. Coppo, F. Damiani, M. Drocco, M. Torquati, and A. Troina, “On Designing Multicore-Aware Simulators for Biological Systems,” in Proc. of Intl. Euromicro PDP 2011: Parallel Distributed and network-based Processing, Ayia Napa, Cyprus, 2011, pp. 318-325. doi:10.1109/PDP.2011.81
    [BibTeX] [Abstract] [Download PDF]

    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.

    @inproceedings{ff:cwc:pdp:11,
      abstract = {The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.},
      address = {Ayia Napa, Cyprus},
      author = {Marco Aldinucci and Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Massimo Torquati and Angelo Troina},
      booktitle = {Proc. of Intl. Euromicro PDP 2011: Parallel Distributed and network-based Processing},
      date-added = {2012-02-25 01:21:25 +0000},
      date-modified = {2013-11-24 00:37:16 +0000},
      doi = {10.1109/PDP.2011.81},
      editor = {Yiannis Cotronis and Marco Danelutto and George Angelos Papadopoulos},
      keywords = {fastflow},
      month = feb,
      pages = {318-325},
      publisher = {IEEE},
      title = {On Designing Multicore-Aware Simulators for Biological Systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2011_ff_cwc_sim_PDP.pdf},
      year = {2011},
      bdsk-url-1 = {http://arxiv.org/pdf/1010.2438v2},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2011_ff_cwc_sim_PDP.pdf},
      bdsk-url-3 = {http://dx.doi.org/10.1109/PDP.2011.81}
    }

  • M. Coppo, F. Damiani, M. Drocco, E. Grassi, M. Guether, and A. Troina, “Modelling Ammonium Transporters in Arbuscular Mycorrhiza Symbiosis,” Transactions on Computational Systems Biology (TCS), vol. 6575, iss. 13, pp. 85-109, 2011. doi:10.1007/978-3-642-19748-2_5
    [BibTeX] [Abstract]

    The Stochastic Calculus of Wrapped Compartments (SCWC) is a recently proposed variant of the Stochastic Calculus of Looping Sequences (SCLS), a language for the representation and simulation of biological systems. In this work we apply SCWC to model a newly discovered ammonium transporter. This transporter is believed to play a fundamental role for plant mineral acquisition, which takes place in the arbuscular mycorrhiza, the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In our experiments the passage of NH3/NH4+ from the fungus to the plant has been dissected in known and hypothetical mechanisms; with the model so far we have been able to simulate the behavior of the system under different conditions. Our simulations confirmed some of the latest experimental results about the LjAMT2;2 transporter. Moreover, by comparing the behaviour of LjAMT2;2 with the behaviour of another ammonium transporter which exists in plants, viz. LjAMT1;1, our simulations support an hypothesis about why LjAMT2;2 is so selectively expressed in arbusculated cells.

    @article{DBLP:journals/tcsb/Coppo/DDGGT11,
      abstract = {The Stochastic Calculus of Wrapped Compartments (SCWC) is a recently proposed variant of the Stochastic Calculus of Looping Sequences (SCLS), a language for the representation and simulation of biological systems. In this work we apply SCWC to model a newly discovered ammonium transporter. This transporter is believed to play a fundamental role for plant mineral acquisition, which takes place in the arbuscular mycorrhiza, the most wide-spread plant-fungus symbiosis on earth. Investigating this kind of symbiosis is considered one of the most promising ways to develop methods to nurture plants in more natural manners, avoiding the complex chemical productions used nowadays to produce artificial fertilizers. In our experiments the passage of NH3/NH4+ from the fungus to the plant has been dissected in known and hypothetical mechanisms; with the model so far we have been able to simulate the behavior of the system under different conditions.  Our simulations confirmed some of the latest experimental results about the LjAMT2;2 transporter. Moreover, by comparing the behaviour of LjAMT2;2 with the behaviour of another ammonium transporter which exists in plants, viz. LjAMT1;1, our simulations support an hypothesis about why LjAMT2;2 is so selectively expressed in arbusculated cells.},
      author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Mike Guether and Angelo Troina},
      date-added = {2013-12-12 22:25:24 +0000},
      date-modified = {2014-08-24 22:03:51 +0000},
      doi = {10.1007/978-3-642-19748-2_5},
      journal = {Transactions on Computational Systems Biology (TCS)},
      number = {13},
      pages = {85-109},
      title = {Modelling Ammonium Transporters in Arbuscular Mycorrhiza Symbiosis},
      volume = {6575},
      year = {2011},
      bdsk-url-1 = {http://dx.doi.org/10.1007/978-3-642-19748-2_5}
    }

2010

  • M. Aldinucci, S. Ruggieri, and M. Torquati, “Porting Decision Tree Algorithms to Multicore using FastFlow,” in Proc. of European Conference in Machine Learning and Knowledge Discovery in Databases (ECML PKDD), Barcelona, Spain, 2010, pp. 7-23. doi:10.1007/978-3-642-15880-3_7
    [BibTeX] [Abstract] [Download PDF]

    The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.

    @inproceedings{fastflow_c45:emclpkdd,
      abstract = {The whole computer hardware industry embraced multicores. For these machines, the extreme optimisation of sequential algorithms is no longer sufficient to squeeze the real machine power, which can be only exploited via thread-level parallelism. Decision tree algorithms exhibit natural concurrency that makes them suitable to be parallelised. This paper presents an approach for easy-yet-efficient porting of an implementation of the C4.5 algorithm on multicores. The parallel porting requires minimal changes to the original sequential code, and it is able to exploit up to 7X speedup on an Intel dual-quad core machine.},
      address = {Barcelona, Spain},
      author = {Marco Aldinucci and Salvatore Ruggieri and Massimo Torquati},
      booktitle = {Proc. of European Conference in Machine Learning and Knowledge Discovery in Databases (ECML PKDD)},
      date-added = {2010-06-15 21:03:56 +0200},
      date-modified = {2013-11-24 00:38:07 +0000},
      doi = {10.1007/978-3-642-15880-3_7},
      editor = {Jos{\'e} L. Balc{\'a}zar and Francesco Bonchi and Aristides Gionis and Mich{\`e}le Sebag},
      keywords = {fastflow},
      month = sep,
      pages = {7-23},
      publisher = {Springer},
      series = {LNCS},
      title = {Porting Decision Tree Algorithms to Multicore using {FastFlow}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2010_c45FF_ECMLPKDD.pdf},
      volume = {6321},
      year = {2010},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2010_c45FF_ECMLPKDD.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-15880-3_7}
    }

  • M. Coppo, F. Damiani, M. Drocco, E. Grassi, E. Sciacca, S. Spinella, and A. Troina, “Hybrid Calculus of Wrapped Compartments,” in Proc. of 4th Workshop on Membrane Computing and Biologically Inspired Process Calculi (MeCBIC), Jena, Germany, 2010, pp. 102-120.
    [BibTeX] [Abstract]

    The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of ordinary differential equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyze the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in CWC, a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.

    @inproceedings{DBLP:journals/corr/abs-1011-0494,
      abstract = {The modelling and analysis of biological systems has deep roots in Mathematics, specifically in the field of ordinary differential equations (ODEs). Alternative approaches based on formal calculi, often derived from process algebras or term rewriting systems, provide a quite complementary way to analyze the behaviour of biological systems. These calculi allow to cope in a natural way with notions like compartments and membranes, which are not easy (sometimes impossible) to handle with purely numerical approaches, and are often based on stochastic simulation methods. Recently, it has also become evident that stochastic effects in regulatory networks play a crucial role in the analysis of such systems. Actually, in many situations it is necessary to use stochastic models. For example when the system to be described is based on the interaction of few molecules, when we are at the presence of a chemical instability, or when we want to simulate the functioning of a pool of entities whose compartmentalised structure evolves dynamically. In contrast, stable metabolic networks, involving a large number of reagents, for which the computational cost of a stochastic simulation becomes an insurmountable obstacle, are efficiently modelled with ODEs. In this paper we define a hybrid simulation method, combining the stochastic approach with ODEs, for systems described in CWC, a calculus on which we can express the compartmentalisation of a biological system whose evolution is defined by a set of rewrite rules.},
      address = {Jena, Germany},
      author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Eva Sciacca and Salvatore Spinella and Angelo Troina},
      bibsource = {DBLP, http://dblp.uni-trier.de},
      booktitle = {Proc. of 4th Workshop on Membrane Computing and Biologically Inspired Process Calculi (MeCBIC)},
      date-added = {2013-12-12 22:24:23 +0000},
      date-modified = {2013-12-13 10:30:02 +0000},
      editor = {Gabriel Ciobanu and Maciej Koutny},
      ee = {http://dx.doi.org/10.4204/EPTCS.40.8},
      month = aug,
      pages = {102-120},
      series = {EPTCS},
      title = {Hybrid Calculus of Wrapped Compartments},
      volume = {40},
      year = {2010}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Autonomic Management of Multiple Non-Functional Concerns in Behavioural Skeletons,” in Grids, P2P and Services Computing, F. Desprez, V. Getov, T. Priol, and R. Yahyapour, Eds., Springer, 2010, pp. 89-103. doi:10.1007/978-1-4419-6794-7_8
    [BibTeX] [Abstract] [Download PDF]

    We introduce and address the problem of concurrent autonomic management of different non-functional concerns in parallel applications build as a hierarchical composition of behavioural skeletons. We first define the problems arising when multiple concerns are dealt with by independent managers, then we propose a methodology supporting coordinated management, and finally we discuss how autonomic management of multiple concerns may be implemented in a typical use case. Being based on the behavioural skeleton concept proposed in the CoreGRID GCM, it is anticipated that the methodology will be readily integrated into the current reference implementation of GCM based on Java ProActive and running on top of major grid middleware systems.

    @incollection{multiple-nf-concern:cgsymph:09:book,
      abstract = {We introduce and address the problem of concurrent autonomic management of different non-functional concerns in parallel applications build as a hierarchical composition of behavioural skeletons. We first define the problems arising when multiple concerns are dealt with by independent managers, then we propose a methodology supporting coordinated management, and finally we discuss how autonomic management of multiple concerns may be implemented in a typical use case. Being based on the behavioural skeleton concept proposed in the CoreGRID GCM, it is anticipated that the methodology will be readily integrated into the current reference implementation of GCM based on Java ProActive and running on top of major grid middleware systems.},
      annote = {ISBN: 978-1-4419-6793-0(Proc. of the CoreGRID Symposium 2009)},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Grids, P2P and Services Computing},
      date-added = {2009-06-30 12:24:06 +0200},
      date-modified = {2012-02-25 00:39:47 +0000},
      doi = {10.1007/978-1-4419-6794-7_8},
      editor = {Fr\'ed\'eric Desprez and Vladimir Getov and Thierry Priol and Ramin Yahyapour},
      month = aug,
      pages = {89-103},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Autonomic Management of Multiple Non-Functional Concerns in Behavioural Skeletons},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_CGSymph_Autonomic_BeSke.pdf},
      year = {2010},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_CGSymph_Autonomic_BeSke.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-1-4419-6794-7_8}
    }

  • M. Aldinucci, A. Bracciali, and P. LiÒ, “Formal Synthetic Immunology,” ERCIM News, vol. 82, pp. 40-41, 2010.
    [BibTeX] [Abstract] [Download PDF]

    The human immune system fights pathogens using an articulated set of strategies whose function is to maintain in health the organism. A large effort to formally model such a complex system using a computational approach is currently underway, with the goal of developing a discipline for engineering "synthetic" immune responses. This requires the integration of a range of analysis techniques developed for formally reasoning about the behaviour of complex dynamical systems. Furthermore, a novel class of software tools has to be developed, capable of efficiently analysing these systems on widely accessible computing platforms, such as commodity multi-core architectures..

    @article{stochkitff:ercimnews:10,
      abstract = {The human immune system fights pathogens using an articulated set of strategies whose function is to maintain in health the organism. A large effort to formally model such a complex system using a computational approach is currently underway, with the goal of developing a discipline for engineering "synthetic" immune responses. This requires the integration of a range of analysis techniques developed for formally reasoning about the behaviour of complex dynamical systems. Furthermore, a novel class of software tools has to be developed, capable of efficiently analysing these systems on widely accessible computing platforms, such as commodity multi-core architectures..},
      author = {Marco Aldinucci and Andrea Bracciali and Pietro Li\`o},
      date-added = {2010-07-02 20:32:31 +0200},
      date-modified = {2013-11-24 00:38:19 +0000},
      issn = {0926-4981},
      journal = {ERCIM News},
      keywords = {bioinformatics, fastflow},
      month = jul,
      pages = {40-41},
      title = {Formal Synthetic Immunology},
      url = {http://ercim-news.ercim.eu/images/stories/EN82/EN82-web.pdf},
      volume = {82},
      year = {2010},
      bdsk-url-1 = {http://ercim-news.ercim.eu/images/stories/EN82/EN82-web.pdf}
    }

  • M. Coppo, F. Damiani, M. Drocco, E. Grassi, and A. Troina, “Stochastic Calculus of Wrapped Compartments,” in Proc. of the 8th Workshop on Quantitative Aspects of Programming Languages (QAPL), Paphos, Cyprus, 2010, pp. 82-98.
    [BibTeX] [Abstract]

    The Calculus of Wrapped Compartments (CWC) is a variant of the Calculus of Looping Sequences (CLS). While keeping the same expressiveness, CWC strongly simplifies the development of automatic tools for the analysis of biological systems. The main simplification consists in the removal of the sequencing operator, thus lightening the formal treatment of the patterns to be matched in a term (whose complexity in CLS is strongly affected by the variables matching in the sequences). We define a stochastic semantics for this new calculus. As an application we model the interaction between macrophages and apoptotic neutrophils and a mechanism of gene regulation in E.Coli.

    @inproceedings{DBLP:journals/corr/abs-1006-5099,
      abstract = {The Calculus of Wrapped Compartments (CWC) is a variant of the Calculus of Looping Sequences (CLS). While keeping the same expressiveness, CWC strongly simplifies the development of automatic tools for the analysis of biological systems. The main simplification consists in the removal of the sequencing operator, thus lightening the formal treatment of the patterns to be matched in a term (whose complexity in CLS is strongly affected by the variables matching in the sequences).
    We define a stochastic semantics for this new calculus. As an application we model the interaction between macrophages and apoptotic neutrophils and a mechanism of gene regulation in E.Coli.},
      address = {Paphos, Cyprus},
      author = {Mario Coppo and Ferruccio Damiani and Maurizio Drocco and Elena Grassi and Angelo Troina},
      bibsource = {DBLP, http://dblp.uni-trier.de},
      booktitle = {Proc. of the 8th Workshop on Quantitative Aspects of Programming Languages (QAPL)},
      date-added = {2013-12-12 22:24:44 +0000},
      date-modified = {2013-12-13 10:31:45 +0000},
      editor = {Alessandra Di Pierro and Gethin Norman},
      ee = {http://dx.doi.org/10.4204/EPTCS.28.6},
      month = mar,
      pages = {82-98},
      series = {EPTCS},
      title = {Stochastic Calculus of Wrapped Compartments},
      volume = {28},
      year = {2010}
    }

  • M. Aldinucci, M. Meneghin, and M. Torquati, “Efficient Smith-Waterman on multi-core with FastFlow,” in Proc. of Intl. Euromicro PDP 2010: Parallel Distributed and network-based Processing, Pisa, Italy, 2010, pp. 195-199. doi:10.1109/PDP.2010.93
    [BibTeX] [Abstract] [Download PDF]

    Shared memory multiprocessors have returned to popularity thanks to rapid spreading of commodity multi-core architectures. However, little attention has been paid to supporting effective streaming applications on these architectures. In this paper we describe FastFlow, a low-level programming framework based on lock-free queues explicitly designed to support high-level languages for streaming applications. We compare FastFlow with state-of-the-art programming frameworks such as Cilk, OpenMP, and Intel TBB. We experimentally demonstrate that FastFlow is always more efficient than them on a given real world application: the speedup of FastFlow over other solutions may be substantial for fine grain tasks, for example +35% over OpenMP, +226% over Cilk, +96% over TBB for the alignment of protein P01111 against UniProt DB using the Smith-Waterman algorithm.

    @inproceedings{fastflow:pdp:10,
      abstract = {Shared memory multiprocessors have returned to popularity thanks to rapid spreading of commodity multi-core architectures. However, little attention has been paid to supporting effective streaming applications on these architectures. In this paper we describe FastFlow, a low-level programming framework based on lock-free queues explicitly designed to support high-level languages for streaming applications. We compare FastFlow with state-of-the-art programming frameworks such as Cilk, OpenMP, and Intel TBB. We experimentally demonstrate that FastFlow is always more efficient than them on a given real world application: the speedup of FastFlow over other solutions may be substantial for fine grain tasks, for example +35% over OpenMP, +226% over Cilk, +96% over TBB for the alignment of protein P01111 against UniProt DB using the Smith-Waterman algorithm.},
      address = {Pisa, Italy},
      author = {Marco Aldinucci and Massimiliano Meneghin and Massimo Torquati},
      booktitle = {Proc. of Intl. Euromicro PDP 2010: Parallel Distributed and network-based Processing},
      date-added = {2007-10-26 01:02:32 +0200},
      date-modified = {2013-11-24 00:38:51 +0000},
      doi = {10.1109/PDP.2010.93},
      editor = {Marco Danelutto and Tom Gross and Julien Bourgeois},
      keywords = {fastflow},
      month = feb,
      pages = {195-199},
      publisher = {IEEE},
      title = {Efficient {Smith-Waterman} on multi-core with FastFlow},
      url = {http://calvados.di.unipi.it/storage/paper_files/2010_fastflow_SW_PDP.pdf},
      year = {2010},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2010_fastflow_SW_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2010.93}
    }

  • M. Aldinucci, M. Danelutto, M. Meneghin, M. Torquati, and P. Kilpatrick, Efficient streaming applications on multi-core with FastFlow: The biosequence alignment test-bed, Elsevier, 2010, vol. 19. doi:10.3233/978-1-60750-530-3-273
    [BibTeX] [Abstract] [Download PDF]

    Shared-memory multi-core architectures are becoming increasingly popular. While their parallelism and peak performance is ever increasing, their efficiency is often disappointing due to memory fence overheads. In this paper we present FastFlow, a programming methodology based on lock-free queues explicitly designed for programming streaming applications on multi-cores. The potential of FastFlow is evaluated on micro-benchmarks and on the Smith-Waterman sequence alignment application, which exhibits a substantial speedup against the state-of-the-art multi-threaded implementation (SWPS3 x86/SSE2).

    @book{fastflow:parco:09,
      abstract = {Shared-memory multi-core architectures are becoming increasingly popular. While their parallelism and peak performance is ever increasing, their efficiency is often disappointing due to memory fence overheads. In this paper we present FastFlow, a programming methodology based on lock-free queues explicitly designed for programming streaming applications on multi-cores. The potential of FastFlow is evaluated on micro-benchmarks and on the Smith-Waterman sequence alignment application, which exhibits a substantial speedup against the state-of-the-art multi-threaded implementation (SWPS3 x86/SSE2).},
      author = {Aldinucci,M. and Danelutto,M. and Meneghin,M. and Torquati,M. and Kilpatrick,P.},
      doi = {10.3233/978-1-60750-530-3-273},
      keywords = {fastflow},
      language = {English},
      opteditor = {Barbara Chapman and Fr{\'e}d{\'e}ric Desprez and Gerhard R. Joubert and Alain Lichnewsky and Frans Peters and Thierry Priol},
      pages = {273-280},
      publisher = {Elsevier},
      series = {Advances in Parallel Computing},
      title = {Efficient streaming applications on multi-core with FastFlow: The biosequence alignment test-bed},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_fastflow_parco.pdf},
      volume = {19},
      year = {2010},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_fastflow_parco.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.3233/978-1-60750-530-3-273}
    }

  • M. Aldinucci, “Efficient Parallel MonteCarlo with FastFlow,” in HPC-Europa2: Science and Supercomputing in Europe, research highlights 2010, Cineca, 2010.
    [BibTeX] [Abstract] [Download PDF]

    The stochastic simulation of natural systems is a very informative but happens be computationally expensive. We present StochKit-FF, an parallel version of StochKit, a reference toolkit for stochastic simulations that sustantially improves StochKit performances on multi-core platforms.

    @incollection{ff:hpc-europa:10,
      abstract = {The stochastic simulation of natural systems is a very informative but happens be computationally expensive. We present StochKit-FF, an parallel version of StochKit, a reference toolkit for stochastic simulations that sustantially improves StochKit performances on multi-core platforms.},
      author = {Marco Aldinucci},
      booktitle = {HPC-Europa2: Science and Supercomputing in Europe, research highlights 2010},
      date-added = {2011-06-18 18:43:19 +0200},
      date-modified = {2013-11-24 00:40:04 +0000},
      keywords = {bioinformatics, fastflow},
      publisher = {Cineca},
      title = {Efficient Parallel {MonteCarlo} with {FastFlow}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2010-ff_hpceuropa2_092-inform-Aldinucci.pdf},
      year = {2010},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2010-ff_hpceuropa2_092-inform-Aldinucci.pdf}
    }

  • T. Weigold, M. Aldinucci, M. Danelutto, and V. Getov, “Integrating Autonomic Grid Components and Process-Driven Business Applications,” in Autonomic Computing and Communications Systems Third International ICST Conference, Autonomics 2009, Limassol, Cyprus, September 9-11, 2009, Revised Selected Papers, Limassol, Cyprus, 2010, pp. 98-113. doi:10.1007/978-3-642-11482-3_7
    [BibTeX] [Abstract] [Download PDF]

    Today’s business applications are increasingly process driven, meaning that the main application logic is executed by a dedicate process engine. In addition, component-oriented software development has been attracting attention for building complex distributed applications. In this paper we present the experiences gained from building a process-driven biometric identification application which makes use of Grid infrastructures via the Grid Component Model (GCM). GCM, besides guaranteeing access to Grid resources, supports autonomic management of notable parallel composite components. This feature is exploited within our biometric identification application to ensure real time identification of fingerprints. Therefore, we briefly introduce the GCM framework and the process engine used, and we describe the implementation of the application using autonomic GCM components. Finally, we summarize the results, experiences, and lessons learned focusing on the integration of autonomic GCM components and the process-driven approach.

    @inproceedings{ibm:autonomics:09,
      abstract = {Today's business applications are increasingly process driven, meaning that the main application logic is executed by a dedicate process engine. In addition, component-oriented software development has been attracting attention for building complex distributed applications. In this paper we present the experiences gained from building a process-driven biometric identification application which makes use of Grid infrastructures via the Grid Component Model (GCM). GCM, besides guaranteeing access to Grid resources, supports autonomic management of notable parallel composite components. This feature is exploited within our biometric identification application to ensure real time identification of fingerprints. Therefore, we briefly introduce the GCM framework and the process engine used, and we describe the implementation of the application using autonomic GCM components. Finally, we summarize the results, experiences, and lessons learned focusing on the integration of autonomic GCM components and the process-driven approach.},
      address = {Limassol, Cyprus},
      annote = {ISBN: 978-3-642-11481-6},
      author = {Thomas Weigold and Marco Aldinucci and Marco Danelutto and Vladimir Getov},
      booktitle = {Autonomic Computing and Communications Systems Third International ICST Conference, Autonomics 2009, Limassol, Cyprus, September 9-11, 2009, Revised Selected Papers},
      date-added = {2010-02-13 16:13:10 +0100},
      date-modified = {2012-11-24 09:44:22 +0000},
      doi = {10.1007/978-3-642-11482-3_7},
      editor = {Athanasios V. Vasilakos and Roberto Beraldi and Roy Friedman and Marco Mamei},
      issn = {1867-8211},
      pages = {98-113},
      publisher = {Springer},
      series = {{Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering (LNICST)}},
      title = {Integrating Autonomic Grid Components and Process-Driven Business Applications},
      url = {http://calvados.di.unipi.it/storage/paper_files/2010_BS_autonomics09.pdf},
      volume = {23},
      year = {2010},
      bdsk-url-1 = {http://dx.doi.org/10.1007/978-3-642-11482-3_7},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2010_BS_autonomics09.pdf}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Skeletons for multi/many-core systems,” in Parallel Computing: From Multicores and GPU’s to Petascale (Proc. of PARCO 2009, Lyon, France), Lyon, France, 2010, pp. 265-272. doi:10.3233/978-1-60750-530-3-265
    [BibTeX] [Abstract] [Download PDF]

    We discuss how algorithmic skeletons (and structured parallel programming models in general) can be used to efficiently and seamlessly program multi-core as well as many-core systems. We introduce a new version of the muskel skeleton library that can be used to target multi/many-core systems and we present experimental results that demonstrate the feasibility of the approach. The experimental results presented also give an idea of the computational grains that can be exploited on current, state-of-the-art multi-core systems.

    @inproceedings{multicoreske:parco:09,
      abstract = {We discuss how algorithmic skeletons (and structured parallel programming models in general) can be used to efficiently and seamlessly program multi-core as well as many-core systems. We introduce a new version of the muskel skeleton library that can be used to target multi/many-core systems and we present experimental results that demonstrate the feasibility of the approach. The experimental results presented also give an idea of the computational grains that can be exploited on current, state-of-the-art multi-core systems.},
      address = {Lyon, France},
      annote = {ISBN: 978-1-60750-529-7},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Parallel Computing: From Multicores and GPU's to Petascale (Proc. of {PARCO 2009}, Lyon, France)},
      date-added = {2009-06-03 17:56:19 +0200},
      date-modified = {2012-11-24 09:43:35 +0000},
      doi = {10.3233/978-1-60750-530-3-265},
      editor = {Barbara Chapman and Fr{\'e}d{\'e}ric Desprez and Gerhard R. Joubert and Alain Lichnewsky and Frans Peters and Thierry Priol},
      pages = {265-272},
      publisher = {IOS press},
      series = {Advances in Parallel Computing},
      title = {Skeletons for multi/many-core systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2010_muskel_multicore_parco.pdf},
      volume = {19},
      year = {2010},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2010_muskel_multicore_parco.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.3233/978-1-60750-530-3-265}
    }

2009

  • M. Aldinucci, H. L. Bouziane, M. Danelutto, and C. Pérez, “STKM on SCA: a Unified Framework with Components, Workflows and Algorithmic Skeletons,” in Proc. of 15th Intl. Euro-Par 2009 Parallel Processing, Delft, The Netherlands, 2009, pp. 678-690. doi:10.1007/978-3-642-03869-3
    [BibTeX] [Abstract] [Download PDF]

    This paper investigates an implementation of STKM, a Spatio-Temporal sKeleton Model. STKM expands the Grid Component Model (GCM) with an innovative programmable approach that allows programmers to compose an application by combining component, workflow and skeleton concepts. The paper deals with a projection of the STKM model on top of SCA and it evaluates its implementation using Tuscany Java SCA. Experimental results show the need and the benefits of the high level of abstraction offered by STKM.

    @inproceedings{stkm:europar:09,
      abstract = {This paper investigates an implementation of STKM, a Spatio-Temporal sKeleton Model. STKM expands the Grid Component Model (GCM) with an innovative programmable approach that allows programmers to compose an application by combining component, workflow and skeleton concepts. The paper deals with a projection of the STKM model on top of SCA and it evaluates its implementation using Tuscany Java SCA. Experimental results show the need and the benefits of the high level of abstraction offered by STKM.},
      address = {Delft, The Netherlands},
      author = {Marco Aldinucci and Hinde Lilia Bouziane and Marco Danelutto and Christian P{\'e}rez},
      booktitle = {Proc. of 15th Intl. Euro-Par 2009 Parallel Processing},
      date-modified = {2009-12-03 00:58:56 +0100},
      doi = {10.1007/978-3-642-03869-3},
      month = aug,
      pages = {678-690},
      publisher = {Springer},
      series = {LNCS},
      title = {{STKM} on {SCA}: a Unified Framework with Components, Workflows and Algorithmic Skeletons},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_STKM_Europar.pdf},
      volume = {5704},
      year = {2009},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_STKM_Europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-03869-3}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Autonomic management of non-functional concerns in distributed and parallel application programming,” in Proc. of Intl. Parallel & Distributed Processing Symposium (IPDPS), Rome, Italy, 2009, pp. 1-12. doi:10.1109/IPDPS.2009.5161034
    [BibTeX] [Abstract] [Download PDF]

    An approach to the management of non-functional concerns in massively parallel and/or distributed architectures that marries parallel programming patterns with autonomic computing is presented. The necessity and suitability of the adoption of autonomic techniques are evidenced. Issues arising in the implementation of autonomic managers taking care of multiple concerns and of coordination among hierarchies of such autonomic managers are discussed. Experimental results are presented that demonstrate the feasibility of the approach.

    @inproceedings{beske:ipdps:09,
      abstract = {An approach to the management of non-functional concerns in massively parallel and/or distributed architectures that marries parallel programming patterns with autonomic computing is presented. The necessity and suitability of the adoption of autonomic techniques are evidenced. Issues arising in the implementation of autonomic managers taking care of multiple concerns and of coordination among hierarchies of such autonomic managers are discussed. Experimental results are presented that demonstrate the feasibility of the approach.},
      address = {Rome, Italy},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Proc. of Intl. Parallel \& Distributed Processing Symposium (IPDPS)},
      date-added = {2008-12-09 18:58:37 +0100},
      date-modified = {2009-06-07 22:30:35 +0200},
      doi = {10.1109/IPDPS.2009.5161034},
      month = {may},
      pages = {1-12},
      publisher = {IEEE},
      title = {Autonomic management of non-functional concerns in distributed and parallel application programming},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_f_nf_IPDPS.pdf},
      year = {2009},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_f_nf_IPDPS.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/IPDPS.2009.5161034}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Co-design of distributed systems using skeletons and autonomic management abstractions,” in Euro-Par 2008 Workshops – Parallel Processing, Selected Papers, Las Palmas, Spain, 2009, pp. 403-414. doi:10.1007/978-3-642-00955-6_46
    [BibTeX] [Abstract] [Download PDF]

    We discuss how common problems arising with multi/many-core distributed architectures can be effectively handled through co-design of parallel/distributed programming abstractions and of autonomic management of non-functional concerns. In particular, we demonstrate how restricted parallel/distributed patterns (or skeletons) may be efficiently managed by rule-based autonomic managers. We discuss the basic principles underlying pattern+manager co-design, current implementations inspired by this approach and some results achieved with a proof-of-concept prototype.

    @inproceedings{abstraction:europarworkshop:09,
      abstract = {We discuss how common problems arising with multi/many-core distributed architectures can be effectively handled through co-design of parallel/distributed programming abstractions and of autonomic management of non-functional concerns. In particular, we demonstrate how restricted parallel/distributed patterns (or skeletons) may be efficiently managed by rule-based autonomic managers. We discuss the basic principles underlying pattern+manager co-design, current implementations inspired by this approach and some results achieved with a proof-of-concept prototype.},
      address = {Las Palmas, Spain},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Euro-Par 2008 Workshops - Parallel Processing, Selected Papers},
      date-added = {2009-01-09 17:57:45 +0100},
      date-modified = {2009-06-26 16:12:56 +0200},
      doi = {10.1007/978-3-642-00955-6_46},
      editor = {E. C{\'e}sar and M. Alexander and A. Streit and J.L. Tr{\"a}ff and C. C{\'e}rin and A. Kn{\"u}pfer and D. Kranzlm{\"u}ller and S. Jha},
      isbn = {978-3-642-00954-9},
      month = apr,
      pages = {403-414},
      publisher = {Springer},
      series = {LNCS},
      title = {Co-design of distributed systems using skeletons and autonomic management abstractions},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_abstraction_workshopeuropar.pdf},
      volume = {5415},
      year = {2009},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_abstraction_workshopeuropar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-00955-6_46}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Towards hierarchical management of autonomic components: a case study,” in Proc. of Intl. Euromicro PDP 2009: Parallel Distributed and network-based Processing, Weimar, Germany, 2009, pp. 3-10. doi:10.1109/PDP.2009.48
    [BibTeX] [Abstract] [Download PDF]

    We address the issue of autonomic management in hierarchical component-based distributed systems. The long term aim is to provide a modeling framework for autonomic management in which QoS goals can be defined, plans for system adaptation described and proofs of achievement of goals by (sequences of) adaptations furnished. Here we present an early step on this path. We restrict our focus to skeleton-based systems in order to exploit their well-defined structure. The autonomic cycle is described using the Orc system orchestration language while the plans are presented as structural modifications together with associated costs and benefits. A case study is presented to illustrate the interaction of managers to maintain QoS goals for throughput under varying conditions of resource availability.

    @inproceedings{beske:pdp:09,
      abstract = {We address the issue of autonomic management in hierarchical component-based distributed systems. The long term aim is to provide a modeling framework for autonomic management in which QoS goals can be defined, plans for system adaptation described and proofs of achievement of goals by (sequences of) adaptations furnished. Here we present an early step on this path. We restrict our focus to skeleton-based systems in order to exploit their well-defined structure. The autonomic cycle is described using the Orc system orchestration language while the plans are presented as structural modifications together with associated costs and benefits. A case study is presented to illustrate the interaction of managers to maintain QoS goals for throughput under varying conditions of resource availability.},
      address = {Weimar, Germany},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Proc. of Intl. Euromicro PDP 2009: Parallel Distributed and network-based Processing},
      date-added = {2008-10-15 22:43:41 +0200},
      date-modified = {2009-05-20 10:26:13 +0200},
      doi = {10.1109/PDP.2009.48},
      editor = {Didier El Baz and Tom Gross and Francois Spies},
      month = feb,
      pages = {3-10},
      publisher = {IEEE},
      title = {Towards hierarchical management of autonomic components: a case study},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_hier_man_PDP.pdf},
      year = {2009},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_hier_man_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2009.48}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Semi-formal models to support program development: autonomic management within component based parallel and distributed programming,” in Formal Methods for Components and Objects: 7th Intl. Symposium, FMCO 2008, Sophia-Antipolis, France, October 20 – 24, 2008, Revised Lectures, 2009, pp. 204-225. doi:10.1007/978-3-642-04167-9
    [BibTeX] [Abstract] [Download PDF]

    Functional and non-functional concerns require different programming effort, different techniques and different methodologies when attempting to program efficient parallel/distributed applications. In this work we present a “programmer oriented” methodology based on formal tools that permits reasoning about parallel/distributed program development and refinement. The proposed methodology is semi-formal in that it does not require the exploitation of highly formal tools and techniques, while providing a palatable and effective support to programmers developing parallel/distributed applications, in particular when handling non-functional concerns.

    @inproceedings{semi-formal:fmco:09,
      abstract = {Functional and non-functional concerns require different programming effort, different techniques and different methodologies when attempting to program efficient parallel/distributed applications. In this work we present a ``programmer oriented'' methodology based on formal tools that permits reasoning about parallel/distributed program development and refinement. The proposed methodology is semi-formal in that it does not require the exploitation of highly formal tools and techniques, while providing a palatable and effective support to programmers developing parallel/distributed applications, in particular when handling non-functional concerns.},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Formal Methods for Components and Objects: 7th Intl. Symposium, FMCO 2008, Sophia-Antipolis, France, October 20 - 24, 2008, Revised Lectures},
      date-added = {2009-06-07 16:05:13 +0200},
      date-modified = {2009-08-30 17:11:01 +0200},
      doi = {10.1007/978-3-642-04167-9},
      editor = {Frank S. de Boer and Marcello M. Bonsangue and Eric Madelaine},
      pages = {204-225},
      publisher = {Springer},
      series = {LNCS},
      title = {Semi-formal models to support program development: autonomic management within component based parallel and distributed programming},
      url = {http://calvados.di.unipi.it/storage/paper_files/2009_semiformal_FMCO08.pdf},
      volume = {5751},
      year = {2009},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2009_semiformal_FMCO08.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-642-04167-9}
    }

2008

  • M. Aldinucci, G. Antoniu, M. Danelutto, and M. Jan, “Fault-Tolerant Data Sharing for High-level Grid Programming: A Hierarchical Storage Architecture,” in Achievements in European Research on Grid Systems, M. Bubak, S. Gorlatch, and T. Priol, Eds., Kraków, Poland: Springer, 2008, pp. 67-81. doi:10.1007/978-0-387-72812-4_6
    [BibTeX] [Abstract] [Download PDF]

    Enabling high-level programming models on grids is today a major challenge. A way to achieve this goal relies on the use of environments able to transparently and automatically provide adequate support for low-level, grid-specific issues (fault-tolerance, scalability, etc.). This paper discusses the above approach when applied to grid data management. As a case study, we propose a 2-tier software architecture that supports transparent, fault-tolerant, grid-level data sharing in the ASSIST programming environment (University of Pisa), based on the JuxMem grid data sharing service (INRIA Rennes).

    @incollection{assist:juxmem:IW_book:07,
      abstract = {Enabling high-level programming models on grids is today a major challenge. A way to achieve this goal relies on the use of environments able to transparently and automatically provide adequate support for low-level, grid-specific issues (fault-tolerance, scalability, etc.). This paper discusses the above approach when applied to grid data management. As a case study, we propose a 2-tier software architecture that supports transparent, fault-tolerant, grid-level data sharing in the ASSIST programming environment (University of Pisa), based on the JuxMem grid data sharing service (INRIA Rennes).},
      address = {Krak{\'o}w, Poland},
      author = {Marco Aldinucci and Gabriel Antoniu and Marco Danelutto and Mathieu Jan},
      booktitle = {Achievements in European Research on Grid Systems},
      date-added = {2007-06-26 01:31:31 +0200},
      date-modified = {2012-11-18 17:45:08 +0000},
      doi = {10.1007/978-0-387-72812-4_6},
      editor = {Marian Bubak and Sergei Gorlatch and Thierry Priol},
      isbn = {978-0-387-72811-7},
      month = nov,
      pages = {67-81},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Fault-Tolerant Data Sharing for High-level Grid Programming: A Hierarchical Storage Architecture},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_IW06_book_juxadhocmem.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_IW06_book_juxadhocmem.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-72812-4_6}
    }

  • M. Aldinucci, M. Danelutto, H. L. Bouziane, and C. Pérez, “Towards Software Component Assembly Language Enhanced with Workflows and Skeletons,” in Proc. of the ACM SIGPLAN Component-Based High Performance Computing (CBHPC), New York, NY, USA, 2008, pp. 1-11. doi:10.1145/1456190.1456194
    [BibTeX] [Abstract] [Download PDF]

    We explore the possibilities offered by a programming model supporting components, workflows and skeletons. In particular we describe how Stcm (Spatio-Temporal Component Model), an already existing programming model supporting components and workflows, can be extended to also provide algorithmic skeleton concepts. Programmers are therefore enabled to assembly applications specifying both temporal and spatial relations among components and instantiating predefined skeleton composite components to implement all those application parts that can be easily modeled with the available skeletons. We discuss preliminary results as well as the benefits deriving from Stkm (Spatio-Temporal sKeleton Model) adoption in a couple of real applications.

    @inproceedings{stkm:CBHPC:08,
      abstract = {We explore the possibilities offered by a programming model supporting components, workflows and skeletons. In particular we describe how Stcm (Spatio-Temporal Component Model), an already existing programming model supporting components and workflows, can be extended to also provide algorithmic skeleton concepts. Programmers are therefore enabled to assembly applications specifying both temporal and spatial relations among components and instantiating predefined skeleton composite components to implement all those application parts that can be easily modeled with the available skeletons. We discuss preliminary results as well as the benefits deriving from Stkm (Spatio-Temporal sKeleton Model) adoption in a couple of real applications.},
      address = {New York, NY, USA},
      author = {Aldinucci, Marco and Danelutto, Marco and Bouziane, Hinde Lilia and P{\'e}rez, Christian},
      booktitle = {Proc. of the ACM SIGPLAN Component-Based High Performance Computing (CBHPC)},
      date-modified = {2008-11-17 18:33:20 +0100},
      doi = {10.1145/1456190.1456194},
      isbn = {978-1-60558-311-2},
      location = {Karlsruhe, Germany},
      month = oct,
      pages = {1-11},
      publisher = {ACM},
      title = {Towards Software Component Assembly Language Enhanced with Workflows and Skeletons},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_CBHPC.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_CBHPC.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1145/1456190.1456194}
    }

  • M. Aldinucci and M. Danelutto, “Securing skeletal systems with limited performance penalty: the Muskel experience,” Journal of Systems Architecture, vol. 54, iss. 9, pp. 868-876, 2008. doi:10.1016/j.sysarc.2008.02.008
    [BibTeX] [Abstract] [Download PDF]

    Algorithmic skeletons have been exploited to implement several parallel programming environments, targeting workstation clusters as well as workstation networks and computational grids. When targeting non-dedicated clusters, workstation networks and grids, security has to be taken adequately into account in order to guarantee both code and data confidentiality and integrity. However, introducing security is usually an expensive activity, both in terms of the effort required to managed security mechanisms and in terms of the time spent performing security related activities at run time.We discuss the cost of security introduction as well as how some features typical of skeleton technology can be exploited to improve the efficiency code and data securing in a typical skeleton based parallel programming environment and we evaluate the performance cost of security mechanisms implemented exploiting state of the art tools. In particular, we take into account the cost of security introduction in muskel, a Java based skeletal system exploiting macro data flow implementation technology. We consider the adoption of mechanisms that allow securing all the communications involving remote, unreliable nodes and we evaluate the cost of such mechanisms. Also, we consider the implications on the computational grains needed to scale secure and insecure skeletal computations.

    @article{security:jsa:07,
      abstract = {Algorithmic skeletons have been exploited to implement several parallel programming environments, targeting workstation clusters as well as workstation networks and computational grids. When targeting non-dedicated clusters, workstation networks and grids, security has to be taken adequately into account in order to guarantee both code and data confidentiality and integrity. However, introducing security is usually an expensive activity, both in terms of the effort required to managed security mechanisms and in terms of the time spent performing security related activities at run time.We discuss the cost of security introduction as well as how some features typical of skeleton technology can be exploited to improve the efficiency code and data securing in a typical skeleton based parallel programming environment and we evaluate the performance cost of security mechanisms implemented exploiting state of the art tools. In particular, we take into account the cost of security introduction in muskel,
     a Java based skeletal system exploiting macro data flow implementation technology. We consider the adoption of mechanisms that allow securing all the communications involving remote, unreliable nodes and we evaluate the cost of such mechanisms. Also, we consider the implications on the computational grains needed to scale secure and insecure skeletal computations.},
      author = {Marco Aldinucci and Marco Danelutto},
      date-added = {2007-10-31 19:23:37 +0100},
      date-modified = {2014-08-24 22:18:21 +0000},
      doi = {10.1016/j.sysarc.2008.02.008},
      journal = {Journal of Systems Architecture},
      month = sep,
      number = {9},
      pages = {868-876},
      publisher = {Elsevier},
      title = {Securing skeletal systems with limited performance penalty: the {Muskel} experience},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_security_JSA.pdf},
      volume = {54},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_security_JSA.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/j.sysarc.2008.02.008}
    }

  • M. Aldinucci and E. Tuosto, “Towards a Formal Semantics for Autonomic Components,” in From Grids To Service and Pervasive Computing (Proc. of the CoreGRID Symposium 2008), Las Palmas, Spain, 2008, pp. 31-45. doi:10.1007/978-0-387-09455-7_3
    [BibTeX] [Abstract] [Download PDF]

    Autonomic management can improve the QoS provided by parallel/distributed applications. Within the CoreGRID Component Model, the autonomic management is tailored to the automatic — monitoring-driven — alteration of the component assembly and, therefore, is defined as the effect of (distributed)management code. This work yields a semantics based on hypergraph rewriting suitable tomodel the dynamic evolution and non-functional aspects of Service Oriented Architectures and component-based autonomic applications. In this regard, our main goal is to provide a formal description of adaptation operations that are typically only informally specified. We advocate that our approach makes easier to raise the level of abstraction of management code in autonomic and adaptive applications.

    @inproceedings{sem:cgsymph:08,
      abstract = {Autonomic management can improve the QoS provided by parallel/distributed applications. Within the CoreGRID Component Model, the autonomic management is tailored to the automatic -- monitoring-driven -- alteration of the component assembly and, therefore, is defined as the effect of (distributed)management code.
    This work yields a semantics based on hypergraph rewriting suitable tomodel the dynamic evolution and non-functional aspects of Service Oriented Architectures and component-based autonomic applications. In this regard, our main goal is to provide a formal description of adaptation operations that are typically only informally specified. We advocate that our approach makes easier to raise the level of abstraction of management code in autonomic and adaptive applications.},
      address = {Las Palmas, Spain},
      author = {Marco Aldinucci and Emilio Tuosto},
      booktitle = {From Grids To Service and Pervasive Computing (Proc. of the CoreGRID Symposium 2008)},
      date-added = {2008-05-11 18:46:45 +0200},
      date-modified = {2010-02-13 19:32:53 +0100},
      doi = {10.1007/978-0-387-09455-7_3},
      editor = {Thierry Priol and Marco Vanneschi},
      isbn = {978-0-387-09454-0},
      month = aug,
      pages = {31-45},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Towards a Formal Semantics for Autonomic Components},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_sem_cgsymph.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_sem_cgsymph.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-09455-7_3}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, P. Dazzi, P. Kilpatrick, D. Laforenza, and N. Tonellotto, “Behavioural skeletons for component autonomic management on grids,” in Making Grids Work, M. Danelutto, P. Frangopoulou, and V. Getov, Eds., Springer, 2008, pp. 3-16. doi:10.1007/978-0-387-78448-9_1
    [BibTeX] [Abstract] [Download PDF]

    Autonomic management can improve the QoS provided by parallel/distributed applications. Within the CoreGRID Component Model, the autonomic management is tailored to the automatic — monitoring-driven — alteration of the component assembly and, therefore, is defined as the effect of (distributed)management code. This work yields a semantics based on hypergraph rewriting suitable tomodel the dynamic evolution and non-functional aspects of Service Oriented Architectures and component-based autonomic applications. In this regard, our main goal is to provide a formal description of adaptation operations that are typically only informally specified. We advocate that our approach makes easier to raise the level of abstraction of management code in autonomic and adaptive applications.

    @incollection{beske:cg_book:08,
      abstract = {Autonomic management can improve the QoS provided by parallel/distributed applications. Within the CoreGRID Component Model, the autonomic management is tailored to the automatic -- monitoring-driven -- alteration of the component assembly and, therefore, is defined as the effect of (distributed)management code.
    This work yields a semantics based on hypergraph rewriting suitable tomodel the dynamic evolution and non-functional aspects of Service Oriented Architectures and component-based autonomic applications. In this regard, our main goal is to provide a formal description of adaptation operations that are typically only informally specified. We advocate that our approach makes easier to raise the level of abstraction of management code in autonomic and adaptive applications.},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Patrizio Dazzi and Peter Kilpatrick and Domenico Laforenza and Nicola Tonellotto},
      booktitle = {Making Grids Work},
      chapter = {Component Programming Models},
      date-added = {2007-12-09 22:26:46 +0100},
      date-modified = {2008-11-17 20:07:48 +0100},
      doi = {10.1007/978-0-387-78448-9_1},
      editor = {Marco Danelutto and Paraskevi Frangopoulou and Vladimir Getov},
      isbn = {978-0-387-78447-2},
      month = aug,
      pages = {3-16},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Behavioural skeletons for component autonomic management on grids},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_beske_cg_crete_book.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_beske_cg_crete_book.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-78448-9_1}
    }

  • M. Aldinucci, M. Danelutto, G. Zoppi, and P. Kilpatrick, “Advances in Autonomic Components & Services,” in From Grids To Service and Pervasive Computing (Proc. of the CoreGRID Symposium 2008), Las Palmas, Spain, 2008, pp. 3-18. doi:10.1007/978-0-387-09455-7_1
    [BibTeX] [Abstract] [Download PDF]

    Hierarchical autonomic management of structured grid applications can be efficiently implemented using production rule engines. Rules of the form "precondition-to-action" can be used to model the behaviour of autonomic managers in such a way that the autonomic control and the application management strategy are kept separate. This simplifies the manager design as well as user customization of autonomic manager policies. We briefly introduce rule-based autonomic managers. Then we discuss an implementation of a GCM-like behavioural skeleton — a composite component modelling a standard parallelism exploitation pattern with its own autonomic controller — in SCA/Tuscany. The implementation uses the JBoss rules engine to provide an autonomic behavioural skeleton component and services to expose the component functionality to the standard service framework. Performance results are discussed and finally similarities and differences with respect to the ProActive-based reference GCM implementation are discussed briefly.

    @inproceedings{sca:cgsymph:08,
      abstract = {Hierarchical autonomic management of structured grid applications can be efficiently implemented using production rule engines. Rules of the form "precondition-to-action" can be used to model the behaviour of autonomic managers in such a way that the autonomic control and the application management strategy are kept separate. This simplifies the manager design as well as user customization of autonomic manager policies. We briefly introduce rule-based autonomic managers. Then we discuss an implementation of a GCM-like behavioural skeleton -- a composite component modelling a standard parallelism exploitation pattern with its own autonomic controller -- in SCA/Tuscany. The implementation uses the JBoss rules engine to provide an autonomic behavioural skeleton component and services to expose the component functionality to the standard service framework. Performance results are discussed and finally similarities and differences with respect to the ProActive-based reference GCM implementation are discussed briefly.},
      address = {Las Palmas, Spain},
      author = {Marco Aldinucci and Marco Danelutto and Giorgio Zoppi and Peter Kilpatrick},
      booktitle = {From Grids To Service and Pervasive Computing (Proc. of the CoreGRID Symposium 2008)},
      date-added = {2008-05-11 18:42:40 +0200},
      date-modified = {2012-11-17 16:11:44 +0000},
      doi = {10.1007/978-0-387-09455-7_1},
      editor = {Thierry Priol and Marco Vanneschi},
      isbn = {978-0-387-09454-0},
      month = aug,
      pages = {3-18},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Advances in Autonomic Components {\&} Services},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_SCA_cgsymph.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_SCA_cgsymph.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-09455-7_1}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, and P. Dazzi, “From Orc Models to Distributed Grid Java code,” in Proc. of the Integrated Research in Grid Computing Workshop, Hersonissos, Crete, Greece, 2008, pp. 2-13.
    [BibTeX] [Abstract] [Download PDF]

    We present O2J, a Java library that allows implementation of Orc programs on distributed architectures including grids and clusters/networks of workstations. With minimal programming effort the grid programmer may implement Orc programs, as he/she is not required to write any low level code relating to distributed orchestration of the computation but only that required to implement Orc expressions. Using the prototype O2J implementation, grid application developers can reason about abstract grid orchestration code described in Orc. Once the required orchestration has been determined and its properties analysed, a grid application prototype can be simply, efficiently and quickly implemented by taking the Orc code, rewriting it into corresponding Java/O2J syntax and finally providing the functional code implementing the sites and processes involved. The proposed modus operandi brings a Model Driven Engineering approach to grid application development.

    @inproceedings{orc:IW:08,
      abstract = {We present O2J, a Java library that allows implementation of Orc programs on distributed architectures including grids and clusters/networks of workstations. With minimal programming effort the grid programmer may implement Orc programs, as he/she is not required to write any low level code relating to distributed orchestration of the computation but only that required to implement Orc expressions. Using the prototype O2J implementation, grid application developers can reason about abstract grid orchestration code described in Orc. Once the required orchestration has been determined and its properties analysed, a grid application prototype can be simply, efficiently and quickly implemented by taking the Orc code, rewriting it into corresponding Java/O2J syntax and finally providing the functional code implementing the sites and processes involved. The proposed modus operandi brings a Model Driven Engineering approach to grid application development.},
      address = {Hersonissos, Crete, Greece},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Patrizio Dazzi},
      booktitle = {Proc. of the Integrated Research in Grid Computing Workshop},
      date-added = {2008-02-09 16:59:20 +0100},
      date-modified = {2012-11-18 18:07:06 +0000},
      editor = {Sergei Gorlatch and Paraskevi Fragopoulou and Thierry Priol},
      keywords = {Duplicate},
      month = apr,
      pages = {2-13},
      series = {CoreGRID},
      title = {From {Orc} Models to Distributed Grid {Java} code},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_IW_O2J.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_IW_O2J.pdf}
    }

  • M. Aldinucci and A. Benoit, “Automatic mapping of ASSIST applications using process algebra,” Parallel Processing Letters, vol. 18, iss. 1, pp. 175-188, 2008. doi:10.1142/S0129626408003302
    [BibTeX] [Abstract] [Download PDF]

    Grid technologies aim to harness the computational capabilities of widely distributed collections of computers. Due to the heterogeneous and dynamic nature of the set of grid resources, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.

    @article{assist:pepa:ppl:08,
      abstract = {Grid technologies aim to harness the computational capabilities of widely distributed
    collections of computers. Due to the heterogeneous and dynamic nature of the set of grid resources, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.},
      annote = {ISSN: 0129-6264},
      author = {Marco Aldinucci and Anne Benoit},
      date-modified = {2013-06-17 14:09:49 +0000},
      doi = {10.1142/S0129626408003302},
      issn = {0129-6264},
      journal = {Parallel Processing Letters},
      month = mar,
      number = {1},
      pages = {175-188},
      title = {Automatic mapping of {ASSIST} applications using process algebra},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_pepa_ppl.pdf},
      volume = {18},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_pepa_ppl.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626408003302}
    }

  • M. Aldinucci, M. Torquati, M. Vanneschi, and P. Zuccato, “The VirtuaLinux Storage Abstraction Layer for Efficient Virtual Clustering,” in Proc. of Intl. Euromicro PDP 2008: Parallel Distributed and network-based Processing, Toulouse, France, 2008, pp. 619-627. doi:10.1109/PDP.2008.86
    [BibTeX] [Abstract] [Download PDF]

    VirtuaLinux is a meta-distribution that enables a standard Linux distribution to support robust physical and virtualized clusters. VirtuaLinux helps in avoiding the "single point of failure" effect by means of a combination of architectural strategies, including the transparent support for disk-less and master-less cluster configuration. VirtuaLinux supports the creation and management of Virtual Clusters in seamless way: VirtuaLinux Virtual Cluster Manager enables the system administrator to create, save, restore Xen-based Virtual Clusters, and to map and dynamically re-map them onto the nodes of the physical cluster. In this paper we introduce and discuss VirtuaLinux virtualization architecture, features, and tools, and in particular, the novel disk abstraction layer, which permits the fast and space-efficient creation of Virtual Clusters.

    @inproceedings{vlinux:pdp:08,
      abstract = {VirtuaLinux is a meta-distribution that enables a standard Linux distribution to support robust physical and virtualized clusters. VirtuaLinux helps in avoiding the "single point of failure" effect by means of a combination of architectural strategies, including the transparent support for disk-less and master-less cluster configuration. VirtuaLinux supports the creation and management of Virtual Clusters in seamless way: VirtuaLinux Virtual Cluster Manager enables the system administrator to create, save, restore Xen-based Virtual Clusters, and to map and dynamically re-map them onto the nodes of the physical cluster. In this paper we introduce and discuss VirtuaLinux virtualization architecture, features, and tools, and in particular, the novel disk abstraction layer, which permits the fast and space-efficient creation of Virtual Clusters.},
      address = {Toulouse, France},
      author = {Marco Aldinucci and Massimo Torquati and Marco Vanneschi and Pierfrancesco Zuccato},
      booktitle = {Proc. of Intl. Euromicro PDP 2008: Parallel Distributed and network-based Processing},
      date-added = {2009-11-10 01:29:09 +0100},
      date-modified = {2009-11-10 01:29:09 +0100},
      doi = {10.1109/PDP.2008.86},
      editor = {Didier El Baz and Julien Bourgeois and Francois Spies},
      month = feb,
      pages = {619-627},
      publisher = {IEEE},
      title = {The VirtuaLinux Storage Abstraction Layer for Efficient Virtual Clustering},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_VirtuaLinux_PDP.pdf},
      year = {2008},
      bdsk-url-1 = {http://dx.doi.org/10.1109/PDP.2008.86},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2008_VirtuaLinux_PDP.pdf}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, M. Vanneschi, P. Dazzi, D. Laforenza, N. Tonellotto, and P. Kilpatrick, “Behavioural skeletons in GCM: autonomic management of grid components,” in Proc. of Intl. Euromicro PDP 2008: Parallel Distributed and network-based Processing, Toulouse, France, 2008, pp. 54-63. doi:10.1109/PDP.2008.46
    [BibTeX] [Abstract] [Download PDF]

    Autonomic management can be used to improve the QoS provided by parallel/distributed applications. We discuss behavioural skeletons introduced in earlier work: rather than relying on programmer ability to design "from scratch" efficient autonomic policies, we encapsulate general autonomic controller features into algorithmic skeletons. Then we leave to the programmer the duty of specifying the parameters needed to specialise the skeletons to the needs of the particular application at hand. This results in the programmer having the ability to fast prototype and tune distributed/parallel applications with non-trivial autonomic management capabilities. We discuss how behavioural skeletons have been implemented in the framework of GCM (the grid component model developed within the CoreGRID NoE and currently being implemented within the GridCOMP STREP project). We present results evaluating the overhead introduced by autonomic management activities as well as the overall behaviour of the skeletons. We also present results achieved with a long running application subject to autonomic management and dynamically adapting to changing features of the target architecture. Overall the results demonstrate both the feasibility of implementing autonomic control via behavioural skeletons and the effectiveness of our sample behavioural skeletons in managing the "functional replication" pattern(s).

    @inproceedings{orc:pdp:08,
      abstract = {Autonomic management can be used to improve the QoS provided by parallel/distributed applications. We discuss behavioural skeletons introduced in earlier work: rather than relying on programmer ability to design "from scratch" efficient autonomic policies, we encapsulate general autonomic controller features into algorithmic skeletons. Then we leave to the programmer the duty of specifying the parameters needed to specialise the skeletons to the needs of the particular application at hand. This results in the programmer having the ability to fast prototype and tune distributed/parallel applications with non-trivial autonomic management capabilities. We discuss how behavioural skeletons have been implemented in the framework of GCM (the grid component model developed within the CoreGRID NoE and currently being implemented within the GridCOMP STREP project). We present results evaluating the overhead introduced by autonomic management activities as well as the overall behaviour of the skeletons. We also present results achieved with a long running application subject to autonomic management and dynamically adapting to changing features of the target architecture. Overall the results demonstrate both the feasibility of implementing autonomic control via behavioural skeletons and the effectiveness of our sample behavioural skeletons in managing the "functional replication" pattern(s).},
      address = {Toulouse, France},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Marco Vanneschi and Patrizio Dazzi and Domenico Laforenza and Nicola Tonellotto and Peter Kilpatrick},
      booktitle = {Proc. of Intl. Euromicro PDP 2008: Parallel Distributed and network-based Processing},
      date-added = {2007-10-09 12:13:13 +0200},
      date-modified = {2009-02-05 23:55:55 +0100},
      doi = {10.1109/PDP.2008.46},
      editor = {Didier El Baz and Julien Bourgeois and Francois Spies},
      month = feb,
      pages = {54-63},
      publisher = {IEEE},
      title = {Behavioural skeletons in {GCM}: autonomic management of grid components},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_orc_PDP.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_orc_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2008.46}
    }

  • M. Aldinucci, M. Danelutto, P. Kilpatrick, and P. Dazzi, “From Orc Models to Distributed Grid Java code,” in Grid Computing: Achievements and Prospects, S. Gorlatch, P. Fragopoulou, and T. Priol, Eds., Springer, 2008, pp. 13-24. doi:10.1007/978-0-387-09457-1_2
    [BibTeX] [Abstract] [Download PDF]

    We present O2J, a Java library that allows implementation of Orc programs on distributed architectures including grids and clusters/networks of workstations. With minimal programming effort the grid programmer may implement Orc programs, as he/she is not required to write any low level code relating to distributed orchestration of the computation but only that required to implement Orc expressions. Using the prototype O2J implementation, grid application developers can reason about abstract grid orchestration code described inOrc. Once the required orchestration has been determined and its properties analysed, a grid application prototype can be simply, efficiently and quickly implemented by taking the Orc code, rewriting it into corresponding Java/O2J syntax and finally providing the functional code implementing the sites and processes involved. The proposed modus operandi brings aModel Driven Engineering approach to grid application development.

    @incollection{orc:IW_book:08,
      abstract = {We present O2J, a Java library that allows implementation of Orc programs on distributed architectures including grids and clusters/networks of workstations. With minimal programming effort the grid programmer may implement Orc programs, as he/she is not required to write any low level code relating to distributed orchestration of the computation but only that required to implement Orc expressions. Using the prototype O2J implementation, grid application developers can reason about abstract grid orchestration code described inOrc. Once the required orchestration has been determined and its properties analysed, a grid application prototype can be simply, efficiently and quickly implemented by taking the Orc code, rewriting it into corresponding Java/O2J syntax and finally providing the functional code implementing the sites and processes involved. The proposed modus operandi brings aModel Driven Engineering approach to grid application development.},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick and Patrizio Dazzi},
      booktitle = {Grid Computing: Achievements and Prospects},
      date-added = {2008-11-16 16:26:47 +0100},
      date-modified = {2015-02-21 14:30:35 +0000},
      doi = {10.1007/978-0-387-09457-1_2},
      editor = {Sergei Gorlatch and Paraskevi Fragopoulou and Thierry Priol},
      isbn = {978-0-387-09456-4},
      pages = {13-24},
      publisher = {Springer},
      series = {CoreGRID},
      title = {From {Orc} Models to Distributed Grid {Java} code},
      url = {http://calvados.di.unipi.it/storage/paper_files/2008_IW_book_O2J.pdf},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2008_IW_book_O2J.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-09457-1},
      bdsk-url-3 = {http://dx.doi.org/10.1007/978-0-387-09457-1_2}
    }

  • M. Aldinucci, M. Danelutto, M. Torquati, F. Polzella, G. Spinatelli, M. Vanneschi, A. Gervaso, M. Cacitti, and P. Zuccato, “VirtuaLinux: virtualized high-density clusters with no single point of failure,” in Parallel Computing: Architectures, Algorithms and Applications, The Netherlands, 2008, pp. 355-362.
    [BibTeX] [Abstract] [Download PDF]

    VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of both physical and virtualized clusters with no single point of failure. VirtuaLinux supports the creation and management of virtual clusters in seamless way: VirtuaLinux Virtual Cluster Manager enables the system administrator to create, save, restore Xen-based virtual clusters, and to map and dynamically remap them onto the nodes of the physical cluster. We introduces and discuss VirtuaLinux virtualization architecture, features, and tools. These rely on a novel disk abstraction layer, which enables the fast, space-efficient, dynamic creation of virtual clusters composed of fully independent complete virtual machines.

    @inproceedings{virtualinux:parco:07,
      abstract = {VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of both physical and virtualized clusters with no single point of failure. VirtuaLinux supports the creation and management of virtual clusters in seamless way: VirtuaLinux Virtual Cluster Manager enables the system administrator to create, save, restore Xen-based virtual clusters, and to map and dynamically remap them onto the nodes of the physical cluster. We introduces and discuss VirtuaLinux virtualization architecture, features, and tools. These rely on a novel disk abstraction layer, which enables the fast, space-efficient, dynamic creation of virtual clusters composed of fully independent complete virtual machines.},
      address = {The Netherlands},
      annote = {Parco 2007},
      author = {Marco Aldinucci and Marco Danelutto and Massimo Torquati and Francesco Polzella and Gianmarco Spinatelli and Marco Vanneschi and Alessandro Gervaso and Manuel Cacitti and Pierfrancesco Zuccato},
      booktitle = {Parallel Computing: Architectures, Algorithms and Applications},
      date-added = {2007-06-26 01:43:08 +0200},
      date-modified = {2012-11-18 17:56:09 +0000},
      editor = {C. Bischof and M. B{\"u}cker and P. Gibbon and G. R. Joubert and T. Lippert and B. Mohr and F. J. Peters},
      pages = {355-362},
      publisher = {IOS press},
      series = {ADVANCES IN PARALLEL COMPUTING},
      title = {{VirtuaLinux}: virtualized high-density clusters with no single point of failure},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_vlinux_parco.pdf},
      volume = {15},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_vlinux_parco.pdf}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “A framework for prototyping and reasoning about grid systems,” in Parallel Computing: Architectures, Algorithms and Applications, Germany, 2008, pp. 235-242.
    [BibTeX] [Abstract] [Download PDF]

    A framework supporting fast prototyping as well as tuning of distributed applications is presented. The approach is based on the adoption of a formal model that is used to describe the orchestration of distributed applications. The formal model (Orc by Misra and Cook) can be used to support semi-formal reasoning about the applications at hand. The paper describes how the framework can be used to derive and evaluate alternative orchestrations of a well know parallel/distributed computation pattern; and shows how the same formal model can be used to support generation of prototypes of distributed applications skeletons directly from the application description.

    @inproceedings{orc:parco:07,
      abstract = {A framework supporting fast prototyping as well as tuning of distributed applications is presented. The approach is based on the adoption of a formal model that is used to describe the orchestration of distributed applications. The formal model (Orc by Misra and Cook) can be used to support semi-formal reasoning about the applications at hand. The paper describes how the framework can be used to derive and evaluate alternative orchestrations of a well know parallel/distributed computation pattern; and shows how the same formal model can be used to support generation of prototypes of distributed applications skeletons directly from the application description.},
      address = {Germany},
      annote = {Parco 2007},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Parallel Computing: Architectures, Algorithms and Applications},
      date-added = {2007-06-26 01:48:06 +0200},
      date-modified = {2012-11-18 17:48:22 +0000},
      editor = {C. Bischof and M. B{\"u}cker and P. Gibbon and G. R. Joubert and T. Lippert and B. Mohr and F. J. Peters},
      isbn = {9781586037963},
      pages = {235-242},
      publisher = {IOS press},
      series = {ADVANCES IN PARALLEL COMPUTING},
      title = {A framework for prototyping and reasoning about grid systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_orc_parco.pdf},
      volume = {15},
      year = {2008},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_orc_parco.pdf}
    }

2007

  • M. Aldinucci, M. Danelutto, and P. Dazzi, “MUSKEL: an expandable skeleton environment,” Scalable Computing: Practice and Experience, vol. 8, iss. 4, pp. 325-341, 2007.
    [BibTeX] [Abstract] [Download PDF]

    Programming models based on algorithmic skeletons promise to raise the level of abstraction perceived by programmers when implementing parallel applications, while guaranteeing good performance figures. At the same time, however, they restrict the freedom of programmers to implement arbitrary parallelism exploitation patterns. In fact, efficiency is achieved by restricting the parallelism exploitation patterns provided to the programmer to the useful ones for which efficient implementations, as well as useful and efficient compositions, are known. In this work we introduce muskel, a full Java library targeting workstation clusters, networks and grids and providing the programmers with a skeleton based parallel programming environment. muskel is implemented exploiting (macro) data flow technology, rather than the more usual skeleton technology relying on the use of implementation templates. Using data flow, muskel easily and efficiently implements both classical, predefined skeletons, and user-defined parallelism exploitation patterns. This provides a means to overcome some of the problems that Cole identified in his skeleton “manifesto” as the issues impairing skeleton success in the parallel programming arena. We discuss fully how user-defined skeletons are supported by exploiting a data flow implementation, experimental results and we also discuss extensions supporting the further characterization of skeletons with non-functional properties, such as security, through the use of Aspect Oriented Programming and annotations.

    @article{muskel:SCPE:07,
      abstract = {Programming models based on algorithmic skeletons promise to raise the level of abstraction perceived by programmers when implementing parallel applications, while guaranteeing good performance figures. At the same time, however, they restrict the freedom of programmers to implement arbitrary parallelism exploitation patterns. In fact, efficiency is achieved by restricting the parallelism exploitation patterns provided to the programmer to the useful ones for which efficient implementations, as well as useful and efficient compositions, are known. In this work we introduce muskel, a full Java library targeting workstation clusters, networks and grids and providing the programmers with a skeleton based parallel programming environment. muskel is implemented exploiting (macro) data flow technology, rather than the more usual skeleton technology relying on the use of implementation templates. Using data flow, muskel easily and efficiently implements both classical, predefined skeletons, and
    user-defined parallelism exploitation patterns. This provides a means to overcome some of the problems that Cole identified in his skeleton ``manifesto'' as the issues impairing skeleton success in the parallel programming arena. We discuss fully how user-defined skeletons are supported by exploiting a data flow implementation, experimental results and we also discuss extensions supporting the further characterization of skeletons with non-functional properties, such as security, through the use of Aspect Oriented Programming and annotations.},
      author = {Marco Aldinucci and Marco Danelutto and Patrizio Dazzi},
      date-added = {2007-06-26 01:27:03 +0200},
      date-modified = {2014-08-24 22:17:35 +0000},
      journal = {Scalable Computing: Practice and Experience},
      month = dec,
      number = {4},
      pages = {325-341},
      title = {MUSKEL: an expandable skeleton environment},
      url = {http://www.scpe.org/index.php/scpe/article/view/429},
      volume = {8},
      year = {2007},
      bdsk-url-1 = {http://www.scpe.org/vols/vol08/no4/SCPE_8_4_01.pdf},
      bdsk-url-2 = {http://calvados.di.unipi.it/storage/paper_files/2007_SCPE_muskel.pdf},
      bdsk-url-3 = {http://www.scpe.org/index.php/scpe/article/view/429}
    }

  • M. Aldinucci and M. Danelutto, “Skeleton based parallel programming: functional and parallel semantic in a single shot,” Computer Languages, Systems and Structures, vol. 33, iss. 3-4, pp. 179-192, 2007. doi:10.1016/j.cl.2006.07.004
    [BibTeX] [Abstract] [Download PDF]

    Semantics of skeleton-based parallel programming languages comes usually as two distinct items: a functional semantics, modeling the function computed by the skeleton program, and a parallel semantics describing the ways used to exploit parallelism during the execution of the skeleton program. The former is usually expressed using some kind of semantic formalism, while the latter is almost always given in an informal way. Such a separation of functional and parallel semantics seriously impairs the possibility of programmers to use the semantic tools to prove properties of programs. In this work, we show how a formal semantic framework can be set up that handles both functional and parallel aspects of skeleton-based parallel programs. The framework is based on a labeled transition system. We show how different properties related to skeleton programs can be proved using such a system. We use Lithium, a skeleton-based full Java parallel programming environment, as the case study.

    @article{lithium:sem:CLSS,
      abstract = {Semantics of skeleton-based parallel programming languages comes usually as two distinct items: a functional semantics, modeling the function computed by the skeleton program, and a parallel semantics describing the ways used to exploit parallelism during the execution of the skeleton program. The former is usually expressed using some kind of semantic formalism, while the latter is almost always given in an informal way. Such a separation of functional and parallel semantics seriously impairs the possibility of programmers to use the semantic tools to prove properties of programs. In this work, we show how a formal semantic framework can be set up that handles both functional and parallel aspects of skeleton-based parallel programs. The framework is based on a labeled transition system. We show how different properties related to skeleton programs can be proved using such a system. We use Lithium, a skeleton-based full Java parallel programming environment, as the case study.},
      annote = {ISSN: 1477-8424},
      author = {Marco Aldinucci and Marco Danelutto},
      date-modified = {2014-08-24 22:17:22 +0000},
      doi = {10.1016/j.cl.2006.07.004},
      journal = {Computer Languages, Systems and Structures},
      month = oct,
      number = {3-4},
      pages = {179-192},
      title = {Skeleton based parallel programming: functional and parallel semantic in a single shot},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_semantics_CLSS.pdf},
      volume = {33},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_semantics_CLSS.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/j.cl.2006.07.004}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Adding metadata to Orc to support reasoning about grid programming,” in Towards Next Generation Grids (Proc. of the CoreGRID Symposium 2007), Rennes, France, 2007, pp. 205-214. doi:10.1007/978-0-387-72498-0_19
    [BibTeX] [Abstract] [Download PDF]

    Following earlier work demonstrating the utility of Orc as a means of specifying and reasoning about grid applications we propose the enhancement of such specifications with metadata that provide a means to extend an Orc specification with implementation oriented information. We argue that such specifications provide a useful refinement step in allowing reasoning about implementation related issues ahead of actual implementation or even prototyping. As examples, we demonstrate how such extended specifications can be used for investigating security related issues and for evaluating the cost of handling grid resource faults. The approach emphasises a semi-formal style of reasoning that makes maximum use of programmer domain knowledge and experience.

    @inproceedings{orc:metadata:cgs:07,
      abstract = {Following earlier work demonstrating the utility of Orc as a means of specifying and reasoning about grid applications we propose the enhancement of such specifications with metadata that provide a means to extend an Orc specification with implementation oriented information. We argue that such specifications provide a useful refinement step in allowing reasoning about implementation related issues ahead of actual implementation or even prototyping. As examples, we demonstrate how such extended specifications can be used for investigating security related issues and for evaluating the cost of handling grid resource faults. The approach emphasises a semi-formal style of reasoning that makes maximum use of programmer domain knowledge and experience.},
      address = {Rennes, France},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Towards Next Generation Grids (Proc. of the CoreGRID Symposium 2007)},
      date-added = {2007-06-26 01:55:01 +0200},
      date-modified = {2009-02-04 18:57:20 +0100},
      doi = {10.1007/978-0-387-72498-0_19},
      editor = {Thierry Priol and Marco Vanneschi},
      isbn = {978-0-387-72497-3},
      month = sep,
      pages = {205-214},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Adding metadata to Orc to support reasoning about grid programming},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_orc_CGSymph.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_orc_CGSymph.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-72498-0_19}
    }

  • M. Aldinucci, M. Danelutto, and P. Kilpatrick, “Management in distributed systems: a semi-formal approach,” in Proc. of 13th Intl. Euro-Par 2007 Parallel Processing, Rennes, France, 2007, pp. 651-661. doi:10.1007/978-3-540-74466-5
    [BibTeX] [Abstract] [Download PDF]

    The reverse engineering of a skeleton based programming environment and redesign to distribute management activities of the system and thereby remove a potential single point of failure is considered. The Orc notation is used to facilitate abstraction of the design and analysis of its properties. It is argued that Orc is particularly suited to this role as this type of management is essentially an orchestration activity. The Orc specification of the original version of the system is modified via a series of semi-formally justified derivation steps to obtain a specification of the decentralized management version which is then used as a basis for its implementation. Analysis of the two specifications allows qualitative prediction of the expected performance of the derived version with respect to the original, and this prediction is borne out in practice.

    @inproceedings{orc:europar:07,
      abstract = {The reverse engineering of a skeleton based programming environment and redesign to distribute management activities of the system and thereby remove a potential single point of failure is considered. The Orc notation is used to facilitate abstraction of the design and analysis of its properties. It is argued that Orc is particularly suited to this role as this type of management is essentially an orchestration activity. The Orc specification of the original version of the system is modified via a series of semi-formally justified derivation steps to obtain a specification of the decentralized management version which is then used as a basis for its implementation. Analysis of the two specifications allows qualitative prediction of the expected performance of the derived version with respect to the original, and this prediction is borne out in practice.},
      address = {Rennes, France},
      author = {Marco Aldinucci and Marco Danelutto and Peter Kilpatrick},
      booktitle = {Proc. of 13th Intl. Euro-Par 2007 Parallel Processing},
      date-added = {2009-05-01 23:33:34 +0200},
      date-modified = {2009-05-01 23:33:34 +0200},
      doi = {10.1007/978-3-540-74466-5},
      editor = {A.-M. Kermarrec and L. Boug{\'e} and T. Priol},
      isbn = {978-3-540-74465-8},
      month = aug,
      pages = {651-661},
      publisher = {Springer},
      series = {LNCS},
      title = {Management in distributed systems: a semi-formal approach},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_orc_europar.pdf},
      volume = {4641},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_orc_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-540-74466-5}
    }

  • M. Aldinucci, S. Campa, M. Danelutto, P. Dazzi, P. Kilpatrick, D. Laforenza, and N. Tonellotto, “Behavioural skeletons for component autonomic management on grids,” in CoreGRID Workshop on Grid Programming Model, Grid and P2P Systems Architecture, Grid Systems, Tools and Environments, Heraklion, Crete, Greece, 2007.
    [BibTeX] [Abstract] [Download PDF]

    We present behavioural skeletons for the CoreGRID Component Model, which are an abstraction aimed at simplifying the development of GCM-based self-management applications. Behavioural skeletons abstract component self-managent in component-based design as design patterns abstract class design in classic OO development. As here we just wish to introduce the behavioural skeleton framework, emphasis is placed on general skeleton structure rather than on their autonomic management policies.

    @inproceedings{beske:cg:heraklion:07,
      abstract = {We present behavioural skeletons for the CoreGRID Component Model, which are an abstraction aimed at simplifying the development of GCM-based self-management applications. Behavioural skeletons abstract component self-managent in component-based design as design patterns abstract class design in classic OO development. As here we just wish to introduce the behavioural skeleton framework, emphasis is placed on general skeleton structure rather than on their autonomic management policies.},
      address = {Heraklion, Crete, Greece},
      author = {Marco Aldinucci and Sonia Campa and Marco Danelutto and Patrizio Dazzi and Peter Kilpatrick and Domenico Laforenza and Nicola Tonellotto},
      booktitle = {CoreGRID Workshop on Grid Programming Model, Grid and P2P Systems Architecture, Grid Systems, Tools and Environments},
      date-added = {2007-06-26 01:50:37 +0200},
      date-modified = {2007-12-16 23:32:27 +0100},
      month = {jun},
      title = {Behavioural skeletons for component autonomic management on grids},
      url = {http://compass2.di.unipi.it/TR/Files/TR-07-12.pdf.gz},
      year = {2007},
      bdsk-url-1 = {http://compass2.di.unipi.it/TR/Files/TR-07-12.pdf.gz}
    }

  • M. Aldinucci and P. Zuccato, “Virtual clusters with no single point of failure,” in Intl. Supercomputing Conference (ISC2007), Poster session, Dresden, Germany, 2007.
    [BibTeX] [Abstract] [Download PDF]

    VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of virtualized clusters with no single point of failure. VirtuaLinux architecture supports diskless configurations and provides an efficient, iSCSI based abstraction of the SAN. Clusters running VirtuaLinux exhibit no master node, thus boosting resilience and flexibility.

    @inproceedings{virtualinux:poster:ics:07,
      abstract = {VirtuaLinux is a Linux meta-distribution that allows the creation, deployment and administration of virtualized clusters with no single point of failure. VirtuaLinux architecture supports diskless configurations and provides an efficient, iSCSI based abstraction of the SAN. Clusters running VirtuaLinux exhibit no master node, thus boosting resilience and flexibility.},
      address = {Dresden, Germany},
      author = {Marco Aldinucci and Pierfrancesco Zuccato},
      booktitle = {Intl. Supercomputing Conference (ISC2007), Poster session},
      date-added = {2007-06-26 01:37:15 +0200},
      date-modified = {2007-11-03 14:28:15 +0100},
      month = jun,
      title = {Virtual clusters with no single point of failure},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_ICS_VirtuaLinux.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_ICS_VirtuaLinux.pdf}
    }

  • M. Aldinucci and M. Danelutto, “The cost of security in skeletal systems,” in Proc. of Intl. Euromicro PDP 2007: Parallel Distributed and network-based Processing, Napoli, Italia, 2007, pp. 213-220. doi:10.1109/PDP.2007.79
    [BibTeX] [Abstract] [Download PDF]

    Skeletal systems exploit algorithmical skeletons technology to provide the user very high level, efficient parallel programming environments. They have been recently demonstrated to be suitable for highly distributed architectures, such as workstation clusters, networks and grids. However, when using skeletal system for grid programming care must be taken to secure data and code transfers across non-dedicated, non-secure network links. In this work we take into account the cost of security introduction in muskel, a Java based skeletal system exploiting macro data flow implementation technology. We consider the adoption of mechanisms that allow securing all the communications taking place between remote, unreliable nodes and we evaluate the cost of such mechanisms. In particular, we consider the implications on the computational grains needed to scale secure and insecure skeletal computations.

    @inproceedings{security:euromicro:07,
      abstract = {Skeletal systems exploit algorithmical skeletons technology to provide the user very high level, efficient parallel programming environments. They have been recently demonstrated to be suitable for highly distributed architectures, such as workstation clusters, networks and grids. However, when using skeletal system for grid programming care must be taken to secure data and code transfers across non-dedicated, non-secure network links. In this work we take into account the cost of security introduction in muskel, a Java based skeletal system exploiting macro data flow implementation technology. We consider the adoption of mechanisms that allow securing all the communications taking place between remote, unreliable nodes and we evaluate the cost of such mechanisms. In particular, we consider the implications on the computational grains needed to scale secure and insecure skeletal computations.},
      address = {Napoli, Italia},
      author = {Marco Aldinucci and Marco Danelutto},
      booktitle = {Proc. of Intl. Euromicro PDP 2007: Parallel Distributed and network-based Processing},
      date-added = {2007-03-08 15:44:26 +0100},
      date-modified = {2008-02-18 12:49:23 +0100},
      doi = {10.1109/PDP.2007.79},
      editor = {Pasqua D'Ambra and Mario Rosario Guarracino},
      month = feb,
      pages = {213-220},
      publisher = {IEEE},
      title = {The cost of security in skeletal systems},
      url = {http://calvados.di.unipi.it/storage/paper_files/2007_security_PDP.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2007_security_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2007.79}
    }

  • M. Pasin, P. Kuonen, M. Danelutto, and M. Aldinucci, “Skeleton Parallel Programming and Parallel Objects,” in Integrated Research in Grid Computing, S. Gorlatch and M. Danelutto, Eds., Springer, 2007, pp. 59-71. doi:10.1007/978-0-387-47658-2_5
    [BibTeX] [Abstract] [Download PDF]

    This paper describes the ongoing work aimed at integrating the POP-C++ parallel object programming environment with the ASSIST component based parallel programming environment. Both these programming environments are shortly outlined, then several possibilities of integration are considered. For each one of these integration opportunities, the advantages and synergies that can be possibly achieved are outlined and discussed. The text explains how GEA, the ASSIST deployer can be considered as the basis for the integration of such different systems. An architecture is proposed, extending the existing tools to work together. The current status of integration of the two environments is discussed, along with the expected results and fallouts on the two programming environments.

    @incollection{pasin:IW_book:06,
      abstract = {This paper describes the ongoing work aimed at integrating the POP-C++ parallel object programming environment with the ASSIST component based parallel programming environment. Both these programming environments are shortly outlined, then several possibilities of integration are considered. For each one of these integration opportunities, the advantages and synergies that can be possibly achieved are outlined and discussed.
    The text explains how GEA, the ASSIST deployer can be considered as the basis for the integration of such different systems. An architecture is proposed, extending the existing tools to work together. The current status of integration of the two environments is discussed, along with the expected results and fallouts on the two programming environments.},
      annote = {ISBN: 978-0-387-47656-8},
      author = {Marcelo Pasin and Pierre Kuonen and Marco Danelutto and Marco Aldinucci},
      booktitle = {Integrated Research in Grid Computing},
      date-modified = {2009-02-01 17:51:38 +0100},
      doi = {10.1007/978-0-387-47658-2_5},
      editor = {Sergei Gorlatch and Marco Danelutto},
      isbn = {978-0-387-47656-8},
      owner = {aldinuc},
      pages = {59-71},
      publisher = {Springer},
      series = {CoreGRID},
      timestamp = {2006.06.28},
      title = {Skeleton Parallel Programming and Parallel Objects},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_popc.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_popc.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-47658-2_5}
    }

  • M. Aldinucci, S. Campa, M. Coppola, M. Danelutto, C. Zoccolo, F. André, and J. Buisson, “An abstract schema modeling adaptivity management,” in Integrated Research in Grid Computing, S. Gorlatch and M. Danelutto, Eds., Springer, 2007, pp. 89-102. doi:10.1007/978-0-387-47658-2_7
    [BibTeX] [Abstract] [Download PDF]

    Nowadays, component application adaptivity in Grid environments has been afforded in different ways, such those provided by the Dynaco/AFPAC framework and by the ASSIST environment. We propose an abstract schema that catches all the designing aspects a model for parallel component applications on Grid should define in order to uniformly handle the dynamic behavior of computing resources within complex parallel applications. The abstraction is validated by demonstrating how two different approaches to adaptivity, ASSIST and Dynaco/AFPAC, easily map to such schema.

    @incollection{adapt_rennes:IW_book:06,
      abstract = {Nowadays, component application adaptivity in Grid environments has been afforded in different ways, such those provided by the Dynaco/AFPAC framework and by the ASSIST environment. We propose an abstract schema that catches all the designing aspects a model for parallel component applications on Grid should define in order to uniformly handle the dynamic behavior of computing resources within complex parallel applications. The abstraction is validated by demonstrating how two different approaches to adaptivity, ASSIST and Dynaco/AFPAC, easily map to such schema.},
      annote = {ISBN: 0-387-47656-3},
      author = {Marco Aldinucci and Sonia Campa and Massimo Coppola and Marco Danelutto and Corrado Zoccolo and Francoise Andr{\'e} and J{\'e}r{\'e}my Buisson},
      booktitle = {Integrated Research in Grid Computing},
      date-modified = {2012-03-18 00:36:49 +0000},
      doi = {10.1007/978-0-387-47658-2_7},
      editor = {Sergei Gorlatch and Marco Danelutto},
      isbn = {978-0-387-47656-8},
      owner = {aldinuc},
      pages = {89-102},
      publisher = {Springer},
      series = {CoreGRID},
      timestamp = {2006.06.28},
      title = {An abstract schema modeling adaptivity management},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_adapt.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_adapt.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-47658-2_7}
    }

  • J. Dünnweber, S. Gorlatch, S. Campa, M. Aldinucci, and M. Danelutto, “Adaptable Parallel Components for Grid Programming,” in Integrated Research in Grid Computing, S. Gorlatch and M. Danelutto, Eds., Springer, 2007, pp. 43-57. doi:10.1007/978-0-387-47658-2_4
    [BibTeX] [Abstract] [Download PDF]

    We suggest that parallel software components used for grid computing should be adaptable to application-specific requirements, instead of developing new components from scratch for each particular application. As an example, we take a parallel farm component which is "embarrassingly parallel", i. e. , free of dependencies, and adapt it to the wavefront processing pattern with dependencies that impact its behavior. We describe our approach in the context of Higher-Order Components (HOCs), with the Java-based system Lithium as our implementation framework. The adaptation process relies on HOCs’ mobile code parameters that are shipped over the network of the grid. We describe our implementation of the proposed component adaptation method and report first experimental results for a particular grid application — the alignment of DNA sequence pairs, a popular, time-critical problem in computational molecular biology.

    @incollection{codeadapt:IW_book:06,
      abstract = {We suggest that parallel software components used for grid computing should be adaptable to application-specific requirements, instead of developing new components from scratch for each particular application. As an example, we take a parallel farm component which is "embarrassingly parallel", i. e. , free of dependencies, and adapt it to the wavefront processing pattern with dependencies that impact its behavior. We describe our approach in the context of Higher-Order Components (HOCs), with the Java-based system Lithium as our implementation framework. The adaptation process relies on HOCs' mobile code parameters that are shipped over the network of the grid. We describe our implementation of the proposed component adaptation method and report first experimental results for a particular grid application -- the alignment of DNA sequence pairs, a popular, time-critical problem in computational molecular biology.},
      author = {Jan D{\"u}nnweber and Sergei Gorlatch and Sonia Campa and Marco Aldinucci and Marco Danelutto},
      booktitle = {Integrated Research in Grid Computing},
      date-modified = {2009-02-01 17:56:57 +0100},
      doi = {10.1007/978-0-387-47658-2_4},
      editor = {Sergei Gorlatch and Marco Danelutto},
      isbn = {978-0-387-47656-8},
      pages = {43-57},
      publisher = {Springer},
      series = {CoreGRID},
      timestamp = {2006.06.28},
      title = {Adaptable Parallel Components for Grid Programming},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_muester.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_muester.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-47658-2_4}
    }

  • M. Aldinucci and A. Benoit, “Towards the Automatic Mapping of ASSIST Applications for the Grid,” in Integrated Research in Grid Computing, S. Gorlatch and M. Danelutto, Eds., Springer, 2007, pp. 73-87. doi:10.1007/978-0-387-47658-2_6
    [BibTeX] [Abstract] [Download PDF]

    One of the most promising technical innovations in present-day computing is the invention of grid technologies which harness the computational power of widely distributed collections of computers. However, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target in this work applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.

    @incollection{assist:pepa:IW_book:06,
      abstract = {One of the most promising technical innovations in present-day computing is the invention of grid technologies which harness the computational power of widely distributed collections of computers. However, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target in this work applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.},
      author = {Marco Aldinucci and Anne Benoit},
      booktitle = {Integrated Research in Grid Computing},
      date-modified = {2009-02-01 17:26:53 +0100},
      doi = {10.1007/978-0-387-47658-2_6},
      editor = {Sergei Gorlatch and Marco Danelutto},
      isbn = {978-0-387-47656-8},
      owner = {aldinuc},
      pages = {73-87},
      publisher = {Springer},
      series = {CoreGRID},
      timestamp = {2006.06.28},
      title = {Towards the Automatic Mapping of {ASSIST} Applications for the Grid},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_pepa.pdf},
      year = {2007},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_book_pepa.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-47658-2_6}
    }

2006

  • M. Aldinucci, M. Danelutto, A. Paternesi, R. Ravazzolo, and M. Vanneschi, “Building interoperable grid-aware ASSIST applications via WebServices,” in Parallel Computing: Current & Future Issues of High-End Computing (Proc. of PARCO 2005, Malaga, Spain), Germany, 2006, pp. 145-152.
    [BibTeX] [Abstract] [Download PDF]

    The ASSIST environment provides a high-level programming toolkit for the grid. ASSIST applications are described by means of a coordination language, which can express arbitrary graphs of modules. These modules (or a graph of them) may be enclosed in components specifically designed for the grid (GRID.it components). In this paper we describe how ASSIST modules can be wired through standard Web Services, and how GRID.it components may be made available as standard Web Services.

    @inproceedings{assist:webs:parco:05,
      abstract = {The ASSIST environment provides a high-level programming toolkit for the grid. ASSIST applications are described by means of a coordination language, which can express arbitrary graphs of modules. These modules (or a graph of them) may be enclosed in components specifically designed for the grid (GRID.it components). In this paper we describe how ASSIST modules can be wired through standard Web Services, and how GRID.it components may be made available as standard Web Services.},
      address = {Germany},
      author = {Marco Aldinucci and Marco Danelutto and Andrea Paternesi and Roberto Ravazzolo and Marco Vanneschi},
      booktitle = {Parallel Computing: Current \& Future Issues of High-End Computing (Proc. of {PARCO 2005}, Malaga, Spain)},
      date-modified = {2012-11-18 17:06:42 +0000},
      editor = {G. R. Joubert and W. E. Nagel and F. J. Peters and O. Plata and P. Tirado and E. Zapata},
      isbn = {3000173528},
      month = dec,
      pages = {145-152},
      publisher = {John von Neumann Institute for Computing},
      series = {NIC},
      title = {Building interoperable grid-aware {ASSIST} applications via {WebServices}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_ws_parco.pdf},
      volume = {33},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_ws_parco.pdf}
    }

  • M. Aldinucci, M. Danelutto, G. Giaccherini, M. Torquati, and M. Vanneschi, “Towards a distributed scalable data service for the grid,” in Parallel Computing: Current & Future Issues of High-End Computing (Proc. of PARCO 2005, Malaga, Spain), Germany, 2006, pp. 73-80.
    [BibTeX] [Abstract] [Download PDF]

    ADHOC (Adaptive Distributed Herd of Object Caches) is a Grid-enabled, fast, scalable object repository providing programmers with a general storage module. We present three different software tools based on ADHOC: A parallel cache for Apache, a DSM, and a main memory parallel file system. We also show that these tools exhibit a considerable performance and speedup both in absolute figures and w.r.t. other software tools exploiting the same features.

    @inproceedings{adhoc:parco:05,
      abstract = {ADHOC (Adaptive Distributed Herd of Object Caches) is a Grid-enabled, fast, scalable object repository providing programmers with a general storage module. We present three different software tools based on ADHOC: A parallel cache for Apache, a DSM, and a main memory parallel file system. We also show that these tools exhibit a considerable performance and speedup both in absolute figures and w.r.t. other software tools exploiting the same features.},
      address = {Germany},
      author = {Marco Aldinucci and Marco Danelutto and Gianni Giaccherini and Massimo Torquati and Marco Vanneschi},
      booktitle = {Parallel Computing: Current \& Future Issues of High-End Computing (Proc. of {PARCO 2005}, Malaga, Spain)},
      date-modified = {2012-11-18 17:07:26 +0000},
      editor = {G. R. Joubert and W. E. Nagel and F. J. Peters and O. Plata and P. Tirado and E. Zapata},
      month = dec,
      optannote = {ISBN: 3-00-017352-8},
      pages = {73-80},
      publisher = {John von Neumann Institute for Computing},
      series = {NIC},
      title = {Towards a distributed scalable data service for the grid},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_adhoc_parco.pdf},
      volume = {33},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_adhoc_parco.pdf}
    }

  • M. Aldinucci, F. André, J. Buisson, S. Campa, M. Coppola, M. Danelutto, and C. Zoccolo, “Parallel program/component adaptivity management,” in Parallel Computing: Current & Future Issues of High-End Computing (Proc. of PARCO 2005, Malaga, Spain), Germany, 2006, pp. 89-96.
    [BibTeX] [Abstract] [Download PDF]

    Grid computing platforms require to handle dynamic behaviour of computing resources within complex parallel applications. We introduce a formalization of adaptive behaviour that separates the abstract model of the application from the implementation design. We exemplify the abstract adaptation schema on two applications, and we show how two quite different approaches to adaptivity, the ASSIST environment and the AFPAC framework, easily map to this common schema.

    @inproceedings{adaptivity:parco:05,
      abstract = {Grid computing platforms require to handle dynamic behaviour of computing resources within complex parallel applications. We introduce a formalization of adaptive behaviour that separates the abstract model of the application from the implementation design. We exemplify the abstract adaptation schema on two applications, and we show how two quite different approaches to adaptivity, the ASSIST environment and the AFPAC framework, easily map to this common schema.},
      address = {Germany},
      author = {Marco Aldinucci and Francoise Andr{\'e} and J{\'e}r{\'e}my Buisson and Sonia Campa and Massimo Coppola and Marco Danelutto and Corrado Zoccolo},
      booktitle = {Parallel Computing: Current \& Future Issues of High-End Computing (Proc. of {PARCO 2005}, Malaga, Spain)},
      date-modified = {2012-11-18 17:08:30 +0000},
      editor = {G. R. Joubert and W. E. Nagel and F. J. Peters and O. Plata and P. Tirado and E. Zapata},
      month = dec,
      optannote = {ISBN: 3-00-017352-8},
      pages = {89-96},
      publisher = {John von Neumann Institute for Computing},
      series = {NIC},
      title = {Parallel program/component adaptivity management},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_adaptivity_parco.pdf},
      volume = {33},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_adaptivity_parco.pdf}
    }

  • M. Aldinucci, G. Antoniu, M. Danelutto, and M. Jan, “Fault-Tolerant Data Sharing for High-level Grid Programming: A Hierarchical Storage Architecture,” in Proc. of the Integrated Research in Grid Computing Workshop, Kraków, Poland, 2006, pp. 177-188.
    [BibTeX] [Abstract] [Download PDF]

    Enabling high-level programming models on grids is today a major challenge. A way to achieve this goal relies on the use of environments able to transparently and automatically provide adequate support for low-level, grid-specific issues (fault-tolerance, scalability, etc.). This paper discusses the above approach when applied to grid data management. As a case study, we propose a 2-tier software architecture that supports transparent, fault-tolerant, grid-level data sharing in the ASSIST programming environment (University of Pisa), based on the JuxMem grid data sharing service (INRIA Rennes).

    @inproceedings{assist:juxmem:IW:06,
      abstract = {Enabling high-level programming models on grids is today a major challenge. A way to achieve this goal relies on the use of environments able to transparently and automatically provide adequate support for low-level, grid-specific issues (fault-tolerance, scalability, etc.). This paper discusses the above approach when applied to grid data management. As a case study, we propose a 2-tier software architecture that supports transparent, fault-tolerant, grid-level data sharing in the ASSIST programming environment (University of Pisa), based on the JuxMem grid data sharing service (INRIA Rennes).},
      address = {Krak{\'o}w, Poland},
      author = {Marco Aldinucci and Gabriel Antoniu and Marco Danelutto and Mathieu Jan},
      booktitle = {Proc. of the Integrated Research in Grid Computing Workshop},
      date-modified = {2012-11-18 17:23:11 +0000},
      editor = {Marian Bubak and Sergei Gorlatch and Thierry Priol},
      keywords = {Duplicate},
      month = oct,
      optannote = {ISBN: 83-9115141-6-1},
      pages = {177-188},
      publisher = {Academic Computing Centre {CYFRONET AGH}},
      series = {CoreGRID},
      title = {Fault-Tolerant Data Sharing for High-level Grid Programming: A Hierarchical Storage Architecture},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_juxadhocmem.pdf},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_juxadhocmem.pdf}
    }

  • M. Aldinucci, C. Bertolli, S. Campa, M. Coppola, M. Vanneschi, L. Veraldi, and C. Zoccolo, “Self-configuring and self-optimizing grid components in the GCM model and their ASSIST implementation,” in Proc. of. HPC-GECO/Compframe (held in conjunction with HPDC-15), Paris, France, 2006, pp. 45-52.
    [BibTeX] [Abstract] [Download PDF]

    We present the concept of autonomic super-component as a building block for Grid-aware applications. Super-components are parametric, higher-order components exhibiting a well-known parallel behaviour. The proposal of a super-component feature is part of the experience we gained in the implementation of the ASSIST environment, which allows the development of self-configuring and optimising component-based applications following a structured and hierarchical approach. We discuss how such approach to Grid programming influenced the design of the Grid Component Model (GCM).

    @inproceedings{selfadapt:hpcgeco:06,
      abstract = {We present the concept of autonomic super-component as a building block for Grid-aware applications. Super-components are parametric, higher-order components exhibiting a well-known parallel behaviour. The proposal of a super-component feature is part of the experience we gained in the implementation of the ASSIST environment, which allows the development of self-configuring and optimising component-based applications following a structured and hierarchical approach. We discuss how such approach to Grid programming influenced the design of the Grid Component Model (GCM).},
      address = {Paris, France},
      author = {Marco Aldinucci and Carlo Bertolli and Sonia Campa and Massimo Coppola and Marco Vanneschi and Luca Veraldi and Corrado Zoccolo},
      booktitle = {Proc. of. HPC-GECO/Compframe (held in conjunction with HPDC-15)},
      date-modified = {2014-08-25 15:06:03 +0000},
      month = jun,
      owner = {aldinuc},
      pages = {45-52},
      publisher = {IEEE},
      timestamp = {2006.06.28},
      title = {Self-configuring and self-optimizing grid components in the {GCM} model and their {ASSIST} implementation},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_self_HPC-GECO.pdf},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_self_HPC-GECO.pdf}
    }

  • M. Aldinucci, M. Danelutto, and M. Vanneschi, “Autonomic QoS in ASSIST Grid-aware components,” in Proc. of Intl. Euromicro PDP 2006: Parallel Distributed and network-based Processing, Montbéliard, France, 2006, pp. 221-230. doi:10.1109/PDP.2006.25
    [BibTeX] [Abstract] [Download PDF]

    Current Grid-aware applications are developed on existing software infrastructures, such as Globus, by developers who are experts on Grid software implementation. Although many useful applications have been produced this way, this approach may hardly support the additional complexity to Quality of Service (QoS) control in real application. We describe the ASSIST programming environment, the prototype of parallel programming environment currently under development at our group, as a suitable basis to capture all the desired features for QoS control for the Grid. Grid applications, built as compositions of ASSIST components, are supported by an innovative Grid Abstract Machine, which includes essential abstractions of standard middleware services and a hierarchical Application Manager, which may be considered as an early prototype of Autonomic Manager.

    @inproceedings{assist:qos:euromicro:06,
      abstract = {Current Grid-aware applications are developed on existing software infrastructures, such as Globus, by developers who are experts on Grid software implementation. Although many useful applications have been produced this way, this approach may hardly support the additional complexity to Quality of Service (QoS) control in real application. We describe the ASSIST programming environment, the prototype of parallel programming environment currently under development at our group, as a suitable basis to capture all the desired features for QoS control for the Grid. Grid applications, built as compositions of ASSIST components, are supported by an innovative Grid Abstract Machine, which includes essential abstractions of standard middleware services and a hierarchical Application Manager, which may be considered as an early prototype of Autonomic Manager.},
      address = {Montb{\'e}liard, France},
      author = {Marco Aldinucci and Marco Danelutto and Marco Vanneschi},
      booktitle = {Proc. of Intl. Euromicro PDP 2006: Parallel Distributed and network-based Processing},
      date-modified = {2012-11-18 16:14:35 +0000},
      doi = {10.1109/PDP.2006.25},
      month = feb,
      pages = {221-230},
      publisher = {IEEE},
      title = {Autonomic {QoS} in {ASSIST} Grid-aware components},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_QoS_PDP.pdf},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_QoS_PDP.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1109/PDP.2006.25}
    }

  • M. Aldinucci, M. Coppola, M. Danelutto, M. Vanneschi, and C. Zoccolo, “ASSIST as a research framework for high-performance Grid programming environments,” in Grid Computing: Software environments and Tools, J. C. Cunha and O. F. Rana, Eds., Springer, 2006, pp. 230-256. doi:10.1007/1-84628-339-6_10
    [BibTeX] [Abstract] [Download PDF]

    ASSIST is a programming environment supporting the development of parallel and distributed high-performance applications on a wide range of target architectures including massively parallel clusters/networks of workstations and Grids. We discuss how ASSIST can act as a valid research vehicle to study, experiment and realize Grid-aware programming environments for high-performance applications. Special emphasis is put on the innovative methodologies, strategies and tools for dynamically adaptive applications that represent the necessary step for the success of Grid platforms. We start considering which are the fundamental features of Grid-aware programming environments, based upon structured parallel programming and components technology. Then we show how ASSIST evolved from its very first version, only targeting workstation clusters, to the current version, targeting Grids and solving many critical problems related to expressive power, flexibility, interoperability and efficiency. We also discuss how ASSIST deals with interoperability issues. Eventually we discuss how an ASSIST-based model for supporting dynamically adaptive applications can be derived.

    @incollection{assist:cunhabook:05,
      abstract = {ASSIST is a programming environment supporting the development of parallel and distributed high-performance applications on a wide range of target architectures including massively parallel clusters/networks of workstations and Grids. We discuss how ASSIST can act as a valid research vehicle to study, experiment and realize Grid-aware programming environments for high-performance applications. Special emphasis is put on the innovative methodologies, strategies and tools for dynamically adaptive applications that represent the necessary step for the success of Grid platforms.
    We start considering which are the fundamental features of Grid-aware programming environments, based upon structured parallel programming and components technology. Then we show how ASSIST evolved from its very first version, only targeting workstation clusters, to the current version, targeting Grids and solving many critical problems related to expressive power, flexibility, interoperability and efficiency. We also discuss how ASSIST deals with interoperability issues. Eventually we discuss how an ASSIST-based model for supporting dynamically adaptive applications can be derived.},
      author = {Marco Aldinucci and Massimo Coppola and Marco Danelutto and Marco Vanneschi and Corrado Zoccolo},
      booktitle = {Grid Computing: Software environments and Tools},
      chapter = {10},
      date-modified = {2014-06-22 10:12:07 +0000},
      doi = {10.1007/1-84628-339-6_10},
      editor = {J. C. Cunha and O. F. Rana},
      isbn = {978-1-85233-998-2},
      month = jan,
      pages = {230-256},
      publisher = {Springer},
      title = {{ASSIST} as a research framework for high-performance Grid programming environments},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_CuhnaBook.pdf},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_CuhnaBook.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/1-84628-339-6_10}
    }

  • M. Aldinucci and M. Danelutto, “Algorithmic skeletons meeting grids,” Parallel Computing, vol. 32, iss. 7, pp. 449-462, 2006. doi:10.1016/j.parco.2006.04.001
    [BibTeX] [Abstract] [Download PDF]

    In this work, we discuss an extension of the set of principles that should guide the future design and development of skeletal programming systems, as defined by Cole in his "pragmatic manifesto’" paper. The three further principles introduced are related to the ability to exploit existing sequential code as well as to the ability to target typical modern architectures, those made out of heterogeneous processing elements with dynamically varying availability, processing power and connectivity features such as grids or heterogeneous, non-dedicated clusters. We outline two skeleton based programming environments currently developed at our university and we discuss how these environments adhere to the proposed set of principles. Eventually, we outline how some other relevant, well-known skeleton environments conform to the same set of principles.

    @article{advske:pc:06,
      abstract = {In this work, we discuss an extension of the set of principles that should guide the future design and development of skeletal programming systems, as defined by Cole in his "pragmatic manifesto'" paper. The three further principles introduced are related to the ability to exploit existing sequential code as well as to the ability to target typical modern architectures, those made out of heterogeneous processing elements with dynamically varying availability, processing power and connectivity features such as grids or heterogeneous, non-dedicated clusters. We outline two skeleton based programming environments currently developed at our university and we discuss how these environments adhere to the proposed set of principles. Eventually, we outline how some other relevant, well-known skeleton environments conform to the same set of principles.},
      author = {Marco Aldinucci and Marco Danelutto},
      date-modified = {2008-02-07 03:38:19 +0100},
      doi = {10.1016/j.parco.2006.04.001},
      journal = {Parallel Computing},
      number = {7},
      pages = {449-462},
      title = {Algorithmic skeletons meeting grids},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_advske_PC.pdf},
      volume = {32},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_advske_PC.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/j.parco.2006.04.001}
    }

  • M. Aldinucci, M. Coppola, M. Danelutto, N. Tonellotto, M. Vanneschi, and C. Zoccolo, “High level grid programming with ASSIST,” Computational Methods in Science and Technology, vol. 12, iss. 1, pp. 21-32, 2006.
    [BibTeX] [Abstract] [Download PDF]

    The development of efficient Grid applications usually requires writing huge portions of code directly at the level of abstraction provided by the underlying Grid middleware. In this work we discuss an alternative approach, raising the level of abstraction used when programming Grid applications. Our approach requires programmers just to describe in a qualitative way the kind of parallelism they want to express. Then, compiler tools, loader tools and run time system take complete care of running the application on a Grid target architecture. This allows to move most of the cumbersome tasks related to Grid targeting and management from programmer responsibility to tools. This paper introduces the structured parallel programming environment ASSIST, whose design is aimed at raising the level of abstraction in Grid programming and discusses how it can support transparent Grid programming while implementing Grid adaptivity.

    @article{assist:CMST:06,
      abstract = {The development of efficient Grid applications usually requires writing huge portions of code directly at the level of abstraction provided by the underlying Grid middleware. In this work we discuss an alternative approach, raising the level of abstraction used when programming Grid applications. Our approach requires programmers just to describe in a qualitative way the kind of parallelism they want to express. Then, compiler tools, loader tools and run time system take complete care of running the application on a Grid target architecture. This allows to move most of the cumbersome tasks related to Grid targeting and management from programmer responsibility to tools. This paper introduces the structured parallel programming environment ASSIST, whose design is aimed at raising the level of abstraction in Grid programming and discusses how it can support transparent Grid programming while implementing Grid adaptivity.},
      annote = {ISSN: 1505-0602},
      author = {Marco Aldinucci and Massimo Coppola and Marco Danelutto and Nicola Tonellotto and Marco Vanneschi and Corrado Zoccolo},
      date-modified = {2012-08-14 15:26:55 +0000},
      journal = {Computational Methods in Science and Technology},
      number = {1},
      owner = {aldinuc},
      pages = {21-32},
      title = {High level grid programming with {ASSIST}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_assist_j_cmst.pdf},
      volume = {12},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_assist_j_cmst.pdf}
    }

  • M. Aldinucci, M. Coppola, S. Campa, M. Danelutto, M. Vanneschi, and C. Zoccolo, “Structured implementation of component based grid programming environments,” in Future Generation Grids, V. Getov, D. Laforenza, and A. Reinefeld, Eds., Springer, 2006, pp. 217-239. doi:10.1007/978-0-387-29445-2_12
    [BibTeX] [Abstract] [Download PDF]

    The design,implementation and deployment of efficient high performance applications on Grids is usually a quite hard task, even in the case that modern and efficient grid middleware systems are used. We claim that most of the difficulties involved in such process can be moved away from programmer responsibility by following a structured programming model approach. The proposed approach relies on the development of a layered, component based execution environment. Each layer deals with distinct features and problems related to the implementation of GRID applications, exploiting the more appropriate techniques. Static optimizations are introduced in the compile layer, dynamic optimization are introduced in the run time layer, whereas modern grid middleware features are simply exploited using standard middleware systems as the final target architecture. We first discuss the general idea, then we discuss the peculiarities of the approach and eventually we discuss the preliminary results achieved in the GRID.it project, where a prototype high performance, component based, GRID programming environment is being developed using this approach.

    @incollection{assist:dagstuhl:05,
      abstract = {The design,implementation and deployment of efficient high performance applications on Grids is usually a quite hard task, even in the case that modern and efficient grid middleware systems are used. We claim that most of the difficulties involved in such process can be moved away from programmer responsibility by following a structured programming model approach. The proposed approach relies on the development of a layered, component based execution environment. Each layer deals with distinct features and problems related to the implementation of GRID applications, exploiting the more appropriate techniques. Static optimizations are introduced in the compile layer, dynamic optimization are introduced in the run time layer, whereas modern grid middleware features are simply exploited using standard middleware systems as the final target architecture. We first discuss the general idea, then we discuss the peculiarities of the approach and eventually we discuss the preliminary results achieved in the GRID.it project, where a prototype high performance, component based, GRID programming environment is being developed using this approach.},
      author = {Marco Aldinucci and Massimo Coppola and Sonia Campa and Marco Danelutto and Marco Vanneschi and Corrado Zoccolo},
      booktitle = {Future Generation Grids},
      date-modified = {2012-11-24 09:27:00 +0000},
      doi = {10.1007/978-0-387-29445-2_12},
      editor = {Vladimir Getov and Domenico Laforenza and Alexander Reinefeld},
      isbn = {978-0-387-27935-0},
      pages = {217-239},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Structured implementation of component based grid programming environments},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_Dagstuhl.pdf},
      year = {2006},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_Dagstuhl.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-0-387-29445-2_12}
    }

2005

  • J. Dünnweber, S. Gorlatch, S. Campa, M. Aldinucci, and M. Danelutto, “Using Code Parameters for Component Adaptations,” in Proc. of the Integrated Research in Grid Computing Workshop, Pisa, Italy, 2005, pp. 49-57.
    [BibTeX] [Abstract] [Download PDF]

    Adaptation means that the behavior of a software component is adjusted to application or platform-specific requirements: new components required in a particular application do not need to be developed from scratch when available components can be adapted accordingly. Instead of introducing a new adaptation syntax (as it is done, e. g. , in AOP), we describe adaptations in the context of Java-based Higher-Order Components (HOCs). HOCs incorporate a code parameter plugin mechanism enabling adaptations on the grid. Our approach is illustrated using a case study of sequence alignment. We show how a HOC with the required provisions for data dependencies in this application can be generated by adapting a farm component, which is "embarrassingly parallel", i.e., free of data dependencies. This way, we could reuse the efficient farm implementation from the Lithium library, although our case study exhibits the wavefront pattern of parallelism which is different from the farm.

    @inproceedings{codeadapt:IW:05,
      abstract = {Adaptation means that the behavior of a software component is adjusted to application or platform-specific requirements: new components required in a particular application do not need to be developed from scratch when available components can be adapted accordingly. Instead of introducing a new adaptation syntax (as it is done, e. g. , in AOP), we describe adaptations in the context of Java-based Higher-Order Components (HOCs). HOCs incorporate a code parameter plugin mechanism enabling adaptations on the grid. Our approach is illustrated using a case study of sequence alignment. We show how a HOC with the required provisions for data dependencies in this application can be generated by adapting a farm component, which is "embarrassingly parallel", i.e., free of data dependencies. This way, we could reuse the efficient farm implementation from the Lithium library, although our case study exhibits the wavefront pattern of parallelism which is different from the farm.},
      address = {Pisa, Italy},
      author = {Jan D{\"u}nnweber and Sergei Gorlatch and Sonia Campa and Marco Aldinucci and Marco Danelutto},
      booktitle = {Proc. of the Integrated Research in Grid Computing Workshop},
      date-modified = {2009-02-03 20:12:52 +0100},
      editor = {Sergei Gorlatch and Marco Danelutto},
      month = {nov},
      owner = {aldinuc},
      pages = {49-57},
      publisher = {Universit{\`a} di Pisa, Dipartimento di Informatica},
      timestamp = {2006.06.28},
      title = {Using Code Parameters for Component Adaptations},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_muenster.pdf},
      volume = {TR-05-22},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_muenster.pdf}
    }

  • M. Pasin, P. Kuonen, M. Danelutto, and M. Aldinucci, “Skeleton Parallel Programming and Parallel Objects,” in Proc. of the Integrated Research in Grid Computing Workshop, Pisa, Italy, 2005, pp. 115-124.
    [BibTeX] [Abstract] [Download PDF]

    We describe here the ongoing work aimed at integrating the POP-C++ parallel object programming environment with the ASSIST component based parallel programming environment. Both these programming environments are shortly outlined, first. Then several possibilities of integration are considered. For each one of these integration opportunities, the advantages and synergies that can be possibly achieved are outlined and discussed. Eventually, the current status of integration of the two environments is discussed, along with the expected results and fallouts on the two programming environments.

    @inproceedings{pasin:IW:05,
      abstract = {We describe here the ongoing work aimed at integrating the POP-C++ parallel object programming environment with the ASSIST component based parallel programming environment. Both these programming environments are shortly outlined, first. Then several possibilities of integration are considered. For each one of these integration opportunities, the advantages and synergies that can be possibly achieved are outlined and discussed. Eventually, the current status of integration of the two environments is discussed, along with the expected results and fallouts on the two programming environments.},
      address = {Pisa, Italy},
      author = {Marcelo Pasin and Pierre Kuonen and Marco Danelutto and Marco Aldinucci},
      booktitle = {Proc. of the Integrated Research in Grid Computing Workshop},
      date-modified = {2009-02-03 20:28:52 +0100},
      editor = {Sergei Gorlatch and Marco Danelutto},
      month = nov,
      owner = {aldinuc},
      pages = {115-124},
      publisher = {Universit{\`a} di Pisa, Dipartimento di Informatica},
      timestamp = {2006.06.28},
      title = {Skeleton Parallel Programming and Parallel Objects},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_popc.pdf},
      volume = {TR-05-22},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_popc.pdf}
    }

  • M. Aldinucci and A. Benoit, “Towards the Automatic Mapping of ASSIST Applications for the Grid,” in Proc. of the Integrated Research in Grid Computing Workshop, Pisa, Italy, 2005, pp. 59-68.
    [BibTeX] [Abstract] [Download PDF]

    One of the most promising technical innovations in present day computing is the invention of grid technologies which harness the computational power of widely distributed collections of computers. However, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target in this work applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.

    @inproceedings{assist:pepa:IW:05,
      abstract = {One of the most promising technical innovations in present day computing is the invention of grid technologies which harness the computational power of widely distributed collections of computers. However, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target in this work applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively.},
      address = {Pisa, Italy},
      author = {Marco Aldinucci and Anne Benoit},
      booktitle = {Proc. of the Integrated Research in Grid Computing Workshop},
      editor = {Sergei Gorlatch and Marco Danelutto},
      month = nov,
      pages = {59-68},
      publisher = {Universit{\`a} di Pisa, Dipartimento di Informatica},
      title = {Towards the Automatic Mapping of {ASSIST} Applications for the Grid},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_pepa.pdf},
      volume = {TR-05-22},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_pepa.pdf}
    }

  • M. Aldinucci, F. André, J. Buisson, S. Campa, M. Coppola, M. Danelutto, and C. Zoccolo, “Parallel program/component adaptivity management,” in Proc. of the Integrated Research in Grid Computing Workshop, Pisa, Italy, 2005, pp. 95-104.
    [BibTeX] [Abstract] [Download PDF]

    Grid computing platforms require to handle dynamic be- haviour of computing resources within complex parallel applications. We introduce a formalization of adaptive behaviour that separates the ab- stract model of the application from the implementation design. We ex- emplify the abstract adaptation schema on two applications, and we show how two quite different approaches to adaptivity, the ASSIST environ- ment and the AFPAC framework, easily map to this common schema.

    @inproceedings{adaptivity:IW:05,
      abstract = {Grid computing platforms require to handle dynamic be- haviour of computing resources within complex parallel applications. We introduce a formalization of adaptive behaviour that separates the ab- stract model of the application from the implementation design. We ex- emplify the abstract adaptation schema on two applications, and we show how two quite different approaches to adaptivity, the ASSIST environ- ment and the AFPAC framework, easily map to this common schema.},
      address = {Pisa, Italy},
      author = {Marco Aldinucci and Francoise Andr{\'e} and J{\'e}r{\'e}my Buisson and Sonia Campa and Massimo Coppola and Marco Danelutto and Corrado Zoccolo},
      booktitle = {Proc. of the Integrated Research in Grid Computing Workshop},
      date-modified = {2012-11-18 17:04:16 +0000},
      editor = {Sergei Gorlatch and Marco Danelutto},
      keywords = {Duplicate},
      month = nov,
      pages = {95-104},
      publisher = {Universit{\`a} di Pisa, Dipartimento di Informatica},
      title = {Parallel program/component adaptivity management},
      url = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_adapt.pdf},
      volume = {TR-05-22},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2006_IW_adapt.pdf}
    }

  • M. Aldinucci, M. Danelutto, J. Dünnweber, and S. Gorlatch, “Optimization techniques for skeletons on grids,” in Grid Computing and New Frontiers of High Performance Processing, L. Grandinetti, Ed., Elsevier, 2005, vol. 14, pp. 255-273. doi:10.1016/S0927-5452(05)80014-0
    [BibTeX] [Abstract] [Download PDF]

    Skeletons are common patterns of parallelism, such as farm and pipeline, that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons on emerging computational grids, with the skeleton system Lithium, based on Java and RMI, as our reference programming syttem. Our main contribution is the exploration of optimization techniques for implementing skeletons on grids based on an optimized, future-based RMI mechanism, which we integrate into the macro-dataflow evaluation mechanism of Lithium. We discuss three optimizations: 1) a lookahead mechanism that allows to process multiple tasks concurrently at each grid server and thereby increases the overall degree of parallelism, 2) a lazy taskbinding technique that reduces interactions between grid servers and the task dispatcher, and 3) dynamic improvements that optimize the collecting of results and the work-load balancing. We report experimental results that demonstrate the improvements due to our optimizations on various testbeds, including a heterogeneous grid-like environment.

    @incollection{vigoni:fut_rmi:book:05,
      abstract = {Skeletons are common patterns of parallelism, such as farm and pipeline, that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons on emerging computational grids, with the skeleton system Lithium, based on Java and RMI, as our reference programming syttem. Our main contribution is the exploration of optimization techniques for implementing skeletons on grids based on an optimized, future-based RMI mechanism, which we integrate into the macro-dataflow evaluation mechanism of Lithium. We discuss three optimizations: 1) a lookahead mechanism that allows to process multiple tasks concurrently at each grid server and thereby increases the overall degree of parallelism, 2) a lazy taskbinding technique that reduces interactions between grid servers and the task dispatcher, and 3) dynamic improvements that optimize the collecting of results and the work-load balancing. We report experimental results that demonstrate the improvements due to our optimizations on various testbeds, including a heterogeneous grid-like environment.},
      author = {Marco Aldinucci and Marco Danelutto and Jan D{\"u}nnweber and Sergei Gorlatch},
      booktitle = {Grid Computing and New Frontiers of High Performance Processing},
      chapter = {2},
      date-modified = {2012-09-23 11:03:01 +0000},
      doi = {10.1016/S0927-5452(05)80014-0},
      editor = {L. Grandinetti},
      isbn = {0-444-51999-8},
      issn = {09275452},
      month = oct,
      pages = {255-273},
      publisher = {Elsevier},
      series = {Advances in Parallel Computing},
      title = {Optimization techniques for skeletons on grids},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_LithiumFutRMI_book.pdf},
      volume = {14},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_LithiumFutRMI_book.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/S0927-5452(05)80014-0}
    }

  • M. Aldinucci, M. Vanneschi, and M. Villa, “Grid technologies and c-business for SMEs,” in Innovation and the Knowledge Economy: Issues, Applications, Case Studies (Proc. of Intl. conference eChallenges 2005), Amsterdam, The Netherland, 2005.
    [BibTeX] [Abstract] [Download PDF]

    We describe the objectives of project SFIDA, aiming at developing a Grid-based interoperable platform able to support next generation applications specifically addressing the needs of SMEs. We sketch the architecture of the platform under development in SFIDA, which will support componentization (e-services), intelligence (mining), collaboration (c-business), and customer business-processes orientation concepts on top of ASSIST, a Grid-aware high-level programming environment. The SFIDA project outcomes will be validated on Supply Chain Management applications matching various typical industrial cases, spanning from automotive, textile, food, white goods, and media retail. Finally we show what business benefits it is expected to bring.

    @inproceedings{sfida:echallenges:05,
      abstract = {We describe the objectives of project SFIDA, aiming at developing a Grid-based interoperable platform able to support next generation applications specifically addressing the needs of SMEs. We sketch the architecture of the platform under development in SFIDA, which will support componentization (e-services), intelligence (mining), collaboration (c-business), and customer business-processes orientation concepts on top of ASSIST, a Grid-aware high-level programming environment. The SFIDA project outcomes will be validated on Supply Chain Management applications matching various typical industrial cases, spanning from automotive, textile, food, white goods, and media retail. Finally we show what business benefits it is expected to bring.},
      address = {Amsterdam, The Netherland},
      author = {Marco Aldinucci and Marco Vanneschi and Matteo Villa},
      booktitle = {Innovation and the Knowledge Economy: Issues, Applications, Case Studies (Proc. of Intl. conference eChallenges 2005)},
      date-modified = {2009-02-03 17:50:53 +0100},
      editor = {P. Cunningham and M. Cunningham},
      month = oct,
      optnote = {ISBN: 1-58603-563-0},
      publisher = {IOS press},
      series = {Information and Communication Technologies and the Knowledge Economy},
      title = {Grid technologies and c-business for {SME}s},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_SFIDA_echallenges.pdf},
      volume = {2},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_SFIDA_echallenges.pdf}
    }

  • M. Aldinucci, A. Petrocelli, E. Pistoletti, M. Torquati, M. Vanneschi, L. Veraldi, and C. Zoccolo, “Dynamic reconfiguration of grid-aware applications in ASSIST,” in Proc. of 11th Intl. Euro-Par 2005 Parallel Processing, 2005, pp. 771-781. doi:10.1007/11549468_84
    [BibTeX] [Abstract] [Download PDF]

    Current grid-aware applications are implemented on top of low-level libraries by developers who are experts on grid middleware architecture. This approach can hardly support the additional complexity of QoS control in real applications. We discuss a novel approach used in the ASSIST programming environment to implement/guarantee user provided QoS contracts in a transparent and effective way. Our approach is based on the implementation of automatic run-time reconfiguration of ASSIST application executions triggered by mismatch between the user provided QoS contract and the actual performance values achieved.

    @inproceedings{dyn:europar:05,
      abstract = {Current grid-aware applications are implemented on top of low-level libraries by developers who are experts on grid middleware architecture. This approach can hardly support the additional complexity of QoS control in real applications. We discuss a novel approach used in the ASSIST programming environment to implement/guarantee user provided QoS contracts in a transparent and effective way. Our approach is based on the implementation of automatic run-time reconfiguration of ASSIST application executions triggered by mismatch between the user provided QoS contract and the actual performance values achieved.},
      author = {Marco Aldinucci and Alessandro Petrocelli and Edoardo Pistoletti and Massimo Torquati and Marco Vanneschi and Luca Veraldi and Corrado Zoccolo},
      booktitle = {Proc. of 11th Intl. Euro-Par 2005 Parallel Processing},
      date-added = {2007-05-20 21:04:01 +0200},
      date-modified = {2009-01-23 00:16:41 +0100},
      doi = {10.1007/11549468_84},
      editor = {J. C. Cunha and P. D. Medeiros},
      month = aug,
      pages = {771-781},
      publisher = {Springer},
      series = {LNCS},
      title = {Dynamic reconfiguration of grid-aware applications in {ASSIST}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_dyn_europar.pdf},
      volume = {3648},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_dyn_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/11549468_84}
    }

  • M. Aldinucci and A. Benoit, “Automatic mapping of ASSIST applications using process algebra,” in Proc. of HLPP2005: Intl. Workshop on High-Level Parallel Programming, 2005. doi:10.1142/S0129626408003302
    [BibTeX] [Abstract] [Download PDF]

    One of the most promising technical innovations in present-day computing is the invention of grid technologies which harness the computational power of widely distributed collections of computers. However, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively. Our methodology is presented through an example of a classical Divide&Conquer algorithm, together with results which demonstrate the efficiency of this approach.

    @inproceedings{pepa_assist:hlpp:05,
      abstract = {One of the most promising technical innovations in present-day computing is the invention of grid technologies which harness the computational power of widely distributed collections of computers. However, the programming and optimisation burden of a low level approach to grid computing is clearly unacceptable for large scale, complex applications. The development of grid applications can be simplified by using high-level programming environments. In the present work, we address the problem of the mapping of a high-level grid application onto the computational resources. In order to optimise the mapping of the application, we propose to automatically generate performance models from the application using the process algebra PEPA. We target applications written with the high-level environment ASSIST, since the use of such a structured environment allows us to automate the study of the application more effectively. Our methodology is presented through an example of a classical Divide\&Conquer algorithm, together with results which demonstrate the efficiency of this approach.},
      author = {Marco Aldinucci and Anne Benoit},
      booktitle = {Proc. of HLPP2005: Intl. Workshop on High-Level Parallel Programming},
      date-modified = {2007-09-16 18:42:58 +0200},
      doi = {10.1142/S0129626408003302},
      month = jul,
      organization = {Warwick University, Coventry, UK},
      title = {Automatic mapping of {ASSIST} applications using process algebra},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_pepa_hlpp.pdf},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_pepa_hlpp.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626408003302}
    }

  • M. Aldinucci, S. Gusmeroli, M. Vanneschi, and M. Villa, “SFIDA: interoperability in innovative c-business models for SMEs through an enabling Grid platform,” in Pre-proc. of INTEROP-ESA: Intl. Conference on Interoperability on Enterprise Software and Applications, Geneva, Switzerland, 2005, pp. 547-557.
    [BibTeX] [Abstract] [Download PDF]

    This position paper describes the objectives of project "SFIDA" (co-funded by the Italian Government), aiming at developing a GRID-based inter-operability platform able to support next generation Supply Chain Management applications specifically addressing the needs of SMEs belonging to industrial districts and dynamic supply networks. Next generation SCM applications are intended in SFIDA to be based on componentization (e-services), intelligence (mining), collaboration (c-business) and customer business-processes orientation. The platform and the next generation SCM applications running on top of it will be tested in various typical industrial cases, spanning from automotive, textile, food, white goods and media retail.

    @inproceedings{sfida:interop:05,
      abstract = {This position paper describes the objectives of project "SFIDA" (co-funded by the Italian Government), aiming at developing a GRID-based inter-operability platform able to support next generation Supply Chain Management applications specifically addressing the needs of SMEs belonging to industrial districts and dynamic supply networks. Next generation SCM applications are intended in SFIDA to be based on componentization (e-services), intelligence (mining), collaboration (c-business) and customer business-processes orientation. The platform and the next generation SCM applications running on top of it will be tested in various typical industrial cases, spanning from automotive, textile, food, white goods and media retail.},
      address = {Geneva, Switzerland},
      author = {Marco Aldinucci and Sergio Gusmeroli and Marco Vanneschi and Matteo Villa},
      booktitle = {Pre-proc. of INTEROP-ESA: Intl. Conference on Interoperability on Enterprise Software and Applications},
      date-modified = {2007-09-16 18:43:26 +0200},
      month = feb,
      pages = {547-557},
      title = {{SFIDA}: interoperability in innovative c-business models for {SMEs} through an enabling Grid platform},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_SFIDA_InteropESA.pdf},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_SFIDA_InteropESA.pdf}
    }

  • M. Aldinucci, S. Campa, M. Coppola, M. Danelutto, D. Laforenza, D. Puppin, L. Scarponi, M. Vanneschi, and C. Zoccolo, “Components for high performance Grid programming in Grid.it,” in Proc. of the Intl. Workshop on Component Models and Systems for Grid Applications, Saint-Malo, France, 2005, pp. 19-38. doi:10.1007/0-387-23352-0_2
    [BibTeX] [Abstract] [Download PDF]

    This paper presents the main ideas of the high-performance component-based Grid programming environment of the Grid.it project. High-performance components are characterized by a programming model that integrates the concepts of structured parallelism, component interaction, compositionality, and adaptivity. We show that ASSIST, the prototype of parallel programming environment currently under development at our group, is a suitable basis to capture all the desired features of the component model in a flexible and efficient manner. For the sake of interoperability, ASSIST modules or programs are automatically encapsulated in standard frameworks; currently, we are experimenting Web Services and the CORBA Component Model. Grid applications, built as compositions of ASSIST components and possibly other existing (legacy) components, are supported by an innovative Grid Abstract Machine, that includes essential abstractions of standard middleware services and a hierarchical Application Manager (AM). AM supports static allocation and dynamic reallocation of adaptive applications according to a performance contract, a reconfiguration strategy, and a performance model.

    @inproceedings{assist:stmalo:05,
      abstract = {This paper presents the main ideas of the high-performance component-based Grid programming environment of the Grid.it project. High-performance components are characterized by a programming model that integrates the concepts of structured parallelism, component interaction, compositionality, and adaptivity. We show that ASSIST, the prototype of parallel programming environment currently under development at our group, is a suitable basis to capture all the desired features of the component model in a flexible and efficient manner. For the sake of interoperability, ASSIST modules or programs are automatically encapsulated in standard frameworks; currently, we are experimenting Web Services and the CORBA Component Model. Grid applications, built as compositions of ASSIST components and possibly other existing (legacy) components, are supported by an innovative Grid Abstract Machine, that includes essential abstractions of standard middleware services and a hierarchical Application Manager (AM). AM supports static allocation and dynamic reallocation of adaptive applications according to a performance contract, a reconfiguration strategy, and a performance model.},
      address = {Saint-Malo, France},
      author = {Marco Aldinucci and Sonia Campa and Massimo Coppola and Marco Danelutto and Domenico Laforenza and Diego Puppin and Luca Scarponi and Marco Vanneschi and Corrado Zoccolo},
      booktitle = {Proc. of the Intl. Workshop on Component Models and Systems for Grid Applications},
      date-modified = {2009-02-03 18:34:58 +0100},
      doi = {10.1007/0-387-23352-0_2},
      editor = {V. Getov and T. Kielmann},
      isbn = {978-0-387-23351-2},
      month = jan,
      pages = {19-38},
      publisher = {Springer},
      series = {CoreGRID},
      title = {Components for high performance Grid programming in Grid.it},
      url = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_ics_stmalo.pdf},
      year = {2005},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2005_assist_ics_stmalo.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/0-387-23352-0_2}
    }

2004

  • M. Aldinucci and M. Torquati, “Accelerating apache farms through ad-HOC distributed scalable object repository,” in Proc. of 10th Intl. Euro-Par 2004 Parallel Processing, 2004, pp. 596-605. doi:10.1007/978-3-540-27866-5_78
    [BibTeX] [Abstract] [Download PDF]

    We present HOC: a fast, scalable object repository providing programmers with a general storage module. hoc may be used to implement DSMs as well as distributed cache subsystems. HOC is composed of a set of hot-pluggable cooperating processes that may sustain a close to optimal network traffic rate. We designed an HOC-based Web cache that extends the Apache Web server and remarkably improves Apache farms performances with no modification to the Apache core code.

    @inproceedings{assist:adhoc:europar:04,
      abstract = {We present HOC: a fast, scalable object repository providing programmers with a general storage module. hoc may be used to implement DSMs as well as distributed cache subsystems. HOC is composed of a set of hot-pluggable cooperating processes that may sustain a close to optimal network traffic rate. We designed an HOC-based Web cache that extends the Apache Web server and remarkably improves Apache farms performances with no modification to the Apache core code.},
      author = {Marco Aldinucci and Massimo Torquati},
      booktitle = {Proc. of 10th Intl. Euro-Par 2004 Parallel Processing},
      date-modified = {2012-07-13 19:06:26 +0200},
      doi = {10.1007/978-3-540-27866-5_78},
      editor = {Marco Danelutto and Marco Vanneschi and Domenico Laforenza},
      month = aug,
      pages = {596-605},
      publisher = {Springer},
      series = {LNCS},
      title = {Accelerating apache farms through {ad-HOC} distributed scalable object repository},
      url = {http://calvados.di.unipi.it/storage/paper_files/2004_hoc_europar.pdf},
      volume = {3149},
      year = {2004},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2004_hoc_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-540-27866-5_78}
    }

  • M. Aldinucci, S. Campa, M. Coppola, S. Magini, P. Pesciullesi, L. Potiti, R. Ravazzolo, M. Torquati, and C. Zoccolo, “Targeting heterogeneous architectures in ASSIST: Experimental results,” in Proc. of 10th Intl. Euro-Par 2004 Parallel Processing, 2004, pp. 638-643. doi:10.1142/S0129626412400063
    [BibTeX] [Abstract] [Download PDF]

    We describe how the ASSIST parallel programming environment can be used to run parallel programs on collections of heterogeneous workstations and evaluate the scalability of one task-farm real application and a data-parallel benchmark, comparing the actual performance figures measured when using homogeneous and heterogeneous workstation clusters. We describe also the ASSIST approach to heterogeneous distributed shared memory and provide preliminary performance figures of the current implementation.

    @inproceedings{assist:hetero:europar:04,
      abstract = {We describe how the ASSIST parallel programming environment can be used to run parallel programs on collections of heterogeneous workstations and evaluate the scalability of one task-farm real application and a data-parallel benchmark, comparing the actual performance figures measured when using homogeneous and heterogeneous workstation clusters. We describe also the ASSIST approach to heterogeneous distributed shared memory and provide preliminary performance figures of the current implementation.},
      author = {Marco Aldinucci and Sonia Campa and Massimo Coppola and Silvia Magini and Paolo Pesciullesi and Laura Potiti and Roberto Ravazzolo and Massimo Torquati and Corrado Zoccolo},
      booktitle = {Proc. of 10th Intl. Euro-Par 2004 Parallel Processing},
      date-modified = {2009-02-04 17:56:42 +0100},
      doi = {10.1142/S0129626412400063},
      editor = {Marco Danelutto and Marco Vanneschi and Domenico Laforenza},
      isbn = {978-3-540-22924-7},
      month = aug,
      pages = {638-643},
      publisher = {Springer},
      series = {LNCS},
      title = {Targeting heterogeneous architectures in {ASSIST}: Experimental results},
      url = {http://calvados.di.unipi.it/storage/paper_files/2004_hetero_europar.pdf},
      volume = {3149},
      year = {2004},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2004_hetero_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626412400063}
    }

  • M. Aldinucci, M. Danelutto, and J. Dünnweber, “Optimization Techniques for Implementing Parallel Skeletons in Grid Environments,” in Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, Stirling, Scotland, UK, 2004, pp. 35-47.
    [BibTeX] [Abstract] [Download PDF]

    Skeletons are common patterns of parallelism like, e.g., farm, pipeline that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons in a distributed grid environment, with the Java-based system Lithium as our reference implementation. Our main contribution are optimization techniques based on an asynchronous, optimized RMI interaction mechanism, which we integrated into the macro data flow (MDF) implementation technology of Lithium. We report initial experimental results that demonstrate the achieved improvements through the proposed optimizations on a simple grid testbed.

    @inproceedings{lith_rmi:cmpp:04,
      abstract = {Skeletons are common patterns of parallelism like, e.g., farm, pipeline that can be abstracted and offered to the application programmer as programming primitives. We describe the use and implementation of skeletons in a distributed grid environment, with the Java-based system Lithium as our reference implementation. Our main contribution are optimization techniques based on an asynchronous, optimized RMI interaction mechanism, which we integrated into the macro data flow (MDF) implementation technology of Lithium. We report initial experimental results that demonstrate the achieved improvements through the proposed optimizations on a simple grid testbed.},
      address = {Stirling, Scotland, UK},
      author = {Marco Aldinucci and Marco Danelutto and Jan D{\"u}nnweber},
      booktitle = {Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming},
      date-modified = {2007-09-16 18:42:29 +0200},
      editor = {S. Gorlatch},
      month = jul,
      pages = {35-47},
      publisher = {Universit{\"a}t M{\"u}nster, Germany},
      title = {Optimization Techniques for Implementing Parallel Skeletons in Grid Environments},
      url = {http://calvados.di.unipi.it/storage/paper_files/2004_RMI_cmpp.pdf},
      year = {2004},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2004_RMI_cmpp.pdf}
    }

  • M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, M. Danelutto, P. Pesciullesi, R. Ravazzolo, M. Torquati, M. Vanneschi, and C. Zoccolo, “A framework for experimenting with structure parallel programming environment design,” in Parallel Computing: Software Technology, Algorithms, Architectures and Applications (Proc. of PARCO 2003, Dresden, Germany), 2004, pp. 617-624. doi:10.1016/S0927-5452(04)80077-7
    [BibTeX] [Abstract] [Download PDF]

    ASSIST is a parallel programming environment aimed at providing programmers of complex parallel application with a suitable and effective programming tool. Being based on algoritmical skeletons and coordination languages technologies, the programming environment relieves the programmer from a number of cumbersome, error prone activities that are required when using traditional parallel programming environments. ASSIST has been specifically designed to be easily customizable in order to experiment different implementation techniques, solutions, algorithms or back-ends any time new features are required or new technologies become available. In this work we discuss how this goal has been achieved and how the current ASSIST programming environment has been already used to experiment solutions not implemented in the first version of the tool.

    @inproceedings{assist:parco:03,
      abstract = {ASSIST is a parallel programming environment aimed at providing programmers of complex parallel application with a suitable and effective programming tool. Being based on algoritmical skeletons and coordination languages technologies, the programming environment relieves the programmer from a number of cumbersome, error prone activities that are required when using traditional parallel programming environments. ASSIST has been specifically designed to be easily customizable in order to experiment different implementation techniques, solutions, algorithms or back-ends any time new features are required or new technologies become available. In this work we discuss how this goal has been achieved and how the current ASSIST programming environment has been already used to experiment solutions not implemented in the first version of the tool.},
      author = {Marco Aldinucci and Sonia Campa and Pierpaolo Ciullo and Massimo Coppola and Marco Danelutto and Paolo Pesciullesi and Roberto Ravazzolo and Massimo Torquati and Marco Vanneschi and Corrado Zoccolo},
      booktitle = {Parallel Computing: Software Technology, Algorithms, Architectures and Applications (Proc. of {PARCO 2003}, Dresden, Germany)},
      date-modified = {2012-11-26 18:49:59 +0000},
      doi = {10.1016/S0927-5452(04)80077-7},
      editor = {G. R. Joubert and W. E. Nagel and F. J. Peters and W. V. Walter},
      issn = {09275452},
      pages = {617-624},
      publisher = {Elsevier},
      series = {Advances in Parallel Computing},
      title = {A framework for experimenting with structure parallel programming environment design},
      url = {http://calvados.di.unipi.it/storage/paper_files/2004_assist_parco03.pdf},
      volume = {13},
      year = {2004},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2004_assist_parco03.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/S0927-5452(04)80077-7}
    }

  • M. Aldinucci and M. Danelutto, “An operational semantics for skeletons,” in Parallel Computing: Software Technology, Algorithms, Architectures and Applications (Proc. of PARCO 2003, Dresden, Germany), Germany, 2004, pp. 63-70. doi:10.1016/S0927-5452(04)80011-X
    [BibTeX] [Abstract] [Download PDF]

    A major weakness of the current programming systems based on skeletons is that parallel semantics is usually provided in an informal way, thus preventing any formal comparison about program behavior. We describe a schema suitable for the description of both functional and parallel semantics of skeletal languages which is aimed at filling this gap. The proposed schema of semantics represents a handy framework to prove the correctness and validate different rewriting rules. These can be used to transform a skeleton program into a functionally equivalent but possibly faster version.

    @inproceedings{lith:sem:parco:03,
      abstract = {A major weakness of the current programming systems based on skeletons is that parallel semantics is usually provided in an informal way, thus preventing any formal comparison about program behavior. We describe a schema suitable for the description of both functional and parallel semantics of skeletal languages which is aimed at filling this gap. The proposed schema of semantics represents a handy framework to prove the correctness and validate different rewriting rules. These can be used to transform a skeleton program into a functionally equivalent but possibly faster version.},
      address = {Germany},
      author = {Marco Aldinucci and Marco Danelutto},
      booktitle = {Parallel Computing: Software Technology, Algorithms, Architectures and Applications (Proc. of {PARCO 2003}, Dresden, Germany)},
      date-modified = {2012-07-15 14:39:27 +0000},
      doi = {10.1016/S0927-5452(04)80011-X},
      editor = {G. R. Joubert and W. E. Nagel and F. J. Peters and W. V. Walter},
      pages = {63-70},
      publisher = {Elsevier},
      series = {Advances in Parallel Computing},
      title = {An operational semantics for skeletons},
      url = {http://calvados.di.unipi.it/storage/paper_files/2004_sem_parco03.pdf},
      volume = {13},
      year = {2004},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2004_sem_parco03.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/S0927-5452(04)80011-X}
    }

2003

  • M. Aldinucci, “Dynamic shared data in structured parallel programming frameworks,” PhD Thesis, 2003.
    [BibTeX]
    @phdthesis{phd:marco:2003,
      author = {Marco Aldinucci},
      date-added = {2008-09-14 11:56:48 +0200},
      date-modified = {2008-09-14 11:58:16 +0200},
      month = dec,
      school = {Computer Science Dept., University of Pisa},
      title = {Dynamic shared data in structured parallel programming frameworks},
      year = {2003}
    }

  • M. Aldinucci, “eskimo: experimenting with Skeletons in the Shared Address Model,” Parallel Processing Letters, vol. 13, iss. 3, pp. 449-460, 2003. doi:10.1142/S0129626403001410
    [BibTeX] [Abstract] [Download PDF]

    We discuss the lack of expressivity in some skeleton-based parallel programmingframeworks. The problem is further exacerbated when approaching irregular problems and dealing with dynamic data structures. Shared memory programming has been argued to have substantial ease of programming advantages for this class of problems. We present the eskimo library which represents an attempt to merge the two programming models by introducing skeletons in a shared memory framework.

    @article{eskimo:PPL:03,
      abstract = {We discuss the lack of expressivity in some skeleton-based parallel programmingframeworks. The problem is further exacerbated when approaching irregular problems and dealing with dynamic data structures. Shared memory programming has been argued to have substantial ease of programming advantages for this class of problems. We present the eskimo library which represents an attempt to merge the two programming models by introducing skeletons in a shared memory framework.},
      annote = {ISSN: 0129-6264},
      author = {Marco Aldinucci},
      date-modified = {2014-08-24 22:20:32 +0000},
      doi = {10.1142/S0129626403001410},
      issn = {0129-6264},
      journal = {Parallel Processing Letters},
      month = sep,
      number = {3},
      pages = {449-460},
      title = {eskimo: experimenting with Skeletons in the Shared Address Model},
      url = {http://calvados.di.unipi.it/storage/paper_files/2003_eskimo_ppl.pdf},
      volume = {13},
      year = {2003},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2003_eskimo_ppl.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1142/S0129626403001410}
    }

  • M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, M. Danelutto, P. Pesciullesi, R. Ravazzolo, M. Torquati, M. Vanneschi, and C. Zoccolo, “ASSIST demo: a high level, high performance, portable, structured parallel programming environment at work,” in Proc. of 9th Intl. Euro-Par 2003 Parallel Processing, Klagenfurt, Austria, 2003, pp. 1295-1300. doi:10.1007/978-3-540-45209-6_176
    [BibTeX] [Abstract] [Download PDF]

    This work summarizes the possibilities offered by parallel programming environment ASSIST by outlining some of the features that will be demonstrated at the conference demo session. We’ll substantially show how this environment can be deployed on a Linux workstation network/cluster, how applications can be compiled and run using ASSIST and eventually, we’ll discuss some ASSIST scalability and performance features. We’ll also outline how the ASSIST environment can be used to target GRID architectures.

    @inproceedings{assist:demo:europar:03,
      abstract = {This work summarizes the possibilities offered by parallel programming environment ASSIST by outlining some of the features that will be demonstrated at the conference demo session. We'll substantially show how this environment can be deployed on a Linux workstation network/cluster, how applications can be compiled and run using ASSIST and eventually, we'll discuss some ASSIST scalability and performance features. We'll also outline how the ASSIST environment can be used to target GRID architectures.},
      address = {Klagenfurt, Austria},
      author = {Marco Aldinucci and Sonia Campa and Pierpaolo Ciullo and Massimo Coppola and Marco Danelutto and Paolo Pesciullesi and Roberto Ravazzolo and Massimo Torquati and Marco Vanneschi and Corrado Zoccolo},
      booktitle = {Proc. of 9th Intl. Euro-Par 2003 Parallel Processing},
      date-modified = {2012-11-10 02:24:20 +0000},
      doi = {10.1007/978-3-540-45209-6_176},
      editor = {H. Kosch and L. B{\"o}sz{\"o}rm{\'e}nyi and H. Hellwagner},
      month = aug,
      pages = {1295-1300},
      publisher = {Springer},
      series = {LNCS},
      title = {{ASSIST} demo: a high level, high performance, portable, structured parallel programming environment at work},
      url = {http://calvados.di.unipi.it/storage/paper_files/2003_assist_demo_europar.pdf},
      volume = {2790},
      year = {2003},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2003_assist_demo_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/978-3-540-45209-6_176}
    }

  • M. Aldinucci, S. Campa, P. Ciullo, M. Coppola, S. Magini, P. Pesciullesi, L. Potiti, R. Ravazzolo, M. Torquati, M. Vanneschi, and C. Zoccolo, “The Implementation of ASSIST, an Environment for Parallel and Distributed Programming,” in Proc. of 9th Intl Euro-Par 2003 Parallel Processing, Klagenfurt, Austria, 2003, pp. 712-721. doi:10.1007/b12024
    [BibTeX] [Abstract] [Download PDF]

    We describe the implementation of ASSIST, a programming environment for parallel and distributed programs. Its coordination language is based of the parallel skeleton model, extended with new features to enhance expressiveness, parallel software reuse, software component integration and interfacing to external resources. The compilation process and the structure of the run-time support of ASSIST are discussed with respect to the issues introduced by the new characteristics, presenting an analysis of the first test results.

    @inproceedings{assist:imp:europar:03,
      abstract = {We describe the implementation of ASSIST, a programming environment for parallel and distributed programs. Its coordination language is based of the parallel skeleton model, extended with new features to enhance expressiveness, parallel software reuse, software component integration and interfacing to external resources. The compilation process and the structure of the run-time support of ASSIST are discussed with respect to the issues introduced by the new characteristics, presenting an analysis of the first test results.},
      address = {Klagenfurt, Austria},
      author = {Marco Aldinucci and Sonia Campa and Pierpaolo Ciullo and Massimo Coppola and Silvia Magini and Paolo Pesciullesi and Laura Potiti and Roberto Ravazzolo and Massimo Torquati and Marco Vanneschi and Corrado Zoccolo},
      booktitle = {Proc. of 9th Intl Euro-Par 2003 Parallel Processing},
      date-modified = {2010-10-24 15:29:07 +0200},
      doi = {10.1007/b12024},
      editor = {H. Kosch and L. B{\"o}sz{\"o}rm{\'e}nyi and H. Hellwagner},
      isbn = {978-3-540-40788-1},
      month = aug,
      pages = {712-721},
      publisher = {Springer},
      series = {LNCS},
      title = {The Implementation of {ASSIST}, an Environment for Parallel and Distributed Programming},
      url = {http://calvados.di.unipi.it/storage/paper_files/2003_assist_imp_europar.pdf},
      volume = {2790},
      year = {2003},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2003_assist_imp_europar.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1007/b12024}
    }

  • M. Aldinucci, M. Danelutto, and P. Teti, “An advanced environment supporting structured parallel programming in Java,” Future Generation Computer Systems, vol. 19, iss. 5, pp. 611-626, 2003. doi:10.1016/S0167-739X(02)00172-3
    [BibTeX] [Abstract] [Download PDF]

    In this work we present Lithium, a pure Java structured parallel programming environment based on skeletons (common, reusable and efficient parallelism exploitation patterns). Lithium is implemented as a Java package and represents both the first skeleton based programming environment in Java and the first complete skeleton based Java environment exploiting macro-data flow implementation techniques. Lithium supports a set of user code optimizations which are based on skeleton rewriting techniques. These optimizations improve both absolute performance and resource usage with respect to original user code. Parallel programs developed using the library run on any network of workstations provided the workstations support plain JRE. The paper describes the library implementation, outlines the optimization techniques used and eventually presents the performance results obtained on both synthetic and real applications.

    @article{lithium:fgcs:03,
      abstract = {In this work we present Lithium, a pure Java structured parallel programming environment based on skeletons (common, reusable and efficient parallelism exploitation patterns). Lithium is implemented as a Java package and represents both the first skeleton based programming environment in Java and the first complete skeleton based Java environment exploiting macro-data flow implementation techniques.
    Lithium supports a set of user code optimizations which are based on skeleton rewriting techniques. These optimizations improve both absolute performance and resource usage with respect to original user code. Parallel programs developed using the library run on any network of workstations provided the workstations support plain JRE. The paper describes the library implementation, outlines the optimization techniques used and eventually presents the performance results obtained on both synthetic and real applications.},
      author = {Marco Aldinucci and Marco Danelutto and Paolo Teti},
      date-modified = {2014-08-24 22:16:31 +0000},
      doi = {10.1016/S0167-739X(02)00172-3},
      journal = {Future Generation Computer Systems},
      month = jul,
      number = {5},
      pages = {611-626},
      title = {An advanced environment supporting structured parallel programming in {Java}},
      url = {http://calvados.di.unipi.it/storage/paper_files/2003_lithium_fgcs.pdf},
      volume = {19},
      year = {2003},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2003_lithium_fgcs.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1016/S0167-739X(02)00172-3}
    }

  • M. Aldinucci, “eskimo: experimenting skeletons on the shared address model,” in Proc. of HLPP2003: Intl. Workshop on High-Level Parallel Programming, Paris, France, 2003, pp. 89-100.
    [BibTeX] [Abstract] [Download PDF]

    We discuss the lack of expressivity in some skeleton-based parallel programming frameworks. The problem is further exacerbated when approaching irregular problems and dealing with dynamic data structures. Shared memory programming has been argued to have substantial ease of programming advantages for this class of problems. We present the eskimo library which represents an attempt to merge the two programming models by introducing skeletons in a shared memory framework.

    @inproceedings{eskimo:hlpp:03,
      abstract = {We discuss the lack of expressivity in some skeleton-based parallel programming
    frameworks. The problem is further exacerbated when approaching irregular problems and dealing with dynamic data structures. Shared memory programming has been argued to have substantial ease of programming advantages for this class of problems. We present the eskimo library which represents an attempt to merge the two programming models by introducing skeletons in a shared memory framework.},
      address = {Paris, France},
      author = {Marco Aldinucci},
      booktitle = {Proc. of HLPP2003: Intl. Workshop on High-Level Parallel Programming},
      date-modified = {2007-09-16 18:41:29 +0200},
      month = jun,
      pages = {89-100},
      title = {eskimo: experimenting skeletons on the shared address model},
      url = {http://calvados.di.unipi.it/storage/paper_files/2003_eskimo_ppl.pdf},
      year = {2003},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2003_eskimo_ppl.pdf}
    }

2002

  • M. Aldinucci, “Automatic Program Transformation: The Meta Tool for Skeleton-based Languages,” in Constructive Methods for Parallel Programming, S. Gorlatch and C. Lengauer, Eds., NY, USA: Nova Science Publishers, 2002, pp. 59-78.
    [BibTeX] [Abstract] [Download PDF]

    Academic and commercial experience with skeleton-based systems has demonstrated the benefits of the approach but also the lack of methods and tools for algorithm design and performance prediction. We propose a (graphical) transformation tool based on a novel internal representation of programs that enables the user to effectively deal with program transformation. Given a skeleton-based language and a set of semantic-preserving transformation rules, the tool locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space.

    @incollection{meta:CMPP:book:02,
      abstract = {Academic and commercial experience with skeleton-based systems has demonstrated the benefits of the approach but also the lack of methods and tools for algorithm design and performance prediction. We propose a (graphical) transformation tool based on a novel internal representation of programs that enables the user to effectively deal with program transformation. Given a skeleton-based language and a set of semantic-preserving transformation rules, the tool locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space.},
      address = {NY, USA},
      author = {Marco Aldinucci},
      booktitle = {Constructive Methods for Parallel Programming},
      chapter = {5},
      date-modified = {2009-01-30 14:55:28 +0100},
      editor = {Sergei Gorlatch and Christian Lengauer},
      isbn = {1-59033-374-8},
      pages = {59-78},
      publisher = {Nova Science Publishers},
      series = {Advances in Computation: Theory and Practice},
      title = {Automatic Program Transformation: The {Meta} Tool for Skeleton-based Languages},
      url = {http://calvados.di.unipi.it/storage/paper_files/2002_meta_book.a4.pdf},
      year = {2002},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2002_meta_book.a4.pdf}
    }

2001

  • M. Aldinucci, S. Gorlatch, C. Lengauer, and S. Pelagatti, “Towards Parallel Programming by Transformation: The FAN Skeleton Framework,” Parallel Algorithms and Applications, vol. 16, iss. 2-3, pp. 87-121, 2001. doi:10.1080/01495730108935268
    [BibTeX] [Abstract] [Download PDF]

    A Functional Abstract Notation (FAN) is proposed for the specification and design of parallel algorithms by means of skeletons – high-level patterns with parallel semantics. The main weakness of the current programming systems based on skeletons is that the user is still responsible for finding the most appropriate skeleton composition for a given application and a given parallel architecture. We describe a transformational framework for the development of skeletal programs which is aimed at filling this gap. The framework makes use of transformation rules which are semantic equivalences among skeleton compositions. For a given problem, an initial, possibly inefficient skeleton specification is refined by applying a sequence of transformations. Transformations are guided by a set of performance prediction models which forecast the behavior of each skeleton and the performance benefits of different rules. The design process is supported by a graphical tool which locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space. We give an overview of the FAN framework and exemplify its use with performance-directed program derivations for simple case studies. Our experience can be viewed as a first feasibility study of methods and tools for transformational, performance-directed parallel programming using skeletons.

    @article{FAN:PPA:01,
      abstract = {A Functional Abstract Notation (FAN) is proposed for the specification and design of parallel algorithms by means of skeletons - high-level patterns with parallel semantics. The main weakness of the current programming systems based on skeletons is that the user is still responsible for finding the most appropriate skeleton composition for a given application and a given parallel architecture.
    We describe a transformational framework for the development of skeletal programs which is aimed at filling this gap. The framework makes use of transformation rules which are semantic equivalences among skeleton compositions. For a given problem, an initial, possibly inefficient skeleton specification is refined by applying a sequence of transformations. Transformations are guided by a set of performance prediction models which forecast the behavior of each skeleton and the performance benefits of different rules. The design process is supported by a graphical tool which locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space. We give an overview of the FAN framework and exemplify its use with performance-directed program derivations for simple case studies. Our experience can be viewed as a first feasibility study of methods and tools for transformational, performance-directed parallel programming using skeletons.},
      author = {Marco Aldinucci and Sergei Gorlatch and Christian Lengauer and Susanna Pelagatti},
      date-modified = {2014-08-24 22:19:37 +0000},
      doi = {10.1080/01495730108935268},
      journal = {Parallel Algorithms and Applications},
      month = mar,
      number = {2-3},
      pages = {87-121},
      title = {Towards Parallel Programming by Transformation: The {FAN} Skeleton Framework},
      url = {http://calvados.di.unipi.it/storage/paper_files/2001_FAN_paa.pdf},
      volume = {16},
      year = {2001},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2001_FAN_paa.pdf},
      bdsk-url-2 = {http://dx.doi.org/10.1080/01495730108935268}
    }

2000

  • M. Aldinucci, “The Meta Transformation Tool for Skeleton-Based Languages,” in Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, 2000, pp. 53-68.
    [BibTeX] [Abstract] [Download PDF]

    Academic and commercial experience with skeleton-based systems has demonstrated the benefits of the approach but also the lack of methods and tools for algorithm design and performance prediction. We propose a (graphical) transformation tool based on a novel internal representation of programs that enables the user to effectively deal with program transformation. Given a skeleton-based language and a set of semantic-preserving transformation rules, the tool locates applicable transformations and provides performance estimates, thereby helping the programmer in navigating through the program refinement space.

    @inproceedings{aldinuc:meta:00,
      abstract = {Academic and commercial experience with skeleton-based systems has
    demonstrated the benefits of the approach but also the lack of methods
    and tools for algorithm design and performance prediction.
    We propose a (graphical) transformation tool based on a novel internal
    representation of programs that enables the user to effectively deal with
    program transformation.
    Given a skeleton-based language and a set of semantic-preserving
    transformation rules, the tool locates applicable transformations
    and provides performance estimates, thereby helping the programmer in
    navigating through the program refinement space.},
      author = {Marco Aldinucci},
      booktitle = {Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming},
      date-modified = {2007-09-16 18:41:04 +0200},
      editor = {S. Gorlatch and C. Lengauer},
      month = jul,
      organization = {Fakult{\"a}t f{\"u}r mathematik und informatik},
      pages = {53-68},
      publisher = {Uni. Passau, Germany},
      title = {The {Meta} Transformation Tool for Skeleton-Based Languages},
      url = {http://calvados.di.unipi.it/storage/paper_files/2000_meta_cmpp.pdf},
      year = {2000},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/2000_meta_cmpp.pdf}
    }

1999

  • M. Aldinucci and M. Danelutto, “Stream parallel skeleton optimization,” in Proc. of PDCS: Intl. Conference on Parallel and Distributed Computing and Systems, Cambridge, Massachusetts, USA, 1999, pp. 955-962.
    [BibTeX] [Abstract] [Download PDF]

    We discuss the properties of the composition of stream parallel skeletons such as pipelines and farms. By looking at the ideal performance figures assumed to hold for these skeletons, we show that any stream parallel skeleton composition can always be rewritten into an equivalent "normal form" skeleton composition, delivering a service time which is equal or even better to the service time of the original skeleton composition, and achieving a better utilization of the processors used. The normal form is defined as a single farm built around a sequential worker code. Experimental results are discussed that validate this normal form.

    @inproceedings{pdcs:nf:99,
      abstract = {We discuss the properties of the composition of stream parallel
      skeletons such as pipelines and farms.  By looking at the ideal
      performance figures assumed to hold for these skeletons, we show
      that any stream parallel skeleton composition can always be
      rewritten into an equivalent "normal form" skeleton composition,
      delivering a service time which is equal or even better to the
      service time of the original skeleton composition, and achieving a
      better utilization of the processors used. The normal form is
      defined as a single farm built around a sequential worker code.
      Experimental results are discussed that validate this normal form.},
      address = {Cambridge, Massachusetts, USA},
      author = {Marco Aldinucci and Marco Danelutto},
      booktitle = {Proc. of PDCS: Intl. Conference on Parallel and Distributed Computing and Systems},
      date-modified = {2007-09-16 18:40:51 +0200},
      month = nov,
      organization = {IASTED},
      pages = {955-962},
      publisher = {ACTA press},
      title = {Stream parallel skeleton optimization},
      url = {http://calvados.di.unipi.it/storage/paper_files/1999_NF_pdcs.pdf},
      year = {1999},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/1999_NF_pdcs.pdf}
    }

1998

  • M. Aldinucci, M. Coppola, and M. Danelutto, “Rewriting skeleton programs: How to evaluate the data-parallel stream-parallel tradeoff,” in Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming, 1998, pp. 44-58.
    [BibTeX] [Abstract] [Download PDF]

    Some skeleton based parallel programming models allow the programmer to use both data and stream parallel skeletons within the same program. It is known that particular skeleton nestings can be formally rewritten into different nestings that preserve the functional semantics. Indeed, the kind and possibly the amount of parallelism usefully exploitable may change while rewriting takes place. Here we discuss an original framework allowing the user (and/or the compiling tools) of a skeleton based parallel programming language to evaluate whether or not the transformation of a skeleton program is worthwhile in terms of the final program performance. We address, in particular, the evaluation of transformations exchanging data parallel and stream parallel skeleton subtrees.

    @inproceedings{aldinuc:stream-data:98,
      abstract = { Some skeleton based parallel programming models allow the programmer to
      use both data and stream parallel skeletons within the same program.
      It is known that particular skeleton nestings can be formally
      rewritten into different nestings that preserve the functional
      semantics. Indeed, the kind and possibly the amount of parallelism
      usefully exploitable may change while rewriting takes place.
      Here we discuss an original framework allowing the user (and/or the
      compiling tools) of a skeleton based parallel programming language to
      evaluate whether or not the transformation of a skeleton program
      is worthwhile in terms of the final program performance. We address,
      in particular, the evaluation of transformations exchanging data
      parallel and stream parallel skeleton subtrees.},
      author = {Marco Aldinucci and Massimo Coppola and Marco Danelutto},
      booktitle = {Proc. of CMPP: Intl. Workshop on Constructive Methods for Parallel Programming},
      date-modified = {2007-09-16 18:40:40 +0200},
      editor = {S. Gorlatch},
      month = may,
      optnumber = {MIP-9805},
      optseries = {University of Passau technical report},
      organization = {Fakult{\"a}t f{\"u}r mathematik und informatik},
      pages = {44-58},
      publisher = {Uni. Passau, Germany},
      title = {Rewriting skeleton programs: How to evaluate the data-parallel stream-parallel tradeoff},
      url = {http://calvados.di.unipi.it/storage/paper_files/1998_transf_cmpp.pdf},
      year = {1998},
      bdsk-url-1 = {http://calvados.di.unipi.it/storage/paper_files/1998_transf_cmpp.pdf}
    }

End of parallel processing papers by year