A novel high-performance libfabric provider for A3Cube network

Paolo Inaudi,  MSc student from University of Torino,  just completed a first version of a novel, high-performance libfabric provider for A3Cube In-memory “Ronniee express” network.  Opensource under LGPLv3 on GitHUB

A sub-μs latency on pingpong up to 512 bytes. This means just a tiny overhead of the whole protocol over the bare metal latency. An excellent result.

High-Performance cluster - Nodes: Intel i7 cluster
--------------------------------------------------
> ./fi_msg_rma_pingpong -p 1000 FORTISSIMO1

name      size   iters   total       time μs  bandwidth     latency/2
2_lat     2       100k    390k        0.18s      2.28       0.88
4_lat     4       100k    781k        0.17s      4.62       0.87
8_lat     8       100k    1.5m        0.18s      9.10       0.88
16_lat    16      100k    3m          0.17s     18.45       0.87
32_lat    32      100k    6.1m        0.17s     36.78       0.87
64_lat    64      100k    12m         0.17s     76.06       0.84
128_lat   128     100k    24m         0.17s    151.70       0.84
256_lat   256     100k    48m         0.17s    299.30       0.86
512_lat   512     100k    97m         0.19s    539.02       0.95
1k_lat    1024    10k     19m         0.02s    898.99       1.14
2k_lat    2048    10k     39m         0.03s   1357.87       1.51
4k_lat    4096    10k     78m         0.04s   1855.91       2.21
8k_lat    8192    10k     156m        0.07s   2245.00       3.65
16k_lat   16384   10k     312m        0.13s   2528.30       6.48
32k_lat   32768   10k     625m        0.24s   2696.31      12.15
64k_lat   65536   1k      125m        0.05s   2793.58      23.46
Low power cluster - Nodes: Intel(R) Atom(TM) CPU  C2750  @ 2.40GHz
------------------------------------------------------------------
> ./fi_msg_rma_pingpong -p 1000 paradigm1

name      size   iters   total       time μs  bandwidth    latency/2
2_lat     2       100k    390k        0.29s      1.36       1.47
4_lat     4       100k    781k        0.29s      2.72       1.47
8_lat     8       100k    1.5m        0.30s      5.42       1.48
16_lat    16      100k    3m          0.30s     10.83       1.48
32_lat    32      100k    6.1m        0.30s     21.30       1.50
64_lat    64      100k    12m         0.29s     43.83       1.46
128_lat   128     100k    24m         0.26s     98.82       1.30
256_lat   256     100k    48m         0.27s    191.91       1.33
512_lat   512     100k    97m         0.30s    339.60       1.51
1k_lat    1024    10k     19m         0.04s    548.24       1.87
2k_lat    2048    10k     39m         0.05s    790.99       2.59
4k_lat    4096    10k     78m         0.08s   1022.20       4.01
8k_lat    8192    10k     156m        0.14s   1196.03       6.85
16k_lat   16384   10k     312m        0.25s   1310.76      12.50
32k_lat   32768   10k     625m        0.48s   1376.25      23.81
64k_lat   65536   1k      125m        0.09s   1412.90      46.38

 

Paradigm Intel atom cluster

Paradigm low-power cluster with A3Cube and Ethernet networks

This entry was posted in news and tagged , , , on by .

About Marco Aldinucci

Marco Aldinucci is an assistant professor at Computer Science Department of the University of Torino since 2008. Previously, he has been researcher at University of Pisa and Italian National Research Agency. He is the author of over a hundred papers in international journals and conference proceeding (Google scholar h-index 21). He has been participating in over 20 national and international research projects concerning parallel and autonomic computing. He is the recipient of the HPC Advisory Council University Award 2011 and the NVidia Research award 2013. He has been leading the “Low-Level Virtualization and Platform-Specific Deployment” workpackage within the EU-STREP FP7 ParaPhrase (Parallel Patterns for Adaptive Heterogeneous Multicore Systems) project, the GPGPU workpackage within the IMPACT project (Innovative Methods for Particle Colliders at the Terascale), and he is the contact person for University of Torino for the European Network of Excellence on High Performance and Embedded Architecture and Compilation. In the last year he delivered 5 invited talks in international workshops (March 2012 – March 2013). He co-designed, together with Massimo Torquati, the FastFlow programming framework and several other programming frameworks and libraries for parallel computing. His research is focused on parallel and distributed computing.