Robert René Maria Birke


Robert Birke

Tenured assistant professor at Computer Science Department, University of Turin
Parallel Computing group
Via Pessinetto 12, 10149 Torino – Italy 
Email: robert.birke@unito.it

Short Bio

Robert Birke is a tenured assistant professor in the Parallel Computing research group at the University of Torino. He received his Ph.D. in Electronics and Communications Engineering from the Politecnico di Torino, Italy, (2009).
He has been a visiting researcher at IBM Research Zurich, Switzerland, and a Principal Scientist at ABB Corporate Research, Switzerland. His research interests are in the broad area of virtual resource management including network design, workload characterization, and AI and big-data application optimization.
He has published more than 90 papers at venues related to communication, system performance and machine learning, e.g., SIGCOMM, SIGMETRICS, FAST, INFOCOM, ACML and JSAC.
He is a senior member of IEEE.

Publications

2022

  • B. Cox, R. Birke, and L. Y. Chen, “Memory-aware and context-aware multi-dnn inference on the edge,” Pervasive and mobile computing, vol. 83, p. 101594, 2022. doi:https://doi.org/10.1016/j.pmcj.2022.101594
    [BibTeX] [Abstract] [Download PDF]

    Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of Masa is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of Masa are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of Masa, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

    @article{COX2022101594,
    title = {Memory-aware and context-aware multi-DNN inference on the edge},
    journal = {Pervasive and Mobile Computing},
    volume = {83},
    pages = {101594},
    year = {2022},
    issn = {1574-1192},
    doi = {https://doi.org/10.1016/j.pmcj.2022.101594},
    url = {https://www.sciencedirect.com/science/article/pii/S1574119222000372},
    author = {Bart Cox and Robert Birke and Lydia Y. Chen},
    keywords = {},
    abstract = {Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted by executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. It is of paramount importance to guarantee low response times of such multi-DNN executions as it affects not only users quality of experience but also safety. The challenge, largely unaddressed by the state of the art, is how to overcome the memory limitation of edge devices without altering the DNN models. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution and scheduling framework, which requires no modification of DNN models. The aim of Masa is to consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. The enabling features of Masa are (i) modeling inter- and intra-network dependency, (ii) leveraging complimentary memory usage of each layer, and (iii) exploring the context dependency of DNNs. We verify the correctness and scheduling optimality via mixed integer programming. We extensively evaluate two versions of Masa, context-oblivious and context-aware, on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.}
    }

2021

  • Z. Zhao, R. Birke, R. Han, B. Robu, S. Bouchenak, S. B. Mokhtar, and L. Y. Chen, “Enhancing robustness of on-line learning models on highly noisy data,” IEEE trans. dependable secur. comput., vol. 18, iss. 5, p. 2177–2192, 2021. doi:10.1109/TDSC.2021.3063947
    [BibTeX] [Abstract] [Download PDF]

    Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this article, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95 percent for IoT device attacks (i.e., +7%), up to 85.03 percent for cloud task failures (i.e., +14%) under 40 percent label noise, and for its extension, it can reach up to 77.51 percent for face recognition (i.e., +39%) under 30 percent label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

    @article{ZhaoBHRBMC21,
    author = {Zilong Zhao and
    Robert Birke and
    Rui Han and
    Bogdan Robu and
    Sara Bouchenak and
    Sonia Ben Mokhtar and
    Lydia Y. Chen},
    title = {Enhancing Robustness of On-Line Learning Models on Highly Noisy Data},
    journal = {{IEEE} Trans. Dependable Secur. Comput.},
    volume = {18},
    number = {5},
    pages = {2177--2192},
    year = {2021},
    url = {https://doi.org/10.1109/TDSC.2021.3063947},
    doi = {10.1109/TDSC.2021.3063947},
    abstract = {Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this article, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95 percent for IoT device attacks (i.e., +7%), up to 85.03 percent for cloud task failures (i.e., +14%) under 40 percent label noise, and for its extension, it can reach up to 77.51 percent for face recognition (i.e., +39%) under 30 percent label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.},
    keywords = {}
    }

  • R. Birke, J. F. Pérez, Z. Qiu, M. Björkqvist, and L. Y. Chen, “Spare: partial replication for multi-tier applications in the cloud,” IEEE trans. serv. comput., vol. 14, iss. 2, p. 574–588, 2021. doi:10.1109/TSC.2017.2780845
    [BibTeX] [Abstract] [Download PDF]

    Offering consistent low latency remains a key challenge for distributed applications, especially when deployed on the cloud where virtual machines (VMs) suffer from capacity variability caused by co-located tenants. Replicating redundant requests was shown to be an effective mechanism to defend application performance from high capacity variability. While the prior art centers on single-tier systems, it still remains an open question how to design replication strategies for distributed multi-tier systems. In this paper, we design a first of its kind PArtial REplication system, sPARE, that replicates and dispatches read-only workloads for distributed multi-tier web applications. The two key components of sPARE are (i) the variability-aware replicator that coordinates the replication levels on all tiers via an iterative searching algorithm, and (ii) the replication-aware arbiter that uses a novel token-based arbitration algorithm (TAD) to dispatch requests in each tier. We evaluate sPARE on web serving and searching applications, i.e., MediaWiki and Solr, the former deployed on our private cloud and the latter on Amazon EC2. Our results based on various interference patterns and traffic loads show that sPARE is able to improve the tail latency of MediaWiki and Solr by a factor of almost 2.7x and 2.9x, respectively.

    @article{BirkePQBC21,
    author = {Robert Birke and
    Juan F. P{\'{e}}rez and
    Zhan Qiu and
    Mathias Bj{\"{o}}rkqvist and
    Lydia Y. Chen},
    title = {sPARE: Partial Replication for Multi-Tier Applications in the Cloud},
    journal = {{IEEE} Trans. Serv. Comput.},
    volume = {14},
    number = {2},
    pages = {574--588},
    year = {2021},
    url = {https://doi.org/10.1109/TSC.2017.2780845},
    doi = {10.1109/TSC.2017.2780845},
    abstract = {Offering consistent low latency remains a key challenge for distributed applications, especially when deployed on the cloud where virtual machines (VMs) suffer from capacity variability caused by co-located tenants. Replicating redundant requests was shown to be an effective mechanism to defend application performance from high capacity variability. While the prior art centers on single-tier systems, it still remains an open question how to design replication strategies for distributed multi-tier systems. In this paper, we design a first of its kind PArtial REplication system, sPARE, that replicates and dispatches read-only workloads for distributed multi-tier web applications. The two key components of sPARE are (i) the variability-aware replicator that coordinates the replication levels on all tiers via an iterative searching algorithm, and (ii) the replication-aware arbiter that uses a novel token-based arbitration algorithm (TAD) to dispatch requests in each tier. We evaluate sPARE on web serving and searching applications, i.e., MediaWiki and Solr, the former deployed on our private cloud and the latter on Amazon EC2. Our results based on various interference patterns and traffic loads show that sPARE is able to improve the tail latency of MediaWiki and Solr by a factor of almost 2.7x and 2.9x, respectively.},
    keywords = {}
    }

  • Z. Zhao, A. Kunar, R. Birke, and L. Y. Chen, “Ctab-gan: effective table data synthesizing,” in Proceedings of the 13th asian conference on machine learning, 2021, p. 97–112.
    [BibTeX] [Abstract] [Download PDF]

    While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) unfortunately limit its full effectiveness. Synthetic tabular data emerges as an alternative to enable data sharing while fulfilling regulatory and privacy constraints. The state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN) and address two main data types in industry, i.e., continuous and categorical. In this paper, we develop CTAB-GAN, a novel conditional table GAN architecture that can effectively model diverse data types, including a mix of continuous and categorical variables. Moreover, we address data imbalance and long tail issues, i.e., certain variables have drastic frequency differences across large values. To achieve those aims, we first introduce the information loss, classification loss and generator loss to the conditional GAN. Secondly, we design a novel conditional vector, which efficiently encodes the mixed data type and skewed distribution of data variable. We extensively evaluate CTAB-GAN with the state of the art GANs that generate synthetic tables, in terms of data similarity and analysis utility. The results on five datasets show that the synthetic data of CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up to 17%.

    @inproceedings{pmlr-v157-zhao21a,
    title = {CTAB-GAN: Effective Table Data Synthesizing},
    author = {Zhao, Zilong and Kunar, Aditya and Birke, Robert and Chen, Lydia Y.},
    booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
    pages = {97--112},
    year = {2021},
    editor = {Balasubramanian, Vineeth N. and Tsang, Ivor},
    volume = {157},
    series = {Proceedings of Machine Learning Research},
    month = {17--19 Nov},
    publisher = {PMLR},
    pdf = {https://proceedings.mlr.press/v157/zhao21a/zhao21a.pdf},
    url = {https://proceedings.mlr.press/v157/zhao21a.html},
    abstract = {While data sharing is crucial for knowledge development, privacy concerns and strict regulation (e.g., European General Data Protection Regulation (GDPR)) unfortunately limit its full effectiveness. Synthetic tabular data emerges as an alternative to enable data sharing while fulfilling regulatory and privacy constraints. The state-of-the-art tabular data synthesizers draw methodologies from Generative Adversarial Networks (GAN) and address two main data types in industry, i.e., continuous and categorical. In this paper, we develop CTAB-GAN, a novel conditional table GAN architecture that can effectively model diverse data types, including a mix of continuous and categorical variables. Moreover, we address data imbalance and long tail issues, i.e., certain variables have drastic frequency differences across large values. To achieve those aims, we first introduce the information loss, classification loss and generator loss to the conditional GAN. Secondly, we design a novel conditional vector, which efficiently encodes the mixed data type and skewed distribution of data variable. We extensively evaluate CTAB-GAN with the state of the art GANs that generate synthetic tables, in terms of data similarity and analysis utility. The results on five datasets show that the synthetic data of CTAB-GAN remarkably resembles the real data for all three types of variables and results into higher accuracy for five machine learning algorithms, by up to 17%.},
    keywords = {}
    }

  • T. Younesian, Z. Zhao, A. Ghiassi, R. Birke, and L. Y. Chen, “Qactor: active learning on noisy labels,” in Proceedings of the 13th asian conference on machine learning, 2021, p. 548–563.
    [BibTeX] [Abstract] [Download PDF]

    Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.

    @inproceedings{pmlr-v157-younesian21a,
    title = {QActor: Active Learning on Noisy Labels},
    author = {Younesian, Taraneh and Zhao, Zilong and Ghiassi, Amirmasoud and Birke, Robert and Chen, Lydia Y},
    booktitle = {Proceedings of The 13th Asian Conference on Machine Learning},
    pages = {548--563},
    year = {2021},
    editor = {Balasubramanian, Vineeth N. and Tsang, Ivor},
    volume = {157},
    series = {Proceedings of Machine Learning Research},
    month = {17--19 Nov},
    publisher = {PMLR},
    pdf = {https://proceedings.mlr.press/v157/younesian21a/younesian21a.pdf},
    url = {https://proceedings.mlr.press/v157/younesian21a.html},
    abstract = {Noisy labeled data is more a norm than a rarity for self-generated content that is continuously published on the web and social media from non-experts. Active querying experts are conventionally adopted to provide labels for the informative samples which don’t have labels, instead of possibly incorrect labels. The new challenge that arises here is how to discern the informative and noisy labels which benefit from expert cleaning. In this paper, we aim to leverage the stringent oracle budget to robustly maximize learning accuracy. We propose a noise-aware active learning framework, QActor, and a novel measure \emph{CENT}, which considers both cross-entropy and entropy to select informative and noisy labels for an expert cleansing. QActor iteratively cleans samples via quality models and actively querying an expert on those noisy yet informative samples. To adapt to learning capacity per iteration, QActor dynamically adjusts the query limit according to the learning loss for each learning iteration. We extensively evaluate different image datasets with noise label ratios ranging between 30% and 60%. Our results show that QActor can nearly match the optimal accuracy achieved using only clean data at the cost of only an additional 10% of ground truth data from the oracle.},
    keywords = {}
    }

  • A. Ghiassi, R. Birke, and L. Y. Chen, “Trustnet: learning from trusted data against (a)symmetric label noise,” in 8th IEEE/ACM international conference on big data computing, applications and technologies (BDCAT), 2021, p. 52–62. doi:10.1145/3492324.3494166
    [BibTeX] [Abstract] [Download PDF]

    Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-the-art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.

    @inproceedings{bdcat-ghiassi21,
    author = {Amirmasoud Ghiassi and
    Robert Birke and
    Lydia Y. Chen},
    title = {TrustNet: Learning from Trusted Data Against (A)symmetric Label Noise},
    booktitle = {8th {IEEE/ACM} International Conference on Big Data Computing, Applications and Technologies ({BDCAT})},
    pages = {52--62},
    publisher = {{ACM}},
    year = {2021},
    month = {6--9 Dec},
    url = {https://doi.org/10.1145/3492324.3494166},
    doi = {10.1145/3492324.3494166},
    abstract = {Big Data systems allow collecting massive datasets to feed the data hungry deep learning. Labelling these ever-bigger datasets is increasingly challenging and label errors affect even highly curated sets. This makes robustness to label noise a critical property for weakly-supervised classifiers. The related works on resilient deep networks tend to focus on a limited set of synthetic noise patterns, and with disparate views on their impacts, e.g., robustness against symmetric v.s. asymmetric noise patterns. In this paper, we first extend the theoretical analysis of test accuracy for any given noise patterns. Based on the insights, we design TrustNet that first learns the pattern of noise corruption, being it both symmetric or asymmetric, from a small set of trusted data. Then, TrustNet is trained via a robust loss function, which weights the given labels against the inferred labels from the learned noise pattern. The weight is adjusted based on model uncertainty across training epochs. We evaluate TrustNet on synthetic label noise for CIFAR-10, CIFAR-100 and big real-world data with label noise, i.e., Clothing1M. We compare against state-of-the-art methods demonstrating the strong robustness of TrustNet under a diverse set of noise patterns.},
    keywords = {}
    }

  • G. Albanese, R. Birke, G. Giannopoulou, S. Schönborn, and T. Sivanthi, “Evaluation of networking options for containerized deployment of real-time applications,” in 26th IEEE international conference on emerging technologies and factory automation (ETFA), 2021, p. 1–8. doi:10.1109/ETFA45728.2021.9613320
    [BibTeX] [Abstract] [Download PDF]

    Enterprises in the field of industrial automation experience an increasing demand for providing virtualized software solutions. Inspired by the recent trends in serverless and cloud computing, software virtualization is considered even for safety-critical applications with hard real-time requirements, as a means of avoiding hardware vendor lock-in and reducing volume and maintenance cost of devices. In this work, we evaluate the applicability of OS-level virtualization to an industrial automation use case. Our application runs in Docker containers on top of Linux patched with PREEMPT_RT. We investigate the ability of Docker coupled with diverse networking technologies to fulfill the latency requirements of the application under normal or heavy system load. We empirically compare four networking technologies with respect to communication latency and frequency of missing packets. The results indicate that Docker with certain technologies, such as the Single Root I/O Virtualization interface, performs robustly even under heavy load, enabling sufficient performance isolation and low overhead that does not jeopardise the real-time performance of our application.

    @inproceedings{etfa-albanese21,
    author = {Giuliano Albanese and
    Robert Birke and
    Georgia Giannopoulou and
    Sandro Sch{\"{o}}nborn and
    Thanikesavan Sivanthi},
    title = {Evaluation of Networking Options for Containerized Deployment of Real-Time
    Applications},
    booktitle = {26th {IEEE} International Conference on Emerging Technologies and
    Factory Automation ({ETFA})},
    pages = {1--8},
    publisher = {{IEEE}},
    year = {2021},
    month = {7--10 Sep},
    url = {https://doi.org/10.1109/ETFA45728.2021.9613320},
    doi = {10.1109/ETFA45728.2021.9613320},
    abstract = {Enterprises in the field of industrial automation experience an increasing demand for providing virtualized software solutions. Inspired by the recent trends in serverless and cloud computing, software virtualization is considered even for safety-critical applications with hard real-time requirements, as a means of avoiding hardware vendor lock-in and reducing volume and maintenance cost of devices. In this work, we evaluate the applicability of OS-level virtualization to an industrial automation use case. Our application runs in Docker containers on top of Linux patched with PREEMPT_RT. We investigate the ability of Docker coupled with diverse networking technologies to fulfill the latency requirements of the application under normal or heavy system load. We empirically compare four networking technologies with respect to communication latency and frequency of missing packets. The results indicate that Docker with certain technologies, such as the Single Root I/O Virtualization interface, performs robustly even under heavy load, enabling sufficient performance isolation and low overhead that does not jeopardise the real-time performance of our application.},
    keywords = {}
    }

  • A. Ghiassi, R. Birke, R. Han, and L. Y. Chen, “LABELNET: recovering noisy labels,” in International joint conference on neural networks (IJCNN), 2021, p. 1–8. doi:10.1109/IJCNN52387.2021.9533562
    [BibTeX] [Abstract] [Download PDF]

    Today’s available datasets in the wild, e.g., from social media and open platforms, present tremendous opportunities and challenges for deep learning, as there is a significant portion of tagged images, but often with noisy, i.e. erroneous, labels. Recent studies improve the robustness of deep models against noisy labels without the knowledge of true labels. In this paper, we advocate to derive a stronger classifier which proactively makes use of the noisy labels in addition to the original images – turning noisy labels into learning features. To such an end, we propose a novel framework, LABELNET, composed of Amateur and Expert, which iteratively learn from each other. Amateur is a regular image classifier trained by the feedback of Expert, which imitates how human experts would correct the predicted labels from Amateur using the noise pattern learnt from the knowledge of both the noisy and ground truth labels. The trained Amateur and Expert proactively leverage the images and their noisy labels to infer image classes. Our empirical evaluations on noisy versions of MNIST, CIFAR-10, CIFAR-100 and real-world data of Clothing1M show that the proposed model can achieve robust classification against a wide range of noise ratios and with as little as 20-50% training data, compared to state-of-the-art deep models that solely focus on distilling the impact of noisy labels.

    @inproceedings{ijcnn-ghiassi21,
    author = {Amirmasoud Ghiassi and
    Robert Birke and
    Rui Han and
    Lydia Y. Chen},
    title = {{LABELNET:} Recovering Noisy Labels},
    booktitle = {International Joint Conference on Neural Networks ({IJCNN})},
    pages = {1--8},
    publisher = {{IEEE}},
    year = {2021},
    month = {18--22 Jul},
    url = {https://doi.org/10.1109/IJCNN52387.2021.9533562},
    doi = {10.1109/IJCNN52387.2021.9533562},
    abstract = {Today's available datasets in the wild, e.g., from social media and open platforms, present tremendous opportunities and challenges for deep learning, as there is a significant portion of tagged images, but often with noisy, i.e. erroneous, labels. Recent studies improve the robustness of deep models against noisy labels without the knowledge of true labels. In this paper, we advocate to derive a stronger classifier which proactively makes use of the noisy labels in addition to the original images - turning noisy labels into learning features. To such an end, we propose a novel framework, LABELNET, composed of Amateur and Expert, which iteratively learn from each other. Amateur is a regular image classifier trained by the feedback of Expert, which imitates how human experts would correct the predicted labels from Amateur using the noise pattern learnt from the knowledge of both the noisy and ground truth labels. The trained Amateur and Expert proactively leverage the images and their noisy labels to infer image classes. Our empirical evaluations on noisy versions of MNIST, CIFAR-10, CIFAR-100 and real-world data of Clothing1M show that the proposed model can achieve robust classification against a wide range of noise ratios and with as little as 20-50% training data, compared to state-of-the-art deep models that solely focus on distilling the impact of noisy labels.},
    keywords = {}
    }

  • B. Cox, J. Galjaard, A. Ghiassi, R. Birke, and L. Y. Chen, “Masa: responsive multi-dnn inference on the edge,” in 19th IEEE international conference on pervasive computing and communications (PerCom), 2021, p. 1–10. doi:10.1109/PERCOM50583.2021.9439111
    [BibTeX] [Abstract] [Download PDF]

    Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users’ quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.

    @inproceedings{percom-cox21a,
    author = {Bart Cox and
    Jeroen Galjaard and
    Amirmasoud Ghiassi and
    Robert Birke and
    Lydia Y. Chen},
    title = {Masa: Responsive Multi-DNN Inference on the Edge},
    booktitle = {19th {IEEE} International Conference on Pervasive Computing and Communications ({PerCom})},
    pages = {1--10},
    publisher = {{IEEE}},
    year = {2021},
    month = {22--26 Mar},
    url = {https://doi.org/10.1109/PERCOM50583.2021.9439111},
    doi = {10.1109/PERCOM50583.2021.9439111},
    abstract = {Deep neural networks (DNNs) are becoming the core components of many applications running on edge devices, especially for real time image-based analysis. Increasingly, multi-faced knowledge is extracted via executing multiple DNNs inference models, e.g., identifying objects, faces, and genders from images. The response times of multi-DNN highly affect users’ quality of experience and safety as well. Different DNNs exhibit diversified resource requirements and execution patterns across layers and networks, which may easily exceed the available device memory and riskily degrade the responsiveness. In this paper, we design and implement Masa, a responsive memory-aware multi-DNN execution framework, an on-device middleware featuring on modeling inter- and intra-network dependency and leveraging complimentary memory usage of each layer. Masa can consistently ensure the average response time when deterministically and stochastically executing multiple DNN-based image analyses. We extensively evaluate Masa on three configurations of Raspberry Pi and a large set of popular DNN models triggered by different generation patterns of images. Our evaluation results show that Masa can achieve lower average response times by up to 90% on devices with small memory, i.e., 512 MB to 1 GB, compared to the state of the art multi-DNN scheduling solutions.},
    keywords = {}
    }

  • J. Galjaard, B. Cox, A. Ghiassi, L. Y. Chen, and R. Birke, “Mema: fast inference of multiple deep models,” in 19th IEEE international conference on pervasive computing and communications workshops and other affiliated events, 2021, p. 281–286. doi:10.1109/PerComWorkshops51409.2021.9430952
    [BibTeX] [Abstract] [Download PDF]

    The execution of deep neural network (DNN) inference jobs on edge devices has become increasingly popular. Multiple of such inference models can concurrently analyse the on-device data, e.g. images, to extract valuable insights. Prior art focuses on low-power accelerators, compressed neural network architectures, and specialized frameworks to reduce execution time of single inference jobs on edge devices which are resource constrained. However, it is little known how different scheduling policies can further improve the runtime performance of multi-inference jobs without additional edge resources. To enable the exploration of scheduling policies, we first develop an execution framework, EdgeCaffe, which splits the DNN inference jobs by loading and execution of each network layer. We empirically characterize the impact of loading and scheduling policies on the execution time of multi-inference jobs and point out their dependency on the available memory space. We propose a novel memory-aware scheduling policy, MemA, which opportunistically interleaves the executions of different types of DNN layers based on their estimated run-time memory demands. Our evaluation on exhaustive combinations of five networks, data inputs, and memory configurations show that MemA can alleviate the degradation of execution times of multi-inference (up to 5×) under severely constrained memory compared to standard scheduling policies without affecting accuracy.

    @inproceedings{percom-galjaard21,
    author = {Jeroen Galjaard and
    Bart Cox and
    Amirmasoud Ghiassi and
    Lydia Y. Chen and
    Robert Birke},
    title = {MemA: Fast Inference of Multiple Deep Models},
    booktitle = {19th {IEEE} International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events},
    pages = {281--286},
    publisher = {{IEEE}},
    year = {2021},
    month = {22--26 Mar},
    url = {https://doi.org/10.1109/PerComWorkshops51409.2021.9430952},
    doi = {10.1109/PerComWorkshops51409.2021.9430952},
    abstract = {The execution of deep neural network (DNN) inference jobs on edge devices has become increasingly popular. Multiple of such inference models can concurrently analyse the on-device data, e.g. images, to extract valuable insights. Prior art focuses on low-power accelerators, compressed neural network architectures, and specialized frameworks to reduce execution time of single inference jobs on edge devices which are resource constrained. However, it is little known how different scheduling policies can further improve the runtime performance of multi-inference jobs without additional edge resources. To enable the exploration of scheduling policies, we first develop an execution framework, EdgeCaffe, which splits the DNN inference jobs by loading and execution of each network layer. We empirically characterize the impact of loading and scheduling policies on the execution time of multi-inference jobs and point out their dependency on the available memory space. We propose a novel memory-aware scheduling policy, MemA, which opportunistically interleaves the executions of different types of DNN layers based on their estimated run-time memory demands. Our evaluation on exhaustive combinations of five networks, data inputs, and memory configurations show that MemA can alleviate the degradation of execution times of multi-inference (up to 5×) under severely constrained memory compared to standard scheduling policies without affecting accuracy.},
    keywords = {}
    }

  • C. Hong, A. Ghiassi, Y. Zhou, R. Birke, and L. Y. Chen, “Online label aggregation: A variational bayesian approach,” in WWW ’21: the web conference 2021, 2021, p. 1904–1915. doi:10.1145/3442381.3449933
    [BibTeX] [Abstract] [Download PDF]

    Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.

    @inproceedings{www-hong21,
    author = {Chi Hong and
    Amirmasoud Ghiassi and
    Yichi Zhou and
    Robert Birke and
    Lydia Y. Chen},
    editor = {Jure Leskovec and
    Marko Grobelnik and
    Marc Najork and
    Jie Tang and
    Leila Zia},
    title = {Online Label Aggregation: {A} Variational Bayesian Approach},
    booktitle = {{WWW} '21: The Web Conference 2021},
    pages = {1904--1915},
    publisher = {{ACM} / {IW3C2}},
    year = {2021},
    month = {19--23 Apr},
    url = {https://doi.org/10.1145/3442381.3449933},
    doi = {10.1145/3442381.3449933},
    abstract = {Noisy labeled data is more a norm than a rarity for crowd sourced contents. It is effective to distill noise and infer correct labels through aggregating results from crowd workers. To ensure the time relevance and overcome slow responses of workers, online label aggregation is increasingly requested, calling for solutions that can incrementally infer true label distribution via subsets of data items. In this paper, we propose a novel online label aggregation framework, BiLA , which employs variational Bayesian inference method and designs a novel stochastic optimization scheme for incremental training. BiLA is flexible to accommodate any generating distribution of labels by the exact computation of its posterior distribution. We also derive the convergence bound of the proposed optimizer. We compare BiLA with the state of the art based on minimax entropy, neural networks and expectation maximization algorithms, on synthetic and real-world data sets. Our evaluation results on various online scenarios show that BiLA can effectively infer the true labels, with an error rate reduction of at least 10 to 1.5 percent points for synthetic and real-world datasets, respectively.},
    keywords = {}
    }