Asynchronous Federated Unlearning

Thanks to regulatory policies such as the General Data Protection Regulation (GDPR), it is essential to provide users with the right to erasure regarding their own private data, even if such data has been used to train a neural network model. Such a machine unlearning problem becomes even more challenging in the context of federated learning. where clients collaborate to train a global model with their private data. When a client requests that its data be erased, its effects have already gradually permeated through a large number of clients, as the server aggregates client updates over multiple communication rounds. Thus, erasing data samples from one client requires a large number of clients to engage in a retraining process.

Read more →

Gradient Leakage in Production Federated Learning

As an emerging distributed machine learning paradigm, federated learning (FL) allows clients to train machine learning models collaboratively with private data, without transmitting them to the server. Though federated learning is celebrated as a privacy-preserving paradigm of training machine learning models, sharing gradients with the server may lead to the potential reconstruction of raw private data, such as images and texts, used in the training process. The discovery of this new attack, known as Deep Leakage from Gradients (DLG), has stimulated a new line of research to improve the attack efficiency and to provide stronger defenses against known DLG-family attacks as well.

Read more →

Congestion Control with Deep Reinforcement Learning

For over a quarter of a century, it has been a fundamental challenge in networking research to design the best possible congestion control algorithms that optimize throughput and end-to-end latencies. Research interests in congestion control have recently been increasing as cloud applications have shown strong demands for higher throughput and lower latencies. Some prevailing congestion control algorithms may struggle to perform well on multiple diverse networks and/or to be fair towards other flows sharing the same network link.

Read more →

Privacy and Fairness in Model Partitioning

Deep learning sits at the forefront of many on-going advances in a variety of learning tasks. Despite its supremacy in accuracy under benign environments, Deep learning suffers from adversarial vulnerability and privacy leakage in adversarial environments. To mitigate privacy concerns in deep learning, the community has developed several distributed machine learning schemes, such as model partitioning. Model partitioning divides the deep learning model into a client partition and a server partition. As shown in the figure below, instead of sharing data $\mathbf{\mathit{x}}$ with the server, the users feed their data into the client partition $\mathbf{F}_{\mathbf{\mathit{\theta}}}(\cdot)$ and send the output representations $\mathbf{\mathit{z}}$ to the server. The server can feed the representations into the server partition $\mathbf{f}_{\mathbf{\mathit{\phi}}}(\cdot)$ to make predictions.

Read more →

Device Placement Optimization with Deep Reinforcement Learning

With the growth of machine learning, today, deep neural networks (DNNs) have been widely used in many real-world applications, and the size of DNNs is becoming exceedingly large. To train DNNs with hundreds of millions of parameters, it is common to use a cluster of accelerators, e.g., GPUs and TPUs, to speed up the training process. Thus, there is a need to coordinate the accelerators to train DNNs efficiently, which refers to the device placement problem.

Read more →

Distributed Inference of Deep Learning Models

Deep learning models are typically deployed at remote cloud servers and require users to upload local data for inference, incurring considerable overhead with respect to the time needed for transferring large volumes of data over the Internet. An intuitive solution to reduce such overhead is to offload these inference tasks from the cloud server to the edge devices. Unfortunately, edge devices are typically resource-constrained while the inference process is extremely computation-intensive. Directly using a deep learning model for inference on devices with limited computation power may result in an even longer inference time. For this reason, it is desirable to design distributed inference mechanisms that accelerate the inference process by partitioning the workload and distributing them to a cluster of edge devices for cooperative inference.

Read more →

Multi-Modal Federated Learning on Non-IID Data

Existing work in federated learning focused on addressing uni-modal tasks, where training generally embraces one modality, such as images or texts. As a result, the global model is uni-modal, containing a modality-specific neural network structure, using samples from a specific modality as its input for training. It is intriguing to explore how federated learning can be extended to the realm of multi-modal tasks effectively, as models are trained with datasets from multiple modalities, such as images and texts. Though several existing studies proposed to train multi-modal models with federated learning, the specific challenges imposed by non-IID data distributions in the context of multi-model federated learning, referred to as multi-modal FL, have not been explored.

Read more →

Multi-Resource Scheduling

Queueing algorithms determine the order in which packets in various independent flows are processed, and serve as a fundamental mechanism for allocating resources in a network appliance. Traditional queueing algorithms make scheduling decisions in network switches that simply forward packets to their next hops, and link bandwidth is the only resource being allocated.

Read more →

Bandwidth Allocation in Datacenter Networks

Web service providers like Google and Facebook have built large scale datacenters to host many computationally intensive applications, ranging from PageRank to machine learning. In order to efficiently proceed a large volume of data, these applications typically embrace data parallel frameworks, such as MapReduce. In general, data parallel applications proceed in several computation stages that require communication between them. During the communication stage, link bandwidth is heavily demanded by constituent flows to transfer the intermediate data across the datacenter network. With MapReduce as an example, input data is first partitioned into a set of splits, so that they can be processed in parallel with map computation tasks. The map tasks produce intermediate results, which are then shuffled over the datacenter network to be processed by reduce computation tasks.

Read more →