Research Topics

  • Coded Distributed Computing

Distributed computing has received wide interest because of the huge amount of data required by many popular applications such as machine learning. This problem seeks to use coding theory to improve the performance of distributed computing in terms of computation complexity, communication cost, and delay. For example, the tradeoff between the computation and the communication cost are considered for a programming platform called Map Reduce, that facilitates the well-known distributed processing software Hadoop. For another example, efficient and reliable secure distributed matrix multiplication are proposed that enables low latency and the cooperation among servers in any connected graph.

  • Coding for DNA Storage

DNA storage have attracted a lot of interest in recent years due to its ultra-high data density, long data retention, and low power consumption. However, because of the high raw data error rate, coding techniques is crucial to enable DNA storage in practical applications. This project investigates several coding techniques such as LDPC codes and constrained codes, as well new information representation methods using DNA sequences, such as composite DNA letters.

  • Private Information Retrieval

The goal of private information retrieval (PIR) is to allow a user to retrieve an arbitrary desired message out of K independent messages that are replicated across N distributed servers, without revealing any information about the identity of the desired message to any individual server. The problem of interest is how to achieve PIR using coding-theoretic or computational-theoretic methods, and what are the associated communication cost.

  • Multi-version Coding and Shared Memory Emulation

The research objective is to study the fundamental problem of storing evolving information in distributed networks through coding-based distributed computing. When multiple versions of a message are to be written to a storage network, the project studies how to use erasure codes and distributed algorithms to tolerate communication and storage node failures and obtain consistent storage, while maintaining a low cost in terms of storage and communication sizes.

  • Regenerating Codes for Distributed Storage

In distributed storage, a file is stored in a set of nodes and protected by erasure-correcting codes. Regenerating code is a type of code with two properties: first, it can reconstruct the entire file in the presence of any r node erasures for some specified integer r; moreover, it can efficiently repair an erased node from any subset of remaining nodes with a given size. In the repair process, the amount of information transmitted from each node normalized by the storage size per node is termed repair bandwidth. The research objective is to study regenerating codes and other forms of codes for distributed storage in terms of storage size, repair bandwidth, locality (the amount of helper nodes used to repair the erasure), and availability (the number of different ways to repair the erased nodes).

  • Compression of Genomic Data

The start of the 21st century has witnessed a dramatic decrease in the cost of DNA/RNA sequencing: the human genome sequencing prices dropped from 2 billion dollars to about 1000 dollars. As a result, programs such as the Million Cancer Genome Warehouse, and the ENCODE Project, have either generated or are expected to generate genomic and functional genomic data in the order of hundreds of PBs per year. However, current storage solutions and prices do not scale appropriately with such massive data surges. It is therefore of paramount importance to develop efficient and task-oriented compression methods for various emerging genomic data representations. The goal of the research to find proper models to describe genomic data, and design compression algorithms so that data can be efficiently stored but also easily computed by downstream applications.

Group Members

  • PhD Students

Alireza Javani

Wenkai Zhang

         Aadhitya Satchidanandan

Previous Members

  • PhD Students

Marwen Zorgui (Qualcomm)

Weiqi Li (Microsoft)

Zhen Chen (Google)

Peng Fei (Stellar Cyber)

  • Master Students

Xiaoran Li

          Keqing Fu

  • Visiting Scholars

Jinyuan Chen