- Multi-version Coding and Shared Memory Emulation
The research objective is to study the fundamental problem of storing evolving information in distributed networks through coding-based distributed computing. When multiple versions of a message are to be written to a storage network, the project studies how to use erasure codes and distributed algorithms to tolerate communication and storage node failures and obtain consistent storage, while maintaining a low cost in terms of storage and communication sizes.
- Regenerating Codes for Distributed Storage
In distributed storage, a file is stored in a set of nodes and protected by erasure-correcting codes. Regenerating code is a type of code with two properties: first, it can reconstruct the entire file in the presence of any r node erasures for some specified integer r; moreover, it can efficiently repair an erased node from any subset of remaining nodes with a given size. In the repair process, the amount of information transmitted from each node normalized by the storage size per node is termed repair bandwidth. The research objective is to study regenerating codes and other forms of codes for distributed storage in terms of storage size, repair bandwidth, locality (the amount of helper nodes used to repair the erasure), and availability (the number of different ways to repair the erased nodes).
- Compression of Genomic Data
The start of the 21st century has witnessed a dramatic decrease in the cost of DNA/RNA sequencing: the human genome sequencing prices dropped from 2 billion dollars to about 1000 dollars. As a result, programs such as the Million Cancer Genome Warehouse, and the ENCODE Project, have either generated or are expected to generate genomic and functional genomic data in the order of hundreds of PBs per year. However, current storage solutions and prices do not scale appropriately with such massive data surges. It is therefore of paramount importance to develop efficient and task-oriented compression methods for various emerging genomic data representations. The goal of the research to find proper models to describe genomic data, and design compression algorithms so that data can be efficiently stored but also easily computed by downstream applications.
- PhD Students