Research Interests

Clonal heterogeneity of ovarian cancer subtypes

Recent studies have indicated that individual ovarian cancer tumors are comprised of multiple clones. Thus the chronology of subtype differentiation has important implications for subtype-specific treatment, because late events are less likely to be shared among sub-clones. If subtypes are polyclonal within a tumor, then treating one subtype cannot provide lasting remission. I am investigating the chronology of subtype differentiation to test the hypothesis that proposed ovarian cancer subtypes occur late in tumorigenesis and are not shared between intra-tumor clones, using DNA sequencing and SNP array data from The Cancer Genome Atlas (TCGA).

Assessing the robustness of ovarian cancer subtypes and their relationship with patient outcome

Ovarian cancer is a molecularly heterogeneous disease in which clinically similar cases can exhibit dramatically different response to treatment. Several major studies have identified potential ovarian cancer molecular subtypes, while several others have been unable to do so. Each study reporting subtypes has offered related but different definitions, and has presented limited and different datasets for validating subtype discreteness and association to patient outcome. Thus the robustness and clinical utility of transcriptome subtypes of high-grade, serous ovarian remain controversial. This project applies the major proposed subtypes consistently across all publicly available datasets and assesses them by comparative meta-analysis, evaluating them for robustness and association to overall survival.

Integrative analysis of multi-assay genomic experiments

This project develops scalable R / Bioconductor software infrastructure and data resources tointegrate complex, heterogeneous, and large cancer genomic experiments. The falling cost of genomic assays facilitates collection of multiple data types (e.g., gene and transcript expression, structural variation, copy number, methylation, and microRNA data) from a set of clinical specimens. Furthermore, substantial resources are now available from large consortium activities like The Cancer Genome Atlas (TCGA). Existing analysis pipelines focus on the treatment of a specific data type, leaving a critical need for tools for integrative analysis of multiple genomic assays for locally generated or publicly availabledata. R / Bioconductor has historically provided standardized genomic data structures and annotations that have enjoyed widespread adoption in the cancer genomics research community. This proposal adapts R / Bioconductor to meet the increasing conceptual and computational complexity of multi-assay cancer genomic experiments, and has created the MultiAssayExperiment package for Bioconductor.

Human microbiome analysis for public health

I am interested in the role of the human microbiome as an ongoing link between host and environment, and its role in human health and disease. Along with the laboratory of Nicola Segata at the University of Trento, we developed the curatedMetagenomicData package for Bioconductor, which provides curated microbiome profiles for thousands of human-associated microbiomes. As part of the New York City Health and Nutrition Examination Study, we have profiled the oral microbiome of a representative sample of the population of NYC. This study has collected extensive lifestyle, health, and socio-demographic data, in addition to oral rinse specimens for microbiome analysis, from a randomized population-representative sample of 1,500 adults. Among other things, this study will evaluate changes to the oral cavity caused by a wide range of tobacco exposures (cigarette, secondhand smoke, hookah, and e-cigarette) from a racially and ethnically diverse, population-based sample of NYC adults.