Meta-Analysis in Gene Expression Studies

A book chapter by Markus Riester (Google Scholar) and myself just came out (Pubmed, or free pre-print). It outlines our approach to meta-analysis in gene expression studies that we developed primarily during two ovarian cancer papers published in JNCI (Markus’s Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples and my Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer). With more than 1.75 million gene expression profiles publicly available through the Gene Expression Omnibus, there are good reasons to integrate data from published studies. One is reducing the risk of drawing conclusions tainted by study-specific artifacts – here is just one example of where that happened and resulted in a retracted ovarian cancer paper.  Another is being able to analyze larger samples, or more specific patient sub-groups, than was possible from any of the original studies alone.

There are plenty of methods around for synthesizing high-throughput gene expression studies (see e.g. this review), but our preferred method is old-fashioned meta-analysis, involving assuming fixed or random effects across studies, using fitting and plotting features from the excellent metafor R package which has nothing to do with gene expression.  These “old-fashioned” approaches include methods for assessing heterogeneity between datasets, looking for associations that are relatively consistent across datasets, and identifying outlier datasets, and it’s not too hard to extend them from their normal usage for synthesizing a single effect estimate to synthesizing fold-change estimates for thousands of genes.