The bioinformatics research team is composed of bioinformatics scientists and statistical analysts with master or PhD degrees. The team has effectively conducted breast cancer and pan-can bioinformatics studies, supported data analysis needs to researchers in the institute, and collaborated with external scientists for team-science research involving oncology surgeons, pathologists, nurses, genomics scientists, and proteomics scientists. The following are a few highlights of our research projects.

Our bioinformatics team led the clinical data quality improvement effort of the TCGA breast cancer study. As part of the TCGA-Breast Cancer Analysis Working Group (BCAWG), we co-authored the breast cancer study marker paper that was published in Nature in 2012 to provide a comprehensive molecular portrait of breast cancer. In subsequent years, our bioinformatics scientists continued to work with the rest of the BCAWG team members to study lobular breast cancer that was published in Cell in 2014. We then made significant contributions to the racial disparity studies between patients of African Ancestry and European Ancestry that was published in JAMA Oncology for which our team member was listed as a co-first author.

As data from 33 cancer types all became available from TCGA, PanCanAtlas studies started in mid-2015. Thirty AWGs were formed, grouped into three research themes including cell-of-origin patterns, oncogenic processes, and signaling pathways. Scientists from our institute participated in these studies, leading the development of the TCGA Clinical Data Resource (TCGA-CDR) with a paper published in Cell.  TCGA-CDR removed any doubt that TCGA clinical data were of value and provided guidance on how the clinical data from 11,160 patients across 33 cancer types should be used, adding tremendous value to the enormous amount of premium quality cancer molecular data that is available. Publication of this paper was accompanied with a press release from NCI which commented on this study as well as other major PanCanAtlas studies. The institute published a press release jointly with other contributing institutions. In addition to leading this TCGA-CDR study, our scientists also significantly contributed to and co-authored 4 other studies. Our scientists were also part of the Cancer Genome Atlas Research Network, listed as an author in all other PanCanAtlas Network papers published in the Cell family of journals.  As proud members of the TCGA Research Network, we herald the historical achievements made by the program which provides a foundation for cancer studies for many years to come. Researchers, clinicians, and ultimately patients will all benefit from the data and the results generated by TCGA on our way to leading the conquering of human cancers.


  1. The Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast cancer. Nature 2012;490(7418):61-70.
  2. Ciriello G, Gatza ML, Beck AH, Wilkerson MD, Rhie SK, Pastore A, Zhang H, McLellan M, Yau C, Kandoth C, Bowlby R, Shen H, Hayat S, Fieldhouse R, Lester SC, Tse GM, Factor RE, Collins LC, Allison KH, Chen YY, Jensen K, Johnson NB, Oesterreich S, Mills GB, Cherniack AD, Robertson G, Benz C, Sander C, Laird PW, Hoadley KA, King TA; TCGA Research Network, Perou CM. Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer. Cell. 2015 Oct 8;163(2):506-19.
  3. Huo D, Hu H, Rhie SK, Gamazon ER, Cherniack AD, Liu J, Yoshimatsu TF, Pitt JJ, Hoadley KA, Troester M, Ru Y, Lichtenberg T, Sturtz LA, Shelley CS, Benz CC, Mills GB, Laird PW, Shriver CD, Perou CM, Olopade OI. Comparison of Breast Cancer Molecular Features and Survival by African and European Ancestry in The Cancer Genome Atlas. JAMA oncology. 2017 Dec 1;3(12):1654-1662. (Co-first Author)
  4. Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD, Kovatich AJ, Benz CC, Levine DA, Lee AV, Omberg L, Wolf DM, Shriver CD, Thorsson V, Cancer Genome Atlas Research Network and Hu H. Liu J, Lichtenberg T, Hoadley KA, Poisson LM, Lazar AJ, Cherniack AD, Kovatich AJ, Benz CC, Levine DA, Lee AV, Omberg L, Wolf DM, Shriver CD, Thorsson V, Cancer Genome Atlas Research Network and Hu H. An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.  Cell 2018;173(2):400-416 e11.  Cell 2018;173(2):400-416 e11. (One of the 8 TCGA PanCanAtlas papers that collectively made the cover)
  5. Ge Z, Leighton JS, Wang Y, Peng X, Chen Z, Chen H, Sun Y, Yao F, Li J, Zhang H, Liu J, Shriver CD, Hu H, Cancer Genome Atlas Research Network, Piwnica-Worms H, Ma L and Liang H. Integrated Genomic Analysis of the Ubiquitin Pathway across Cancer Types.  Cell Rep 2018;23(1):213-226 e3.
  6. Thorsson V, Gibbs DL, Brown SD, Wolf D, Bortone DS, Ou Yang TH, Porta-Pardo E, Gao GF, Plaisier CL, Eddy JA, Ziv E, Culhane AC, Paull EO, Sivakumar IKA, Gentles AJ, Malhotra R, Farshidfar F, Colaprico A, Parker JS, Mose LE, Vo NS, Liu J, Liu Y, Rader J, Dhankani V, Reynolds SM, Bowlby R, Califano A, Cherniack AD, Anastassiou D, Bedognetti D, Rao A, Chen K, Krasnitz A, Hu H, Malta TM, Noushmehr H, Pedamallu CS, Bullman S, Ojesina AI, Lamb A, Zhou W, Shen H, Choueiri TK, Weinstein JN, Guinney J, Saltz J, Holt RA, Rabkin CE, Cancer Genome Atlas Research Network, Lazar AJ, Serody JS, Demicco EG, Disis ML, Vincent BG and Shmulevich L. The Immune Landscape of Cancer.  Immunity 2018;48:812-830.e14 (Cover paper of the journal)
  7. Taylor AM, Shih J, Ha G, Gao GF, Zhang X, Berger AC, Schumacher SE, Wang C, Hu H, Liu J, Lazar AJ, Cancer Genome Atlas Research Network, Cherniack AD, Beroukhim R and Meyerson M. Genomic and Functional Approaches to Understanding Cancer Aneuploidy.  Cancer Cell 2018;33(4):676-689.e3
  8. Knijnenburg TA, Wang L, Zimmermann MT, Chambwe N, Gao GF, Cherniack AD, Fan H, Shen H, Way GP, Greene CS, Liu Y, Akbani R, Feng B, Donehower LA, Miller C, Shen Y, Karimi M, Chen H, Kim P, Jia P, Shinbrot E, Zhang S, Liu J, Hu H, Bailey MH, Yau C, Wolf D, Zhao Z, Weinstein JN, Li L, Ding L, Mills GB, Laird PW, Wheeler DA, Shmulevich I, Cancer Genome Atlas Research Network, Monnat RJ, Jr., Xiao Y and Wang C. Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas. Cell Rep 2018;23(1):239-254 e6.

Beneficiaries of the DoD healthcare system have equal access to health care, which provides a unique opportunity to study patient treatment and survival disparities that might be due to socioeconomic factors including access to care. As a baseline study, when clinical outcome data were available for CBCP patients receiving treatments at WRNMMC, we performed an analysis to compare survival outcomes of these patients with those from matched patients recorded in the Surveillance, Epidemiology, and End Results Program (SEER). Our results indicate that overall patients treated at WRNMMC were less likely to die from breast cancer. This increase in survival was also significant in African American patients (HR=0.524, 95% CI=0.277-0.992; P=0.047) and patients older than 50 years. This study was highlighted on the cover of the journal Military Medicine. Our initial observations laid the foundation for possible future disparity studies on race and age at diagnosis etc.

Reference: Ru Y, Liu J, Fantacone-Campbell JL, Zhu K, Kovatich AJ, Hooke, JA, Kvecher L, Deyarmin B, Kovatich AW, Cammarata F, Hueman MT, Rui H, Mural RJ, Shriver CD, and Hu H. Survival comparative analysis of breast cancer patients treated by a military medical center and matched patients of the U. S. general population. Mil Med. 2017 Nov;182(11):e1851-e1858.

PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B

This study presents a method termed PCA-PAM50 to perform intrinsic subtyping which enhanced the consistency between gene expression-based PAM50 calls and protein-based IHC subtyping. Gene-expression based subtyping is widely used for research whereas IHC-based subtyping is used for clinical intervention, and inconsistencies between the two subtype calls affects translating research findings to clinical utilities. We evaluated our approach on three different primary breast cancer datasets: in-house RNA-Seq, TCGA RNA-Seq and METABRIC microarray. By using the PCA-PAM50, improved consistency was observed between intrinsic and clinical subtyping for three different breast cancer cohorts. Particularly, the luminal B (LB) cases increased in consistency by 25-49%. In the TCGA-BC cohort, a subset of LA tumors were reclassified as LB which demonstrated significantly worse clinical survival outcomes compared to cases that remained as LA, and it is this subset of the patients that resulted in the significant outcome difference between the newly classified LA and LB subtype cases. Furthermore, the switched cases showed a significantly higher level of the expression of the MKI67 gene. Taken together, the new method we developed, termed PCA-PAM50, makes gene expression-based breast cancer subtyping more clinically relevant.


Raj-Kumar, P. K., Liu, J., Hooke, J. A., Kovatich, A. J., Kvecher, L., Shriver, C. D., & Hu, H. (2019). PCA-PAM50 improves consistency between breast cancer intrinsic and clinical subtyping reclassifying a subset of luminal A tumors as luminal B. Scientific reports9(1), 7956.

Through years of translational research, our biomedical informatics scientists gained first-hand knowledge in the field which prompted them to write a book entitled, “Biomedical Informatics in Translational Research”. As the publisher wrote, “This groundbreaking resource on biomedical informatics gives you step-by-step insight into innovative techniques for integrating and federating data from clinical and high-throughput molecular study platforms as well as from the public domain. It details how to apply computational and statistical technologies to clinical, genomic, and proteomic studies to enhance data collection, tracking, storage, visualization, analysis, and knowledge discovery processes, and to translate knowledge from “bench to bedside” and “bedside to bench” with never-before efficiency.”

Reference: Hai Hu, Richard J. Mural, and Michael N. Liebman (Editors). Biomedical Informatics in Translational Research. Artech Publishing House. 2008.


PCA-PAM50 is used to perform intrinsic subtyping for breast cancer gene expression data. PCA-PAM50 leverages principal component analysis (PCA) and iterative PAM50 calls to derive the gene expression-based ER status. This generates an ER-balanced subset which is automatically used for gene centering and subsequent intrinsic calls. Thus, PCA-PAM50 performs intrinsic subtyping in ER status unbalanced cohorts.

View Tool