Our Recent Research Areas Include:

I. Algorithms and tools

INTEGRATE: gene fusion discovery using whole genome and whole transcriptome data

Overview of INTEGRATE

While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.

Zhang J, et al. Genome Research 2016. PMID: 26556708

Best paper in Bioinformatics and Translational Informatics – IMIA Yearbook of Medical Informatics 2017

HPV-EM: an accurate HPV detection and genotyping EM algorithm

Overview of the HPV-EM tool

Accurate HPV genotyping is crucial in facilitating epidemiology studies, vaccine trials, and HPV-related cancer research. Contemporary HPV genotyping assays only detect  < 25% of all known HPV genotypes and are not accurate for low-risk or mixed HPV genotypes. Current genomic HPV genotyping algorithms use a simple read-alignment and filtering strategy that has difficulty handling repeats and homology sequences. Therefore, we have developed an optimized expectation–maximization algorithm, designated HPV-EM, to address the ambiguities caused by repetitive sequencing reads. HPV-EM achieved 97–100% accuracy when benchmarked using cell line data and TCGA cervical cancer data. We also validated HPV-EM using DNA tiling data on an institutional cervical cancer cohort (96.5% accuracy). Using HPV-EM, we demonstrated HPV genotypic differences in recurrence and patient outcomes in cervical and head and neck cancers.

Inkman M. et al. Scientific Reports 2020. PMID: 32868873

More algorithms and tools:

INTEGRATE (Genome Research 2016); INTEGRATE-Neo (Bioinformatics 2017); INTEGRATE-Vis (Scientific Reports 2017); SVseq2 (BMC Bioinformatics 2012); SVseq(Bioinformatics)

II. Applications of gene fusions and structural variations

Functional Annotation of ESR1 Gene Fusions in Estrogen Receptor-Positive Breast Cancer

ESR1 gene fusions promote endocrine therapy resistant cell proliferation and metastasis.  

More on gene fusions and SVs

SMC-RNA Dream challenge (Cell Systems 2021) ESR1 EMT(Cell Reports 2018); ESR1 Treatment Resistance (Cell Reports 2013); Prostate Cancer (Cell 2018); mFL-HCC (Annals of Oncology 2016); Adult B-lymphoblastic leukemia (Experimental Hematology 2016)

III. HPV-related cancer biology, metabolism, and radiogenomics

Integrating imaging and RNA-seq improves outcome prediction in cervical cancer

graphical abstract
Graphical abstract

Approaches using a single type of data have been applied to classify human tumors. Here we integrate imaging features and transcriptomic data using a prospectively collected tumor bank. We demonstrate that increased maximum standardized uptake value on pretreatment 18F-fluorodeoxyglucose-positron emission tomography correlates with epithelial-to-mesenchymal transition (EMT) gene expression. We derived and validated 3 major molecular groups, namely squamous epithelial, squamous mesenchymal, and adenocarcinoma, using prospectively collected institutional (n = 67) and publicly available (n = 304) data sets. Patients with tumors of the squamous mesenchymal subtype showed inferior survival outcomes compared with the other 2 molecular groups. High mesenchymal gene expression in cervical cancer cells positively correlated with the capacity to form spheroids and with resistance to radiation. CaSki organoids were radiation-resistant but sensitive to the glycolysis inhibitor, 2-DG. These experiments provide a strategy for response prediction by integrating large data sets, and highlight the potential for metabolic therapy to influence EMT phenotypes in cervical cancer.

Zhang J, et al. Journal of Clinical Investigation 2021. PMID: 33645544

From QuadShot News: A hot mes | Zhang, J Clin Invest 2021

More on cervical cancer biology, metabolism, and radiogenomics:

HPV and CRT (JCI Insight 2021); SUVmax and Macrophage (Clinical Cancer Research 2021); Glutaminase Inhibitors (Molecular Cancer Therapeutics 2020); Neutrophils (PNAS 2019); SUVmax/Radiogenomics (ASTRO 2018)

IV. Non-coding RNAs

Comprehensive discovery of noncoding RNAs in acute myeloid leukemia cell transcriptomes

non-coding Small RNAs with different lengths

Zhang J, et al. Experimental Hematology 2017. PMID: 28760689

Featured in: RNA Biology Blog

Multi-institutional Analysis Shows that Low PCAT-14 Expression Associates with Poor Outcomes in Prostate Cancer


Integrative analysis reveals Prostate Cancer Associated Transcript-14 (PCAT-14) expression associates with prostate cancer

More on non-coding RNA:

PCAT-14 (European Urology 2017); Mid-sized RNA (Experimental Hematology 2017); cDNA Capture (The Journal of Molecular Diagnostics 2014)

V. Population genetics and cancer evolution

Tumor Evolution (Science Advances 2020); Rare Variants (BMC Genomics 2013); Haplotype Inference (PSB 2011); SNP and Logic Regression (ISCABS 2011)

VI. Cancers, immunology, radiation therapy, and other clinical applications

Cervical cancer (JCI Insight 2021; Clinical Cancer Research 2021; Journal of Clinical Investigation 2021; Molecular Cancer Therapeutics 2020; Scientific Reports 2020 ; PNAS 2019; ASTRO 2018) Breast cancer (Cell Reports 2018; Genome Research 2016; Cell Reports 2013) Prostate Cancer (Journal for ImmunoTherapy of Cancer 2020; Cell 2018; Bioinformatics 2017; Scientific Reports 2017; European Urology 2017; The Journal of Molecular Diagnostics 2014) Colorectal cancer (Science Advances 2020) Head and neck cancer (Scientific Reports 2020) Leukemia (Experimental Hematology 2017; Experimental Hematology 2016) Lung cancer (The Journal of Molecular Diagnostics 2014) mFL-HCC (Annals of Oncology 2016) Immunology (Journal for ImmunoTherapy of Cancer 2020; Bioinformatics 2017)

VII. Machine learning and deep learning

Tumor Evolution (Science Advances 2020); SVM in SV (PLoS One 2014); Synchronization Detection (BMC Genomics 2013); Probabilistic method (BMC Genomics 2013); Logic Regression (ISCABS 2011); Haplotype Inference (PSB 2011)

ML and DL is one of our lab’s most current research focuses. Please refer to News for ongoing grants and projects.

Please also refer to the full list of publications.