Our Recent Research Areas Include:
I. Algorithms and tools
While next-generation sequencing (NGS) has become the primary technology for discovering gene fusions, we are still faced with the challenge of ensuring that causative mutations are not missed while minimizing false positives. Currently, there are many computational tools that predict structural variations (SV) and gene fusions using whole genome (WGS) and transcriptome sequencing (RNA-seq) data separately. However, as both WGS and RNA-seq have their limitations when used independently, we hypothesize that the orthogonal validation from integrating both data could generate a sensitive and specific approach for detecting high-confidence gene fusion predictions. Fortunately, decreasing NGS costs have resulted in a growing quantity of patients with both data available. Therefore, we developed a gene fusion discovery tool, INTEGRATE, that leverages both RNA-seq and WGS data to reconstruct gene fusion junctions and genomic breakpoints by split-read mapping. To evaluate INTEGRATE, we compared it with eight additional gene fusion discovery tools using the well-characterized breast cell line HCC1395 and peripheral blood lymphocytes derived from the same patient (HCC1395BL). The predictions subsequently underwent a targeted validation leading to the discovery of 131 novel fusions in addition to the seven previously reported fusions. Overall, INTEGRATE only missed six out of the 138 validated fusions and had the highest accuracy of the nine tools evaluated. Additionally, we applied INTEGRATE to 62 breast cancer patients from The Cancer Genome Atlas (TCGA) and found multiple recurrent gene fusions including a subset involving estrogen receptor. Taken together, INTEGRATE is a highly sensitive and accurate tool that is freely available for academic use.
Accurate HPV genotyping is crucial in facilitating epidemiology studies, vaccine trials, and HPV-related cancer research. Contemporary HPV genotyping assays only detect < 25% of all known HPV genotypes and are not accurate for low-risk or mixed HPV genotypes. Current genomic HPV genotyping algorithms use a simple read-alignment and filtering strategy that has difficulty handling repeats and homology sequences. Therefore, we have developed an optimized expectation–maximization algorithm, designated HPV-EM, to address the ambiguities caused by repetitive sequencing reads. HPV-EM achieved 97–100% accuracy when benchmarked using cell line data and TCGA cervical cancer data. We also validated HPV-EM using DNA tiling data on an institutional cervical cancer cohort (96.5% accuracy). Using HPV-EM, we demonstrated HPV genotypic differences in recurrence and patient outcomes in cervical and head and neck cancers.
More algorithms and tools:
II. Applications of gene fusions and structural variations
ESR1 gene fusions promote endocrine therapy resistant cell proliferation and metastasis.
III. HPV-related cancer biology, metabolism, and radiogenomics
Approaches using a single type of data have been applied to classify human tumors. Here we integrate imaging features and transcriptomic data using a prospectively collected tumor bank. We demonstrate that increased maximum standardized uptake value on pretreatment 18F-fluorodeoxyglucose-positron emission tomography correlates with epithelial-to-mesenchymal transition (EMT) gene expression. We derived and validated 3 major molecular groups, namely squamous epithelial, squamous mesenchymal, and adenocarcinoma, using prospectively collected institutional (n = 67) and publicly available (n = 304) data sets. Patients with tumors of the squamous mesenchymal subtype showed inferior survival outcomes compared with the other 2 molecular groups. High mesenchymal gene expression in cervical cancer cells positively correlated with the capacity to form spheroids and with resistance to radiation. CaSki organoids were radiation-resistant but sensitive to the glycolysis inhibitor, 2-DG. These experiments provide a strategy for response prediction by integrating large data sets, and highlight the potential for metabolic therapy to influence EMT phenotypes in cervical cancer.
From QuadShot News: A hot mes | Zhang, J Clin Invest 2021
More on cervical cancer biology, metabolism, and radiogenomics:
HPV and CRT (JCI Insight 2021); SUVmax and Macrophage (Clinical Cancer Research 2021); Glutaminase Inhibitors (Molecular Cancer Therapeutics 2020); Neutrophils (PNAS 2019); SUVmax/Radiogenomics (ASTRO 2018)
IV. Non-coding RNAs
Featured in: RNA Biology Blog
Multi-institutional Analysis Shows that Low PCAT-14 Expression Associates with Poor Outcomes in Prostate Cancer
More on non-coding RNA:
V. Population genetics and cancer evolution
VI. Cancers, immunology, radiation therapy, and other clinical applications
Cervical cancer (JCI Insight 2021; Clinical Cancer Research 2021; Journal of Clinical Investigation 2021; Molecular Cancer Therapeutics 2020; Scientific Reports 2020 ; PNAS 2019; ASTRO 2018) Breast cancer (Cell Reports 2018; Genome Research 2016; Cell Reports 2013) Prostate Cancer (Journal for ImmunoTherapy of Cancer 2020; Cell 2018; Bioinformatics 2017; Scientific Reports 2017; European Urology 2017; The Journal of Molecular Diagnostics 2014) Colorectal cancer (Science Advances 2020) Head and neck cancer (Scientific Reports 2020) Leukemia (Experimental Hematology 2017; Experimental Hematology 2016) Lung cancer (The Journal of Molecular Diagnostics 2014) mFL-HCC (Annals of Oncology 2016) Immunology (Journal for ImmunoTherapy of Cancer 2020; Bioinformatics 2017)
VII. Machine learning and deep learning
Tumor Evolution (Science Advances 2020); SVM in SV (PLoS One 2014); Synchronization Detection (BMC Genomics 2013); Probabilistic method (BMC Genomics 2013); Logic Regression (ISCABS 2011); Haplotype Inference (PSB 2011)
ML and DL is one of our lab’s most current research focuses. Please refer to News for ongoing grants and projects.
Please also refer to the full list of publications.