Dr Dapeng Wang

Dr Dapeng Wang


I am a Senior Bioinformatics Research Officer at LeedsOmics at the University of Leeds. I have been developing, maintaining and applying a series of automated and handy pipelines or workflows for omics datasets generated by the technologies such as genomics (DNA-Seq, exome-Seq, phylogenomics and RAD-Seq), transcriptomics (bulk RNA-Seq and single-cell RNA-Seq), translatomics (Ribo-Seq), epigenomics (BS-Seq and ChIP-Seq), proteomics and metabolomics. Besides, I have particular research interests in the areas of database construction of big biological data, genome evolution (composition, intronic sequences and repetitive sequences), evolution patterns of plastic transcriptomes and gene regulatory networks across tissues or conditions and variants in population level (SNP, InDel, copy number variation, duplication, inversion and translocation). I am also keen to use machine learning and deep learning techniques to understand and explore the complicated biological systems through the integration of a variety of omics datasets.

I received a bachelor’s degree in mathematics from the Shandong University (Jul. 2006) and obtained a PhD degree in bioinformatics from the Beijing Institute of Genomics of the Chinese Academy of Sciences (Jul. 2011). After my graduation, I continued to conduct research at the same institute (Jul. 2011- Feb. 2014) and afterwards moved to the UK to work in the Cancer Institute at the University College London (Mar. 2014-Jan. 2016) and the Department of Plant Sciences at the University of Oxford (Feb. 2016-Jan. 2018).

I am a trained and experienced Bioinformatician and have spent over 10 years in the research projects of genomics and bioinformatics in the UK and China, possessing profound knowledge of genome biology and masters a rich set of bioinformatics skills in the processing of large-scale biological data. I am very familiar with the popular molecular quantification technologies including array and sequencing as well as the state-of-the-art toolkits where I have been developing a number of high-standard and advanced Next-Generation-Sequencing pipelines or workflows for exome-Seq, RNA-Seq, RAD-Seq, ChIP-Seq, single cell analysis, BS-Seq and Ribo-Seq. In addition, I have established a bunch of useful LAMP databases and webservers as a series of successful applications of computer techniques in contemporary computational biology studies, which integrate a huge number of genomes covering all major life forms in our planet and classified them in a well-characterized taxonomical system, leading to some of the solid foundations and innovative hypotheses for exploring genome architecture evolution from both two dimensional and three dimensional points of view.

Latest publications:

Wang D. GCevobase: an evolution-based database for GC content in eukaryotic genomes. Bioinformatics. 2018 Feb 6.

Wang D. hppRNA—a Snakemake-based handy parameter-free pipeline for RNA-Seq analysis of numerous samples. Briefings in Bioinformatics. bbw143-bbw143, 2017.

Böiers C, Richardson SE, Laycock E, Zriwil A, Turati VA, Brown J, Wray JP, Wang D, James C, Herrero J, Sitnicka E, Karlsson S, Smith AJH, Jacobsen SEW, Enver T. A Human IPS Model Implicates Embryonic B-Myeloid Fate Restriction as Developmental Susceptibility to B Acute Lymphoblastic Leukemia-Associated ETV6-RUNX1. Developmental Cell. 2018 Feb 5;44(3):362-377.e7.

Research interests

Biological database construction

To use the standard LAMP framework to integrate and complie the publicly-available datasets from a variety of individual studies from the public repositories. 

Genome evolution

To investigate the remarkable differences between evolutional lineages or clades in terms of genomic properties or elements such as GC content, intronic sequences and repetitive sequences.

Omics pipeline develpment

To construct a series of high-quality and effient workflows for high-throughput experiements including exome-Seq, RNA-Seq, RAD-Seq, ChIP-Seq, single cell analysis, BS-Seq, Ribo-Seq, metabolomics, proteomics and array data.

Gene expression

To profile the gene expressions, identify the differential gene expression and probe the evolution patterns of gene expression in terms of plenty of species and conditions 

Variants in population level

To detect the whole set of variants at the nucleotide level such as SNP, InDel, copy number variation, duplication, inversion and translocation among the population genomic data.



  • BS, Mathematics, Shandong University, 2006
  • PhD, Bioinformatics, Beijing Institute of Genomics, Chinese Academy of Sciences, 2011

Research groups and institutes

  • Leeds Omics