Cercis chinensis

Introduction

      Cercis chinensis (Fabaceae, Fabales) is a wildly cultivated tree in China, also known as Chinese redbud. It is diploid (2n = 2x = 14), and belongs to Cercidoideae subfamily. The tree is famous for it ornamental flowers which typically range in color from purple and pink to white. These flowers are clustered on old branches or the trunk before the emergence of heart-shaped leaves. In addition to its horticulture values, Cercis chinensis's bark is a traditonal medicine in China.

Genus Cercis
Species Cercis chinensis Bunge
Assembly Level Chromosome
Chromosome Number 2n=14
Genome Size 310.8 Mb
Contig N50 4.5 Mb

Assembly

       This assembly was based on 454.42 Gb Pacbio CLR reads (Generated by the Sequel II sequencer (PacBio, USA) for 900-minute movies each by Frasergen Bioinformatics Co., Ltd. (Wuhan, China)). Contigs were constructed from CLR reads using using Falcon v1.8.1. Two rounds of error correction were performed to correct potential errors in the initial assembly. First, contigs were polished with CLR subreads using pbmm2 v1.2.1 and errors were corrected with GCPP v1.9.0-SL-release-8.0.0+1-37-gd7b188d. Second, contigs were polished with 144.11 Gb short reads generated from MGISEQ-2000 and Illumina NovaSeq6000. Short reads were cleaned with Trimmomatic v0.38 in a stringent criterion, to clip low-quality regions and avoid inserting Ns to the contigs. Cleaned short reads were then mapped to genome by bwa v0.7.17-r1188 and polished with nextpolish v1.1.0. The final assembly was scaffolded into chromosomes using 191.89 Gb Hi-C data (MGISEQ-2000) with juicer pipeline.

Annotation

       The gene annotation was performed using the MAKER v3.01.02 pipeline, combining evidence-based and ab initio approaches. Repeat masking was done using two libraries: a repeat library of Viridiplantae and a de novo library constructed with RepeatModeler2. Transcriptomic evidence was generated from Iso-Seq and RNA-Seq data and protein evidence was from protein sequences of Uniprot-sprot (taxonomy: viridiplantae), Arabidopsis thaliana (TAIR10), and Medicago truncatula (MtrunA17r5.0) Then, three rounds of gene prediction training were conducted, primarily to train de novo gene predictor models. The first round used transcription and homologous evidence to predict genes and train Augustus and SNAP. The second round used trained models of Augustus and SNAP with est2genome and protein2genome set to 0. The retrained models of SNAP and Augustus were used for the final round of gene annotation. Finally, gene structures were polished using PASA with high-quality FLNC reads.

Download

Fasta file Cercis_chinensis.fa
GFF3 file Cercis_chinensis.gff3
CDS file Cercis_chinensis.cds
Protein file Cercis_chinensis.pep
geneID geneName chrid start end strand