Zenia insignis

Introduction

      Zenia insignis (Fabaceae, Fabales) is a tree native to South-west China and Vietnam. It is characterized by its ornamental red flowers and red legumes. Belonging to the Dialioideae subfamily within the Fabaceae family, it is diploid with a chromosome count of 2n=2x=14. This specie is commonly utilized for afforestation in rocky terrains, particularly in the karst regions of southern and southwestern China, due to its rapid growth rate. Additionally, it is cultivated for its wood, which is used in furniture production, while the tender leaves serve as green manure. Zenia insignis exhibits the ability to fix atmospheric nitrogen through root nodules and has the potential for metal accumulation, thus contributing to its ecological significance.

Genus Zenia
Species Zenia insignis Chun
Assembly Level Chromosome
Chromosome Number 2n=14
Genome Size 361.1 Mb
Contig N50 6.1 Mb

Assembly

       This assembly was based on 362.79 Gb Pacbio CLR reads (Generated by the Sequel II sequencer (PacBio, USA) for 900-minute movies each by Frasergen Bioinformatics Co., Ltd. (Wuhan, China)). Contigs were constructed from CLR reads using using Falcon v1.8.1. Two rounds of error correction were performed to correct potential errors in the initial assembly. First, contigs were polished with CLR subreads using pbmm2 v1.2.1 and errors were corrected with GCPP v1.9.0-SL-release-8.0.0+1-37-gd7b188d. Second, contigs were polished with 109.58 Gb short reads generated from MGISEQ-2000. Short reads were cleaned with Trimmomatic v0.38 in a stringent criterion, to clip low-quality regions and avoid inserting Ns to the contigs. Cleaned short reads were then mapped to genome by bwa v0.7.17-r1188 and polished with nextpolish v1.1.0. The final assembly was scaffolded into chromosomes using 124.23 Gb Hi-C data (MGISEQ-2000) with juicer pipeline.

Annotation

       The gene annotation was performed using the MAKER v3.01.02 pipeline, combining evidence-based and ab initio approaches. Repeat masking was done using two libraries: a repeat library of Viridiplantae and a de novo library constructed with RepeatModeler2. Transcriptomic evidence was generated from Iso-Seq and RNA-Seq data and protein evidence was from protein sequences of Uniprot-sprot (taxonomy: viridiplantae), Arabidopsis thaliana (TAIR10), and Medicago truncatula (MtrunA17r5.0) Then, three rounds of gene prediction training were conducted, primarily to train de novo gene predictor models. The first round used transcription and homologous evidence to predict genes and train Augustus and SNAP. The second round used trained models of Augustus and SNAP with est2genome and protein2genome set to 0. The retrained models of SNAP and Augustus were used for the final round of gene annotation. Finally, gene structures were polished using PASA with high-quality FLNC reads.

Download

Fasta file Zenia_insignis.fa
GFF3 file Zenia_insignis.gff3
CDS file Zenia_insignis.cds
Protein file Zenia_insignis.pep
geneID geneName chrid start end strand