Coriaria_nepalensis

Coriaria nepalensis

Introduction

Coriaria nepalensis (Coriariaceae, Cucurbitales), also known as Masuri berry, is a shrub distributed from Southwest of China to Central Asia, which could form root nodules with Frankia. The whole plant contain toxic compounds and is utilized as biological pesticide. This plant also rich in medicinal chemicals, such as coriatin and tannin.


Genus	Coriaria
Species	Coriaria nepalensis Wall.
Assembly Level	Chromosome
Chromosome Number	2n=20
Genome Size	231.4 Mb
Contig N50	4.6 Mb

Assembly

This assembly was based on 372.50 Gb Pacbio CLR reads (Generated by the Sequel II sequencer (PacBio, USA) for 900-minute movies each by Frasergen Bioinformatics Co., Ltd. (Wuhan, China)). Contigs were constructed from CLR reads using using Falcon v1.8.1. Two rounds of error correction were performed to correct potential errors in the initial assembly. First, contigs were polished with CLR subreads using pbmm2 v1.2.1 and errors were corrected with GCPP v1.9.0-SL-release-8.0.0+1-37-gd7b188d. Second, contigs were polished with 246.32 Gb short reads generated from MGISEQ-2000. Short reads were cleaned with Trimmomatic v0.38 in a stringent criterion, to clip low-quality regions and avoid inserting Ns to the contigs. Cleaned short reads were then mapped to genome by bwa v0.7.17-r1188 and polished with nextpolish v1.1.0. The final assembly was scaffolded into chromosomes using 254.82 Gb Hi-C data (MGISEQ-2000) with juicer pipeline.

Annotation

The gene annotation was performed using the MAKER v3.01.02 pipeline, combining evidence-based and ab initio approaches. Repeat masking was done using two libraries: a repeat library of Viridiplantae and a de novo library constructed with RepeatModeler2. Transcriptomic evidence was generated from Iso-Seq and RNA-Seq data and protein evidence was from protein sequences of Uniprot-sprot (taxonomy: viridiplantae), Arabidopsis thaliana (TAIR10), and Medicago truncatula (MtrunA17r5.0) Then, three rounds of gene prediction training were conducted, primarily to train de novo gene predictor models. The first round used transcription and homologous evidence to predict genes and train Augustus and SNAP. The second round used trained models of Augustus and SNAP with est2genome and protein2genome set to 0. The retrained models of SNAP and Augustus were used for the final round of gene annotation. Finally, gene structures were polished using PASA with high-quality FLNC reads.

Download


Fasta file	Coriaria_nepalensis.fa
GFF3 file	Coriaria_nepalensis.gff3
CDS file	Coriaria_nepalensis.cds
Protein file	Coriaria_nepalensis.pep

geneID	geneName	chrid	start	end	strand