Elaeagnus_umbellata

Elaeagnus umbellata

Introduction

Elaeagnus umbellata (Elaeagnaceae, Rosales), known as autumn olive, is a shurb with red edible drupes and also cultivated as gardening plant. It is diploid and has 28 chromosomes.


Genus	Elaeagnus
Species	Elaeagnus umbellata Thunb.
Assembly Level	Chromosome
Chromosome Number	2n=14
Genome Size	625.9 Mb
Contig N50	5.3 Mb

Assembly

This assembly was based on 253.94 Gb Pacbio CLR reads (Generated by the Sequel II sequencer (PacBio, USA) for 900-minute movies each by Frasergen Bioinformatics Co., Ltd. (Wuhan, China)). Contigs were constructed from CLR reads using using Falcon v1.8.1. Two rounds of error correction were performed to correct potential errors in the initial assembly. First, contigs were polished with CLR subreads using pbmm2 v1.2.1 and errors were corrected with GCPP v1.9.0-SL-release-8.0.0+1-37-gd7b188d. Second, contigs were polished with 105.40 Gb short reads generated from MGISEQ-2000. Short reads were cleaned with Trimmomatic v0.38 in a stringent criterion, to clip low-quality regions and avoid inserting Ns to the contigs. Cleaned short reads were then mapped to genome by bwa v0.7.17-r1188 and polished with nextpolish v1.1.0. The final assembly was scaffolded into chromosomes using 85.47 Gb Hi-C data (MGISEQ-2000) with juicer pipeline.

Annotation

The gene annotation was performed using the MAKER v3.01.02 pipeline, combining evidence-based and ab initio approaches. Repeat masking was done using two libraries: a repeat library of Viridiplantae and a de novo library constructed with RepeatModeler2. Transcriptomic evidence was generated from Iso-Seq and RNA-Seq data and protein evidence was from protein sequences of Uniprot-sprot (taxonomy: viridiplantae), Arabidopsis thaliana (TAIR10), and Medicago truncatula (MtrunA17r5.0) Then, three rounds of gene prediction training were conducted, primarily to train de novo gene predictor models. The first round used transcription and homologous evidence to predict genes and train Augustus and SNAP. The second round used trained models of Augustus and SNAP with est2genome and protein2genome set to 0. The retrained models of SNAP and Augustus were used for the final round of gene annotation. Finally, gene structures were polished using PASA with high-quality FLNC reads.

Download


Fasta file	Elaeagnus_umbellata.fa
GFF3 file	Elaeagnus_umbellata.gff3
CDS file	Elaeagnus_umbellata.cds
Protein file	Elaeagnus_umbellata.pep

geneID	geneName	chrid	start	end	strand