R&D CENTER

Three genome sequences of Coffea arabica var. typica (Rubiaceae) with the Coffee Genome Database

Jongsun Park*, Hong Xi, Yongsung Kim, Deokgyu Lee, and Jongwook Woo
URL  
Coffee is one of favorite drink in the world. The best countries for coffee production in the world are Brazil, Vietnam, and Indonesia. Recently, Coffea canephora which is paternal species of Coffea arabica was successfully sequenced. It presents overall features of coffee genome, however, it is not cultivated species for coffee bean. With the aid of next generation sequencing technologies, we sequenced cold-resistance coffee tree (CR) which can survive without falling leaves at -2 degree in Jeju island in Korea, another coffee tree (HP) which produces a large amount of coffee beans, and C. arabica var. typica (TY). Genome of C. arabica which is tetraploid, inherited from C. canephora and C. eugenioides was estimated around 1.3Gbp to 1.4Gbp from those of two parental species. Around 246 Gbp raw data (189.2x coverage) were generated from CR for de novo assembly and 49 Gbp and 98 Gbp were from HP and TY, respectively. Currently assembled sequence presents that total length is 1.285Gbp (N50 is 10,734bp) and longest contig is 278kb. TY genome was de novo assembled resulting that 1.024Gbp (N50 is 2,586bp) and longest contig is 144kb. Alignment of HP reads against CR genome presents that 823,724 SNPs and 50,369 INDELs. For efficient comparative genomics analyses of Coffee genomes we sequenced, we established web-based Coffee Genome Database (CGD; http://www.coffeegenome.info/) with CR genome, SNVs identified from several coffee trees, as well as EST and RNA-Seq publicly available. It also provides InterPro statistics to analyze functional domains detected by InterProScan, BLAST search, basic Genome Browser to navigate genomes, and SNV browser for single nucleotide variations. provides personalized space to collect sequences fur further diverse analyses on the web. It will be an integrated analysis platform for coffee comparative genomics.