Genome Archive: A Standardized Whole Genome Database

Jongsun Park1,2,* and Hong Xi1,2
Owing to Next Generation Sequencing (NGS) technologies. More than 15,000 whole genome sequences have been sequenced till now ranging from bacteria to human. There are several major data archives, such as NCBI, DDBJ, and EBI, for collecting them. These databases serve structured file of genome information, such as fasta and gff3 files, web interface to access data, as well as analysis tools, including BLAST. There are another databases which archive genome sequences with their own pipelines, for example, Ensembl covering many Eukaryotic species and Phytozome archiving plant genomes usually sequenced by Joint Genomic Institute. These databases provide better bioinformatics tools to dissect them efficiently. However, all these databases do not contain all available genomes, requiring additional efforts to analyze the genomes from different repositories. Especially, gff3 files from different databases have slightly different formats, which is another hurdle to integrate them for researches. To overcome this problem, we developed a standardized genome database, named as GenomeArchive® (http://www.genomearchive.info). Currently it contains six genome databases: Plant Genome Database, Fish Genome Database, Insect Genome Database, Fungal Genome Database, Nematode Genome Database, and Bacterial Genome Database. Due to a standardized form of genome sequences in Genome Archive®, these genome databases can share genome data freely through GlobalScrap®, which is a simple on-line cart to collect and to analyze sequences. Moreover, GenomeArchive® have been utilized for customized genome databases including Pseudostellaria Database (http://www.pseudostellaria.net/), Salix Genome Database (http://www.salixgenome.info/), and Coffee Genome Database (http://www.coffeegenome.info/). GenomeArchive® can be a generated platform not only for supporting diverse bioinformatics analyses but also for presenting genome data efficiently through web interfaces.
