Utilization of Medicinal Plant Genomes by Identification of Secondary Metabolites from Genomes
Jongsun Park, Kiman Lee, Suhyeon Park, Woochan Kwon, Hong Xi, Janghyuk Son, Taejin Kang
Due to the rapid development of sequencing technologies, at least 2,174 plant
genomes of 713 species have been sequenced, which are available on the recent
released Plant Genome Database (PGD; Release 2.7; http://www.plantgenome.info/).
Based on the PGD, 27 of 162 species of widely used herbal plants and 52 of 559 herbal
plants listed in the ‘herbal medicine list’ presented by the Ministry of Food and Drug
Safety contained at least one genome in the PGD. The genome sizes of these medicinal
plants range from 145 Mb (Spirodela polyrhiza) to 10.6 Gb (Ginkgo biloba), displaying
73 times differences. Interestingly, Cannabis sativa, containing various useful
secondary metabolites including α-pinene, myrcene, and linalool, has eight genomes originated from different strains, which is a good example for understanding intraspecific differences of secondary metabolites at the genomic level. MetaPre-AI® is a bioinformatic pipeline equipped with machine learning and artificial intelligence algorithms based on whole genome sequences, performance of which was proved by the study that predicted acteoside from Abeliophyllum distichum genome and was confirmed by HPLC. MetaPre-AI® can be used to profile possible secondary metabolites based on the medicinal plant genomes, which can reduce the costs and uncertainty of screening experiments. Considering the release speed of new plant genomes and the accumulation of available biochemical pathways, MetaPre-AI® will be the more efficient pipeline to investigate secondary metabolites of various plant species.