Latent environment allocation of microbial community data
Koichi Higashi, Shinya Suzuki, Shin Kurosawa, Hiroshi Mori and Ken Kurokawa
PLOS Computational Biology Published: June 6, 2018 DOI:10.1371/journal.pcbi.1006143
As data for microbial community structures found in various environments has increased, studies have examined the relationship between environmental labels given to retrieved microbial samples and their community structures. However, because environments continuously change over time and space, mixed states of some environments and its effects on community formation should be considered, instead of evaluating effects of discrete environmental categories. Here we applied a hierarchical Bayesian model to paired datasets containing more than 30,000 samples of microbial community structures and sample description documents. From the training results, we extracted latent environmental topics that associate co-occurring microbes with co-occurring word sets among samples. Topics are the core elements of environmental mixtures and the visualization of topic-based samples clarifies the connections of various environments. Based on the model training results, we developed a web application, LEA (Latent Environment Allocation), which provides the way to evaluate typicality and heterogeneity of microbial communities in newly obtained samples without confining environmental categories to be compared. Because topics link words and microbes, LEA also enables to search samples semantically related to the query out of 30,000 microbiome samples.
Source: Koichi Higashi, et al., (2018), 14,e1006143, PLOS Computational Biology, DOI:10.1371/journal.pcbi.1006143