2014년 7월 16일 수요일

2009,EM,Metagenomic signatures of 86 microbial and viral metagenomes


Metagenomic signatures of 86 microbial and viral metagenomes

Abstract

Previous studies have shown that dinucleotide abundances capture the majority of variation in genome signatures and are useful for quantifying lateral gene transfer and building molecular phylogenies. Metagenomes contain a mixture of individual genomes, and might be expected to lack compositional signatures. In many metagenomic data sets the majority of sequences have no significant similarities to known sequences and are effectively excluded from subsequent analyses. To circumvent this limitation, di-, tri and tetranucleotide abundances of 86 microbial and viral metagenomes consisting of short pyrosequencing reads were analysed to provide a method which includes all sequences that can be used in combination with other analysis to increase our knowledge about microbial and viral communities. Both principal component analysis and hierarchical clustering showed definitive groupings of metagenomes drawn from similar environments. Together these analyses showed that dinucleotide composition, as opposed to tri- and tetranucleotides, defines a metagenomic signature which can explain up to 80% of the variance between biomes, which is comparable to that obtained by functional genomics. Metagenomes with anomalous content were also identified using dinucleotide abundances. Subsequent analyses determined that these metagenomes were contaminated with exogenous DNA, suggesting that this approach is a useful metric for quality control. The predictive strength of the dinucleotide composition also opens the possibility of assigning ecological classifications to unknown fragments. Environmental
selection may be responsible for this dinucleotide signature through direct selection of specific compositional signals; however, simulations suggest that the environment may select indirectly by promoting the increased abundance of a few dominant taxa.emi_1901 1752.

GC, Di, tri ,tetra nucleotide frequecny를 가지고 86 개의 metagenome들의 signature를 찾는 논문. 확인 결과 di-nucleotide frequency가 가장 환경을 잘 반영하는것으로 밝혀졌다.
이를 이용하여 좀더 빠른 metagenome contamination detection 가능할것.

댓글 1개:

  1. individual genome 들을 봤을 때 2-4 nucleotide frequency가 변하는게 taxonomy 를 많이 따르는지 habitat을 많이 따르는지 비교해보면 잼겠네. 종,속,과 중에 어느정도 수준에서 클러스터링이 되는지도.. Metagenome의 경우 Abstract에도 나오듯 결국 taxonomic composition이 달라서 다른건지도 모르겠고. 근데 이 논문에서 컨탬으로 밝힌 sequence는 그럼 어디서 온 dna라는거지? 컨탬이라고 어떻게 확정한건지?

    답글삭제