2014년 7월 24일 목요일

2014, nature biotechnology, Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

  • Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

  • Kristian Cibulskis,
  • Michael S Lawrence,
  • Scott L Carter,
  • Andrey Sivachenko,
  • David Jaffe,
  • Carrie Sougnez,
  • Stacey Gabriel,
  • Matthew Meyerson,
  • Eric S Lander
  • Gad Getz
  • Abstract
    Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.

    Link

    Comment
    MuTect은 Broad Institute에서 개발한  Somatic point Mutation을 찾는 툴 중에 공개 된 버전입니다. (더 좋은건 비공개) 논문에서는 툴에 대한 알고리즘 설명과 왜 타 툴들보다 MuTect이 왜 좋은지 설명 되있습니다. 솔직히 다른 툴이 별로 없고 성능 테스트 하기가 까따로운 (cancer data의 양) 분야 이다 보니 성능이 얼마나 좋은지는 불명확 합니다. (정보: 교수님이 만들고 싶어 하시는 툴) 

    2014년 7월 23일 수요일

    2008, Nucleic Acids Research, eggNOG: automated construction and annotation of orthologous groups of genes


    eggNOG: automated construction and annotation
    of orthologous groups of genes

    Lars Juhl Jensen1, Philippe Julien1, Michael Kuhn1, Christian von Mering2,
    Jean Muller1, Tobias Doerks1 and Peer Bork1,3,*
    1European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2University of Zurich
    and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland and 3Max-Delbru¨ ck-
    Centre for Molecular Medicine, Robert-Ro¨ ssle-Strrasse 10, 13092 Berlin, Germany

    ABSTRACT
    The identification of orthologous genes forms the
    basis for most comparative genomics studies.
    Existing approaches either lack functional annotation
    of the identified orthologous groups, hampering
    the interpretation of subsequent results, or are
    manually annotated and thus lag behind the rapid
    sequencing of new genomes. Here we present
    the eggNOG database (‘evolutionary genealogy of
    genes: Non-supervised Orthologous Groups’),
    which contains orthologous groups constructed
    from Smith–Waterman alignments through identification
    of reciprocal best matches and triangular
    linkage clustering. Applying this procedure to 312
    bacterial, 26 archaeal and 35 eukaryotic genomes
    yielded 43 582 course-grained orthologous groups
    of which 9724 are extended versions of those
    from the original COG/KOG database. We also
    constructed more fine-grained groups for selected
    subsets of organisms, such as the 19 914 mammalian
    orthologous groups. We automatically
    annotated our non-supervised orthologous groups
    with functional descriptions, which were derived by
    identifying common denominators for the genes
    based on their individual textual descriptions,
    annotated functional categories, and predicted
    protein domains. The orthologous groups in
    eggNOG contain 1 241 751 genes and provide at
    least a broad functional description for 77% of them.
    Users can query the resource for individual genes
    via a web interface or download the complete set
    of orthologous groups at http://eggnog.embl.de.

    2014년 7월 21일 월요일

    2014,BMC Systems Biology,A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial - mesenchymal cell reprogramming and cancer progression




    A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial – mesenchymal cell reprogramming and cancer progression

    Hernan Roca1Manjusha Pande2Jeffrey S Huo3James Hernandez4James D Cavalcoli2Kenneth J Pienta4* and Richard C McEachin2*

    Abstract

    Background

    Mesenchymal to Epithelial Transition (MET) plasticity is critical to cancer progression, and we recently showed that the OVOL transcription factors (TFs) are critical regulators of MET. Results of that work also posed the hypothesis that the OVOLs impact MET in a range of cancers. We now test this hypothesis by developing a model, OVOL Induced MET (OI-MET), and sub-model (OI-MET-TF), to characterize differential gene expression in MET common to prostate cancer (PC) and breast cancer (BC).

    Results

    In the OI-MET model, we identified 739 genes differentially expressed in both the PC and BC models. For this gene set, we found significant enrichment of annotation for BC, PC, cancer, and MET, as well as regulation of gene expression by AP1, STAT1, STAT3, and NFKB1. Focusing on the target genes for these four TFs plus the OVOLs, we produced the OI-MET-TF sub-model, which shows even greater enrichment for these annotations, plus significant evidence of cooperation among these five TFs. Based on known gene/drug interactions, we prioritized targets in the OI-MET-TF network for follow-on analysis, emphasizing the clinical relevance of this work. Reflecting these results back to the OI-MET model, we found that binding motifs for the TF pair AP1/MYC are more frequent than expected and that the AP1/MYC pair is significantly enriched in binding in cancer models, relative to non-cancer models, in these promoters. This effect is seen in both MET models (solid tumors) and in non-MET models (leukemia). These results are consistent with our hypothesis that the OVOLs impact cancer susceptibility by regulating MET, and extend the hypothesis to include mechanisms not specific to MET.

    Conclusions

    We find significant evidence of the OVOL, AP1, STAT1, STAT3, and NFKB1 TFs having important roles in MET, and more broadly in cancer. We prioritize known gene/drug targets for follow-up in the clinic, and we show that the AP1/MYC TF pair is a strong candidate for intervention.
    Keywords: 
    Metastasis; Migration; Tumor progression; Systems biology; Transcription factors; Signal transduction; Therapeutics

    Article Links :
    http://www.biomedcentral.com/content/pdf/1752-0509-8-29.pdf

    2014년 7월 16일 수요일

    2014, Nature Reviews Cancer, Principles and methods of integrative genomic analyses in cancer

    VOLUME 14 | MAY 2014
    NATURE REVIEWS | CANCER

    Principles and methods of integrative genomic analyses in cancer
    Vessela N. Kristensen, Ole Christian Lingjærde, Hege G. Russnes,
    Hans Kristian M. Vollan, Arnoldo Frigessi and Anne-Lise Børresen-Dale


    Abstract | Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.


    Article links :
    http://www.nature.com/nrc/journal/v14/n5/abs/nrc3721.html


    Comment :
    현재까지 진행 되었던 Cancer integrative genomics 연구 소개.

    2009,EM,Metagenomic signatures of 86 microbial and viral metagenomes


    Metagenomic signatures of 86 microbial and viral metagenomes

    Abstract

    Previous studies have shown that dinucleotide abundances capture the majority of variation in genome signatures and are useful for quantifying lateral gene transfer and building molecular phylogenies. Metagenomes contain a mixture of individual genomes, and might be expected to lack compositional signatures. In many metagenomic data sets the majority of sequences have no significant similarities to known sequences and are effectively excluded from subsequent analyses. To circumvent this limitation, di-, tri and tetranucleotide abundances of 86 microbial and viral metagenomes consisting of short pyrosequencing reads were analysed to provide a method which includes all sequences that can be used in combination with other analysis to increase our knowledge about microbial and viral communities. Both principal component analysis and hierarchical clustering showed definitive groupings of metagenomes drawn from similar environments. Together these analyses showed that dinucleotide composition, as opposed to tri- and tetranucleotides, defines a metagenomic signature which can explain up to 80% of the variance between biomes, which is comparable to that obtained by functional genomics. Metagenomes with anomalous content were also identified using dinucleotide abundances. Subsequent analyses determined that these metagenomes were contaminated with exogenous DNA, suggesting that this approach is a useful metric for quality control. The predictive strength of the dinucleotide composition also opens the possibility of assigning ecological classifications to unknown fragments. Environmental
    selection may be responsible for this dinucleotide signature through direct selection of specific compositional signals; however, simulations suggest that the environment may select indirectly by promoting the increased abundance of a few dominant taxa.emi_1901 1752.

    GC, Di, tri ,tetra nucleotide frequecny를 가지고 86 개의 metagenome들의 signature를 찾는 논문. 확인 결과 di-nucleotide frequency가 가장 환경을 잘 반영하는것으로 밝혀졌다.
    이를 이용하여 좀더 빠른 metagenome contamination detection 가능할것.

    2014, MBE, Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model

    June, 2014
    Molecular Biology and Evolution

    Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model



    Abstract
    There is a considerable impetus in population genomics to pinpoint loci involved in local adaptation. A powerful approach to find genomic regions subject to local adaptation is to genotype numerous molecular markers and look for outlier loci. One of the most common approaches for selection scans is based on statistics that measure population differentiation such as FST. However, there are important caveats with approaches related to FST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here, we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. In order to identify outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared with the software BayeScan or with an FST approach. We show that our software can handle large data sets by analyzing the single nucleotide polymorphisms of the Human Genome Diversity Project. The Bayesian factor model is implemented in the open-source PCAdapt software.


    Article Link : http://mbe.oxfordjournals.org/content/early/2014/06/24/molbev.msu182.long


    2014, BMC Bioinformatics, A multivariate approach to the integration of multi-omics datasets

    May 2014
    BMC Bioinformatics

    A multivariate approach to the integration of multi-omics datasets



    Background

    To leverage the potential of multi-omics studies, exploratory data analysis methods that provide systematic integration and comparison of multiple layers of omics information are required. We describe multiple co-inertia analysis (MCIA), an exploratory data analysis method that identifies co-relationships between multiple high dimensional datasets. Based on a covariance optimization criterion, MCIA simultaneously projects several datasets into the same dimensional space, transforming diverse sets of features onto the same scale, to extract the most variant from each dataset and facilitate biological interpretation and pathway analysis.

    Results

    We demonstrate integration of multiple layers of information using MCIA, applied to two typical “omics” research scenarios. The integration of transcriptome and proteome profiles of cells in the NCI-60 cancer cell line panel revealed distinct, complementary features, which together increased the coverage and power of pathway analysis. Our analysis highlighted the importance of the leukemia extravasation signaling pathway in leukemia that was not highly ranked in the analysis of any individual dataset. Secondly, we compared transcriptome profiles of high grade serous ovarian tumors that were obtained, on two different microarray platforms and next generation RNA-sequencing, to identify the most informative platform and extract robust biomarkers of molecular subtypes. We discovered that the variance of RNA-sequencing data processed using RPKM had greater variance than that with MapSplice and RSEM. We provided novel markers highly associated to tumor molecular subtype combined from four data platforms. MCIA is implemented and available in the R/Bioconductor “omicade4” package.

    Conclusion

    We believe MCIA is an attractive method for data integration and visualization of several datasets of multi-omics features observed on the same set of individuals. The method is not dependent on feature annotation, and thus it can extract important features even when there are not present across all datasets. MCIA provides simple graphical representations for the identification of relationships between large datasets.


    Article Link : http://www.biomedcentral.com/1471-2105/15/162


    2014, PNAS, Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

    July 2014
    PNAS

    Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses


    Abstract
    Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes). Although protein clustering techniques provide a significant advance by helping organize this unknown metagenomic sequence space, they typically use only ∼75% of the data and rely on assembly methods not yet tuned for naturally occurring sequence variation. Here, we introduce an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure. Application to 32 viromes from the Pacific Ocean Virome dataset identified clusters of samples broadly delineated by photic zone and revealed that geographic region, depth, and proximity to shore were significant predictors of community structure. Within subsets of this dataset, depth, season, and oxygen concentration were significant drivers of viral community structure at a single open ocean station, whereas variability along onshore–offshore transects was driven by oxygen concentration in an area with an oxygen minimum zone and not depth or proximity to shore, as might be expected. Together these results demonstrate that this highly scalable approach using complete metagenomic network-based comparisons can both test and generate hypotheses for ecological investigation of viral and microbial communities in nature.



    Article Link:   http://www.pnas.org/content/early/2014/07/03/1319778111.abstract

    2014, Genetics, A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data

    2014/05 Genetics

    A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data.


    Abstract
    Metagenomics provides a powerful new tool set for investigating evolutionary interactions with the environment. However, an absence of model-based statistical methods means that researchers are often not able to make full use of this complex information. We present a Bayesian method for inferring the phylogenetic relationship among related organisms found within metagenomic samples. Our approach exploits variation in the frequency of taxa among samples to simultaneously infer each lineage haplotype, the phylogenetic tree connecting them, and their frequency within each sample. Applications of the algorithm to simulated data show that our method can recover a substantial fraction of the phylogenetic structure even in the presence of high rates of migration among sample sites. We provide examples of the method applied to data from green sulfur bacteria recovered from an Antarctic lake, plastids from mixed Plasmodium falciparum infections, and virulent Neisseria meningitidis samples.


    Article URL:  http://www.genetics.org/content/197/3/925.long