2014년 7월 24일 목요일

2014, nature biotechnology, Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Kristian Cibulskis,

Michael S Lawrence,

Scott L Carter,

Andrey Sivachenko,

David Jaffe,

Carrie Sougnez,

Stacey Gabriel,

Matthew Meyerson,

Eric S Lander

& Gad Getz

Abstract

Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.

Link

Web: http://www.nature.com/nbt/journal/v31/n3/full/nbt.2514.html#methods

PDF: http://www.nature.com/nbt/journal/v31/n3/pdf/nbt.2514.pdf

Comment

MuTect은 Broad Institute에서 개발한 Somatic point Mutation을 찾는 툴 중에 공개 된 버전입니다. (더 좋은건 비공개) 논문에서는 툴에 대한 알고리즘 설명과 왜 타 툴들보다 MuTect이 왜 좋은지 설명 되있습니다. 솔직히 다른 툴이 별로 없고 성능 테스트 하기가 까따로운 (cancer data의 양) 분야 이다 보니 성능이 얼마나 좋은지는 불명확 합니다. (정보: 교수님이 만들고 싶어 하시는 툴)

2014년 7월 23일 수요일

2008, Nucleic Acids Research, eggNOG: automated construction and annotation of orthologous groups of genes

eggNOG: automated construction and annotation
of orthologous groups of genes

Lars Juhl Jensen1, Philippe Julien1, Michael Kuhn1, Christian von Mering2,
Jean Muller1, Tobias Doerks1 and Peer Bork1,3,*
1European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2University of Zurich
and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland and 3Max-Delbru¨ ck-
Centre for Molecular Medicine, Robert-Ro¨ ssle-Strrasse 10, 13092 Berlin, Germany

ABSTRACT
The identification of orthologous genes forms the
basis for most comparative genomics studies.
Existing approaches either lack functional annotation
of the identified orthologous groups, hampering
the interpretation of subsequent results, or are
manually annotated and thus lag behind the rapid
sequencing of new genomes. Here we present
the eggNOG database (‘evolutionary genealogy of
genes: Non-supervised Orthologous Groups’),
which contains orthologous groups constructed
from Smith–Waterman alignments through identification
of reciprocal best matches and triangular
linkage clustering. Applying this procedure to 312
bacterial, 26 archaeal and 35 eukaryotic genomes
yielded 43 582 course-grained orthologous groups
of which 9724 are extended versions of those
from the original COG/KOG database. We also
constructed more fine-grained groups for selected
subsets of organisms, such as the 19 914 mammalian
orthologous groups. We automatically
annotated our non-supervised orthologous groups
with functional descriptions, which were derived by
identifying common denominators for the genes
based on their individual textual descriptions,
annotated functional categories, and predicted
protein domains. The orthologous groups in
eggNOG contain 1 241 751 genes and provide at
least a broad functional description for 77% of them.
Users can query the resource for individual genes
via a web interface or download the complete set
of orthologous groups at http://eggnog.embl.de.

2014년 7월 21일 월요일

2014,BMC Systems Biology,A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial - mesenchymal cell reprogramming and cancer progression

A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial – mesenchymal cell reprogramming and cancer progression

Hernan Roca¹, Manjusha Pande², Jeffrey S Huo³, James Hernandez⁴, James D Cavalcoli², Kenneth J Pienta⁴^* and Richard C McEachin²^*

Abstract

Background

Mesenchymal to Epithelial Transition (MET) plasticity is critical to cancer progression, and we recently showed that the OVOL transcription factors (TFs) are critical regulators of MET. Results of that work also posed the hypothesis that the OVOLs impact MET in a range of cancers. We now test this hypothesis by developing a model, OVOL Induced MET (OI-MET), and sub-model (OI-MET-TF), to characterize differential gene expression in MET common to prostate cancer (PC) and breast cancer (BC).

Results

In the OI-MET model, we identified 739 genes differentially expressed in both the PC and BC models. For this gene set, we found significant enrichment of annotation for BC, PC, cancer, and MET, as well as regulation of gene expression by AP1, STAT1, STAT3, and NFKB1. Focusing on the target genes for these four TFs plus the OVOLs, we produced the OI-MET-TF sub-model, which shows even greater enrichment for these annotations, plus significant evidence of cooperation among these five TFs. Based on known gene/drug interactions, we prioritized targets in the OI-MET-TF network for follow-on analysis, emphasizing the clinical relevance of this work. Reflecting these results back to the OI-MET model, we found that binding motifs for the TF pair AP1/MYC are more frequent than expected and that the AP1/MYC pair is significantly enriched in binding in cancer models, relative to non-cancer models, in these promoters. This effect is seen in both MET models (solid tumors) and in non-MET models (leukemia). These results are consistent with our hypothesis that the OVOLs impact cancer susceptibility by regulating MET, and extend the hypothesis to include mechanisms not specific to MET.

Conclusions

We find significant evidence of the OVOL, AP1, STAT1, STAT3, and NFKB1 TFs having important roles in MET, and more broadly in cancer. We prioritize known gene/drug targets for follow-up in the clinic, and we show that the AP1/MYC TF pair is a strong candidate for intervention.

Keywords:

Metastasis; Migration; Tumor progression; Systems biology; Transcription factors; Signal transduction; Therapeutics

Article Links :

http://www.biomedcentral.com/content/pdf/1752-0509-8-29.pdf

2014년 7월 16일 수요일

2014, Nature Reviews Cancer, Principles and methods of integrative genomic analyses in cancer

VOLUME 14 | MAY 2014
NATURE REVIEWS | CANCER

Principles and methods of integrative genomic analyses in cancer
Vessela N. Kristensen, Ole Christian Lingjærde, Hege G. Russnes,
Hans Kristian M. Vollan, Arnoldo Frigessi and Anne-Lise Børresen-Dale

Abstract | Combined analyses of molecular data, such as DNA copy-number alteration, mRNA and protein expression, point to biological functions and molecular pathways being deregulated in multiple cancers. Genomic, metabolomic and clinical data from various solid cancers and model systems are emerging and can be used to identify novel patient subgroups for tailored therapy and monitoring. The integrative genomics methodologies that are used to interpret these data require expertise in different disciplines, such as biology, medicine, mathematics, statistics and bioinformatics, and they can seem daunting. The objectives, methods and computational tools of integrative genomics that are available to date are reviewed here, as is their implementation in cancer research.

Article links :
http://www.nature.com/nrc/journal/v14/n5/abs/nrc3721.html

Comment :
현재까지 진행 되었던 Cancer integrative genomics 연구 소개.

2009,EM,Metagenomic signatures of 86 microbial and viral metagenomes

Metagenomic signatures of 86 microbial and viral metagenomes

Abstract

Previous studies have shown that dinucleotide abundances capture the majority of variation in genome signatures and are useful for quantifying lateral gene transfer and building molecular phylogenies. Metagenomes contain a mixture of individual genomes, and might be expected to lack compositional signatures. In many metagenomic data sets the majority of sequences have no significant similarities to known sequences and are effectively excluded from subsequent analyses. To circumvent this limitation, di-, tri and tetranucleotide abundances of 86 microbial and viral metagenomes consisting of short pyrosequencing reads were analysed to provide a method which includes all sequences that can be used in combination with other analysis to increase our knowledge about microbial and viral communities. Both principal component analysis and hierarchical clustering showed definitive groupings of metagenomes drawn from similar environments. Together these analyses showed that dinucleotide composition, as opposed to tri- and tetranucleotides, defines a metagenomic signature which can explain up to 80% of the variance between biomes, which is comparable to that obtained by functional genomics. Metagenomes with anomalous content were also identified using dinucleotide abundances. Subsequent analyses determined that these metagenomes were contaminated with exogenous DNA, suggesting that this approach is a useful metric for quality control. The predictive strength of the dinucleotide composition also opens the possibility of assigning ecological classifications to unknown fragments. Environmental
selection may be responsible for this dinucleotide signature through direct selection of specific compositional signals; however, simulations suggest that the environment may select indirectly by promoting the increased abundance of a few dominant taxa.emi_1901 1752.

GC, Di, tri ,tetra nucleotide frequecny를 가지고 86 개의 metagenome들의 signature를 찾는 논문. 확인 결과 di-nucleotide frequency가 가장 환경을 잘 반영하는것으로 밝혀졌다.
이를 이용하여 좀더 빠른 metagenome contamination detection 가능할것.

2014, MBE, Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model

June, 2014
Molecular Biology and Evolution

Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model

Abstract

There is a considerable impetus in population genomics to pinpoint loci involved in local adaptation. A powerful approach to find genomic regions subject to local adaptation is to genotype numerous molecular markers and look for outlier loci. One of the most common approaches for selection scans is based on statistics that measure population differentiation such as F_ST. However, there are important caveats with approaches related to F_ST because they require grouping individuals into populations and they additionally assume a particular model of population structure. Here, we implement a more flexible individual-based approach based on Bayesian factor models. Factor models capture population structure with latent variables called factors, which can describe clustering of individuals into populations or isolation-by-distance patterns. Using hierarchical Bayesian modeling, we both infer population structure and identify outlier loci that are candidates for local adaptation. In order to identify outlier loci, the hierarchical factor model searches for loci that are atypically related to population structure as measured by the latent factors. In a model of population divergence, we show that it can achieve a 2-fold or more reduction of false discovery rate compared with the software BayeScan or with an F_ST approach. We show that our software can handle large data sets by analyzing the single nucleotide polymorphisms of the Human Genome Diversity Project. The Bayesian factor model is implemented in the open-source PCAdapt software.

Article Link : http://mbe.oxfordjournals.org/content/early/2014/06/24/molbev.msu182.long

2014, BMC Bioinformatics, A multivariate approach to the integration of multi-omics datasets

May 2014
BMC Bioinformatics

A multivariate approach to the integration of multi-omics datasets

Background

To leverage the potential of multi-omics studies, exploratory data analysis methods that provide systematic integration and comparison of multiple layers of omics information are required. We describe multiple co-inertia analysis (MCIA), an exploratory data analysis method that identifies co-relationships between multiple high dimensional datasets. Based on a covariance optimization criterion, MCIA simultaneously projects several datasets into the same dimensional space, transforming diverse sets of features onto the same scale, to extract the most variant from each dataset and facilitate biological interpretation and pathway analysis.

Results

We demonstrate integration of multiple layers of information using MCIA, applied to two typical “omics” research scenarios. The integration of transcriptome and proteome profiles of cells in the NCI-60 cancer cell line panel revealed distinct, complementary features, which together increased the coverage and power of pathway analysis. Our analysis highlighted the importance of the leukemia extravasation signaling pathway in leukemia that was not highly ranked in the analysis of any individual dataset. Secondly, we compared transcriptome profiles of high grade serous ovarian tumors that were obtained, on two different microarray platforms and next generation RNA-sequencing, to identify the most informative platform and extract robust biomarkers of molecular subtypes. We discovered that the variance of RNA-sequencing data processed using RPKM had greater variance than that with MapSplice and RSEM. We provided novel markers highly associated to tumor molecular subtype combined from four data platforms. MCIA is implemented and available in the R/Bioconductor “omicade4” package.

Conclusion

We believe MCIA is an attractive method for data integration and visualization of several datasets of multi-omics features observed on the same set of individuals. The method is not dependent on feature annotation, and thus it can extract important features even when there are not present across all datasets. MCIA provides simple graphical representations for the identification of relationships between large datasets.

Article Link : http://www.biomedcentral.com/1471-2105/15/162

2014, PNAS, Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

July 2014
PNAS

Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

Abstract

Long-standing questions in marine viral ecology are centered on understanding how viral assemblages change along gradients in space and time. However, investigating these fundamental ecological questions has been challenging due to incomplete representation of naturally occurring viral diversity in single gene- or morphology-based studies and an inability to identify up to 90% of reads in viral metagenomes (viromes). Although protein clustering techniques provide a significant advance by helping organize this unknown metagenomic sequence space, they typically use only ∼75% of the data and rely on assembly methods not yet tuned for naturally occurring sequence variation. Here, we introduce an annotation- and assembly-free strategy for comparative metagenomics that combines shared k-mer and social network analyses (regression modeling). This robust statistical framework enables visualization of complex sample networks and determination of ecological factors driving community structure. Application to 32 viromes from the Pacific Ocean Virome dataset identified clusters of samples broadly delineated by photic zone and revealed that geographic region, depth, and proximity to shore were significant predictors of community structure. Within subsets of this dataset, depth, season, and oxygen concentration were significant drivers of viral community structure at a single open ocean station, whereas variability along onshore–offshore transects was driven by oxygen concentration in an area with an oxygen minimum zone and not depth or proximity to shore, as might be expected. Together these results demonstrate that this highly scalable approach using complete metagenomic network-based comparisons can both test and generate hypotheses for ecological investigation of viral and microbial communities in nature.

Article Link: http://www.pnas.org/content/early/2014/07/03/1319778111.abstract

2014, Genetics, A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data

2014/05 Genetics

A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data.

Abstract

Metagenomics provides a powerful new tool set for investigating evolutionary interactions with the environment. However, an absence of model-based statistical methods means that researchers are often not able to make full use of this complex information. We present a Bayesian method for inferring the phylogenetic relationship among related organisms found within metagenomic samples. Our approach exploits variation in the frequency of taxa among samples to simultaneously infer each lineage haplotype, the phylogenetic tree connecting them, and their frequency within each sample. Applications of the algorithm to simulated data show that our method can recover a substantial fraction of the phylogenetic structure even in the presence of high rates of migration among sample sites. We provide examples of the method applied to data from green sulfur bacteria recovered from an Antarctic lake, plastids from mixed Plasmodium falciparum infections, and virulent Neisseria meningitidis samples.

Article URL: http://www.genetics.org/content/197/3/925.long

Journal Readers @ Lab of Evolutionary Bioinformatics

태그

2014년 7월 24일 목요일

2014, nature biotechnology, Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Kristian Cibulskis,

Michael S Lawrence,

Scott L Carter,

Andrey Sivachenko,

David Jaffe,

Carrie Sougnez,

Stacey Gabriel,

Matthew Meyerson,

Eric S Lander

& Gad Getz

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

2014년 7월 23일 수요일

2008, Nucleic Acids Research, eggNOG: automated construction and annotation of orthologous groups of genes

2014년 7월 21일 월요일

2014,BMC Systems Biology,A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial - mesenchymal cell reprogramming and cancer progression

A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial – mesenchymal cell reprogramming and cancer progression

Hernan Roca¹, Manjusha Pande², Jeffrey S Huo³, James Hernandez⁴, James D Cavalcoli², Kenneth J Pienta⁴^* and Richard C McEachin²^*

Background

Results

Conclusions

Keywords:

2014년 7월 16일 수요일

2014, Nature Reviews Cancer, Principles and methods of integrative genomic analyses in cancer

2009,EM,Metagenomic signatures of 86 microbial and viral metagenomes

Metagenomic signatures of 86 microbial and viral metagenomes

Abstract

2014, MBE, Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model

Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model

2014, BMC Bioinformatics, A multivariate approach to the integration of multi-omics datasets

A multivariate approach to the integration of multi-omics datasets

Background

Results

Conclusion

2014, PNAS, Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

2014, Genetics, A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data

A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data.

태그

2014년 7월 24일 목요일

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples Kristian Cibulskis, Michael S Lawrence, Scott L Carter, Andrey Sivachenko, David Jaffe, Carrie Sougnez, Stacey Gabriel, Matthew Meyerson, Eric S Lander & Gad Getz

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

2014년 7월 23일 수요일

2014년 7월 21일 월요일

A bioinformatics approach reveals novel interactions of the OVOL transcription factors in the regulation of epithelial – mesenchymal cell reprogramming and cancer progression

Hernan Roca1, Manjusha Pande2, Jeffrey S Huo3, James Hernandez4, James D Cavalcoli2, Kenneth J Pienta4* and Richard C McEachin2*

Background

Results

Conclusions

Keywords:

2014년 7월 16일 수요일

Metagenomic signatures of 86 microbial and viral metagenomes

Abstract

Genome Scans for Detecting Footprints of Local Adaptation Using a Bayesian Factor Model

A multivariate approach to the integration of multi-omics datasets

Background

Results

Conclusion

Modeling ecological drivers in marine viral communities using comparative metagenomics and network analyses

A Bayesian Approach to Inferring the Phylogenetic Structure of Communities from Metagenomic Data.

Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Kristian Cibulskis,

Michael S Lawrence,

Scott L Carter,

Andrey Sivachenko,

David Jaffe,

Carrie Sougnez,

Stacey Gabriel,

Matthew Meyerson,

Eric S Lander

& Gad Getz

Hernan Roca¹, Manjusha Pande², Jeffrey S Huo³, James Hernandez⁴, James D Cavalcoli², Kenneth J Pienta⁴^* and Richard C McEachin²^*