Analysis of functional annotation:

Click on the hyperlinks to see the results.
Analysis of Functional Annotation and Functional Gene Set description

Below is a brief description of the Functional Gene Sets (FGS) used to perform the Analysis of Functional Annotation (AFA).
In the table below is reported a brief description for each group of FGS (i.e. Gene Ontology, KEGG pathway, ...), along with the source or database of origin.

Usually the AFA is performed by one-sided Wilcoxon rank-sum test to identify the biological concepts associated with the phenotypes and comparisons of interest. The Wilcoxon rank-sum test computes a p-value to test the hypothesis that a specific FGS, defined by a specific functional annotation, tends to be more highly ranked in an ordered list. In general the individual genes on the arrays are ranked by their absolute and signed moderated t-statistics.
The use of the absolute moderated t-statistics enables the investigation of gene set enrichment irrespective to differential gene expression direction (up- or down-regulation), while the use of the signed t-statistics enables investigating the enrichment driven by up- or down-regulated genes.

All the statistical tests are performed using all annotated genes as reference population.

After the statistical tests are performed, control of false discovery rate (correction for multiple hypothesis testing) is obtained by applying the Benjamini and Hochberg method.





FGS scope name FGS scope description FGS scope source
allPPIa2hsa protein-protein-interaction data from the Entrez Gene database retrieved from NCBI Entrez Gene
c1allV2V5 Gene sets corresponding to each human chromosome and each cytogenetic band that has at least one gene retrieved from retrieved from Molecular Signatures Database (MsigDB)
c2allV2V5 Gene sets collected from various sources such as online pathway databases, publications in PubMed, and knowledge of domain experts retrieved from Molecular Signatures Database (MsigDB)
c2biocartaV2V5 Gene sets collected from Biocarta retrieved from Molecular Signatures Database (MsigDB)
c2cgpV2V5 Gene sets that represent gene expression signatures of genetic and chemical perturbations retrieved from Molecular Signatures Database (MsigDB)
c2cpV2V5 GGene sets from the pathway databases. Usually, these gene sets are canonical representations of a biological process compiled by domain experts retrieved from Molecular Signatures Database (MsigDB)
c2genmappV2V5 Gene sets collected from GenMAPP retrieved from Molecular Signatures Database (MsigDB)
c2keggV2V5 Gene sets collected from KEGG retrieved from Molecular Signatures Database (MsigDB)
c3allV2V5 Gene sets that contain genes that share a cis-regulatory motif that is conserved across the human, mouse, rat, and dog genomes. The motifs are catalogued in Xie, et al. (2005, Nature 434, 338–345) and represent known or likely regulatory elements in promoters and 3'-UTRs retrieved from Molecular Signatures Database (MsigDB)
c3mirV2V5 Gene sets that contain genes that share a 3'-UTR microRNA binding motif (see c3allV2V5 above) retrieved from Molecular Signatures Database (MsigDB)
c3tftV2V5 ene sets that contain genes that share a transcription factor binding site defined in the TRANSFAC database (version 7.4) retrieved from Molecular Signatures Database (MsigDB)
c4allV2V5 Computational gene sets defined by mining large collections of cancer-oriented microarray data retrieved from Molecular Signatures Database (MsigDB)
c4cgnV2V5 Gene sets defined by expression neighborhoods centered on 380 cancer-associated genes (Brentani, Caballero et al. 2003) retrieved from Molecular Signatures Database (MsigDB)
c4cmV2V5 Gene sets defined by Segal et al. (Nature Genetics 36, 1090-1098, 2004). Briefly, the authors compiled gene sets ('modules') from a variety of resources such as KEGG, GO, and others. By mining a large compendium of cancer-related microarray data, they identified 456 such modules as significantly changed in a variety of cancer conditions. retrieved from Molecular Signatures Database (MsigDB)
c5allV2V5 Gene sets are named by GO term and contain genes annotated by that term; All Gene Ontology terms together retrieved from Molecular Signatures Database (MsigDB)
c5bpV2V5 Gene sets derived from the Biological Process Gene Ontology see guidelines ) retrieved from Molecular Signatures Database (MsigDB)
c5ccV2V5 Gene sets derived from the Molecular Function Gene Ontology ( see guidelines ) retrieved from Molecular Signatures Database (MsigDB)
c5ccV2V5 Gene sets derived from the Cellular Component Gene Ontology ( see guidelines ) retrieved from Molecular Signatures Database (MsigDB)
cancerModulesSegal Cancer module gene sets from Eran Segal's lab, described here retrieved from the Rob Tibshirani webpage
ChromosomeArms Chromosome Arms from Stanford Microarray Database; a description is available at Synthetic genes page on SMD retrieved from the Rob Tibshirani webpage
cytobandsStanford Cytobands from Stanford Microarray Database; a description is available at Synthetic genes page on SMD retrieved from the Rob Tibshirani webpage
mb5ChromosomalTiles 5MbChromosomalTiles from Stanford Microarray Database; a description is available at Synthetic genes page on SMD retrieved from the Rob Tibshirani webpage
ENZYME Enzyme Commission number; these FGS group genes annotated to the same Enzyme Commission number, as obtained from the R/Bioconductor metadata package retrieved from bioconductor metadata annotation packages
GO Gene Ontology; FGS defined based on the Gene Ontology annotation obtained from the R/Bioconductor metadata package (complete collection, since each gene inherited all parents GO terms) retrieved from bioconductor metadata annotation packages
KEGG KEGG; FGS defined based on the KEGG pathways annotation obtained from the R/Bioconductor metadata package retrieved from bioconductor metadata annotation packages
PMID PubMed id; FGS defined based on the PudMed identifiers obtained from the R/Bioconductor metadata package retrieved from bioconductor metadata annotation packages
processes Cellular processes gene sets from Stanford Microarray Database; a description is available at Synthetic genes page on SMD retrieved from the Rob Tibshirani webpage
tfbsK15Z164EGID Transcription factor binding site (TFBS); the genes contained in this collection have a TFBS in the genomic region around their transcription starting site (TSS); The genomic window considered spans from 15kb before the TSS to 15kb after the TSS, with a Z-score for conservation of 1.64, corresponding to a False Discovery Rate of 5%. Details are available from the UCSC Genome Browser based on TRANSFAC TFBS tracks available through UCSC Genome Browser
tfbsK15Z233EGID Transcription factor binding site (TFBS); the genes contained in this collection have a TFBS in the genomic region around their transcription starting site (TSS); The genomic window considered spans from 15kb before the TSS to 15kb after the TSS, with a Z-score for conservation of 2.33, corresponding to a False Discovery Rate of 1%. Details are available from the UCSC Genome Browser based on TRANSFAC TFBS tracks available through UCSC Genome Browser
tfbsK5Z164EGID Transcription factor binding site (TFBS); the genes contained in this collection have a TFBS in the genomic region around their transcription starting site (TSS); The genomic window considered spans from 5kb before the TSS to 5kb after the TSS, with a Z-score for conservation of 1.64, corresponding to a False Discovery Rate of 5%. Details are available from the UCSC Genome Browser based on TRANSFAC TFBS tracks available through UCSC Genome Browser
tfbsK5Z164EGID Transcription factor binding site (TFBS); the genes contained in this collection have a TFBS in the genomic region around their transcription starting site (TSS); The genomic window considered spans from 5kb before the TSS to 5kb after the TSS, with a Z-score for conservation of 1.64, corresponding to a False Discovery Rate of 5%. Details are available from the UCSC Genome Browser based on TRANSFAC TFBS tracks available through UCSC Genome Browser
pathsNetPathEGID Pathway members; the genes contained in this collection are signaling pathway members; the gene lists were manually curated and are available from the HPRD data base Based on manually curated pathways part of the NetPath initiative
upNetPathEGID Genes up-regulated by pathway activation; the genes contained in this collection are the up-regulated targets induced by the activation of the signaling pathway; the gene lists were manually curated, result from the evaluation of evidence available from the literature, and are available from the HPRD data base Based on manually curated pathways part of the NetPath initiative
dwNetPathEGID Genes down-regulated by pathway activation; the genes contained in this collection are the down-regulated targets induced by the activation of the signaling pathway; the gene lists were manually curated, result from the evaluation of evidence available from the literature, and are available from the HPRD data base Based on manually curated pathways part of the NetPath initiative
updwNetPathEGID Genes up- and down-regulated by pathway activation; the genes contained in this collection are the up- and down-regulated targets induced by the activation of the signaling pathway; the gene lists were manually curated, result from the evaluation of evidence available from the literature, and are available from the HPRD data base Based on manually curated pathways part of the NetPath initiative
tissues Tissues gene sets from Stanford Microarray Database; a description is available at Synthetic genes page on SMD retrieved from the Rob Tibshirani webpage
mirInt2X Gene sets that contain genes that share a 3'-UTR microRNA binding motif (see c3allV2V5 above); Gene sets in this collection account for the INTERSECTION of predicted targets by TWO programs (PicTar, miRanda, DIANA-microT, or TargetScanS). Intersections can provide more specific searches, see miRGen target web-page retrieved from the miRGen data base
mirInt3X Gene sets that contain genes that share a 3'-UTR microRNA binding motif (see c3allV2V5 above); Gene sets in this collection account for the INTERSECTION of predicted targets by THREE programs (PicTar, miRanda, DIANA-microT, or TargetScanS). Intersections can provide more specific searches, see miRGen target web-page retrieved from the miRGen data base
mirUnion Gene sets that contain genes that share a 3'-UTR microRNA binding motif (see c3allV2V5 above); Gene sets in this collection account for the UNION of predicted targets by THREE programs (PicTar, miRanda, DIANA-microT, and TargetScanS). UNION can provide more sensitive searches since gene lists are larger, see miRGen target web-page retrieved from the miRGen data base
miranda miRNA targets as predicted by the miranda algorithm ; FGS were obtained from the R/Bioconductor 'microRNA' package retrieved from bioconductor 'microRNA' package
mirbase miRNA targets as obtained from the miRBase database ; FGS were obtained from the R/Bioconductor 'microRNA' package retrieved from bioconductor 'microRNA' package
mirtarget2 miRNA targets as obtained from the mirtarget2 algorithm ; FGS were obtained from the R/Bioconductor 'microRNA' package retrieved from bioconductor 'microRNA' package
pictar miRNA targets as obtained from the PicTar algorithm ; FGS were obtained from the R/Bioconductor 'microRNA' package retrieved from bioconductor 'microRNA' package
tarbase miRNA targets as obtained from the TabBase database ; FGS were obtained from the R/Bioconductor 'microRNA' package retrieved from bioconductor 'microRNA' package
targetscan miRNA targets as obtained from the targetscan algorithm ; FGS were obtained from the R/Bioconductor 'microRNA' package retrieved from bioconductor 'microRNA' package