Development of this software was supported in part by ENDGAME (Enhancing Development of Genome-wide Association Methods) project U01 CA 125489 Dissecting complex traits with diverse resources. Investigators on this project are James Dai, Li Hsu, Charles Kooperberg, Michael LeBlanc, Hua Tang, and Yingye Zheng.
Charles Kooperberg has developed code for approximate power calculations for identification of gene x gene and gene x environment interactions in genomewide association studies using a two-stage analysis: bundled in the R-package powerGWASinteraction available from CRAN.
Hua Tang has a variety of genomic software programs on her web site. In particular, the SABER program, a computationally efficient, R-based, program that infers locus-specific ancestry in admixed individuals, taking into account background LD within ancestral populations, was developed as part of our ENDGAME project.
Li Hsu has developed a program hybrid.r which provides simultaneous estimation of environmental risk factors, candidate genes, and their interactions. The program outputs log-odds ratio estimates, standard error estimates, and p-values for all covariates using data on case families only, case-unrelated controls only, and combined case families and unrelated controls.
Mike LeBlanc has developed software for Adaptively Weighted Association Statistics (AWAS). This program implements adaptive selection and weighting to potentially improve the power of association testing of genetic factors with disease outcome. The strategy is based on the often plausible assumption that genetic associations may be stronger within subgroups of subjects in epidemiologic or clinical studies. The least angle regression (LAR) method (Efron et al, 2004) is used to adaptively select or weight the score test statistics.
James Dai has developed software for SNP-Haplotype Adaptive Regression (SHARE) to perform multi-locus analysis in order to account for LD patterns observed in human genome. The challenge is to choose a model that exploits the local dependence of SNPs without incurring too many parameters. SHARE uses novel strategy to select an optimal set of SNPs that captures the genetic association in the targeted region using statistical learning framework. The model searching process resembles CART. Depending on the evolutionary history of the disease mutation and the markers, the optimal set may contain a single SNP, or several SNPs that lay foundation for a haplotype analysis. The algorithm is implemented in the R-package SHARE available from CRAN.