----------------------------------------------------------------------------------------------------------------------------------------- Alabama Genomics Health Initiative (AGHIpop) ----------------------------------------------------------------------------------------------------------------------------------------- Analytical pipeline to analyze data from the Alabama Genomic Health Initiative population based cohort.A tool for performing Genome-Wide Association Study (GWAS) analysis. This pipeline uses PLINK open-source whole genome association analysis toolset, designed to perform a range of basic,large-scale analyses in a computationally efficient manner. Association Analysis is run on both the raw genotype and the imputed data. GWAS summary report is generated with the following results included. 1. Manhattan plots 2. Q-Q plots 3. Tables for top 10 or significant variants and closed genes 4. LocusZoom plot for the variant with lowest P value 5. Table for Meta-analysis 6. Table for top 10 or significant genes from gene-based analysis 7. Resulted gene-set analysis #Getting Started ## copy copy_me directory to the location you want to run GWAS ## You need to fill GWAS.config file with the required details based on the number of study and population ## You need to create a directory where you want to launch the pipeline and mention its path in the config file ## You need to place all your study specific phenotype files in the above created path #Prerequisites ##PLINK software and version: PLINK open-source whole genome association analysis toolset, designed to perform a range of basic,large-scale analyses in a computationally efficient manner. ## R version and libraries: This pipeline was adapted for module R/3.3.1-foss-2016b on the UAB Cheaha. All dependent R libraries have been installed and their paths .libPaths("/data/project/ubrite/pipelines/AGHIpop/softwares/R_3.3.1_packages") #Running the Analysis ## Sample usage: Working Directory: "/data/project/ubrite/pipelines/AGHIpop/example" and the results will be saved to: "/data/project/ubrite/pipelines/AGHIpop/example/dummy/myWD" and "/data/project/ubrite/pipelines/AGHIpop/example/dummy/myRT" ## copy the following code to launch the pipeline: sbatch /data/project/ubrite/pipelines/AGHIpop/GWAS_Pip/GWAS_Pipeline_main_V2.job GWAS.config ## Job status can be checked using the following command squeue -u [user-id] #blazer ID ## Result data files: 1. Resulted data files will be stored at the result directory for producing tables and figures in the report. The report file "Pipeline_PredmGWAS_[trait_name].html" is always located at the working directory instead. 2. The result subdirectory "chr1~22" are used to store gene-based analysis results chromosome by chromosome and subdirectory "allchrs" is used for gene-based analysis for all 22 chromosomes and gene-set analysis ## Common usage errors: 1. This pipeine was written based on R/3.3.1-foss-2016b, please be careful of any updating of R version ## Study data requirement: * phenotype data: phenotype file should have the required columns in the below format FID IID TRAIT CO-VARIATE1 CO-VARIATE2 CO-VARIATE3 ... * TRAIT column should always be the third column follwing by co-variates * In addition, only 22 autosomal chromosomes are chosen for analysis ## System issue: So far only R/3.3.1-foss-2016b module on the cheaha server has be equipped with most libraries such as 'knitr' 'rmarkdown' and comprehensive graph supports.