Section 1 | Section 2 | Section 3

This page gives an example of using the PSEQ and R interfaces to PLINK/Seq to analyse a GWAS dataset, along with the corresponding PLINK commands.

For more details on PLINK, please see the PLINK website.

Example dataset

This example dataset (available for download here) contains genotype data for 228,694 SNPs on 89 unreleated individuals from the Japanese and Chinese HapMap study.

Inputing and viewing the data


We first read in the PED and MAP files, and convert the these data to the PLINK binary format: ./plink --file ../data/wgas1 --make-bed --out wgas2 We can review the contents of this file as follows: ./plink --bfile wgas2 --out validate

which yields

|        PLINK!       |     v1.08p     |   14/Aug/2009     |
|  (C) 2009 Shaun Purcell, GNU General Public License, v2  |
|  For documentation, citation & bug-report instructions:  |
|        |

 ... details skipped ...

228694 (of 228694) markers to be included from [ ../data/ ]
90 individuals read from [ ../data/wgas1.ped ]
90 individuals with nonmissing phenotypes
Assuming a disease phenotype (1=unaff, 2=aff, 0=miss)
Missing phenotype value is also -9
49 cases, 41 controls and 0 missing
45 males, 45 females, and 0 of unspecified sex
Before frequency and genotyping pruning, there are 228694 SNPs
90 founders and 0 non-founders found
Total genotyping rate in remaining individuals is 0.993346
0 SNPs failed missingness test ( GENO > 1 )
0 SNPs failed frequency test ( MAF < 0 )
After frequency and genotyping pruning, there are 228694 SNPs
After filtering, 49 cases, 41 controls and 0 missing
After filtering, 45 males, 45 females, and 0 of unspecified sex
Writing pedigree information to [ wgas2.fam ]
Writing map (extended format) information to [ wgas2.bim ]
Writing genotype bitfile to [ wgas2.bed ]
Using (default) SNP-major mode

which confirms that these data were correctly read into PLINK


Currently, PLINK/Seq supports input in either VCF or PLINK binary format. To create a new project and load the PLINK binary file created above, we use the pseq command-line tool:

./pseq /path/to/my/example_proj new-project --resources /psych/genetics/pseq/resources/

Here, we need to specify the location of the resources folder that PLINK/Seq requires. Otherwise, this command creates an empty project. To load the PLINK binary file, use the command:

./pseq /path/to/my/example_proj load-plink --file wgas2 We can now view these data via the PSEQ command-line tool, e.g. ./pseq /path/to/my/example_proj g-view --mask loc.subset=refseq,DISC1 --geno To

Basic QC and filtering


./plink --bfile wgas2 --freq --out freq1 ./plink --bfile wgas2 --hardy --out hwe1 ./plink --bfile wgas2 --maf 0.01 --geno 0.05 --mind 0.05 --hwe 1e-3 --make-bed --out wgas3


Running a basic GWAS analysis (case/control association)


./plink --bfile wgas3 --assoc --adjust --out assoc1


Empirical assessment of ancestry (multidimensional scaling)


./plink --bfile wgas3 --indep-pairwise 50 10 0.2 --out prune1 ./plink --bfile wgas3 --extract --genome --out ibs1 ./plink --bfile wgas3 --read-genome ibs1.genome --mds-plot 2 --out strat1


Association analysis, controlling for ancestry (with MDS components)


./plink --bfile wgas3 --logistic --covar strat1.mds --covar-name C1,C2 --adjust --out log1