Usage ####### The main aim of Simtools is to provide an easy to use platform to simulate and manipulate genetic and phenotype data. The package was mainly written as a collection of tools for myself to explore various scenarios. It provides facilities to simulate simple categorical and continuous phenotypes from genetic data, as well as more complicated multiple dependent phenotypes. In addition, the package has a connection to plink to perform various different tasks, such as allelic score computation, clumping and pruning. I also provide a vcf reader for rare variant analysis. Here I will briefly outline the usage of these classes. genotypes ------------ The genotype module provides two classes to read and sample genotypes. For plink file you simply can:: import simtools.genotypes as gp n = 100; p=1000 seedfile = 'plink_stem' plink = gp.ReadPlink(seedfile) genotypematrix = plink.sample(n, p) This will randomly chose n subjects and p SNPs from a given plink file. Similar you can also you a vcf file:: import simtools.genotypes as gp n = 100; p=1000 vcf = gp.ReadVCF('example.vcf.gz') vcf.read_vcf() genotypematrix = vcf.sample(n, p) simtools -------- Simtools is the work horse of the package. It provides two main functions to simulate phenoytpes. You can simulate a single phenoytpe on the basis of an existing genotypematrix.:: import simtools.simtools as st heratibiltiy = 0.4 num_causal_snps = 10 plink_file = 'plink_stem' sims = st.Simtools(plink_file) # continuous phenotype pheno = sims.simple_phenotype(num_causal_snps, heratibiltiy) # binary phenotype n_cases = 100 n_controls = 100 liability_threshold = 0.1 pheno = sims.simple_phenotype(num_causal_snps, heratibiltiy, (n_cases, n_controls, liability_threshold)) In addition to a simple phenotype the package is also able to simulate multiple inter-related phenotypes. For example, the module allows you to simulate phenotype A, which is caused by genetic factors as well as phenotype B and C. To simulate 3 different interrelated phenotypes one can simply:: import simtools.simtools as st import numpy as np heratibiltiy = 0.4 num_causal_snps = [10, 10, 10] #for each phenotype # adjacency matrix lamb = np.eye(3) lamb[1,0] = 0.1 lamb[2,0] = 0.1 # genetc effect matrix B = np.eye(3); B = B * heratibiltiy5 # simulate phenoytpes phenos = sims.multi_phenotype(lamb, B, num_causal_snps) tools ------- The tools module allows you to perform common tasks. This includes: - GWAS (single or multicore) - Computation of the genomic inflation factor - QQ-Plots - Randomly chose adjacency matrices - compute GWAS, do clumping and pruning, as well as calculate allelic scores with Plink