SLIDE 1
Statistical methods in bioinformatics April 9, 2018 Exercises day 2 Claus Ekstrøm These exercises will use a combination of R and specific programmes. The exercises are build up over two types of problems: introductory text that you should run through to make sure you understand how to use and a few additional questions for you to explore.
1 Analysis of SNP data with R
The data we will be working with here can be read into R using the following two lines (ensure that you use upper and lower case exactly as listed below). You will need internet access since the data are read down directly from a web-site.
geno <- read.table(url("http://www.biostatistics.dk/Genotypes.txt")) pheno <- read.table(url("http://www.biostatistics.dk/Phenotypes.txt")) y <- pheno$V1
The rows in the two files match up such that the genotypes for row 1 matches the first phenotype.
- How many genes are in our ”small” example?
- How many individuals are in the dataset?
- What type of output variable/phenotype do we have here?
- Make an analysis of the association between the phenotype and the
first SNP. Are they associated? [Hint: You can either use a logistic regression model or just analyze this as a Gaussian model here — it should probably be a logistic regression model.]
- Make a full SNP analysis as shown at the lectures.
- Create a Manhattan plot with your results.
- Try to analyze the data using lasso. Compare the results to the previous