class exercise single nucleotide polymorphism
play

Class exercise Single-nucleotide polymorphism A single-nucleotide - PowerPoint PPT Presentation

Class exercise Single-nucleotide polymorphism A single-nucleotide polymorphism (SNP, pronounced snip) is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of


  1. Class exercise

  2. Single-nucleotide polymorphism ● A single-nucleotide polymorphism (SNP, pronounced snip) is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of more than 1% in the population. ● For example, at a specific base position in the human genome, the C nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – C or A – are said to be alleles for this position.

  3. Objective ● Write a software that given a sequence of data about SNPs computes: ● the number of transitions (A vs. G or C vs. T) within the data for each chromosome ● the number of transversions (anything not being a transition) within the data for each chromosome ● BUT FIRST YOU HAVE TO DESIGN THE SOFTWARE BY DEFINING CRC CARDS AND UML CLASS DIAGRAMS

  4. Input data ● A dataset consisting of a VCF file representing a random sampling of SNPs from three people—a mother, a father, and their daughter—compared to the reference human genome. ● VCF is tabular format similar to CSV ● The dataset contains a SNP for each row

  5. Input data sample Alternative Chromosome # SNP’s ID base found Reference SNP’s position in base at this the chromosome position

  6. What to do: SNP class (1) ● Implement a SNP class whose object will hold relevant information about a single line in the VCF file. ● The SNP class is a derived class of AlleleVariation, which is an abstract class ● AlleleVariation provides two abstract methods: ● .isTransition() should return True if the variation is a transition and False otherwise by looking at the two allele instance variables. ● .isTransversion() should return True if the variation is a not transition and False otherwise. ● Instances of SNP include the following private attributes: ● the reference allele (a one-character string in column 4, e.g., “A”) ● the alternative allele (a one-character string in column 5, e.g., “G") ● the name of the chromosome on which it exists (a string in column 1, e.g., “1") ● the reference position (an integer in column 2, e.g., 799739) ● and the ID of the SNP (in column 3, e.g., "rs57181708" or "."). ● Because we’ll be parsing lines one at a time, all of this information can be provided in the constructor.

  7. What to do: SNP class (2) ● SNP objects should be able to answer questions: ● isTransition() should return True if the SNP is a transition and False otherwise by looking at the two allele instance variables. A transition is A/G, G/A, C/T, or T/C ● isTransversion() should return True if the SNP is a not transition and False otherwise ● Use of inheritance and overriding for this and encapsulation for hiding all attributes of SNP

  8. What to do: Chromosome class ● Implement a Chromosome class that provides four methods: ● count_transitions(), which returns the number of transition SNPs ● count_transversions(), which returns the number of transversion SNPs ● addSNP(), which add a SNP object into the array of SNPs associated to the current Chromosome ● getName, which returns the string representing the name of the Chromosome

  9. Where to get the dataset ● The dataset can be downloaded here: https://raw.githubusercontent.com/anuzzolese/genomics-unibo/master/ 2019-2020/data/trio.sample.vcf

  10. How to read the dataset: import csv with open('trio.sample.vcf') as csv_file: csv_reader = csv.reader(csv_file, delimiter='\t') line_count = 0 for row in csv_reader: chromosomeName = row[0] snpPosition = row[1] snpId = row[2] refAllele = row[3] altAllele = row[4] print(chromosomeName + ", " + snpPosition + ", " + snpId + ", " + refAllele + ", " + altAllele) https://github.com/anuzzolese/genomics-unibo/blob/master/2019-2020/exercises/trio-sample-vcf-reader.py

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend