Class exercise Single-nucleotide polymorphism A single-nucleotide - - PowerPoint PPT Presentation

class exercise single nucleotide polymorphism
SMART_READER_LITE
LIVE PREVIEW

Class exercise Single-nucleotide polymorphism A single-nucleotide - - PowerPoint PPT Presentation

Class exercise Single-nucleotide polymorphism A single-nucleotide polymorphism (SNP, pronounced snip) is a substitution of a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of


slide-1
SLIDE 1

Class exercise

slide-2
SLIDE 2

Single-nucleotide polymorphism

  • A single-nucleotide polymorphism (SNP, pronounced snip) is a substitution of

a single nucleotide that occurs at a specific position in the genome, where each variation is present at a level of more than 1% in the population.

  • For example, at a specific base position in the human genome, the C

nucleotide may appear in most individuals, but in a minority of individuals, the position is occupied by an A. This means that there is a SNP at this specific position, and the two possible nucleotide variations – C or A – are said to be alleles for this position.

slide-3
SLIDE 3

Objective

  • Write a software that given a sequence of data about SNPs computes:
  • the number of transitions (A vs. G or C vs. T) within the data for each

chromosome

  • the number of transversions (anything not being a transition) within the

data for each chromosome

  • BUT FIRST YOU HAVE TO DESIGN THE SOFTWARE BY DEFINING CRC

CARDS AND UML CLASS DIAGRAMS

slide-4
SLIDE 4

Input data

  • A dataset consisting of a VCF file representing a random sampling of SNPs

from three people—a mother, a father, and their daughter—compared to the reference human genome.

  • VCF is tabular format similar to CSV
  • The dataset contains a SNP for each row
slide-5
SLIDE 5

Input data sample

Chromosome # SNP’s position in the chromosome SNP’s ID Reference base at this position Alternative base found

slide-6
SLIDE 6

What to do: SNP class (1)

  • Implement a SNP class whose object will hold relevant information about a single line in the VCF file.
  • The SNP class is a derived class of AlleleVariation, which is an abstract class
  • AlleleVariation provides two abstract methods:
  • .isTransition() should return True if the variation is a transition and False otherwise by looking

at the two allele instance variables.

  • .isTransversion() should return True if the variation is a not transition and False otherwise.
  • Instances of SNP include the following private attributes:
  • the reference allele (a one-character string in column 4, e.g., “A”)
  • the alternative allele (a one-character string in column 5, e.g., “G")
  • the name of the chromosome on which it exists (a string in column 1, e.g., “1")
  • the reference position (an integer in column 2, e.g., 799739)
  • and the ID of the SNP (in column 3, e.g., "rs57181708" or ".").
  • Because we’ll be parsing lines one at a time, all of this information can be provided in the

constructor.

slide-7
SLIDE 7

What to do: SNP class (2)

  • SNP objects should be able to answer questions:
  • isTransition() should return True if the SNP is a transition

and False otherwise by looking at the two allele instance variables. A transition is A/G, G/A, C/T, or T/C

  • isTransversion() should return True if the SNP is a not transition

and False otherwise

  • Use of inheritance and overriding for this and encapsulation for hiding all

attributes of SNP

slide-8
SLIDE 8

What to do: Chromosome class

  • Implement a Chromosome class that provides four methods:
  • count_transitions(), which returns the number of transition SNPs
  • count_transversions(), which returns the number of transversion SNPs
  • addSNP(), which add a SNP object into the array of SNPs associated to

the current Chromosome

  • getName, which returns the string representing the name of the

Chromosome

slide-9
SLIDE 9

Where to get the dataset

  • The dataset can be downloaded here:

https://raw.githubusercontent.com/anuzzolese/genomics-unibo/master/ 2019-2020/data/trio.sample.vcf

slide-10
SLIDE 10

How to read the dataset:

import csv with open('trio.sample.vcf') as csv_file: csv_reader = csv.reader(csv_file, delimiter='\t') line_count = 0 for row in csv_reader: chromosomeName = row[0] snpPosition = row[1] snpId = row[2] refAllele = row[3] altAllele = row[4] print(chromosomeName + ", " + snpPosition + ", " + snpId + ", " + refAllele + ", " + altAllele)

https://github.com/anuzzolese/genomics-unibo/blob/master/2019-2020/exercises/trio-sample-vcf-reader.py