CS-5630 / CS-6630 Visualization for Data Science Set Visualization
Alexander Lex alex@sci.utah.edu
[xkcd]
CS-5630 / CS-6630 Visualization for Data Science Set Visualization - - PowerPoint PPT Presentation
CS-5630 / CS-6630 Visualization for Data Science Set Visualization Alexander Lex alex@sci.utah.edu [xkcd] Design Workshop item1 : A item2 : A A item3 : A, B item4 : A, C item5 : A, B, C B item6 : B item7 : B, C C item8 : C
Alexander Lex alex@sci.utah.edu
[xkcd]
item1 : A item2 : A item3 : A, B item4 : A, C item5 : A, B, C item6 : B item7 : B, C item8 : C … A B C Venn diagram
doi:10.1038/nature11241
The banana (Musa acuminata) genome and the evolution of monocotyledonous plants
Ange ´lique D’Hont1*, France Denoeud2,3,4*, Jean-Marc Aury2, Franc-Christophe Baurens1, Françoise Carreel1,5, Olivier Garsmeur1, Benjamin Noel2, Ste ´phanie Bocs1, Gae ¨tan Droc1, Mathieu Rouard6, Corinne Da Silva2, Kamel Jabbari2,3,4, Ce ´line Cardi1, Julie Poulain2, Marle `ne Souquet1, Karine Labadie2, Cyril Jourda1, Juliette Lengelle ´1, Marguerite Rodier-Goud1, Adriana Alberti2, Maria Bernard2, Margot Correa2, Saravanaraj Ayyampalayam7, Michael R. Mckain7, Jim Leebens-Mack7, Diane Burgess8, Mike Freeling8, Didier Mbe ´guie ´-A-Mbe ´guie ´9, Matthieu Chabannes5, Thomas Wicker10, Olivier Panaud11, Jose Barbosa11, Eva Hribova12, Pat Heslop-Harrison13, Re ´my Habas5, Ronan Rivallan1, Philippe Francois1, Claire Poiron1, Andrzej Kilian14, Dheema Burthia1, Christophe Jenny1, Fre ´de ´ric Bakry1, Spencer Brown15, Valentin Guignon1,6, Gert Kema16, Miguel Dita19, Cees Waalwijk16, Steeve Joseph1, Anne Dievart1, Olivier Jaillon2,3,4, Julie Leclercq1, Xavier Argout1, Eric Lyons17, Ana Almeida8, Mouna Jeridi1, Jaroslav Dolezel12, Nicolas Roux6, Ange-Marie Risterucci1, Jean Weissenbach2,3,4, Manuel Ruiz1, Jean-Christophe Glaszmann1, Francis Que ´tier18, Nabila Yahiaoui1 & Patrick Wincker2,3,4
Bananas (Musa spp.), including dessert and cooking types, are giant perennial monocotyledonous herbs of the order Zingiberales, a sister group to the well-studied Poales, which include cereals. Bananas are vital for food security in many tropical and subtropical countries and the most popular fruit in industrialized countries1. The Musa domestication process started some 7,000 years ago in Southeast Asia. It involved hybridizations between diverse species and subspecies, fostered by human migrations2, and selection of diploid and triploid seedless, parthenocarpic hybrids thereafter widely dispersed by vegetative propagation. Half of the current production relies on somaclones derived from a single triploid genotype (Cavendish)1. Pests and diseases have gradually become adapted, representing an imminent danger for global banana pro- duction3,4. Here we describe the draft sequence of the 523-megabase genome of a Musa acuminata doubled-haploid genotype, providing a crucial stepping-stone for genetic improvement of banana. We detected three rounds of whole-genome duplications in the Musa lineage, independently of those previously described in the Poales lineage and the one we detected in the Arecales lineage. This first monocotyledon high-continuity whole-genome sequence reported
genome analysis in plants. As such, it clarifies commelinid- sequence errors. The assembly consisted of 24,425 contigs and 7,513 scaffolds with a total length of 472.2 Mb, which represented 90% of the estimated DH-Pahang genome size. Ninety per cent of the assembly was in 647 scaffolds, and the N50 (the scaffold size above which 50% of the total length of the sequence assembly can be found) was 1.3 Mb (Supplementary Text and Supplementary Tables 1–3). We anchored 70% of the assembly (332 Mb) along the 11 Musa linkage groups of the Pahang genetic map. This corresponded to 258 scaffolds and included 98.0% of the scaffolds larger than 1 Mb and 92% of the annotated genes (Supplementary Text, Supplementary Table 4 and Supplementary Fig. 1). We identified 36,542 protein-coding gene models in the Musa genome (Supplementary Tables 1 and 5). A total of 235 microRNAs from 37 families were identified, including only one of the eight microRNA gene (MIR) families found so far solely in Poaceae8 (Supplementary Tables 6 and 7). Viral sequences related to the banana streak virus (BSV) dsDNA plant pararetrovirus were found to be integrated in the Pahang genome, with 24 loci spanning 10 chromosomes (Supplementary Text and Supplementary Fig. 2). They belonged to a badnavirus phylogenetic group that differed from the endogenous BSV species (eBSV) found in M. balbisiana9 and most of them formed a new
Nature 2012
[D’Hont et al., Nature, 2012] [Wiles et al., BMC Systems Biology] [Neale et al., BMC Genome Biology, 2014] [Gibbs et al., Nature, 2004]
https://en.wikipedia.org/wiki/Venn_diagram
Problem with Venn: size doesn’t correspond to the data. Creating area-proportional Euler diagrams is hard. Layout criteria:
area proportional simple curves (circles are best) makes it easy to identify which sets are participating in intersection Gestalt-principle: good continuation
[Alsallakh 2015]
[created with EulerAPE]
22 19 44 43 41 19 9 22 5 [created with EulerAPE]
[Riche 2010]
No Duplicate Nodes Complex Shapes Notice the Nesting Duplicate Nodes Simple Shapes
https://www.youtube.com/watch?v=Ju2hSThmPWA
[Alper 2011] [Dinkla 2012]
http://mariandoerk.de/pivotpaths/demo/#/1:0_497686
https://vimeo.com/213029678#at=0
[Sadana 14]
[RODGERS 2015]
https://www.youtube.com/watch?v=UcYRrPqC5A8
[Alsallakh 2013]
vs.
Visualizing Intersections Visualizing Properties Attribute Details Element List & Queries
[Movie Lens Dataset]
A B C Universal Set A B C
A B C Universal Set Must Must Not A B C
A B C
Cardinality
5 17 7 10 14 20 7 5 5 17 7 10 14 20 7 5
A B C Additional Plots
Deviation Attributes
How surprising is the size of an intersection? What’s the distribution of an attribute in an intersection?
Action- Comedy Drama- Comedy
A B C Which is the biggest intersection? Sort By: Cardinality
How do documentaries compare to adventure movies?
How do documentaries compare to adventure movies?
http://setviz.net