Machine Learning and Visualisation Ian T. Nabney Aston University, - PowerPoint PPT Presentation

Machine Learning and Visualisation Ian T. Nabney Aston University, Birmingham, UK March 2015 Ian T. Nabney Machine Learning and Visualisation 1/45

Outline The challenge of hidden knowledge Data visualisation: latent variable models Data visualisation: topographic mappings Non-linear modelling and feature selection Ian T. Nabney Machine Learning and Visualisation 2/45

Acknowledgements Collaborators Chris Bishop, Mike Tipping, David Lowe, Markus Sv´ ensen, Chris Williams Peter Ti˜ no, Yi Sun, Dharmesh Maniyar, John Owen Phil Laflin, Bruce Williams, Paola Gaolini, Jens L¨ osel Martin Schroeder, Ain Abdul Karim, Dan Cornford, Cliff Bailey, Naomi Hubber, Shahzad Mumtaz, Midhel Randrianandrasana Richard Barnes, Colin Smith, Dan Wells Ian T. Nabney Machine Learning and Visualisation 3/45

Hidden Knowledge Hidden Knowledge Understanding the vast quantities of data that surround us is a real challenge; particularly in situations with a lot of variables We can understand more of it with help. Machine learning is the computer-based generation of models from data. A model is a parameterised function from input attributes to an output prediction. Parameters in the model express the hidden connection between inputs and predictions. They are learned from data. Ian T. Nabney Machine Learning and Visualisation 4/45

Data Visualisation What is Visualisation? Goal of visualisation is to present data in a human-readable way. Visualisation is an important tool for developing a better understanding of large complex datasets. It is particularly helpful for users such as research scientists or clinicians who are not specialists in data modelling. Detection of outliers. Clustering and segmentation. Aid to feature selection. Feedback on results of analysis. Two aspects: data projection and information visualisation. Ian T. Nabney Machine Learning and Visualisation 5/45

Data Visualisation Data Projection The goal is to project data to a lower-dimensional space (usually 2d) while preserving as much information or structure as possible. Once the projection is done standard information visualisation approaches can be used to support user interaction. The quantity and complexity of many datasets means that simple visualisation methods, such as Principal Component Analysis, are not very effective. Ian T. Nabney Machine Learning and Visualisation 6/45

Data Visualisation Information Visualisation Shneiderman: Overview first; zoom and filter; details on demand. Overview provided by projection. Zooming possible in Matlab plots. Filtering by user interaction; e.g. specify pattern of values that is of interest. Details by providing local information. See more of this later on practical examples. Ian T. Nabney Machine Learning and Visualisation 7/45

Data Visualisation Information Visualisation Examples Word Cloud (www.wordle.net) Ian T. Nabney Machine Learning and Visualisation 8/45

Data Visualisation Uncertainty Doubt is not a pleasant condition, but certainty is absurd. Voltaire Real data is noisy. We are forced to deal with uncertainty, yet we need to be quantitative. The optimal formalism for inference in the presence of uncertainty is probability theory. We assume the presence of an underlying regularity to make predictions. Bayesian inference allows us to reason probabilistically about the model as well as the data. Ian T. Nabney Machine Learning and Visualisation 9/45

Data Visualisation Data Projection D y3 Define f to optimise some f(y; W) criterion. PCA is minimal y2 V variance; Sammon mapping is minimal stress. y1 Ian T. Nabney Machine Learning and Visualisation 10/45

Data Visualisation What can we learn from this? 10 Sinus VEL VER 5 0 −5 −10 −15 −20 −20 −15 −10 −5 0 5 10 Ian T. Nabney Machine Learning and Visualisation 11/45

Data Visualisation Projection What is the simplest way to project data? A linear map. What is the best way to linearly project data? Want to preserve as much information as possible. If we assume that information is measured by variance this implies choosing new coordinate axes along directions of maximal variance; these can be found by analysing the covariance matrix of the data. This gives Principal Component Analysis (PCA). For large datasets, the end result is usually a circular blob in the middle of the screen. Ian T. Nabney Machine Learning and Visualisation 12/45

Data Visualisation PCA Let S be the covariance matrix of the data, so that S ij = 1 � ( x n i − x i )( x n j − x j ) N n The first q principal components are the first q eigenvectors w j of S , ordered by the size of the eigenvalues λ j . The percentage of the variance explained by the first q PC’s is � q j =1 λ j � d j =1 λ j where the data dimension is d . These vectors are orthonormal (perpendicular and unit length). The variance when the data is projected onto them is maximal. Plot the sorted principal values: plot(-sort(-eig(cov(data)))); Ian T. Nabney Machine Learning and Visualisation 13/45

Data Visualisation: Topographic Mappings Topographic Mappings Basic aim is that distances in the visualisation space are as close a possible to those in original data space. Given a dissimilarity matrix d ij , we want to map data points x i to points y i in a feature space such that their dissimilarities in feature space, ˜ d ij , are as close as possible to the d ij . We say that the map preserves similarities. The stress measure is used as objective function � 2 � d ij − ˜ d ij 1 � E = � ij d ij d ij i < j Ian T. Nabney Machine Learning and Visualisation 14/45

Data Visualisation: Topographic Mappings Multi-Dimensional Scaling Given distances or dissimilarities d rs between every pair of observations try to preserve these as far as possible in lower dimensional space. In classical scaling, the distance between the objects is assumed to be Euclidean. A linear projection then corresponds to PCA. The Sammon mapping is a non-linear multidimensional scaling technique more general (and more widely used) than classical scaling. Neuroscale is a neural network based scaling technique that has the advantage of actually giving a map that generalises! Ian T. Nabney Machine Learning and Visualisation 15/45

Data Visualisation: Topographic Mappings Neuroscale Ian T. Nabney Machine Learning and Visualisation 16/45

Data Visualisation: Topographic Mappings Biological Application: Streptomyces Gene Expression Data supplied by Colin Smith (Surrey University). Streptomyces Coelicolor is a bacterium which undergoes developmental changes correlated to sporulation and production of antibiotics. 7825 genes include more than 20 clusters coding for secondary metabolites including a large proportion of regulatory genes. The dataset consists of ten time points from 16 to 67 hours after inoculation of the growth medium. Analysis based on 3067 genes that were significantly expressed. SCO6283, SCO6284, SCO6277, SCO6278 co-regulated genes involved in synthesis of type I polyketide, SCO3245 in synthesis of lipid. Ian T. Nabney Machine Learning and Visualisation 17/45

Data Visualisation: Topographic Mappings Streptomycin Life of streptomycin Bioinformatics Measuring the expression levels of thousands of genes over multiple timepoints. Ian T. Nabney Machine Learning and Visualisation 18/45

Data Visualisation: Topographic Mappings SCO6283, SCO6284, SCO6277, SCO6278 in cluster 11, SCO3245 in cluster 12. Ian T. Nabney Machine Learning and Visualisation 19/45

Data Visualisation: Topographic Mappings Genes involved with synthesis of two distinct secondary metabolites may be coregulated by a common network. Ian T. Nabney Machine Learning and Visualisation 20/45

Data Visualisation: Latent Variable Models Latent Variable Models The projection approach is one way of reducing the data complexity. An alternative view is to hypothesise how the data might have been generated. Hidden Connections A hidden connection is stronger than an obvious one. Heraclitus Ian T. Nabney Machine Learning and Visualisation 21/45

Data Visualisation: Latent Variable Models Latent Variable Models How is the idea of hidden connections applied to statistical pattern recognition? Separate the observed variables and the latent variables. Latent variables generate observations. Use (probabilistic) inference to deduce what is happening in latent variable space. Often use Bayes’ Theorem: P ( L|O ) = P ( O|L ) P ( L ) P ( O ) Static case: GTM. Two latent variables and a non-linear transformation to observation space. Dynamic case: Hidden Markov Models: discrete state space. Speech recognition. State Space Models: continuous state space. Tracking. Ian T. Nabney Machine Learning and Visualisation 22/45

Data Visualisation: Latent Variable Models Visualisation with Density Models Construct a generative model for the data mapping from a low-dimensional latent space H to the data space D . Maps latent variables r to observed variables x giving a probability density p ( x | r ). To visualise the data we want to map from observed variables to latent variables: use Bayes’ theorem to compute p ( r | x ) = p ( x | r ) p ( r ) . p ( x ) Plot a summary statistic of p ( r i | x i ) for each data point x i : usually the mean. If the mapping is linear and there is a single Gaussian noise model, we recover PCA. Ian T. Nabney Machine Learning and Visualisation 23/45

Machine Learning and Visualisation Ian T. Nabney Aston University, - PowerPoint PPT Presentation

Machine Learning and Visualisation Ian T. Nabney Aston University, Birmingham, UK March 2015 Ian T. Nabney Machine Learning and Visualisation 1/45 Outline The challenge of hidden knowledge Data visualisation: latent variable models Data

Science Visualisation Paul Bourke iVEC @ University of Western Australia Contents What is

Science Visualisation Paul Bourke iVEC @ University of Western Australia Contents What is

RESULTS VISUALISATION RESULTS VISUALISATION At the beginning of this course, the large majority

Detection, Analysis and Visualisation of Detection, Analysis and Visualisation of Georeferenced

Data Visualisation with R Data Visualisation with R Workshop Day 1 Workshop Day 1 Scales and

Data Handling: Import, Cleaning and Visualisation Lecture 11: Visualisation and Dynamic Documents

The Diversity of Visualisation: Selected visualisation projects from 2011 Paul Bourke Contents

Computer Graphics: Visualisation Lecture 3 Taku Komura tkomura@inf.ed.ac.uk Institute for

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Graphical models and inference III Milos Hauskrecht milos@pitt.edu 5329 Sennott Square, x4-8845

Protocol-to-Origin Brian Sni ff en Mike Bishop Erik Nygren Rich Salz Current State Who might

Bagging and Boosting Amit Srinet Dave Snyder Outline Bagging Definition Variants Examples

Attitudes to Accents in Britain Ideologies, phonetic detail and the reproduction of accent bias

Ethnic Percentage of Synod Territory and Synod Congregations As of 31 January 2014 100 90 90

1 Peter Series Lesson #145 September 20, 2018 Dean Bible Ministries www.deanbibleministries.org

Course Overview and Introduction Probabilistic Graphical Models Sharif University of Technology

THE BRISTOL BUS BOYCOTT 1963 1. Watch this video on YouTube first