Computational Forensics: Machine Learning and Predictive Analytics - PowerPoint PPT Presentation

(Internal) Model Complexity 47 MC-0

0 th Order Polynomial Regression estimated model MC-2 48

1 st Order Polynomial 49 MC-2

3 rd Order 3 3 50 MC-3

9 th Order What Happened?! 51 MC-6

Model Complexity • Curse of Dimensionality (Too Much Complexity) • Overfitting 52 MC-7

Training Performance Evaluation 53 MC-8

The Machine Learning Process Output Evaluation Feature Learning/Adaptation Preprocessing Training Data Extraction/Selection Internal Model Feature Classification/ Preprocessing Testing Data Extraction/Selection Regression Application T&T-1 54

Training Data, Testing Data & Over-fitting 55 MC-9

A Central Principle in ML • The model complexity drives the training data requirements! 56 MC-10

More Data Can Fix Overfitting Problem • N= 10 Data Points • N= 15 Data Points • N= 100 Data Points 57 MC-11

Curse of Dimensionality (Model Complexity) 58 MC-12

• More complex problems, require more complex models • More complex models, require more complex feature spaces – Need higher dimensionality to get good class separation Wood classifier with 1D feature space? Grain Prominence Wood Brightness 59 MC-13

Distance Metrics 60 DM-0

The Distance Metric • How the similarity of two elements in a set is determined, e.g. – Euclidean Distance – Inner Product (Vector Spaces) – Manhattan Distance – Maximum Norm – Mahalanobis Distance – Hamming Distance – Or any metric you define over the space … 61 DM-1

Manhattan Distance https://www.quora.com/What-is-the-difference-between-Manhattan-and- Euclidean-distance-measures 62 DM-2

Far From Normal? y X X X X X X X X X X X X X X X X X X X X X x Center = Mean Spread = Variance 63 DM-3

Mahalanobis Distance http://www.jennessent.com/arcview/mahalanobis_description.htm 64 DM-4

Mahalanobis Distance http://stats.stackexchange.com/questions/62092/bottom-to-top- explanation-of-the-mahalanobis-distance 65 DM-5

Unsupervised Learning 66 U-0

Clustering • Partitional • Hierarchical 67 U-C-1

Anomaly Detection with Unlabelled Data Packet Size X X X X X X X X X X X X X X X X X X X X X Packet Data Size 68 U-C-1

Recap of Wood Classification – 2 Optical Attributes or Features • Brightness • Grain prominence – Yielded a 2-Dimensional Feature Space – We had SUPERVISED learning: • We started with known pieces of wood • Gave each plotted training example its class LABEL – We chose our features well, we saw good clustering/separation of the different classes in the features space. 69 U-C-2

Unlabelled Data Brightness 10 X X X X X X X X X X X X X X X Grain Prominence 0 1 70 U-C-3

Partitional Clustering U-C-3 71

Hierarchical Clustering: Corpus browsing www.yahoo.com/Science … (30) agriculture biology physics CS space ... ... ... ... ... dairy botany cell AI courses crops craft magnetism HCI missions agronomy evolution forestry relativity U-C-3

Essentials of Clustering • Similarities – Natural Associations – Proximate* • Differences – Distant* *Implies a distance metric 73 U-C-3

Essentials of Clustering • What is a “Good” Cluster? –Members are very “similar” to each other • Within Cluster Divergence Metric σ i – Variance also works • Relative Cluster Sizes versus Data Spread 74 U-C-4

Partitional Clustering Methods • K-Means Clustering • Gaussian Mixture Models • Canopy Clustering • Vector Quantization 75 U-C-5

Unsupervised Learning/Clustering Self Organizing Maps (SOM) 76 U-C-7

SOMs Topology Preserving Projections http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 77 U-C-8

http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 78 U-C-9

Topology Preserving Projections http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 79 U-C-10

Topology Preserving Projections • How will the distance metric handle polymorphous data? – Units of time (different units of time?) • Sprint performance data: years of age and seconds to finish – Units of space • (meters, lightyears) • Surface area • Volumetric – Units of mass (grams, kilograms, tonnes) – Units of $$$ • NOK • USD 80 U-C-11

Proximity By Colour and Location Poverty Map of the World (1997) http://www.cis.hut.fi/research/som-research/worldmap.html 81 U-C-12

Map of Labels in Titles From comp.ai.neural-nets-news newsgroup www.cs.hmc.edu/courses/2003/ fall/cs152/slides/som.pdf 82 U-C-13

Learning As Search 83 LAS-0

• Exhaustive search – DFS – BFS • Gradient search – Can Get Stuck in Local Optimal Solution • Simulated annealing – Avoids Local Optima • Genetic algorithms 84 LAS-1

Exact vs Approximate Search • Exact: – Hashing techniques – S tring matching (“Murder”) • Approximate: – Approximate Hashing – Partial strings – Elastic Search • “murder” • “ merder ” 85 LAS-7

Artificial Neural Networks (ANN) 86 ANN-0

Inspired by Natural Neural Nets 87 ANN-1

Perceptron (1950s) 88 ANN-2

Perceptron Can Learn Simple Boolean Logic Single Boundary, Linearly Separable 89 ANN-03

Perceptron Cannot Learn XOR 90 ANN-4

Multi-Layer Perceptron Error Back-Propagation Network MLP-BP 91 ANN-5

MLP-BP Internal Model Building Block 5 MLP-BP Neurons 92 ANN-7

MLP- BP “Universal Voxel” 93 ANN-8

NeuroFuzzy Methods 94 NF-0

Neuro Fuzzy Overview • Neuro-Fuzzy (NF) is a hybrid intelligence / soft computing – (*Soft?) • A combination of Artificial Neural NetworkS (ANN) and Fuzzy Logic (FL) • Opposite of fuzzy logic is – Crisp – Sharp • ANN are black box statistics, modelled to simulate the activity of biological neurons • FL extracts human-explainable linguistic fuzzy rules • Applications in Decision Support Systems and Expert Systems 95 NF-1

Fuzzy Basics • FL uses linguistic variables that can contains several linguistic terms • Temperature (linguistic variable) – Hot (linguistic terms) – Warm – Cold • Consistency (linguistic variable) – Watery (linguistic terms) – Gooey – Soft – Firm – Hard – Crunchy – Crispy 96 NF-2

Triangular Fuzzy Membership Functions http://sci2s.ugr.es/keel/links.php 97 NF-3

98 Fuzzy Inference ● Sharp antecedent: “If the tomato is red, then it is sweet” ● Fuzzy antecedent: ● “If the piece of wood is more or less dark ( μ dark = 0.7 )” ● Fuzzy consequent(s): ● “The piece of is more of less pine ( μ pine = 0.64 )” ● “The piece of is more of less birch ( μ birch = 0.36 )” http://ispac.diet.uniroma1.it/scarpiniti/files/NNs/Less9.pdf NF-4

Combining ANN/FL ● ANN black box approach requires sufficient data to find the structure (generalization learning) ● NO PRIORS required ● But cannot extract linguistically meaningful rules from trained ANN ● Fuzzy rules require prior knowledge ● Based on linguistically meaningful rules http://www.scholarpedia.org/article/Fuzzy_neural_network 99 NF-5

Combining ANN/FL Combining the two gives us higher level of system ● intelligence Intelligence(?) ● Can handle the usual ML tasks ● (regression, classification, etc) ● http://www.scholarpedia.org/article/Fuzzy_neural_network 100 NF-6

Computational Forensics: Machine Learning and Predictive Analytics - PowerPoint PPT Presentation

Fundamentals of Computational Forensics: Machine Learning and Predictive Analytics Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group NTNU Testimon Digital Forensics Group Cyber Threat Intelligence and

CSE 469: Computer and Network Forensics Topic 5: Image Forensics Dr. Mike Mabey | Spring 2019

CSE 469: Computer and Network Forensics Topic 1: Forensics Intro Dr. Mike Mabey | Spring 2019

CSN08101 Digital Forensics Lecture 1A: Introduction to Forensics Lecture 1A: Introduction to

CSE 469: Computer and Network Forensics Topic 7: Mobile Forensics Dr. Mike Mabey | Spring 2019

Image Forensics of High Dynamic Range Imaging 10th International Workshop on Digital-Forensics

Introduction Why is the Study of Digital Forensics Relevant? What is Digital/Computer

About this presentation : Learning : What is Digital Forensics ? Political : Digital

2015-2017 (c) P.Pale: Computer Forensics 2015-10-17 File System Forensics A New York

CSE 469: Computer and Network Forensics Topic 6: Email Forensics Dr. Mike Mabey | Spring 2019

Teaching digital forensics in a large class Teaching forensics at of students UL FRI

SQL SERVER Anti-Forensics Cesar Cerrudo Introduction Sophisticated attacks requires leaving

CSE 469: Computer and Network Forensics Topic 9: Semester Review Dr. Mike Mabey | Spring 2019

Android: forensics and reverse engineering Raphal Rigo - ANSSI 26/11/2010 Agence nationale de

CSE 469: Computer and Network Forensics Topic 8: Cloud and Web Forensics Dr. Mike Mabey | Spring

Nuix Workshop Introduction to Forensics W HAT IS C OMPUTER F ORENSICS ? Computer

Electronic Discovery Electronic Discovery & Digital Forensics & Digital Forensics

Helicopter Routing in the Norwegian Offshore Oil Industry: Including Safety Concerns for

Making the Common Core State Standards (CCSS) Accessible with Technology Ron Twitchell

platforms R Kaplan MRC Clinical Trials Unit at UCL R Kaplan NCI-MATCH trial R Kaplan

Different approaches on normalisation of gene expression RT-qPCR data Jan Hellemans PhD, Ghent

Pattern Based Packet Filtering using NetFPGA in DETER Infrastructure Article CITATIONS READS 3

Landau gauge gluon and ghost propagators from the lattice point of view M. M uller-Preussker

Metrics-Driven Engineering Mike Brittain @ mikebrittain Director of engineering, Infrastructure

Common code system for the lattice QCD simulations Shinji MOTOKI (KEK, A04 team) for Bridge++

Computational Forensics: Machine Learning and Predictive Analytics - PowerPoint PPT Presentation

Fundamentals of Computational Forensics: Machine Learning and Predictive Analytics Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group NTNU Testimon Digital Forensics Group Cyber Threat Intelligence and

CSE 469: Computer and Network Forensics Topic 5: Image Forensics Dr. Mike Mabey | Spring 2019

CSE 469: Computer and Network Forensics Topic 1: Forensics Intro Dr. Mike Mabey | Spring 2019

CSN08101 Digital Forensics Lecture 1A: Introduction to Forensics Lecture 1A: Introduction to

CSE 469: Computer and Network Forensics Topic 7: Mobile Forensics Dr. Mike Mabey | Spring 2019

Image Forensics of High Dynamic Range Imaging 10th International Workshop on Digital-Forensics

Introduction Why is the Study of Digital Forensics Relevant? What is Digital/Computer

About this presentation : Learning : What is Digital Forensics ? Political : Digital

2015-2017 (c) P.Pale: Computer Forensics 2015-10-17 File System Forensics A New York

CSE 469: Computer and Network Forensics Topic 6: Email Forensics Dr. Mike Mabey | Spring 2019

Teaching digital forensics in a large class Teaching forensics at of students UL FRI

SQL SERVER Anti-Forensics Cesar Cerrudo Introduction Sophisticated attacks requires leaving

CSE 469: Computer and Network Forensics Topic 9: Semester Review Dr. Mike Mabey | Spring 2019

Android: forensics and reverse engineering Raphal Rigo - ANSSI 26/11/2010 Agence nationale de

CSE 469: Computer and Network Forensics Topic 8: Cloud and Web Forensics Dr. Mike Mabey | Spring

Nuix Workshop Introduction to Forensics W HAT IS C OMPUTER F ORENSICS ? Computer

Electronic Discovery Electronic Discovery &amp; Digital Forensics &amp; Digital Forensics

Helicopter Routing in the Norwegian Offshore Oil Industry: Including Safety Concerns for

Making the Common Core State Standards (CCSS) Accessible with Technology Ron Twitchell

platforms R Kaplan MRC Clinical Trials Unit at UCL R Kaplan NCI-MATCH trial R Kaplan

Different approaches on normalisation of gene expression RT-qPCR data Jan Hellemans PhD, Ghent

Pattern Based Packet Filtering using NetFPGA in DETER Infrastructure Article CITATIONS READS 3

Landau gauge gluon and ghost propagators from the lattice point of view M. M uller-Preussker

Metrics-Driven Engineering Mike Brittain @ mikebrittain Director of engineering, Infrastructure

Common code system for the lattice QCD simulations Shinji MOTOKI (KEK, A04 team) for Bridge++

Electronic Discovery Electronic Discovery & Digital Forensics & Digital Forensics