# Computational Forensics: Machine Learning and Predictive Analytics - PowerPoint PPT Presentation

## Fundamentals of Computational Forensics: Machine Learning and Predictive Analytics Carl Stuart Leichter PhD carl.leichter@ntnu.no NTNU Testimon Digital Forensics Group NTNU Testimon Digital Forensics Group Cyber Threat Intelligence and

1. (Internal) Model Complexity 47 MC-0

2. 0 th Order Polynomial Regression estimated model MC-2 48

3. 1 st Order Polynomial 49 MC-2

4. 3 rd Order 3 3 50 MC-3

5. 9 th Order What Happened?! 51 MC-6

6. Model Complexity • Curse of Dimensionality (Too Much Complexity) • Overfitting 52 MC-7

7. Training Performance Evaluation 53 MC-8

8. The Machine Learning Process Output Evaluation Feature Learning/Adaptation Preprocessing Training Data Extraction/Selection Internal Model Feature Classification/ Preprocessing Testing Data Extraction/Selection Regression Application T&T-1 54

9. Training Data, Testing Data & Over-fitting 55 MC-9

10. A Central Principle in ML • The model complexity drives the training data requirements! 56 MC-10

11. More Data Can Fix Overfitting Problem • N= 10 Data Points • N= 15 Data Points • N= 100 Data Points 57 MC-11

12. Curse of Dimensionality (Model Complexity) 58 MC-12

13. • More complex problems, require more complex models • More complex models, require more complex feature spaces – Need higher dimensionality to get good class separation Wood classifier with 1D feature space? Grain Prominence Wood Brightness 59 MC-13

14. Distance Metrics 60 DM-0

15. The Distance Metric • How the similarity of two elements in a set is determined, e.g. – Euclidean Distance – Inner Product (Vector Spaces) – Manhattan Distance – Maximum Norm – Mahalanobis Distance – Hamming Distance – Or any metric you define over the space … 61 DM-1

16. Manhattan Distance https://www.quora.com/What-is-the-difference-between-Manhattan-and- Euclidean-distance-measures 62 DM-2

17. Far From Normal? y X X X X X X X X X X X X X X X X X X X X X x Center = Mean Spread = Variance 63 DM-3

18. Mahalanobis Distance http://www.jennessent.com/arcview/mahalanobis_description.htm 64 DM-4

19. Mahalanobis Distance http://stats.stackexchange.com/questions/62092/bottom-to-top- explanation-of-the-mahalanobis-distance 65 DM-5

20. Unsupervised Learning 66 U-0

21. Clustering • Partitional • Hierarchical 67 U-C-1

22. Anomaly Detection with Unlabelled Data Packet Size X X X X X X X X X X X X X X X X X X X X X Packet Data Size 68 U-C-1

23. Recap of Wood Classification – 2 Optical Attributes or Features • Brightness • Grain prominence – Yielded a 2-Dimensional Feature Space – We had SUPERVISED learning: • We started with known pieces of wood • Gave each plotted training example its class LABEL – We chose our features well, we saw good clustering/separation of the different classes in the features space. 69 U-C-2

24. Unlabelled Data Brightness 10 X X X X X X X X X X X X X X X Grain Prominence 0 1 70 U-C-3

25. Partitional Clustering U-C-3 71

26. Hierarchical Clustering: Corpus browsing www.yahoo.com/Science … (30) agriculture biology physics CS space ... ... ... ... ... dairy botany cell AI courses crops craft magnetism HCI missions agronomy evolution forestry relativity U-C-3

27. Essentials of Clustering • Similarities – Natural Associations – Proximate* • Differences – Distant* *Implies a distance metric 73 U-C-3

28. Essentials of Clustering • What is a “Good” Cluster? –Members are very “similar” to each other • Within Cluster Divergence Metric σ i – Variance also works • Relative Cluster Sizes versus Data Spread 74 U-C-4

29. Partitional Clustering Methods • K-Means Clustering • Gaussian Mixture Models • Canopy Clustering • Vector Quantization 75 U-C-5

30. Unsupervised Learning/Clustering Self Organizing Maps (SOM) 76 U-C-7

31. SOMs Topology Preserving Projections http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 77 U-C-8

32. http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 78 U-C-9

33. Topology Preserving Projections http://www.cita.utoronto.ca/~murray/GLG130/Exercises/F2.gif 79 U-C-10

34. Topology Preserving Projections • How will the distance metric handle polymorphous data? – Units of time (different units of time?) • Sprint performance data: years of age and seconds to finish – Units of space • (meters, lightyears) • Surface area • Volumetric – Units of mass (grams, kilograms, tonnes) – Units of \$\$\$ • NOK • USD 80 U-C-11

35. Proximity By Colour and Location Poverty Map of the World (1997) http://www.cis.hut.fi/research/som-research/worldmap.html 81 U-C-12

36. Map of Labels in Titles From comp.ai.neural-nets-news newsgroup www.cs.hmc.edu/courses/2003/ fall/cs152/slides/som.pdf 82 U-C-13

37. Learning As Search 83 LAS-0

38. • Exhaustive search – DFS – BFS • Gradient search – Can Get Stuck in Local Optimal Solution • Simulated annealing – Avoids Local Optima • Genetic algorithms 84 LAS-1

39. Exact vs Approximate Search • Exact: – Hashing techniques – S tring matching (“Murder”) • Approximate: – Approximate Hashing – Partial strings – Elastic Search • “murder” • “ merder ” 85 LAS-7

40. Artificial Neural Networks (ANN) 86 ANN-0

41. Inspired by Natural Neural Nets 87 ANN-1

42. Perceptron (1950s) 88 ANN-2

43. Perceptron Can Learn Simple Boolean Logic Single Boundary, Linearly Separable 89 ANN-03

44. Perceptron Cannot Learn XOR 90 ANN-4

45. Multi-Layer Perceptron Error Back-Propagation Network MLP-BP 91 ANN-5

46. MLP-BP Internal Model Building Block 5 MLP-BP Neurons 92 ANN-7

47. MLP- BP “Universal Voxel” 93 ANN-8

48. NeuroFuzzy Methods 94 NF-0

49. Neuro Fuzzy Overview • Neuro-Fuzzy (NF) is a hybrid intelligence / soft computing – (*Soft?) • A combination of Artificial Neural NetworkS (ANN) and Fuzzy Logic (FL) • Opposite of fuzzy logic is – Crisp – Sharp • ANN are black box statistics, modelled to simulate the activity of biological neurons • FL extracts human-explainable linguistic fuzzy rules • Applications in Decision Support Systems and Expert Systems 95 NF-1

50. Fuzzy Basics • FL uses linguistic variables that can contains several linguistic terms • Temperature (linguistic variable) – Hot (linguistic terms) – Warm – Cold • Consistency (linguistic variable) – Watery (linguistic terms) – Gooey – Soft – Firm – Hard – Crunchy – Crispy 96 NF-2

51. Triangular Fuzzy Membership Functions http://sci2s.ugr.es/keel/links.php 97 NF-3

52. 98 Fuzzy Inference ● Sharp antecedent: “If the tomato is red, then it is sweet” ● Fuzzy antecedent: ● “If the piece of wood is more or less dark ( μ dark = 0.7 )” ● Fuzzy consequent(s): ● “The piece of is more of less pine ( μ pine = 0.64 )” ● “The piece of is more of less birch ( μ birch = 0.36 )” http://ispac.diet.uniroma1.it/scarpiniti/files/NNs/Less9.pdf NF-4

53. Combining ANN/FL ● ANN black box approach requires sufficient data to find the structure (generalization learning) ● NO PRIORS required ● But cannot extract linguistically meaningful rules from trained ANN ● Fuzzy rules require prior knowledge ● Based on linguistically meaningful rules http://www.scholarpedia.org/article/Fuzzy_neural_network 99 NF-5

54. Combining ANN/FL Combining the two gives us higher level of system ● intelligence Intelligence(?) ● Can handle the usual ML tasks ● (regression, classification, etc) ● http://www.scholarpedia.org/article/Fuzzy_neural_network 100 NF-6