Carlos Ramos Carreo Grupo de Aprendizaje Automtico, Department of - - PowerPoint PPT Presentation

carlos ramos carre o
SMART_READER_LITE
LIVE PREVIEW

Carlos Ramos Carreo Grupo de Aprendizaje Automtico, Department of - - PowerPoint PPT Presentation

Carlos Ramos Carreo Grupo de Aprendizaje Automtico, Department of Computer Science , Universidad Autnoma de Madrid (UAM) Who are we? Carlos Ramos Carreo (carlos.ramos@uam.es) Jos Luis Torrecilla Noguerales


slide-1
SLIDE 1

Carlos Ramos Carreño

Grupo de Aprendizaje Automático, Department of Computer Science , Universidad Autónoma de Madrid (UAM)

slide-2
SLIDE 2

Who are we?

  • Carlos Ramos Carreño (carlos.ramos@uam.es)¹
  • José Luis Torrecilla Noguerales (joseluis.torrecilla@uam.es)²
  • Alberto Suárez (alberto.suarez@uam.es)¹
  • Miguel Carbajo Berrocal
  • Pablo Marcos Manchón
  • Amanda Hernando Bernabé
  • Pablo Pérez Manso

¹ Department of Computer Science , Universidad Autónoma de Madrid (UAM) ² Department of Mathematics, Universidad Autónoma de Madrid (UAM)

2

slide-3
SLIDE 3

What is scikit-fda?

  • A software package for Functional Data Analysis

(FDA)

  • Preprocessing, exploration and machine learning

tools

  • Fully integrated in the Python science ecosystem
  • Efficient, flexible and easy to use

3

slide-4
SLIDE 4

Which other tools for FDA are available?

Mainly R software:

4

  • General purpose

○ fda ○ fda.usc ○ tidyfun

  • Representation

○ funData

  • Registration

○ fdasrvf

  • Robust analysis

○ roahd

  • FPCA

○ fdapace ○ MFPCA

  • Regression

○ refund ○ refund.wave ○ fdaPDE ○ sparseFLMM ○ FDBoost

slide-5
SLIDE 5

Which other tools for FDA are available?

Mainly R software:

5

  • Visualization

○ rainbow

  • Variable selection

○ RFgroove

  • Time series

○ fds ○ ftsa

  • Clustering

○ Funclustering ○ funcy ○ funFEM ○ funHDDC

slide-6
SLIDE 6
  • Powerful, easy to use, generic purpose programming language
  • The Scipy environment:

○ Numpy: N-dimensional arrays and linear algebra ○ SciPy: Utilities (statistics, integration, formats…) ○ Matplotlib: Plotting ○ Jupyter: Interactive notebooks ○ and much more...

Why Python?

6

slide-7
SLIDE 7
  • Scipy Toolkits (SciKits)
  • Specialized science packages:

Why scikit?

7

slide-8
SLIDE 8

8

exploratory analysis representation preprocessing statistical inference machine learning

scikit-fda

slide-9
SLIDE 9

9

representation

basis representation

regularly sampled irregularly sampled

discretized representation

slide-10
SLIDE 10

Discretized representation

Each curve is evaluated at the same points

10

slide-11
SLIDE 11

Basis representation

11

Expansion in a truncated basis of functions

slide-12
SLIDE 12

12

smoothing

preprocessing

registration dimensionality reduction

slide-13
SLIDE 13

Registration

  • Alignment of the curves, so that common features (peaks, valleys...) are

at the same points

  • Typically, a warping function is used to transform the input
  • Several methods

○ Shift registration ○ Landmark registration ○ Elastic registration ○ ...

13

slide-14
SLIDE 14

Shift registration

  • Warpings are translations
  • Try to minimize the least squares criterion

14

slide-15
SLIDE 15

Landmark registration

  • Warping functions to move the predefined landmarks to fixed positions
  • Landmarks should be specified by the user

15

slide-16
SLIDE 16

Elastic registration

  • Uses the square root velocity framework (Srivastava et al., 2011

<arXiv:1103.3817> and Tucker et al., 2014 <doi:10.1016/j.csda.2012.12.001>)

  • Available also in fdasrvf in R
  • Unsupervised method

16

slide-17
SLIDE 17

17

descriptive statistics

exploratory analysis

  • utliers

depth visualization

slide-18
SLIDE 18

Functional data boxplot

  • Similar to the boxplot of univariate data
  • A depth function must be chosen

18

slide-19
SLIDE 19

19

statistical inference

estimation confidence intervals statistical hypothesis testing

slide-20
SLIDE 20

20

clustering

machine learning

regression classification

slide-21
SLIDE 21

K-means clustering

  • Predefined number of clusters
  • Finds the best position of the centroids of the clusters
  • A functional metric must be chosen

21

slide-22
SLIDE 22

Fuzzy K-means

  • Fuzzy version of K-means
  • Each observation does not necessary belong to only one of the clusters:

it has a degree of membership to each of them

  • The degrees of membership add up to one

22

slide-23
SLIDE 23

Documentation

  • Up to date and available online
  • Easily searchable
  • Cross referenced
  • Detailed examples and interactive notebooks
  • Examples downloadable as Python source files or

Jupyter notebook

23

slide-24
SLIDE 24

Where can I find more?

PyPI: https://pypi.org/project/scikit-fda/ Github page: https://github.com/GAA-UAM/scikit-fda/ Documentation: https://fda.readthedocs.io

24

slide-25
SLIDE 25

Thanks for your attention!!

25