Data Mining meets Football (soccer) Ulf Brefeld Knowledge Mining - - PowerPoint PPT Presentation

data mining meets football
SMART_READER_LITE
LIVE PREVIEW

Data Mining meets Football (soccer) Ulf Brefeld Knowledge Mining - - PowerPoint PPT Presentation

Data Mining meets Football (soccer) Ulf Brefeld Knowledge Mining & Assessment TU Darmstadt / DIPF brefeld@cs.tu-darmstadt.de Data Mining meets Football (soccer) Ulf Brefeld Machine Learning Group Leuphana University of Lneburg


slide-1
SLIDE 1

Data Mining meets Football

Ulf Brefeld

Knowledge Mining & Assessment TU Darmstadt / DIPF

brefeld@cs.tu-darmstadt.de

(soccer)

slide-2
SLIDE 2

Data Mining meets Football

Ulf Brefeld

Machine Learning Group Leuphana University of Lüneburg (soccer)

slide-3
SLIDE 3

Ulf Brefeld Knowledge Mining & Assessment Group 3

Personalisation Recommendations Information Extraction & Aggregation

Machine Learning / Data Mining

slide-4
SLIDE 4

Ulf Brefeld Knowledge Mining & Assessment Group 4

Personalisation Recommendations Information Extraction & Aggregation

Machine Learning / Data Mining

Sports Analytics

slide-5
SLIDE 5

Ulf Brefeld Knowledge Mining & Assessment Group 5 http://www.ruhrnachrichten.de/storage/pic/mdhl/artikelbilder/sport/4081417_1_Bayern1.jpg?version=1387208424

On average 43,502 attendees per game 13.31m attendees per season

German Bundesliga

slide-6
SLIDE 6

Ulf Brefeld Knowledge Mining & Assessment Group

Monetary Aspects

6 http://www.statista.com/topics/1774/bundesliga/

Revenue of European soccer market €19.90bn Revenue of German Bundesliga €2,172.59m German Bundesliga total value of player assets €413.77m FC Bayern Munich brand value €794.60m FC Bayern Munich profit after tax €14.00m

slide-7
SLIDE 7

Ulf Brefeld Knowledge Mining & Assessment Group

Traditional Sports Analytics

๏ Monetary aspects ๏ Statistics to serve information needs…

7

slide-8
SLIDE 8

Ulf Brefeld Knowledge Mining & Assessment Group

Descriptive Statistics

8 season #players season #goals

slide-9
SLIDE 9

Ulf Brefeld Knowledge Mining & Assessment Group

Distribution of Goals

9 home team away team

slide-10
SLIDE 10

Ulf Brefeld Knowledge Mining & Assessment Group

Yellow Cards

10

slide-11
SLIDE 11

Ulf Brefeld Knowledge Mining & Assessment Group

Average Player Value

11 season

incomplete data

values in €

slide-12
SLIDE 12

Ulf Brefeld Knowledge Mining & Assessment Group

๏ Yeah, interesting… but what does it tell us?

12

slide-13
SLIDE 13

Ulf Brefeld Knowledge Mining & Assessment Group 13

slide-14
SLIDE 14

Ulf Brefeld Knowledge Mining & Assessment Group

“B. Charlton v F. Beckenbauer”, David Marsh

14

1966 World Cup Final, England - W. Germany

slide-15
SLIDE 15

Ulf Brefeld Knowledge Mining & Assessment Group

Trajectories and Tactics

๏ Understanding player movements is a precondition

for analysing game strategy (i.e., tactics)

15

slide-16
SLIDE 16

Ulf Brefeld Knowledge Mining & Assessment Group

Player Trajectory Data

๏ Cameras capture positions of players and ball* ๏ x,y,(z) coordinates ๏ ≥24 frames p second ๏ Manually denoised (corners, mass confrontations,…) ๏ Players annotated ๏ Perfect data for analysing movements, coordination,

tactics, etc.

16 * Referee also tracked and recorded but data usually kept private

slide-17
SLIDE 17

Ulf Brefeld Knowledge Mining & Assessment Group

Ball touches of Franck Ribery

(FCB vs BMG, season 2013/14) 17

slide-18
SLIDE 18

Ulf Brefeld Knowledge Mining & Assessment Group

Shots leading to Goals

(season 2009/10 - 2013/14) 18

slide-19
SLIDE 19

Ulf Brefeld Knowledge Mining & Assessment Group

Goalmouth Coordinates (penalties)

19

slide-20
SLIDE 20

Ulf Brefeld Knowledge Mining & Assessment Group

๏ Hm… still, what does it tell us?

20

slide-21
SLIDE 21

Ulf Brefeld Knowledge Mining & Assessment Group

Use Cases

๏ Analyse opponent tactics ๏ Detect strengths/weaknesses in strategy ๏ Automatic game plans ๏ Serious games / training ๏ Player scouting ๏ Improved media coverage ๏ …

21

slide-22
SLIDE 22

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

22

A B C

slide-23
SLIDE 23

Ulf Brefeld Knowledge Mining & Assessment Group

Why is it difficult?

๏ >3 million positions per game ๏ Every player generates ≈135000 positions per game ๏ There are ≈13500023 different candidate patterns* ๏ This is considerably larger than the number of atoms

in our galaxy**

๏ Explicit enumeration infeasible ๏ What similarity measure to use?

23

* Ignoring the fact that patterns are of different lengths ** Dark and exotic matter already included

slide-24
SLIDE 24

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

24

A B C

slide-25
SLIDE 25

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

25

A B C

๏ frequent ๏ rare (anomalies/

  • utliers)

๏ predefined (e.g.,

match plan, training)

๏ …

slide-26
SLIDE 26

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

26

A B C

๏ frequent ๏ rare (anomalies/

  • utliers)

๏ predefined (e.g.,

match plan, training)

๏ …

slide-27
SLIDE 27

Ulf Brefeld Knowledge Mining & Assessment Group

Representation

๏ Position = player coordinates on the pitch ๏ A game of soccer = positional data stream ๏ Player trajectory = sequence of consecutive positions ๏ Positions represented by angles wrt reference vector

vref (translation, rotation, scale invariant)

27

αi = sign(vi, vref)  cos1 ✓ v>

i vref

kvik kvrefk ◆

Vlachos et al. (KDD, 2004)

slide-28
SLIDE 28

Ulf Brefeld Knowledge Mining & Assessment Group

Dynamic Time Warping

Rabiner & Juang (1993)

๏ Movements should be independent of player speed ๏ Dynamic time warping compensates phase shifts ๏ Distance measure ๏ DTW for sequences s and q defined recursively

28

g(;, ;) = 0 g(s, ;) = dist(;, q) = 1 g(s, q) = dist(s1, q1) + min 8 < : g(s, hq2, . . . , qmi) g(hs2, . . . , smi, q) g(hs2, . . . , smi, hq2, . . . , qmi) 9 = ;

h i function dist : R⇥R ! R (e.g.,

slide-29
SLIDE 29

Ulf Brefeld Knowledge Mining & Assessment Group

Dynamic Time Warping

Rabiner & Juang (1993)

๏ Movements should be independent of player speed ๏ Dynamic time warping compensates phase shifts ๏ Distance measure ๏ DTW for sequences s and q defined recursively

29

g(;, ;) = 0 g(s, ;) = dist(;, q) = 1 g(s, q) = dist(s1, q1) + min 8 < : g(s, hq2, . . . , qmi) g(hs2, . . . , smi, q) g(hs2, . . . , smi, hq2, . . . , qmi) 9 = ;

O(|s||q|)

h i function dist : R⇥R ! R (e.g.,

slide-30
SLIDE 30

Ulf Brefeld Knowledge Mining & Assessment Group

Approximate DTW

๏ Approximate DTW by lower bounds ๏ Focus on characteristic values ๏ Kim et al. (ICDE, 2001) ๏ first, last, greatest, smallest value ๏ Keogh (VLDB, 2002) ๏ minimum/maximum values of subsequences ๏ Complexity in O(|s|)

30

i.e., f(s, q) ≤ g(s, q), ficiently computed than [10].

slide-31
SLIDE 31

Ulf Brefeld Knowledge Mining & Assessment Group

Locality Sensitive Hashing

Athitsos et al. (2008), Gionis et al., (1999)

๏ Distance-based hash function ๏ Bucket determined by ๏ Set of admissible intervals

31

h : D ! R

∈ D hs1,s2(s) = dist(s, s1)2 + dist(s1, s2)2 − dist(s, s2)2 2 dist(s1, s2) .

use Kim et al. (ICDE, 2001) as distance function s1 and s2 randomly drawn from database

T T (s1, s2) = n [t1, t2] : PrD(h[t1,t2]

s1,s2 (s)) = 0) = PrD(h[t1,t2] s1,s2 (s)) = 1)

  • h[t1,t2]

s1,s2 (s) =

⇢1 : hs1,s2(s) ∈ [t1, t2] 0 :

  • therwise
slide-32
SLIDE 32

Ulf Brefeld Knowledge Mining & Assessment Group

Computing Similarities

๏ Remainder needs test for identity ๏ Use outcomes of ๏ Dynamic time warping ๏ Approximate DTW ๏ Locality sensitive hashing (buckets) ๏ … together with similarity threshold

32

slide-33
SLIDE 33

Ulf Brefeld Knowledge Mining & Assessment Group

Episode Discovery

๏ Apriori-based algorithms ๏ Approach based on Achar et al. (2012) ๏ Distributed implementation scheme (Hadoop) ๏ Two phases ๏ Candidate generation (Mapper) ๏ Counting (Reducer)

33

slide-34
SLIDE 34

Ulf Brefeld Knowledge Mining & Assessment Group

Empirical Evaluation

๏ DEBS Grand Challenge ๏ 8 vs. 8 soccer game recorded by Fraunhofer IIS ๏ In total 33 sensors ๏ 1 sensor per shoe (200Hz) ๏ 1 sensor in the ball (2000Hz) ๏ 15,000 positions per second (3 dimensional)

34

http://www.orgs.ttu.edu/debs2013/index.php?goto=cfchallengedetails

slide-35
SLIDE 35

Ulf Brefeld Knowledge Mining & Assessment Group

Representation

๏ Further preprocessing: ๏ Discarding positions outside of the pitch ๏ Removing half-time effect of changing sides ๏ Averaging player positions over 100ms ๏ Trajectory windows of size 10

35 t+1 t+2

slide-36
SLIDE 36

Ulf Brefeld Knowledge Mining & Assessment Group

Evaluation

๏ Given: a query trajectory ๏ Task: Find near-duplicates ๏ (i.e., N=1000 most similar trajectories) ๏ Focus on 15k consecutive positions of one player ๏ (for baseline comparisons)

36

slide-37
SLIDE 37

Ulf Brefeld Knowledge Mining & Assessment Group

Run-time

๏ Exact computation infeasible ๏ Dynamic time warping very effective ๏ LSH adds only little

37

slide-38
SLIDE 38

Ulf Brefeld Knowledge Mining & Assessment Group

LSH Accuracy

๏ On average LSH performs very accurate ๏ However, worst cases clearly inappropriate

38

slide-39
SLIDE 39

Ulf Brefeld Knowledge Mining & Assessment Group

Exemplary Retrieval

39

slide-40
SLIDE 40

Ulf Brefeld Knowledge Mining & Assessment Group

Exemplary Pattern

๏ Ball is played towards

  • pponent goal (black)

๏ Trajectories in pattern

visualised by thick lines (dot indicates beginning)

๏ Players 1,2,3,6 and 7 move

in direction of ball

40 direction of play

slide-41
SLIDE 41

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

41

A B C

slide-42
SLIDE 42

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

42

A B C

๏ frequent ๏ rare (anomalies/

  • utliers)

๏ predefined (e.g.,

match plan, training)

๏ …

slide-43
SLIDE 43

Ulf Brefeld Knowledge Mining & Assessment Group

Identifying Patterns

๏ Pattern = “interesting” event ๏ E.g., A plays 1-2 with B and crosses to C

43

A B C

๏ frequent ๏ rare (anomalies/

  • utliers)

๏ predefined (e.g.,

match plan, training)

๏ …

slide-44
SLIDE 44

Ulf Brefeld Knowledge Mining & Assessment Group

Patterns / Events

๏ Individual level ๏ Group level ๏ Team level

44

slide-45
SLIDE 45

Ulf Brefeld Knowledge Mining & Assessment Group

Patterns / Events

๏ Individual level ๏ Group level ๏ Team level

45

๏ 4 defence players

⇾ game initiations

๏ 4 offence players

⇾ scoring opportunities

slide-46
SLIDE 46

Ulf Brefeld Knowledge Mining & Assessment Group

Spatio-temporal Convolution Kernels

๏ Tailored similarity measure for multi-trajectory

scenarios

๏ Separate data from algorithm, eg., works with every

kernel machine (SVMs, kPCA, kernel kMeans, etc.)

๏ But: Complexity

46 Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

O(N 2L2)

length of trajectories number of trajectories cheap temporal kernel expensive spatial kernel

slide-47
SLIDE 47

Ulf Brefeld Knowledge Mining & Assessment Group

Approximate STCKs

๏ Efficient approximation of exact kernel ๏ Idea: Use cheap temporal kernel as filter ๏ Evaluate spatial kernel by percental approximation

47 time in seconds length of trajectories approximate kernel exact kernel baselines Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-48
SLIDE 48

Ulf Brefeld Knowledge Mining & Assessment Group

Empirical Results

๏ VIS.TRACK data, Bundesliga season 2011/12 ๏ Two teams (5 games each) ๏ Cluster analysis w k-medoids ๏ Game initiations (start: goal keeper has ball) ๏ Scoring opportunities (end: ball in dangerous

zone)

48 Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-49
SLIDE 49

Ulf Brefeld Knowledge Mining & Assessment Group 49

median trajectories of ball

Example

distribution of lengths of trajectories colours encode clusters

  • ptimal number of

clusters determined by multiple criteria length

slide-50
SLIDE 50

Ulf Brefeld Knowledge Mining & Assessment Group

Game Initiations

๏ Team A known for ๏ Transporting the ball with few but rehearsed

short game initiations to the opposing half

๏ Many ball contacts, different players integrated ๏ Team B’s strategy ๏ Focused on increasingly long and straight balls ๏ Few players involved on average

50

slide-51
SLIDE 51

Ulf Brefeld Knowledge Mining & Assessment Group

Game Initiations

51

Bundesliga Team A Bundesliga Team B

length length Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-52
SLIDE 52

Ulf Brefeld Knowledge Mining & Assessment Group

Game Initiations

52

Bundesliga Team A Bundesliga Team B

length length

many players short passes

Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-53
SLIDE 53

Ulf Brefeld Knowledge Mining & Assessment Group

Game Initiations

53

Bundesliga Team A Bundesliga Team B

length length

many players short passes short trajectories, long straight balls

Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-54
SLIDE 54

Ulf Brefeld Knowledge Mining & Assessment Group

Scoring Opportunities

๏ Team A: ๏ Aimed at quickly scoring a goal in the opposing

half, i.e., few ball contacts, faster ball transport in the zone of danger

๏ Team B: ๏ Many ball contacts, took their time in waiting for

a mistake of the opponent and only then played in the zone of danger to achieve a goal

54

slide-55
SLIDE 55

Ulf Brefeld Knowledge Mining & Assessment Group

Scoring Opportunities

55

Bundesliga Team A Bundesliga Team B

length length Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-56
SLIDE 56

Ulf Brefeld Knowledge Mining & Assessment Group

Scoring Opportunities

56

Bundesliga Team A Bundesliga Team B

length length

short trajectories, few ball contacts

Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-57
SLIDE 57

Ulf Brefeld Knowledge Mining & Assessment Group

Scoring Opportunities

57

Bundesliga Team A Bundesliga Team B

length length

short trajectories, few ball contacts longer trajectories, waiting for mistakes

Knauf, Memmert & Brefeld, Spatio-temporal Convolution Kernels, Machine Learning Journal, 2015

slide-58
SLIDE 58

Ulf Brefeld Knowledge Mining & Assessment Group

DMKD Special Issue on Sports Analytics

58

๏ Goal is to publish special issue in 2016 ๏ Cfp end of September 2015 ๏ Submission deadline end of December 2015 ๏ Inquiries: ๏ albrecht.zimmermann@insa-lyon.fr ๏ brefeld@cs.tu-darmstadt.de

(together with Albrecht Zimmermann)

slide-59
SLIDE 59

Ulf Brefeld Knowledge Mining & Assessment Group

Wrap-Up: Trajectory Data

๏ Analysing trajectories of players it the key to

analysing coordination in team sports

๏ Potential use cases go far beyond heat maps ๏ Inherent complexity renders tasks challenging ๏ Adapt existing large-scale algorithms to data ๏ Exploit prior knowledge

59

slide-60
SLIDE 60

Ulf Brefeld Knowledge Mining & Assessment Group

Mapper: Candidate Generation

๏ Combine existing episodes that differ only in a

single position

60

slide-61
SLIDE 61

Ulf Brefeld Knowledge Mining & Assessment Group

Reducer: Counting

๏ FSA for every possible realisation of a known episode ๏ An additional FSA will always remain in initial state ๏ Similar to Laxman et al. (2005)

61

slide-62
SLIDE 62

Ulf Brefeld Knowledge Mining & Assessment Group

Pruned Trajectories

๏ Effectiveness of DBH depends only on data ๏ Approximations effective for constant N

62 Kim Keough LSH total 1000 0% 0% 11,42% 11,42% 5000 0,28% 34% 16,33% 50,61% 10000 9,79% 41,51% 17,8% 60,1% 15000 17,5% 46,25% 11,82% 75,57%

  • nof. trajetories
slide-63
SLIDE 63

Ulf Brefeld Knowledge Mining & Assessment Group

Similarity Threshold

๏ Number of generated/frequent episodes depends

highly on similarity threshold

63