Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM - - PowerPoint PPT Presentation

trip report
SMART_READER_LITE
LIVE PREVIEW

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM - - PowerPoint PPT Presentation

Trip Report FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM ALGORITHM ENGINEERING DFG PP 1307: Algorithm Engineering DFG Priority Program: nationwide funding program over 6 years for up to 30 individual projects PP 1307: Algorithm


slide-1
SLIDE 1

Trip Report

FINAL MEETING AND SUMMER SCHOOL OF DFG PRIORITY PROGRAM

ALGORITHM ENGINEERING

slide-2
SLIDE 2

DFG PP 1307: Algorithm Engineering

PP 1307: Algorithm Engineering

  • 28 research projects
  • 267 publications
  • 17 software projects, e.g.:
  • Multi-Core STL (MCSTL) – now gcc parallel mode
  • STL for Extra Large Datasets (STXXL)

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

2

DFG Priority Program: nationwide funding program over 6 years for up to 30 individual projects

slide-3
SLIDE 3

Recap: Algorithm Engineering

1.

realistic models hardware and problem

2.

design efficient, implementable algorithms

3.

analyze beyond worst-case

4.

implement with hardware peculiarities in mind

5.

experiment repeatable, thorough interpretation

“The distance between theory and practice is closer in theory than in practice”

[Y. Matias (Google) in his invited talk at ESA ‘12]

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

3

slide-4
SLIDE 4

Final Meeting (17.09.2014)

9 talks, covering wide range of topics

  • route planning in road and public transport networks
  • graph clustering and partitioning
  • data compression
  • linear and mixed integer optimization
  • sequence analysis

no Indico used, slides only partially available

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

4

slide-5
SLIDE 5

Summer School (18.-19.09.2014)

Two days of lectures and hands-on sessions

  • data compression (lecture only)
  • linear and mixed integer optimization
  • network analysis - graph clustering and partitioning
  • shortest paths algorithms (lecture only)

about 30 PhD students lots of discussion among students and lecturers

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

5

slide-6
SLIDE 6

Selected Topics

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

6

slide-7
SLIDE 7

Network Analysis

Networks are everywhere

  • Computer networks
  • Social networks

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

7

slide-8
SLIDE 8

Network Analysis

Network analysis mainly concerned with complex networks

  • Small diameter
  • Varying degree distribution
  • Lots of triangles

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

8

slide-9
SLIDE 9

Network Analysis

GRAPH CLUSTERING

  • Find (non-overlapping) internally dense,

externally sparse subgraphs

  • Unknown: Number of subgraphs, their size
  • Goals / Applications:

GRAPH PARTITIONING

  • Partition vertex set into k (nearly) equally sized

blocks

  • Objective functions aim at small interfaces
  • Applications:
  • Numerical simulations
  • route planning
  • distributed graph algorithms
  • Uncover community structure

(analysis, ...)

  • Prepartition network

(distributed storage, ...)

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

9

slide-10
SLIDE 10

Network Analysis

GRAPH CLUSTERING Algorithms:

  • Label propagation algorithm
  • Louvain greedy method

Many different metrics:

  • Conductance
  • Expansion
  • Modularity

GRAPH PARTITIONING Algorithms:

  • Size-constrained label propagation
  • Diffusion-based partitioning

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

10

slide-11
SLIDE 11

Network Analysis

NetworKit:

  • Toolkit developed during the project for network analysis – C++ with Python bindings
  • Includes wide range of tools for graph analysis
  • Excellent IPython notebook-based tutorial
  • Includes algorithms proposed for evolving networks
  • Analyze changing social networks – e.g. ITI email graph

Interest for CERN:

  • Community detection on the grid planning of file transfers
  • Track reconstruction  ongoing work

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

11

slide-12
SLIDE 12

Shortest Paths and Routing

Problem: find shortest path between s and t in weighted graph G Algorithms:

  • Dijkstra’s algorithm too slow for large graphs
  • Manifold speedup techniques [survey]
  • A∗: search with Euclidean bounds (classic)
  • ALT: A∗ search with landmarks, preprocessing computes distances to landmarks
  • Contraction Hierarchies: introduce shortcuts between “important” vertices of the graph
  • Hub Labeling: every vertex stores distance to several hubs, covering the graph
  • Most techniques rely on (more or less) expensive pre-computations

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

12

slide-13
SLIDE 13

Shortest Paths and Routing

Problem: User-defined cost functions render pre-computations futile Solution: Three-stage processing [Delling et al. 2013]

  • 1. Metric-independent pre-processing

Recursively partition graph Generate arcs between entry and exit nodes to neighboring partitions

  • 2. Metric-dependent pre-processing

Compute metric between all shortcut arcs

  • 3. Query

Find shortest-path in contracted graph and unpack it in original one

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

13

≈ hr ≈ s ≈ μs

slide-14
SLIDE 14

Shortest Paths and Routing

Routing in public transport networks is a much harder problem

  • Inherent time-dependence
  • Solved using (potentially huge!) event-activity networks

Interest for CERN:

  • Grid tiers already define contraction hierarchy

 examine actual data flows for missing/misplaced hubs

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

14

slide-15
SLIDE 15

Data Compression

Requirements:

  • Compressed space
  • Decompression time
  • Compression time is not much an issue

Compressor

  • n dataset

MINGW (1gb) Compressed space (MB) Decompressi

  • n time

(secs) Gzip 344 5.5 Lzma 188 8.3 Snappy 461 0.9

Trade-off

“Snappy is widely used inside Google, in everything from BigTable and MapReduce …”

Problem: compress once, decompress many times

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

15

slide-16
SLIDE 16

Data Compression

Reminder: Lempel-Ziv compression

a a c a a c a b c a a d a a a

<6,3>

a c

<0,d>

This part has been already compressed

<3,2> <11,3> Greedy approach only optimal if every pair takes constant space

  • but variable number of bits required for distances  non-optimal

Bit-optimal LZ parsing [Ferragina et al. 2013]

  • Solve shortest path problem on DAG describing possible compression pairs

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

16

slide-17
SLIDE 17

Data Compression

Bi-criteria Compression [Farruggia et al. 2014]:

  • Space and decompression time edge weight in DAQ
  • Fix space constraint, search for lowest decompression time and vice versa

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

17

slide-18
SLIDE 18

Data Compression

Different approach to compression: Burrows-Wheeler Transform [introduction]

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

18

slide-19
SLIDE 19

Data Compression

Different approach to compression: Burrows-Wheeler Transform

  • Yields smaller compression size but longer decompression time
  • Construction of BWT closely related to suffix-array construction
  • Allows decompression of any substring

FM index [Ferragina and Manzini 2000]

  • Used BWT and auxiliary data structures to answer count and locate queries on compressed text

Interest for CERN:

  • Compression of ROOT files + access of individual entries
  • Compression of and search in dictionaries

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

19

slide-20
SLIDE 20

Miscellaneous

Linear programming

  • Disprove of Hirsch conjecture poses thread to simplex method

 still well in practice

  • Anecdote: interior point method patented by AT&T

 circumvent patent by polar transformation of problem and usage of barrier method

SeqAn

  • Package for analysis of (genome) sequences
  • Developers face similar problems as HEP:

Bridge gap between computer science and real world problems

External memory algorithms

  • Flow computations for massive LiDAR terrain data sets
  • General trick of time forward processing to reduce I/O

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

20

slide-21
SLIDE 21

Conclusions

  • Final meeting gave good overview of broad activity in DFG PP 1307 “Algorithm Engineering”
  • Summer school expanded on four focus topics of the PP
  • Similar research continues in DFG PP DFG 1736 “Algorithms for Big Data”
  • Funding period 2013-2019
  • Currently 16 projects covering graph analysis, energy efficient scheduling, search and text indexing, genome assembly,…
  • Most projects concerned with computer science problems
  • Computational biology problems present in both PPs

2014-10-27 TRIP REPORT: ALGORITHM ENGINEERING

21

HEP community needs to explore how to exploit this resource of expertise and funding