Introduction External memory algorithms for well known problems A - PowerPoint PPT Presentation

Introduction • External memory algorithms for well known problems • A basic breadth first search algorithm • A more advanced breadth first search algorithm • A connected components algorithm . – p.1/26

Assumptions • There is a main memory of size M • The block size is B • One I/O operations moves 1 block of data • Graphs are stored in adjacency list format . – p.2/26

Tools • The main tools used in the algorithms are sorting and scanning • We denote with sort ( x ) the number of I/Os needed for sorting x elements. sort ( x ) = x/B ∗ log M/b ( x/B ) • We denote with scan ( x ) the number of I/O’s needed for reading or writing x consecutive elements. scan ( x ) = x/B . – p.3/26

BFS s 3 1 2 • We want to partition the nodes of a graph into levels L ( i ) such that nodes in Level i have distance i from the source node s . L (0) = { s } . – p.4/26

BFS algorithm in internal memory • Keep the nodes to be visited in a queue. Whenever we extract a node, mark it as visited and insert all not marked neighbors to the queue . – p.5/26

MR BFS a a b b b c c −L(t−2) −L(t−1) c d d d e e e e f f f f g g g g • Let L ( t ) be the set of nodes in level t . L (0) = { s } • We denote with A ( t ) the multi-set of neighbors of L ( t − 1) . – p.6/26

MR BFS(cont.) • To build A ( t ) we access for all nodes in L ( t − 1) their corresponding adjacency lists • We need one I/O per node for getting the pointer to the respective adjacency list • Then we have to read the adjacency lists and write the indices contained there to A ( t ) . This can be done in time O ( | N ( L ( t − 1)) | /B ) (Scanning) • Thus the number of I/Os needed for this step is O ( | L ( t − 1) | + | N ( L ( t − 1)) | /B ) . – p.7/26

MR BFS(cont.) • Next we remove duplicates in A ( t ) • This can be done by sorting A ( t ) , followed by a scanning phase • This second step is dominated by the number of I/Os needed for sorting A ( t ) so the number of I/Os required is O ( sort ( | A ( t ) | )) . – p.8/26

MR BFS(cont.) • In the third step we remove all nodes from A ( t ) already occurring in L ( t − 1) or L ( t − 2) • This can be done by scanning L ( t − 1) and L ( t − 2) which costs O ( | L ( t − 1) + L ( t − 2) | /B ) I/Os • So building L ( t ) costs all in all O ( | L ( t − 1) + L ( t − 2) | /B + sort ( | N ( L ( t − 1)) | ) + | L ( t − 1) | ) . – p.9/26

MR BFS(cont.) • Since � | N ( L ( t )) | = O ( | E | ) and � | L ( t ) | = O ( | V | ) the number of I/O’s needed for building all L ( t ) is O ( sort ( | E + V | ) + | V | ) • L ( t − 1) has no neighbors in levels below L ( t − 2) otherwise there would be a node in L ( t − 1) having distance less than t − 1 from s . – p.10/26

Fast BFS • We split the graph into subgraphs S i each having diameter (maximal short distance between any two nodes) 2 /c • First we find all nodes in G being in the same component as s • This can be done with a deterministic connected-components algorithm with O ((1 + loglog ( B | V | / | E | ))) sort ( | V | + | E | )) = � O ( | V | scan ( | V | + | E | )+ sort ( | V | + | E | )) I/O’s . – p.11/26

Fast BFS (cont.) 0 4 6 04247406181513160 1 7 2 3 8 5 • With the same number of I/Os we can compute a minimum spanning tree T s for C s . – p.12/26

Fast BFS(cont.) • It is possible to construct an Euler tour around T s and break it into pieces of size 2 /c . This needs a constant number of sorts and scans • Every such piece is a subgraph of G with diameter 2 /c − 1 • A node of degree d may occur in at most d different subgraphs • With a constant number of sorting steps we can make sure that each node in C s is part of exactly one S i . – p.13/26

Fast BFS(cont.) • For every subgraph S i we create a file F i containing all adjacency lists of nodes in S i • This takes O ( sort ( | V | + | E | )) I/Os . – p.14/26

The BFS phase • Similar to MR_BFS • The main difference is that we use an external file H containing the adjacency lists of all nodes in the current level • H is initialized with F 0 • In FAST_BFS we don’t access every adjacency list as in MR_BFS instead we scan L ( t − 1) and H to extract N ( L ( t − 1)) . – p.15/26

The BFS phase (cont.) • Whenever we write a node to N ( L ( t − 1)) whose F i is not in H we merge F i with H • Each adjacency list is part of H for at most 2 /c BFS levels: If a F i is merged with H for BFS level L ( t ) then the BFS level of any node in S i is at most L ( t ) + 2 /c − 1 . – p.16/26

The BFS phase(cont.) • The total number of I/Os to handle H and the F i is then O ( c | V | + sort ( | V | + | E | ) + 1 /c ∗ scan ( | V | + | E | ) • Setting c = min { 1 , � scan ( | V | + | E | ) /V we � get O | V | scan ( | V | + | E | ) + sort ( | V | + | E | )) I/Os . – p.17/26

Connected components • We want to label each node with the index of the component it belongs to • This can be done with a BFS algorithm • Start a BFS from a node s . If for some t L ( t − 1) is empty the nodes in L (0) ∪ L (1) ∪ ... ∪ L ( t − 3) ∪ L ( t − 2) build a component • Label these nodes and start from an unvisited node a new BFS . – p.18/26

Connected components (cont.) • Labelling and finding an unvisited node can be done with O ( | V | ) I/Os • Thus the complexity is still O ( | V | + sort ( | V | + | E | )) • If | V | ≤ | V | B the number of I/Os improves to O ( sort ( | V | + | E | )) . – p.19/26

Node reduction 3 7 3 7 8 5 8 5 6 6 2 2 9 9 1 1 4 4 • Select for each node u the smallest neighbor v . – p.20/26

Node reduction(cont.) • This can be done by sorting two copies of the edges, one by source node and one by target node and then by scanning both copies simultaneously to find the smallest neighbor of each node • This partitions the nodes into cycles and cycles with trees converging into them . – p.21/26

Node reduction (cont.) 3 7 8 5 6 2 1 9 1 4 4 • Each such cycle has one edge having source id lower than target id • Remove this edge and choose the source as leader, this step leads to a forest . – p.22/26

Node reduction(cont.) • This can be done by a scan through the edges • Replace each edge ( u, v ) ∈ E by an edge ( R ( u ) , R ( v ) where R ( v ) is the leader of the “cycle” v belongs to • This can be done with O ( sort ( | E | )) I/Os . – p.23/26

Node reduction (cont.) • Finally we remove all isolated nodes, parallel edges and self loops • This requires a constant number of sorts and scans of the edges • The total number of I/Os needed for one step is then O ( sort ( | E | )) . – p.24/26

Node reduction (cont.) • Since each iteration at least halves the number of nodes after at most log 2 ( | V | B/ | E | ) we have V ≤ | E | /B • So we need in total O ( sort ( | E | ) log 2 ( | V | B/ | E | )) I/Os • After the node reduction we apply BFS to the contracted graph until each node of the contracted graph has a label . – p.25/26

Node reduction(cont.) • This can be done by sorting the list of nodes by the id of the supernode that the nodes was contracted to and the list of component labels by supernode id and then scanning both lists simultaneously. • Since the number of I/Os is dominated by sorting this costs and we have log 2 ( | V | B/ | E | ) phases the number of I/Os needed is O ( log 2 ( | V | B/ | E | ) sort ( | V | )) . – p.26/26

Introduction External memory algorithms for well known problems A - PowerPoint PPT Presentation

Introduction External memory algorithms for well known problems A basic breadth first search algorithm A more advanced breadth first search algorithm A connected components algorithm . p.1/26 Assumptions There is a main

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Search Engine Architecture 6. Link Analysis This work is licensed under a Creative Commons

The Implementation of T elemaintenance A Study on Change Management with respect to the Naval

AKTINA* a new productive urban furniture a solar & sustainable mobility project by CITY

Strategic Management of Knowledge in Big Science Agust Canals KIMO Research Group Universitat

1 Introduction 1.1 Problem Definition Let G = ( V, E ) be undirected graph with n vertices, and

Output Spaces Darryl Buller, Aaron Kaufer Information Assurance Directorate National Security

chameleon-db Presented by Alu Joint work with

Windowed All- k NN Search over Multidimensional Array Data from Medical Imaging GTC 2016 San

Introduction External memory algorithms for well known problems A - PowerPoint PPT Presentation

Introduction External memory algorithms for well known problems A basic breadth first search algorithm A more advanced breadth first search algorithm A connected components algorithm . p.1/26 Assumptions There is a main

INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION INTRODUCTION

Introduction ATV Introduction A T V Introduction A lphabet T V Introduction A lphabet

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Brief Brief Introduction Introduction Brief Brief Introduction Introduction Zhengzhou

Shenzhen Cuilu jewelry Co., Ltd was founded in 1996 and its a large private enterprise

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Spectrum Painting Richard Shipman MW0RCZ ADARS 6th Jan 2020 Introduction Introduction

Introduction Introduction Introduction Introduction Outline Motivation Failures

Introduction Introduction Introduction Nationwide Cause for Concern 1

Team Introduction Experiments Outreach Problem Project Brainstorm Introduction Introduction

Lecture 1 Andreas Habegger Introduction Zynq Introduction Zynq Introduction Zynq PS vs. PL

Introduction to Web Design &amp; Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

Introduction to CICS Course introduction Course introduction What is CICS? What is an

INF5110 Compiler Construction Introduction Spring 2016 1 / 33 Outline 1. Introduction

INTRODUCTION I Syllabus INTRODUCTION I Syllabus I Why study labor economics? INTRODUCTION I

2018.06 01 SMILE5 Introduction S E 5 02 Alpha Cloud M I L 03 Company Introduction 04

Search Engine Architecture 6. Link Analysis This work is licensed under a Creative Commons

The Implementation of T elemaintenance A Study on Change Management with respect to the Naval

AKTINA* a new productive urban furniture a solar &amp; sustainable mobility project by CITY

Strategic Management of Knowledge in Big Science Agust Canals KIMO Research Group Universitat

1 Introduction 1.1 Problem Definition Let G = ( V, E ) be undirected graph with n vertices, and

Output Spaces Darryl Buller, Aaron Kaufer Information Assurance Directorate National Security

chameleon-db Presented by Alu Joint work with

Windowed All- k NN Search over Multidimensional Array Data from Medical Imaging GTC 2016 San

Introduction to Web Design & Computer Principles Class 1 CSCI-UA 4 Introduction and Overview

AKTINA* a new productive urban furniture a solar & sustainable mobility project by CITY