Agni: An Efficient Dual-access File System over Object Storage - - PowerPoint PPT Presentation

agni an efficient dual access file system over object
SMART_READER_LITE
LIVE PREVIEW

Agni: An Efficient Dual-access File System over Object Storage - - PowerPoint PPT Presentation

Agni: An Efficient Dual-access File System over Object Storage Kunal Lillaney, Vasily Tarasov, The image part with David Pease, Randal Burns relationship ID rId3 was not found in the file. 23 November, SoCC19 - Santa Cruz What is


slide-1
SLIDE 1

The image part with relationship ID rId3 was not found in the file.

23 November, SoCC‘19 - Santa Cruz

Kunal Lillaney, Vasily Tarasov, David Pease, Randal Burns

Agni: An Efficient Dual-access File System over Object Storage

slide-2
SLIDE 2

SoCC’19

What is Dual-Access?

32

slide-3
SLIDE 3

SoCC’19

Dual Access

33

Data

slide-4
SLIDE 4

SoCC’19

Dual Access

34

File Interface Application Data

slide-5
SLIDE 5

SoCC’19

Dual Access

35

File Interface Object Interface Applica-on Application Data

slide-6
SLIDE 6

SoCC’19

Dual Access

36

File Interface Object Interface Transparently Application Application Data

slide-7
SLIDE 7

SoCC’19

37

Use Cases

slide-8
SLIDE 8

SoCC’19

38

Use Cases Media

slide-9
SLIDE 9

SoCC’19

39

Life Science Use Cases Media

slide-10
SLIDE 10

SoCC’19

40

Life Science Geo-Informa-cs Use Cases Media

slide-11
SLIDE 11

SoCC’19

41

Life Science Neuroscience Geo-Informatics Use Cases Media

slide-12
SLIDE 12

SoCC’19

Media Transcoding, Editing, Analytics

42

slide-13
SLIDE 13

SoCC’19

Media Transcoding, Editing, Analytics

43

Object Interface

slide-14
SLIDE 14

SoCC’19

Media Transcoding, Editing, Analytics

44

Object Interface File Interface

slide-15
SLIDE 15

SoCC’19

Media Transcoding, EdiLng, AnalyLcs

45

Object Interface Object Interface File Interface

slide-16
SLIDE 16

SoCC’19

File Systems vs Object Storage

46

slide-17
SLIDE 17

SoCC’19

File Systems vs Object Storage

47

Partial Writes

slide-18
SLIDE 18

SoCC’19

File Systems vs Object Storage

49

Namespace Partial Writes

slide-19
SLIDE 19

SoCC’19

File Systems vs Object Storage

50

Namespace Partial Writes

slide-20
SLIDE 20

SoCC’19

File Systems vs Object Storage

51

Namespace Partial Writes Interfaces

slide-21
SLIDE 21

SoCC’19

File Systems vs Object Storage

52

Namespace Par-al Writes Interfaces

slide-22
SLIDE 22

SoCC’19

Outline

▸Design considera-ons ▸Exis-ng systems ▸Agni

https://www.greenbiz.com/sites/default/files/styles/gbz_article_primary_breakpoints_kalapictur e_screenmd_1x/public/images/articles/featured/datacenter_0.jpg?itok=iJm7ezgB&timestamp=14 83504030

slide-23
SLIDE 23

SoCC’19

Outline

▸Design considera-ons ▸ ▸

h\ps://www.greenbiz.com/sites/default/files/styles/gbz_ar-cle_primary_breakpoints_kalapicture _screenmd_1x/public/images/ar-cles/featured/datacenter_0.jpg?itok=iJm7ezgB&-mestamp=148 3504030

slide-24
SLIDE 24

SoCC’19

Object based Generic Efficient Interfaces Distributed Coherent Namespace

Design Considerations

55

slide-25
SLIDE 25

SoCC’19

Design Considerations

56

Generic Efficient Interfaces Coherent Namespace

slide-26
SLIDE 26

SoCC’19

Generic → Object Store AgnosLc

57

slide-27
SLIDE 27

SoCC’19

Generic → Object Store Agnostic

58

Dual Access System

slide-28
SLIDE 28

SoCC’19

Generic → Object Store AgnosLc

59

GET PUT DELETE MULTI-PART EVENTS

Dual Access System

slide-29
SLIDE 29

SoCC’19

Generic → Object Store Agnostic

60

GET PUT DELETE MULTI-PART EVENTS

Dual Access System

slide-30
SLIDE 30

SoCC’19

Generic → Object Store Agnostic

61

GET PUT DELETE MULTI-PART EVENTS

Dual Access System

slide-31
SLIDE 31

SoCC’19

Generic → Object Store Agnostic

62

GET PUT DELETE MULTI-PART EVENTS

Dual Access System

slide-32
SLIDE 32

SoCC’19

Generic → Object Store Agnostic

63

GET PUT DELETE MULTI-PART EVENTS

Dual Access System

slide-33
SLIDE 33

SoCC’19

Generic → Object Store Agnostic

64

GET PUT DELETE MULTI-PART EVENTS

Dual Access System

slide-34
SLIDE 34

SoCC’19

File Object

Dual Access: File to Object Mapping

65

slide-35
SLIDE 35

SoCC’19

File Object

Dual Access: File to Object Mapping

66

A A

1→1

slide-36
SLIDE 36

SoCC’19

File Object

Dual Access: File to Object Mapping

67

A A1 A2 A3 A A

1→1 1→N

slide-37
SLIDE 37

SoCC’19

File Object

Dual Access: File to Object Mapping

68

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

slide-38
SLIDE 38

SoCC’19

File Object

Dual Access: File to Object Mapping

69

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

slide-39
SLIDE 39

SoCC’19

File Object

Efficiency: File to Object Mapping

70

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

slide-40
SLIDE 40

SoCC’19

File Object

Efficiency: File to Object Mapping

71

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

A

slide-41
SLIDE 41

SoCC’19

File Object

Efficiency: File to Object Mapping

72

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

A A2

slide-42
SLIDE 42

SoCC’19

File Object

Efficiency: File to Object Mapping

73

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

A A2 A B C

slide-43
SLIDE 43

SoCC’19

File Object

Efficiency: File to Object Mapping

74

A A1 A2 A3 A A C A B A B C

1→1 1→N N→1

A A2 A B C

slide-44
SLIDE 44

SoCC’19

Namespace: Example of Incoherency

75

A/ C/ B D

slide-45
SLIDE 45

SoCC’19

Namespace: Example of Incoherency

76

A/ C/ B D A/ A/C/ A/C/D A/B

slide-46
SLIDE 46

SoCC’19

Namespace: Example of Incoherency

77

A/ C/ B D A/ A/C/ A/C/D A/B A/C

slide-47
SLIDE 47

SoCC’19

Outline

▸ ▸Existing systems ▸

https://www.greenbiz.com/sites/default/files/styles/gbz_article_primary_breakpoints_kalapictur e_screenmd_1x/public/images/articles/featured/datacenter_0.jpg?itok=iJm7ezgB&timestamp=14 83504030

slide-48
SLIDE 48

SoCC’19

File Systems Paired With Object Storage

79

Object Storage Object Interface

slide-49
SLIDE 49

SoCC’19

File Systems Paired With Object Storage

80

Object Storage File System Object Interface

slide-50
SLIDE 50

SoCC’19

File Systems Paired With Object Storage

81

Object Storage File Interface File System Object Interface

slide-51
SLIDE 51

SoCC’19

Object Storage File System

82

Object Storage

slide-52
SLIDE 52

SoCC’19

Object Storage File System

83

Object Storage File System Abstraction File Interface

slide-53
SLIDE 53

SoCC’19

Existing Systems

84

Object Storage File Systems

slide-54
SLIDE 54

SoCC’19

Existing Systems

85

Object Store Like Object Storage File Systems

S3FS, GoogleFuse

slide-55
SLIDE 55

SoCC’19

Existing Systems

86

Object Store Like Object Storage File Systems File System Like

S3FS, GoogleFuse CephFS, MarFS

slide-56
SLIDE 56

SoCC’19

Existing Systems

87

Object Store Like Object Storage File Systems File System Like Customized Hybrid

S3FS, GoogleFuse CephFS, MarFS ProxyFS, OpenIOFS

slide-57
SLIDE 57

SoCC’19

Outline

▸ ▸ ▸Agni ▸

slide-58
SLIDE 58

SoCC’19

Architecture

89

Agni

slide-59
SLIDE 59

SoCC’19

Architecture

90

Agni Object Storage

slide-60
SLIDE 60

SoCC’19

Architecture

91

Agni Object Storage Key-Value Storage

slide-61
SLIDE 61

SoCC’19

Architecture

92

Agni Object Storage Key-Value Storage Data

slide-62
SLIDE 62

SoCC’19

Architecture

93

Agni Object Storage Key-Value Storage Data Metadata

slide-63
SLIDE 63

SoCC’19

Architecture

94

Agni Object Storage Key-Value Storage Data Metadata

slide-64
SLIDE 64

SoCC’19

Our Design Choices

95

File Object

slide-65
SLIDE 65

SoCC’19

Our Design Choices

96

File Object

A A

1→1

A

What about inefficient writes to immutable objects ?

slide-66
SLIDE 66

SoCC’19

Multi-Tier Data Structure

97

slide-67
SLIDE 67

SoCC’19

Multi-Tier Data Structure

98

Agni File Interface

slide-68
SLIDE 68

SoCC’19

Multi-Tier Data Structure

99

Agni Cache Memory File Interface

slide-69
SLIDE 69

SoCC’19

Multi-Tier Data Structure

10

Agni Cache Log Memory File Interface

slide-70
SLIDE 70

SoCC’19

Multi-Tier Data Structure

10 1

Agni Cache Log Base Memory Object Storage File Interface

slide-71
SLIDE 71

SoCC’19

Multi-Tier Data Structure

10 2

Agni Cache Log Base Memory Object Storage File Interface

slide-72
SLIDE 72

SoCC’19

Multi-Tier Data Structure

10 3

Agni Cache Log Base Memory Object Storage File Interface Write

slide-73
SLIDE 73

SoCC’19

Multi-Tier Data Structure

10 4

Agni Cache Log Base Memory Object Storage File Interface Flush

slide-74
SLIDE 74

SoCC’19

Multi-Tier Data Structure

10 5

Agni Cache Log Base Memory Object Storage File Interface Merge

slide-75
SLIDE 75

SoCC’19

Multi-Tier Data Structure

10 6

Agni Cache Log Base Memory Object Storage File Interface Object Interface

slide-76
SLIDE 76

SoCC’19

Data Layout

10 7

File

slide-77
SLIDE 77

SoCC’19

Data Layout

10 8

File Logical File View A

slide-78
SLIDE 78

SoCC’19

Data Layout

10 9

File Logical File View

1 2 3 4

A Blocks

slide-79
SLIDE 79

SoCC’19

Data Layout

11

File Memory Cache Logical File View

1 2 3 4

A Blocks

slide-80
SLIDE 80

SoCC’19

Data Layout

11 1

File Memory Cache Logical File View

1 2 3 4 1 2 3 4

A Blocks Cache Blocks

slide-81
SLIDE 81

SoCC’19

Data Layout

11 2

File Memory Cache Log Logical File View

1 2 3 4 1 2 3 4

A Blocks Cache Blocks

slide-82
SLIDE 82

SoCC’19

Data Layout

11 3

File Memory Cache Log Logical File View

1 2 3 4 1 4 1 2 3 4

A Blocks Cache Blocks Log Objects A1

slide-83
SLIDE 83

SoCC’19

Data Layout

11 4

File Memory Cache Log Logical File View

1 2 3 4 1 4 1 3 4 1 2 3 4

A Blocks Cache Blocks Log Objects A1 A2

slide-84
SLIDE 84

SoCC’19

Data Layout

11 5

File Memory Object Storage Cache Log Base Logical File View

1 2 3 4 1 4 1 3 4 1 2 3 4

A Blocks Cache Blocks Log Objects A1 A2 Base Object

slide-85
SLIDE 85

SoCC’19

Data Layout

11 6

File Memory Object Storage Cache Log Base Logical File View

1 2 3 4 1 4 1 3 4 1 2 3 4 A

A Blocks Cache Blocks Log Objects A1 A2 Base Object

slide-86
SLIDE 86

SoCC’19

We Call This Approach Eventual 1→1 Mapping

117

File Object

slide-87
SLIDE 87

SoCC’19

We Call This Approach Eventual 1→1 Mapping

118

File Object

slide-88
SLIDE 88

SoCC’19

We Call This Approach Eventual 1→1 Mapping

119

File Object

slide-89
SLIDE 89

SoCC’19

We Call This Approach Eventual 1→1 Mapping

120

File Object

File to Object Visibility Lag

slide-90
SLIDE 90

SoCC’19

Eventual 1→1 Mapping Works Both Ways

121

Object File

slide-91
SLIDE 91

SoCC’19

Eventual 1→1 Mapping Works Both Ways

122

Object File

slide-92
SLIDE 92

SoCC’19

Eventual 1→1 Mapping Works Both Ways

123

Object File

slide-93
SLIDE 93

SoCC’19

Eventual 1→1 Mapping Works Both Ways

124

Object File

Object to File Visibility Lag

slide-94
SLIDE 94

SoCC’19

Design: Object to File

12 5

Agni Object Storage Key-Value Storage Data Metadata File Interface

slide-95
SLIDE 95

SoCC’19

Design: Object to File

12 6

Agni Object Storage Key-Value Storage Data Metadata File Interface Object Interface

slide-96
SLIDE 96

SoCC’19

Design: Object to File

12 7

Agni Object Storage Key-Value Storage Data Metadata Notification Processor File Interface Object Interface Visible

slide-97
SLIDE 97

SoCC’19

Namespace Management

12 8

slide-98
SLIDE 98

SoCC’19

Namespace Management

12 9

/ movie/ avatar

slide-99
SLIDE 99

SoCC’19

Namespace Management

13

Key Value Master index / movie/ avatar

slide-100
SLIDE 100

SoCC’19

Namespace Management

13 1

Key Value Master index / movie/ avatar 5 1 8

slide-101
SLIDE 101

SoCC’19

Namespace Management

13 2

Key Value Master index 5 inode metadata / movie/ avatar 5 1 Inode key 8

slide-102
SLIDE 102

SoCC’19

Namespace Management

13 3

Key Value Master index 5 inode metadata 1→movie 5 / movie/ avatar 5 1 Lookup key Inode key 8

slide-103
SLIDE 103

SoCC’19

Namespace Management

13 4

Key Value Master index 5 inode metadata 1→movie 5 5→Children [<8,avatar>] Directory metadata / movie/ avatar 5 1 Lookup key Inode key Children key 8

slide-104
SLIDE 104

SoCC’19

Namespace Management

13 5

Key Value Master index 5 inode metadata 1→movie 5 5→Children [<8,avatar>] 8 inode metadata Directory metadata / movie/ avatar 5 1 Lookup key Inode key Children key Inode key 8

slide-105
SLIDE 105

SoCC’19

Namespace Management

13 6

Key Value Master index 5 inode metadata 1→movie 5 5→Children [<8,avatar>] 8 inode metadata Directory metadata 5→avatar 8 File metadata / movie/ avatar 5 1 Lookup key Inode key Children key Inode key Lookup key 8

slide-106
SLIDE 106

SoCC’19

Fragment Map

13 7

Key Value Master index

slide-107
SLIDE 107

SoCC’19

Fragment Map

13 8

Key Value Master index 8 inode metadata 5→avatar 8 File metadata

slide-108
SLIDE 108

SoCC’19

Fragment Map

13 9

Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … File metadata Block pointers

slide-109
SLIDE 109

SoCC’19

Fragment Map

14

Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map

slide-110
SLIDE 110

SoCC’19

Fragment Map

14 1

Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T0 Base entry

slide-111
SLIDE 111

SoCC’19

Fragment Map

14 2

Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T0 Base entry T1 Log entry #1 T2 Log entry #2 … … … …

slide-112
SLIDE 112

SoCC’19

Fragment Map

14 3

Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T2 Cache entry T0 Base entry T1 Log entry #1 T2 Log entry #2 … … … …

slide-113
SLIDE 113

SoCC’19

Fragment Map

14 4

Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T2 Cache entry T0 Base entry T1 Log entry #1 T2 Log entry #2 … … … …

Temporally indexed

slide-114
SLIDE 114

SoCC’19

▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bowtie: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled

1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge

14 5

Evaluation: Applications

🙃 ☹

slide-115
SLIDE 115

SoCC’19

▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bowtie: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled

1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge

14 6

60% faster

Evaluation: Applications

🙃 ☹

slide-116
SLIDE 116

SoCC’19

▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bowtie: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled

1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge

14 7

EvaluaLon: ApplicaLons

40% faster

🙃 ☹

slide-117
SLIDE 117

SoCC’19

▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bow,e: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled

1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge

14 8

EvaluaLon: ApplicaLons

🙃 ☹

40% faster

slide-118
SLIDE 118

SoCC’19

▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bow,e: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled

1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge

14 9

EvaluaLon: ApplicaLons

20% faster

🙃 ☹

slide-119
SLIDE 119

SoCC’19

Summary

151

▸Complete dual access with all desired features ▸Cloud Neutral ▸Outperforms exis-ng dual-access systems ▸Adding unified access control on roadmap

slide-120
SLIDE 120

SoCC’19

Thank you!

155

https://github.com/objectfs/objectfs Contact: lillaney@jhu.edu