The image part with relationship ID rId3 was not found in the file.
Agni: An Efficient Dual-access File System over Object Storage - - PowerPoint PPT Presentation
Agni: An Efficient Dual-access File System over Object Storage - - PowerPoint PPT Presentation
Agni: An Efficient Dual-access File System over Object Storage Kunal Lillaney, Vasily Tarasov, The image part with David Pease, Randal Burns relationship ID rId3 was not found in the file. 23 November, SoCC19 - Santa Cruz What is
SoCC’19
What is Dual-Access?
32
SoCC’19
Dual Access
33
Data
SoCC’19
Dual Access
34
File Interface Application Data
SoCC’19
Dual Access
35
File Interface Object Interface Applica-on Application Data
SoCC’19
Dual Access
36
File Interface Object Interface Transparently Application Application Data
SoCC’19
37
Use Cases
SoCC’19
38
Use Cases Media
SoCC’19
39
Life Science Use Cases Media
SoCC’19
40
Life Science Geo-Informa-cs Use Cases Media
SoCC’19
41
Life Science Neuroscience Geo-Informatics Use Cases Media
SoCC’19
Media Transcoding, Editing, Analytics
42
SoCC’19
Media Transcoding, Editing, Analytics
43
Object Interface
SoCC’19
Media Transcoding, Editing, Analytics
44
Object Interface File Interface
SoCC’19
Media Transcoding, EdiLng, AnalyLcs
45
Object Interface Object Interface File Interface
SoCC’19
File Systems vs Object Storage
46
SoCC’19
File Systems vs Object Storage
47
Partial Writes
SoCC’19
File Systems vs Object Storage
49
Namespace Partial Writes
SoCC’19
File Systems vs Object Storage
50
Namespace Partial Writes
❌
SoCC’19
File Systems vs Object Storage
51
Namespace Partial Writes Interfaces
✅
SoCC’19
File Systems vs Object Storage
52
Namespace Par-al Writes Interfaces
❌
SoCC’19
Outline
▸Design considera-ons ▸Exis-ng systems ▸Agni
https://www.greenbiz.com/sites/default/files/styles/gbz_article_primary_breakpoints_kalapictur e_screenmd_1x/public/images/articles/featured/datacenter_0.jpg?itok=iJm7ezgB×tamp=14 83504030
SoCC’19
Outline
▸Design considera-ons ▸ ▸
h\ps://www.greenbiz.com/sites/default/files/styles/gbz_ar-cle_primary_breakpoints_kalapicture _screenmd_1x/public/images/ar-cles/featured/datacenter_0.jpg?itok=iJm7ezgB&-mestamp=148 3504030
SoCC’19
Object based Generic Efficient Interfaces Distributed Coherent Namespace
Design Considerations
55
SoCC’19
Design Considerations
56
Generic Efficient Interfaces Coherent Namespace
SoCC’19
Generic → Object Store AgnosLc
57
SoCC’19
Generic → Object Store Agnostic
58
Dual Access System
SoCC’19
Generic → Object Store AgnosLc
59
GET PUT DELETE MULTI-PART EVENTS
Dual Access System
SoCC’19
Generic → Object Store Agnostic
60
GET PUT DELETE MULTI-PART EVENTS
Dual Access System
SoCC’19
Generic → Object Store Agnostic
61
GET PUT DELETE MULTI-PART EVENTS
Dual Access System
SoCC’19
Generic → Object Store Agnostic
62
GET PUT DELETE MULTI-PART EVENTS
Dual Access System
SoCC’19
Generic → Object Store Agnostic
63
GET PUT DELETE MULTI-PART EVENTS
Dual Access System
SoCC’19
Generic → Object Store Agnostic
64
GET PUT DELETE MULTI-PART EVENTS
Dual Access System
SoCC’19
File Object
Dual Access: File to Object Mapping
65
SoCC’19
File Object
Dual Access: File to Object Mapping
66
A A
1→1
SoCC’19
File Object
Dual Access: File to Object Mapping
67
A A1 A2 A3 A A
1→1 1→N
SoCC’19
File Object
Dual Access: File to Object Mapping
68
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
SoCC’19
File Object
Dual Access: File to Object Mapping
69
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
SoCC’19
File Object
Efficiency: File to Object Mapping
70
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
SoCC’19
File Object
Efficiency: File to Object Mapping
71
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
A
SoCC’19
File Object
Efficiency: File to Object Mapping
72
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
A A2
SoCC’19
File Object
Efficiency: File to Object Mapping
73
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
A A2 A B C
SoCC’19
File Object
Efficiency: File to Object Mapping
74
A A1 A2 A3 A A C A B A B C
1→1 1→N N→1
A A2 A B C
SoCC’19
Namespace: Example of Incoherency
75
A/ C/ B D
SoCC’19
Namespace: Example of Incoherency
76
A/ C/ B D A/ A/C/ A/C/D A/B
SoCC’19
Namespace: Example of Incoherency
77
A/ C/ B D A/ A/C/ A/C/D A/B A/C
SoCC’19
Outline
▸ ▸Existing systems ▸
https://www.greenbiz.com/sites/default/files/styles/gbz_article_primary_breakpoints_kalapictur e_screenmd_1x/public/images/articles/featured/datacenter_0.jpg?itok=iJm7ezgB×tamp=14 83504030
SoCC’19
File Systems Paired With Object Storage
79
Object Storage Object Interface
SoCC’19
File Systems Paired With Object Storage
80
Object Storage File System Object Interface
SoCC’19
File Systems Paired With Object Storage
81
Object Storage File Interface File System Object Interface
SoCC’19
Object Storage File System
82
Object Storage
SoCC’19
Object Storage File System
83
Object Storage File System Abstraction File Interface
SoCC’19
Existing Systems
84
Object Storage File Systems
SoCC’19
Existing Systems
85
Object Store Like Object Storage File Systems
S3FS, GoogleFuse
SoCC’19
Existing Systems
86
Object Store Like Object Storage File Systems File System Like
S3FS, GoogleFuse CephFS, MarFS
SoCC’19
Existing Systems
87
Object Store Like Object Storage File Systems File System Like Customized Hybrid
S3FS, GoogleFuse CephFS, MarFS ProxyFS, OpenIOFS
SoCC’19
Outline
▸ ▸ ▸Agni ▸
SoCC’19
Architecture
89
Agni
SoCC’19
Architecture
90
Agni Object Storage
SoCC’19
Architecture
91
Agni Object Storage Key-Value Storage
SoCC’19
Architecture
92
Agni Object Storage Key-Value Storage Data
SoCC’19
Architecture
93
Agni Object Storage Key-Value Storage Data Metadata
SoCC’19
Architecture
94
Agni Object Storage Key-Value Storage Data Metadata
SoCC’19
Our Design Choices
95
File Object
SoCC’19
Our Design Choices
96
File Object
A A
1→1
A
What about inefficient writes to immutable objects ?
SoCC’19
Multi-Tier Data Structure
97
SoCC’19
Multi-Tier Data Structure
98
Agni File Interface
SoCC’19
Multi-Tier Data Structure
99
Agni Cache Memory File Interface
SoCC’19
Multi-Tier Data Structure
10
Agni Cache Log Memory File Interface
SoCC’19
Multi-Tier Data Structure
10 1
Agni Cache Log Base Memory Object Storage File Interface
SoCC’19
Multi-Tier Data Structure
10 2
Agni Cache Log Base Memory Object Storage File Interface
SoCC’19
Multi-Tier Data Structure
10 3
Agni Cache Log Base Memory Object Storage File Interface Write
SoCC’19
Multi-Tier Data Structure
10 4
Agni Cache Log Base Memory Object Storage File Interface Flush
SoCC’19
Multi-Tier Data Structure
10 5
Agni Cache Log Base Memory Object Storage File Interface Merge
SoCC’19
Multi-Tier Data Structure
10 6
Agni Cache Log Base Memory Object Storage File Interface Object Interface
SoCC’19
Data Layout
10 7
File
SoCC’19
Data Layout
10 8
File Logical File View A
SoCC’19
Data Layout
10 9
File Logical File View
1 2 3 4
A Blocks
SoCC’19
Data Layout
11
File Memory Cache Logical File View
1 2 3 4
A Blocks
SoCC’19
Data Layout
11 1
File Memory Cache Logical File View
1 2 3 4 1 2 3 4
A Blocks Cache Blocks
SoCC’19
Data Layout
11 2
File Memory Cache Log Logical File View
1 2 3 4 1 2 3 4
A Blocks Cache Blocks
SoCC’19
Data Layout
11 3
File Memory Cache Log Logical File View
1 2 3 4 1 4 1 2 3 4
A Blocks Cache Blocks Log Objects A1
SoCC’19
Data Layout
11 4
File Memory Cache Log Logical File View
1 2 3 4 1 4 1 3 4 1 2 3 4
A Blocks Cache Blocks Log Objects A1 A2
SoCC’19
Data Layout
11 5
File Memory Object Storage Cache Log Base Logical File View
1 2 3 4 1 4 1 3 4 1 2 3 4
A Blocks Cache Blocks Log Objects A1 A2 Base Object
SoCC’19
Data Layout
11 6
File Memory Object Storage Cache Log Base Logical File View
1 2 3 4 1 4 1 3 4 1 2 3 4 A
A Blocks Cache Blocks Log Objects A1 A2 Base Object
SoCC’19
We Call This Approach Eventual 1→1 Mapping
117
File Object
SoCC’19
We Call This Approach Eventual 1→1 Mapping
118
File Object
SoCC’19
We Call This Approach Eventual 1→1 Mapping
119
File Object
SoCC’19
We Call This Approach Eventual 1→1 Mapping
120
File Object
File to Object Visibility Lag
SoCC’19
Eventual 1→1 Mapping Works Both Ways
121
Object File
SoCC’19
Eventual 1→1 Mapping Works Both Ways
122
Object File
SoCC’19
Eventual 1→1 Mapping Works Both Ways
123
Object File
SoCC’19
Eventual 1→1 Mapping Works Both Ways
124
Object File
Object to File Visibility Lag
SoCC’19
Design: Object to File
12 5
Agni Object Storage Key-Value Storage Data Metadata File Interface
SoCC’19
Design: Object to File
12 6
Agni Object Storage Key-Value Storage Data Metadata File Interface Object Interface
SoCC’19
Design: Object to File
12 7
Agni Object Storage Key-Value Storage Data Metadata Notification Processor File Interface Object Interface Visible
SoCC’19
Namespace Management
12 8
SoCC’19
Namespace Management
12 9
/ movie/ avatar
SoCC’19
Namespace Management
13
Key Value Master index / movie/ avatar
SoCC’19
Namespace Management
13 1
Key Value Master index / movie/ avatar 5 1 8
SoCC’19
Namespace Management
13 2
Key Value Master index 5 inode metadata / movie/ avatar 5 1 Inode key 8
SoCC’19
Namespace Management
13 3
Key Value Master index 5 inode metadata 1→movie 5 / movie/ avatar 5 1 Lookup key Inode key 8
SoCC’19
Namespace Management
13 4
Key Value Master index 5 inode metadata 1→movie 5 5→Children [<8,avatar>] Directory metadata / movie/ avatar 5 1 Lookup key Inode key Children key 8
SoCC’19
Namespace Management
13 5
Key Value Master index 5 inode metadata 1→movie 5 5→Children [<8,avatar>] 8 inode metadata Directory metadata / movie/ avatar 5 1 Lookup key Inode key Children key Inode key 8
SoCC’19
Namespace Management
13 6
Key Value Master index 5 inode metadata 1→movie 5 5→Children [<8,avatar>] 8 inode metadata Directory metadata 5→avatar 8 File metadata / movie/ avatar 5 1 Lookup key Inode key Children key Inode key Lookup key 8
SoCC’19
Fragment Map
13 7
Key Value Master index
SoCC’19
Fragment Map
13 8
Key Value Master index 8 inode metadata 5→avatar 8 File metadata
SoCC’19
Fragment Map
13 9
Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … File metadata Block pointers
SoCC’19
Fragment Map
14
Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map
SoCC’19
Fragment Map
14 1
Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T0 Base entry
SoCC’19
Fragment Map
14 2
Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T0 Base entry T1 Log entry #1 T2 Log entry #2 … … … …
SoCC’19
Fragment Map
14 3
Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T2 Cache entry T0 Base entry T1 Log entry #1 T2 Log entry #2 … … … …
SoCC’19
Fragment Map
14 4
Key Value Master index 8 inode metadata 5→avatar 8 <8, #1> Fragment map <8, #2> Fragment map <8, #3> Fragment map <8, #N> Fragment map … … Fragment map T2 Cache entry T0 Base entry T1 Log entry #1 T2 Log entry #2 … … … …
Temporally indexed
SoCC’19
▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bowtie: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled
1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge
14 5
Evaluation: Applications
🙃 ☹
SoCC’19
▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bowtie: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled
1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge
14 6
60% faster
Evaluation: Applications
🙃 ☹
SoCC’19
▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bowtie: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled
1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge
14 7
EvaluaLon: ApplicaLons
40% faster
🙃 ☹
SoCC’19
▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bow,e: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled
1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge
14 8
EvaluaLon: ApplicaLons
🙃 ☹
40% faster
SoCC’19
▸ ffmpeg: 320 GB of MPEG to MOV files ▸ bow,e: 80 GB genome files ▸ Agni+Merge denotes when dual access is enabled
1 5 10 50 100 150 200 Run time (minutes) ffmpeg 1 5 10 50 100 bowtie Number of nodes Manual S3FS S3QL Agni Agni+Merge
14 9
EvaluaLon: ApplicaLons
20% faster
🙃 ☹
SoCC’19
Summary
151
▸Complete dual access with all desired features ▸Cloud Neutral ▸Outperforms exis-ng dual-access systems ▸Adding unified access control on roadmap
SoCC’19
Thank you!
155