Supercomputing 2009
SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness for Paradigm with Semantic-Awareness for Next-Generation File Systems
Yu Hua Hong Jiang Yifeng Zhu Dan Feng Lei Tian
1
SmartStore: A New Metadata Organization Paradigm with - - PowerPoint PPT Presentation
Supercomputing 2009 SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness for Paradigm with Semantic-Awareness for Next-Generation File Systems Yu Hua Hong Jiang Yifeng Zhu Dan Feng Lei Tian 1 Outline Outline
Supercomputing 2009
Yu Hua Hong Jiang Yifeng Zhu Dan Feng Lei Tian
1
Motivations SmartStore System Key Issues Performance Evaluation Discussion and Conclusion
2
Storage capacity → Exabyte (or even larger) Storage capacity → Exabyte (or even larger) Amounts of Files → Billions Metadata-based transactions → over 50% Hierarchical directory tree → Performance Bottleneck
Static and inflexible I/O interfaces Linearly brute-force searching L
Lack of full utilization of semantics
3
Millions of files under each directory
4
Quickly return queried results with acceptable tradeoff
Obtain interested knowledge from data ocean to guide
Query for high-dimensional data
Scalability Reliability Performance improvements
5
Reduce search space
Not entire large-scale file system
Search correlated metadata
Configure a context related to queries
Desirable interfaces
6
Such as range query and top-k query, i.e., complex queries;
7
S
Semantic: correlation represented by multi-
Group files based on metadata semantic correlations
Query and other relevant operations can be completed
8
9
Node Vector
Design Objectives Group sizes are approximately equal. A file in a group has a higher correlation with other files
11
Grouping correlated
p g metadata into storage and index units based
Insertion Range Query Point Query
Construction of
semantic R-trees in a distributed
Deletion Top-K NN Query
environment
Multiple operations
Semantic Grouping Latent Semantic Indexing
12
Semantic R-tree leaf nodes as storage units The non-leaf nodes as index units
MBR representation for local metadata
13
Insertion Deletion On-line Query Approaches
Range Query Top-K Query Point Query
14
Point Query
Accelerate queries
Off-line pre-processing
Each storage unit locally maintains a replica of the
Lazy updating to deal with information staleness
15
Matching? Query : Forward
(4) if fail, continue to forward Matching? Query : Forward
Query : Forward
Q y
A newly created version attached to its correlated A newly created version attached to its correlated
SmartStore removes attached versions when
The frequency of reconfiguration depends on the user
17
Our mapping is based on a simple bottom-up approach
18
Prototype Implementation Large file system-level traces, including HP , MSN, and
Compared with typical DBMS and R-tree
Query latency reduction: 1000 times Space savings: 20 times
19
20
A( ) i th t l lt
T 8 NN Q Top‐8 NN Query Range Query
21
700 180
(ms)
400 500 600 700
HP(on-line) MSN(on-line) EECS(on-line) HP(off-line) MSN(off-line) EECS(off-line)
ber (1000)
120 150 180
HP(on-line) MSN(on-line) EECS(on-line) HP(off-line) MSN(off-line) EECS(off-line)
Latency
100 200 300 400
ssage Num
30 60 90
Number of Data Nodes
20 30 40 50 60 100
Number of Data Nodes Mes
20 30 40 50 60 30
22
Pay only once: configuration efficiency for a long time Pay-only-once: configuration efficiency for a long time
Rich semantics of multi-dimensional attributes to
Lack of semantics, such as uniform distribution;
Quick and dynamic evolution of semantics; Explicit scatter of dimension increments;
23
Users’ views
Range query and top-k query
System views
De-duplication Caching Caching Pre-fetching
24
Exploit file semantics C
Complex queries Enhance system scalability and functionality.
S
Semantic aggregation Decrease search space
25
NSFC under Grant 60703046 NSFC under Grant 60703046 National Basic Research 973 Program under Grant
NSF CCF 0621526 NSF CCF 0937993 NSF CCF 0937988 and NSF CCF-0621526, NSF CCF-0937993, NSF CCF-0937988 and
HUST-SRF No.2007Q021B The Program for Changjiang Scholars and Innovative Research
26
The Program for Changjiang Scholars and Innovative Research
27