Peter Schwaller – Senior Director Server Engineering, Percona Santa Clara, California | April 23th – 25th, 2018
Time-Series Data in MongoDB
- n a Budget
Time-Series Data in MongoDB on a Budget Peter Schwaller Senior - - PowerPoint PPT Presentation
Time-Series Data in MongoDB on a Budget Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California | April 23th 25th, 2018 TIME SERIES DATA in MongoDB on a Budget Click to add text What is Time-Series Data?
Peter Schwaller – Senior Director Server Engineering, Percona Santa Clara, California | April 23th – 25th, 2018
Click to add text
3
Characteristics:
Click to add text
5
6
7
8
9
$currentDate: { recordedTime: true }
Allows applications to reliably process each document once and only once.
It’s only *mostly* write-only.
11
{ partialFilterExpression: { speed: { $gt: 75.0 } } }
12
db.collection.aggregate( [ { $indexStats: {}}])
Getting the Speed You Need
14
database[collection].insert(doc_array)
bulk = database[collection].initialize_unordered_bulk_op() bulk.insert(doc) # loop here bulk.execute()
bulk = database[collection].initialize_unordered_bulk_op() bulk.find({"_id": doc["_id"]}).upsert().update_one({"$set": doc}) # loop here bulk.execute()
database[collection].insert(doc)
database[collection].update_one({"_id": doc["_id"]}, {"$set": doc}, upsert=True)
15
5000 10000 15000 20000 25000 30000 35000 40000 Insert Array Insert Unordered Bulk Update Unordered Bulk Insert Single Update Single
Comparison of API Methods
Docs/Sec
Answering, “Why can’t I just use a gigantic HDD RAID array?”
17
18
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Inserts/Sec SSD HDD
19
20
21
22
23
24
25
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Inserts/Sec SSD HDD
26
27
29
replica set)
“Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.”
31
32
… let’s take advantage of that.
33
34
35
use admin sh.enableSharding(‘DBName’) sh.shardCollection(‘DBName.TimeSeries’, { time : 1 } ) sh.addShardTag('rsmain', ‘future') sh.addShardTag(‘rs001', ‘ts001') sh.addTagRange('DBName.TimeSeries',{time: new Date("2099-01-01")}, {time:MaxKey},'future') sh.addTagRange(‘DBName.TimeSeries',{time:MinKey},{time:new Date("2099-01-01")},‘ts001') # sh.splitAt('DBName.TimeSeries', {"time" : new Date("2099-01-01")})
36
use admin db.runCommand({addShard: “rs002/hostname:port", name: "rs002"}) sh.addShardTag(‘rs002’, ‘ts002') var configdb=db.getSiblingDB("config"); configdb.tags.update({tag:“ts001"},{$set:{'max.time': new ISODate(“2018-04-26”) }}) sh.addTagRange(‘DBName.TimeSeries',{time:new Date("2018-04-26")},{time:new Date("2099-01- 01")},‘ts002') # sh.splitAt('DBName.TimeSeries', {"time" : new ISODate("2018-04-26")})
37
38
40
(cheaper) servers
41
42