ToroDB Stampede: Open-source BI connector for MongoDB
Álvaro Hernández Tortosa <aht@8kdata.com> Percona live 2017
ToroDB Stampede: Open-source BI connector for MongoDB lvaro - - PowerPoint PPT Presentation
ToroDB Stampede: Open-source BI connector for MongoDB lvaro Hernndez Tortosa <aht@8kdata.com> Percona live 2017 Who I am CEO, 8Kdata.com What we do @8Kdata: ALVARO HERNANDEZ Creators of ToroDB.com, NoSQL & SQL
Álvaro Hernández Tortosa <aht@8kdata.com> Percona live 2017
ALVARO HERNANDEZ
CEO, 8Kdata.com
Founder, President Spanish Postgres User Group postgrespana.es ~ 750 members
✓Creators of ToroDB.com, NoSQL & SQL database ✓Database R&D, product development ✓Training and consulting in PostgreSQL ✓PostgreSQL Support
Linkedin: http://es.linkedin.com/in/alvarohernandeztortosa/ Twitter: @ahachete
connector/master/
SQL-ish way But...
may or may not fit you
paying for the whole solution
https://blog.meteor.com/mongodb-queries-dont-always-return-all-matching-documents-654b6594a827#.fplxodagr
{ name: “Alice”, address: { street: “Wonderland”, number: 42 }, hobbies: [ “reading”, “talking to rabbits” ] }
★There’s no need to generate it, it’s implicit!
implicit schema
metadata (schema) from data (tuples)
MongoDB collection
database
did name_s _id_x 1 Alice \x01d4a63dc6944f37a61cf3d6
SELECT * FROM torodb.people; SELECT * FROM torodb.people_address;
did name_s seq number_d street_s 1 1 42 Wonderland
SELECT * FROM torodb.people_hobbies;
did rid seq value_s 1 1 “reading” 1 2 1 “talking to rabbits”
database: torodb collection: people { name: “Alice”, _id: (implicit val), address: { street: “Wonderland”, number: 42 }, hobbies: [ “reading”, “talking to rabbits” ] }
[ { name: “Alice” }, { name: “Harry”, surname: “Potter” } ]
★What happens if a new document in the collection brings new fields?
★ToroDB automatically performs: ✓ALTER TABLE ... ADD COLUMN surname_s text
did _id_x name_s surname_s 1 \x01d4a63dc6944f37a61cf3d6 Alice 2 \x21f4f563dc6944f37a61cf436 Harry Potter
[ { name: “Harry”, surname: “Potter” } { name: “Henry”, surname: 8 }, ]
★What happens if a new document in the collection brings type conflicts?
★ Type conflicts are resolved by suffixing column names with the type of the
data: "s" for text, "i" for integer and so forth
did _id_x name_s surname_s surname_i 1 \x01d4a63dc6944f37a61cf3d6 Harry Potter 2 \x21f4f563dc6944f37a61cf436 Henry 8
ToroDB ToroDB works as a secondary node on a MongoDB replica set
This is your SQL replica!
✓ First, it imports all MongoDB data into PostgreSQL ✓ Then it stays tuned to changes in MongoDB Oplog and applies them in PostgreSQL
MongoDB*, Couchbase, Cassandra… Here is where ‘the magic’ happens transforming Docs into Tables and Rows{ PostgreSQL*, MySQL, SQL Server, Oracle…
ToroDB offers the interface of any NoSQL database, but persisting data in tables and rows within a SQL database.
* currently available
Github Archive: top 10 actors (1,4GB dataset) Disk Read (MB) 125 250 375 500 MongoDB PostgreSQL PostgreSQL + cstore
Exactly 100% of the storageSize!
MongoDB storageSize: 536.37 MB
db.githubarchive.aggregate([ { $group: { _id: '$actor.login', events: { $sum: 1 } } }, { $sort: { events: -1 }}, { $limit: 10 } ]) SELECT count(*), login FROM actor GROUP BY login ORDER BY 1 DESC LIMIT 10;
MongoDB size: 1410,35 MB
✓ MongoDB ✓ MongoDB with 3 shards ✓ PostgreSQL (obtained by migrating MongoDB environment via ToroDB)
equivalent to MongoDB queries) ✓ Available at https://gist.github.com/ahachete
A B C D E F
Seconds (Less is better)
4.000 8.000 12.000 16.000
1.021 1.418 3.691 1.191 1.184 621 4.126 4.732 14.504 4.253 4.333 1.851 402 1.650 254 239 83 880
PostgreSQL MongoDB MongoDB (3 Shards)
57x
A B C D E F
Seconds (Less is better)
750 1.500 2.250 3.000
117 193 383 163 194 89 664 768 2.405 747 750 361 53 240 9 38 13 34
PostgreSQL MongoDB MongoDB (3 Shards)
That’s 267x faster!
Important because many BI queries don’t use indexes A B C D E F
Seconds (Less is better)
200 400 600 800
164 211 158 193 209 199 667 793 658 668 694 715 53 241 7 37 13 31
PostgreSQL MongoDB MongoDB (3 Shards) 94x
ToroDB Stampede
MongoDB connector for BI
ToroDB Stampede
wget https://www.torodb.com/download/torodb-stampede-1.0.0- beta1.tar.bz2 tar xjf torodb-stampede-1.0.0-beta1.tar.bz2
MongoDB connector for BI
wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64- ubuntu1604-3.4.3.tgz tar xjf mongodb-linux-x86_64-ubuntu1604-3.4.3.tgz sudo install -m755 bin/mongo* /usr/local/bin/
MongoDB connector for BI
each database (or collection).
mongodrdl --host example.com
... It’s something like this
schema:
tables:
collection: name_collection pipeline: [] columns:
MongoType: bson.ObjectId SqlName: _id SqlType: varchar
MongoType: type SqlName: sqlname SqlType: sqltype
ToroDB Stampede
PostgreSQL.
host:port:torod:torodb:<password>
torodb-stampede -l
ToroDB Stampede
$TOROHOME/bin/torodb-stampede
MongoDB connector for BI
mongosqld --schema schema.drdl --mongo-uri <your.mongohost.com>
ToroDB Stampede
tables if it’s necessary
MongoDB connector for BI
manually the mongodrdl schema: ➡ mongodrdl infers the schema by sampling
Insert Documents with new fields
ToroDB Stampede
a postfix, indicating the data type ➡ If one path contains two different data types then two different columns are created.
MongoDB connector for BI
again, but surely you don’t have the new type, so: ➡ You have to edit the schema and add a new field with different sql name.
Type Conflicts
Array / Document exploding MongoDB Connector for BI
database: stampede collection: restaurants { _id: (implicit val), name: “Glorious Food” address: { street:”Wonderland”, number: 42, coord: [ -73.9832324, 40.6563633] } }
_id name address.street address.number
01d4a63 dc6944f
“Glorious Food” “Wonderland” 42
stampede.restaurants stampede.restaurants_address_coord
_id address.coord address.coord_idx
01d4a63 dc6944f
01d4a63 dc6944f
40.6563633 1
Array / Document exploding ToroDB Stampede
did _id_x name_s address_e
\x01d4a “Glorious Food”
f
stampede.restaurants stampede.restaurants_address
did rid seq street_s number_d coord_e “Wonderland” 42 t
database: stampede collection: restaurants { _id: (implicit val), name: “Glorious Food” address: { street:”Wonderland”, number: 42, coord: [ -73.9832324, 40.6563633] } }
stampede.restaurants_address_coord
did rid pid seq v_d
1 1 40.6563633
M
g
B B I 1 . r
s 8 2 . 4 1 2 m s ( 8 x ! ! ) PostgreSQL 100.000 rows 1.008 ms
SELECT * FROM twitter.message INNER JOIN twitter.message_user ON twitter.message.did = twitter.message_user.did LIMIT 100000 SELECT * FROM `twitter.message` LIMIT 100000
Top ten users sending messages
We ran the following query:
SELECT screen_name_s, count(1) FROM (SELECT screen_name_s FROM twitter.message_user LIMIT 1000000) AS users GROUP BY 1 ORDER BY 2 desc LIMIT 10
PostgreSQL 3.791 ms MongoDB 7.107 ms (1,8x)
Top user locations
We ran the with the following query:
SELECT location_s, count(1) FROM (SELECT location_s FROM twitter.message_user LIMIT 10000000) AS locations GROUP BY 1 ORDER BY 2 desc LIMIT 20
PostgreSQL 15.452 ms MongoDB 72.536 ms (4,7x)
Top Hashtags
We did the with the following query:
SELECT text_s, count(1) FROM (SELECT text_s FROM twitter.message_entities_hashtags LIMIT 10000000)) AS hashtags GROUP BY 1 ORDER BY 2 desc LIMIT 20
PostgreSQL 4.819 ms MongoDB 305.021 ms (64x !!!)
MongoDB Connector for BI ToroDB Stampede Installation Easy and fast Easy and fast Configuration Need to create schema Easy and fast Querying Indirect (convert to NoSQL) Direct (Keeps data synchronized) Adaptability Manual update schema for new fields or types Adapts automagically to changes in schema Query capabilities Limited, requires many self- joins (flattened schema) Advance, schema looks very natural Speed x Up to 100x faster License MongoDB Enterprise Open Source
Tables:
www.8kdata.com info@8kdata.com