torodb stampede open source bi connector for mongodb
play

ToroDB Stampede: Open-source BI connector for MongoDB lvaro - PowerPoint PPT Presentation

ToroDB Stampede: Open-source BI connector for MongoDB lvaro Hernndez Tortosa <aht@8kdata.com> Percona live 2017 Who I am CEO, 8Kdata.com What we do @8Kdata: ALVARO HERNANDEZ Creators of ToroDB.com, NoSQL & SQL


  1. ToroDB Stampede: Open-source BI connector for MongoDB Álvaro Hernández Tortosa <aht@8kdata.com> Percona live 2017

  2. Who I am CEO, 8Kdata.com • What we do @8Kdata: ALVARO HERNANDEZ ✓ Creators of ToroDB.com, NoSQL & SQL database ✓ Database R&D, product development ✓ Training and consulting in PostgreSQL ✓ PostgreSQL Support Twitter: @ahachete Linkedin: Founder, President Spanish Postgres User Group http://es.linkedin.com/in/alvarohernandeztortosa/ postgrespana.es ~ 750 members

  3. MongoDB Connector for BI

  4. MongoDB BI Connector •https://docs.mongodb.com/bi- connector/master/ •Developed by MongoDB themselves •Effectively allows to query mongo in a SQL-ish way But...

  5. MongoDB BI Connector •How do they do it? You don’t know it. •Harder to detect bugs •They’re part of bigger solutions which may or may not fit you •You need Enterprise version, you are paying for the whole solution

  6. MongoDB does *not* have consistent reads https://blog.meteor.com/mongodb-queries-dont-always-return-all-matching-documents-654b6594a827#.fplxodagr

  7. ToroDB Stampede

  8. ToroDB: from MongoDB to PostgreSQL

  9. How do we generate the schema? ★ There’s no need to generate it, it’s implicit! { name: “Alice”, address: { street: “Wonderland”, number: 42 }, hobbies: [ “reading”, “talking to rabbits” ] } implicit schema

  10. How do we generate the schema? •ToroDB analyzes every incoming document and separates metadata (schema) from data (tuples) •With that schema info, ToroDB creates 1+ tables per MongoDB collection •ToroDB also creates a RDBMS catalog schema per MongoDB database

  11. How do we generate the schema? SELECT * FROM torodb.people; database: torodb did name_s _id_x collection: people { 1 Alice \x01d4a63dc6944f37a61cf3d6 name: “Alice”, _id: (implicit val), SELECT * FROM torodb.people_address; address: { street: did name_s seq number_d street_s “Wonderland”, number: 42 1 1 42 Wonderland }, hobbies: [ SELECT * FROM torodb.people_hobbies; “reading”, “talking to did rid seq value_s rabbits” 1 1 0 “reading” ] } 1 2 1 “talking to rabbits”

  12. How to react to dynamic changes in the schema? [ { name: “Alice” }, { name: “Harry”, surname: “Potter” } ] ★ What happens if a new document in the collection brings new fields?

  13. How to react to dynamic changes in the schema? did _id_x name_s surname_s 1 \x01d4a63dc6944f37a61cf3d6 Alice 2 \x21f4f563dc6944f37a61cf436 Harry Potter ★ ToroDB automatically performs: ✓ ALTER TABLE ... ADD COLUMN surname_s text

  14. How to react to dynamic changes in the schema? [ { name: “Harry”, surname: “Potter” } { name: “Henry”, surname: 8 }, ] ★ What happens if a new document in the collection brings type conflicts?

  15. How to react to dynamic changes in the schema? did _id_x name_s surname_s surname_i 1 \x01d4a63dc6944f37a61cf3d6 Harry Potter 2 \x21f4f563dc6944f37a61cf436 Henry 8 ★ Type conflicts are resolved by suffixing column names with the type of the data: "s" for text, "i" for integer and so forth

  16. ToroDB Architecture This is your SQL replica! ToroDB ToroDB works as a secondary node on a MongoDB replica set

  17. ToroDB Architecture • ToroDB works as a secondary node on a replica set • This node has two main modes: ✓ First, it imports all MongoDB data into PostgreSQL ✓ Then it stays tuned to changes in MongoDB Oplog and applies them in PostgreSQL

  18. ToroDB Architecture Protocols MongoDB*, Couchbase, Cassandra… Abstract Doc Tables and Rows { Here is where ‘the magic’ happens transforming Docs into Data2Relational Backends PostgreSQL*, MySQL, SQL Server, Oracle… * currently available ToroDB offers the interface of any NoSQL database, but persisting data in tables and rows within a SQL database.

  19. Measuring I/O with iotop required to answer a query Github Archive: top 10 actors (1,4GB dataset) 500 MongoDB storageSize: 536.37 MB MongoDB size: 1410,35 MB 375 Exactly 100% of the storageSize! db.githubarchive.aggregate([ Disk Read (MB) { $group: { _id: '$actor.login', events: { $sum: 1 } 536,36 250 } }, { $sort: { events: -1 }}, { $limit: 10 } ]) 125 SELECT count(*), login FROM actor GROUP BY login 87,93 6,5 ORDER BY 1 DESC LIMIT 10; 0 MongoDB PostgreSQL PostgreSQL + cstore

  20. Building the benchmarks • Data from Github archive • Timing three different environments that contain the same data: ✓ MongoDB ✓ MongoDB with 3 shards ✓ PostgreSQL (obtained by migrating MongoDB environment via ToroDB) •We run 6 queries (PostgreSQL queries being the relational equivalent to MongoDB queries) ✓ Available at https://gist.github.com/ahachete

  21. Github 500GB PostgreSQL MongoDB MongoDB (3 Shards) 880 A 1.851 621 83 B 4.333 1.184 57x 239 C 4.253 1.191 254 D 14.504 3.691 1.650 E 4.732 1.418 402 F 4.126 1.021 0 4.000 8.000 12.000 16.000 Seconds (Less is better)

  22. Github 100GB PostgreSQL MongoDB MongoDB (3 Shards) 34 A 361 89 13 B 750 That’s 267x 194 faster! 38 C 747 163 9 D 2.405 383 240 E 768 193 53 F 664 117 0 750 1.500 2.250 3.000 Seconds (Less is better)

  23. Github 100GB no idx PostgreSQL MongoDB MongoDB (3 Shards) 31 A 715 199 13 B 694 209 Important 37 because 94x C 668 many BI 193 queries 7 don’t use D 658 indexes 158 241 E 793 211 53 F 667 164 0 200 400 600 800 Seconds (Less is better)

  24. What's on my MongoDB database?

  25. Stampede infers and clearly shows you your data

  26. Or use SQL tools to show your schema

  27. Installation Prerequisites ToroDB Stampede MongoDB connector for BI • PostgreSQL 
 • MongoDB Enterprise 3.2+ 
 • Java SE 8 • OpenSSL

  28. Installation MongoDB connector for BI wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64- ubuntu1604-3.4.3.tgz tar xjf mongodb-linux-x86_64-ubuntu1604-3.4.3.tgz sudo install -m755 bin/mongo* /usr/local/bin/ ToroDB Stampede wget https://www.torodb.com/download/torodb-stampede-1.0.0- beta1.tar.bz2 tar xjf torodb-stampede-1.0.0-beta1.tar.bz2

  29. Configuration MongoDB connector for BI ... It’s something like this schema: • You have to create an schema for - db: database each database (or collection). tables: - table: name_table collection: name_collection pipeline: [] mongodrdl --host example.com columns: -d dbname [-c collname] - Name: _id MongoType: bson.ObjectId -o schema.drdl SqlName: _id SqlType: varchar • You can edit manually… - Name: column_name MongoType: type SqlName: sqlname SqlType: sqltype

  30. Configuration ToroDB Stampede • Requires a new user ( ‘torodb' ) and a new database ( ’torod' ) in PostgreSQL. • Access configuration to PostgreSQL detailed in .toropass : host:port:torod:torodb:<password> • You can also check the configuration used by ToroDB Stampede using the -l parameter: torodb-stampede -l

  31. Start! MongoDB connector for BI mongosqld --schema schema.drdl --mongo-uri <your.mongohost.com> • You need one schema per database • It enables a MySQL connection ToroDB Stampede $TOROHOME/bin/torodb-stampede • And it starts the automagically replication

  32. Problem resolution Insert Documents with new fields MongoDB connector for BI ToroDB Stampede • You will surely have to edit • Creates new columns and manually the mongodrdl tables if it’s necessary schema: ➡ mongodrdl infers the schema by sampling

  33. Problem resolution Type Conflicts MongoDB connector for BI ToroDB Stampede • You should execute mongodrdl • The column names contain again, but surely you don’t have a postfix, indicating the the new type, so: data type ➡ You have to edit the schema ➡ If one path contains and add a new field with two different data types different sql name. then two different columns are created.

  34. Problem resolution Array / Document exploding MongoDB Connector for BI database: stampede collection: restaurants stampede.restaurants { _id name address.street address.number _id: (implicit val), 01d4a63 “Glorious Food” “Wonderland” 42 name: “Glorious Food” dc6944f address: { street:”Wonderland”, number: 42, coord: [ -73.9832324, 40.6563633] stampede.restaurants_address_coord } _id address.coord address.coord_idx } 01d4a63 -73.9832324 0 dc6944f 01d4a63 40.6563633 1 dc6944f

  35. Problem resolution Array / Document exploding ToroDB Stampede stampede.restaurants database: stampede did _id_x name_s address_e collection: restaurants 0 \x01d4a “Glorious Food” f { _id: (implicit val), name: “Glorious Food” address: { stampede.restaurants_address street:”Wonderland”, did rid seq street_s number_d coord_e number: 42, 0 0 “Wonderland” 42 t coord: [ -73.9832324, 40.6563633] } stampede.restaurants_address_coord } did rid pid seq v_d 0 0 0 0 -73,9832324 0 1 0 1 40.6563633

  36. Use Case with… PostgreSQL 100.000 rows M 1.008 ms o n g o 1 D 0 B 0 B . 0 I 0 0 8 r o 2 w . 4 s 1 2 m ( s 8 0 x ! ! ) SELECT * FROM twitter.message INNER JOIN twitter.message_user ON SELECT * FROM `twitter.message` twitter.message.did = LIMIT 100000 twitter.message_user.did LIMIT 100000

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend