Large Datasets on Amazon EC2
Anders Karlsson Database Architect, Recorded Future anders@recordedfuture.com
Large Datasets on Amazon EC2 Anders Karlsson Database Architect, - - PowerPoint PPT Presentation
Large Datasets on Amazon EC2 Anders Karlsson Database Architect, Recorded Future anders@recordedfuture.com Agenda About Anders Karlsson About Recorded Future Whats the deal with the Cloud How Recorded Future Works How
Anders Karlsson Database Architect, Recorded Future anders@recordedfuture.com
Oracle, Informix, MySQL / Sun / Oracle etc.
Engineer and in many other roles
(www.papablues.com), develop Open Source software (MyQuery, ndbtop etc), am a keen photographer and drives sub-standard cars, among other things
Ventures and others
Intelligence markets, for example In-Q-Tel
analyzing the past” (Predictive Analytics)
content and more
and compute a “momentum” to an entity
momentum to compute a relevance
based free service
to export data and possibly integrate it with their
Master database
preprocessing is done, as much as can be done at this stage
loading - Some processing, such as momentum computation, is applied to larger parts of the data set
slaves for further processing, and is then copied to user-focusing databases:
and change what we are doing today
have paying customers you know!
like that
Containers or stuff like that!
now and move to a more modern, cost-effective and performance environment
prepared to change.
special job 47 and tomorrow 2. Without downtime!
using Foreign Keys
columns, whereof there are 10 BLOB / TEXT columns
used for searching
performance compensates for all that
shards
Ubuntu
8 Gb). We have 16 of these
RAM). We have 12 of these currently
snapshots
systems, allowing striped disk to be consistently backed up with EC2 snapshots
databases in general
scripting
Slavereadahead utility to speed to the slaves
way of managing disk space
performance
way too much
mess of things, and some software doesn’t like the way the network is set up too well
latency even worse, and varies way too much!
servers and most of the software
standard chef recipes, and many are written from scratch
not so sure about the implementation. I like it better now than when I first started using it
scripts are used
databases and operating systems, as well as application specific data is monitored
changed somehow
cases, as our processing does a lot of references to the database, and there is no good natural sharding key
technologies
also, we are looking at InfoBright or similar for aggregates
data, and to balance frequently used data with not so frequently
manage disks and instances! This is getting expensive!
cloud, is just a Virtual Environment, and nutin’ else
fine on VMWare”
almost certainly work on EC2, but:
getting a bigger EC2 instance / server
idea, unless some caution has been taken
vendors understand how proper cloud computing works. And that’s pretty much OK!
understand it
to reap the benefit of that, that is left to you!
because it’s in a cloud! It’s more to it than that! Much more!
anders@recordedfuture.com