Redundant Storage Cluster For When It's Just Too Big Bob Burgess - - PowerPoint PPT Presentation

redundant storage cluster
SMART_READER_LITE
LIVE PREVIEW

Redundant Storage Cluster For When It's Just Too Big Bob Burgess - - PowerPoint PPT Presentation

Redundant Storage Cluster For When It's Just Too Big Bob Burgess radian 6 Technologies MySQL User Conference 2009 Scope Fundamentals of MySQL Proxy Fundamentals of LuaSQL Description of the Redundant Storage Cluster Architecture


slide-1
SLIDE 1

Redundant Storage Cluster

For When It's Just Too Big

Bob Burgess radian6 Technologies MySQL User Conference 2009

slide-2
SLIDE 2

Scope

 Fundamentals of MySQL Proxy  Fundamentals of LuaSQL  Description of the

Redundant Storage Cluster

  • Architecture
  • Sample code
slide-3
SLIDE 3

Scope – NOT

 Complete Course on MySQL Proxy  Lua Programming Course  Amazing Lua Tricks  Complete code listing of cluster

(Posted online)

slide-4
SLIDE 4

The Problem

slide-5
SLIDE 5

Growth!

2 4 6 8 10 12

Blog Posts – millions per day

slide-6
SLIDE 6

Current Solution

month table month table month table month table

. . .

myisam merge table

 Data over NFS can be unreliable  Separate copy for each DB that could

use it (Master / Replicas)

slide-7
SLIDE 7

Goals

 Single place for all content  Redundancy without complete

duplication

 Add storage by adding nodes  Survive a server failure

slide-8
SLIDE 8

Existing Products

 HiveDB

  • Java/Hibernate-based
  • No redundancy

 Spock Proxy

  • No redundancy

 MySQL Cluster

  • All indexes in RAM
slide-9
SLIDE 9

Other DBs Other DBs

Cluster

Master DB content

(federated)

Other DBs content

(federated)

Load Balancer 1

content 1

MySQL Proxy

Lua 2

content 2

MySQL Proxy

Lua

n

content n

MySQL Proxy

Lua

...

dir dir dir

slide-10
SLIDE 10

Component Walk-Through

 Federated Engine  Load Balancer  MySQL Proxy  Lua  LuaSQL

slide-11
SLIDE 11

Federated Engine

 Just a pointer to another table

create table sample ( id int primary key, value varchar(100) ) engine=federated connection= 'mysql://user:password@host:9999/schema/sample'

slide-12
SLIDE 12

Federated Engine

create table sample ( id int primary key, val int, name varchar(50) ) select id,name from sample; SELECT `id`, `val`, `name` FROM `sample`; select id,name from sample where id=2; SELECT `id`, `val`, `name` FROM sample WHERE `id` = '2'; select val,name from sample where val=2; SELECT `id`, `val`, `name` FROM `sample`; select * from sample limit 10; SELECT `id`, `val`, `name` FROM `sample`;

slide-13
SLIDE 13

Federated Engine

select max(id)from sample; SELECT `id`, `val`, `name` FROM `sample`; select count(*) from sample; SELECT `id`, `val`, `name` FROM `sample`; insert into sample (id,name) values (5,'bob'); INSERT INTO `sample` (`id`, `val`, `name`) VALUES ('5', NULL, 'bob'); insert into sample values (5,10,'bob'); INSERT INTO `sample` (`id`, `val`, `name`) VALUES ('5', '10', 'bob');

slide-14
SLIDE 14

Load Balancer Options

 MySQL Proxy Load Balancer

  • For read balancing only
  • For traditional master/slave architecture

 Custom load balancer

– MySQL Proxy & Lua script

  • Still an option, depending on future

architecture

slide-15
SLIDE 15

Load Balancer Options

 Linux Networking – NAT

  • LB stays involved in session
  • Limited Scalability
  • Slower option

 Linux Networking – Direct Routing

  • LB hands off session
  • Node answers client directly
  • Much more scalable
slide-16
SLIDE 16

MySQL Proxy

 Communicates with a MySQL client

using MySQL Network Protocol

 Provides an API to a script

environment

  • Pass query from client
  • Return result set to the client

 Uses Lua scripting language

slide-17
SLIDE 17

MySQL Proxy – Supplied Constants and Methods

 global

proxy.global.connect_retry=3 proxy.global.conRemote={} proxy.global.MySQLenv=luasql.mysql()

proxy.global.conRemote[tonumber(details.nodeId)] = proxy.global.MySQLenv:connect(...)

 queries – list of queries going to the

server

proxy.queries:reset()

slide-18
SLIDE 18

MySQL Proxy – Supplied Constants and Methods

 response – response from server to

client

proxy.response.type = proxy.MYSQLD_PACKET_OK proxy.response.resultset = {...}

 Constants

MYSQL_PACKET_OK MYSQL_PACKET_ERR MYSQL_TYPE_LONG PROXY_SEND_RESULT

slide-19
SLIDE 19

MySQL Proxy

MySQL client MySQL server 3306 MySQL client MySQL server 3306 MySQL Proxy 4040 MySQL client MySQL server MySQL Proxy 4040

Lua LuaSQL

3306

slide-20
SLIDE 20

MySQL Proxy

MySQL client MySQL server MySQL Proxy

Lua

read_query( ) read_query_result( )

slide-21
SLIDE 21

MySQL Proxy

MySQL client MySQL server MySQL Proxy

Lua

LuaSQL calls

slide-22
SLIDE 22

LuaSQL

 Connect directly to databases from

Lua scripts

environment connection

:connect

cursor

:execute (select) :execute (ins/upd/del)

return code & error msg

:fetch

return set

slide-23
SLIDE 23

LuaSQL Connection Timeouts

 Connection drops after sleeping 10s

call the execute method if cursor object is nil: call connect method for the environment call execute again

slide-24
SLIDE 24

LuaSQL Connection Timeouts

function select_db (node,sql) local reconnects = 0 local cur local err repeat cur, err = proxy.global.conRemote[tonumber(node)]:execute(sql) if cur == nil then print ('Reconnecting select_db. Error='..err) reopen_remote_db (node) end reconnects = reconnects+1 until (cur ~= nil or reconnects==proxy.global.connect_retry) if cur==nil then error("Could not reconnect / No result set in select_db",2) else return cur:fetch() end end

slide-25
SLIDE 25

LuaSQL Connection Timeouts

function execute_db (node,sql) local reconnects = 0 local LOST_CONNECTION = "MySQL server has gone away" local rc local err repeat rc, err = proxy.global.conRemote[tonumber(node)]:execute(sql) if rc ~= nil then print ("execute_db RC="..rc); end if err~=nil then print ("execute_db error="..err); end if rc == nil and err:find(LOST_CONNECTION) then print ('Reconnecting execute_db. Error='..err) reopen_remote_db (node) end reconnects = reconnects+1 until (rc ~= nil or reconnects==proxy.global.connect_retry or not err:find(LOST_CONNECTION) ) if rc == nil and err:find(LOST_CONNECTION) then error("Could not reconnect in execute_db.",2) else return rc,err end end

slide-26
SLIDE 26

System Info Script

 Keeps cluster up to date on disk

usage

 Update Node table on all nodes with

disk size & free of this node

slide-27
SLIDE 27

Accepting Queries from Client

 Query comes in to Proxy,

read_query is called

 The query appears in read_query's

parameter

 First byte of variable indicates the type

  • f query:

 proxy.COM_QUERY (Query)

proxy.COM_PROCESS_INFO (Process List ) proxy.COM_CONNECT (Connect) proxy.COM_PROCESS_KILL (Kill)

slide-28
SLIDE 28

Returning a result set

 Two response types

  • "OK"
  • Error

 Set properties of the "response" object

in the "proxy" environment

slide-29
SLIDE 29

Returning a result set: Error

 Array:

  • type
  • errmsg
  • errcode
  • sqlstate

proxy.response={ type= proxy.MYSQLD_PACKET_ERR, errmsg= "Malformed INSERT statement.", errcode= 1064, sqlstate= "42000" } return proxy.PROXY_SEND_RESULT

slide-30
SLIDE 30

Returning a result set: OK

 Empty Result Set  Array:

  • type

proxy.response.type=proxy.MYSQLD_PACKET_OK return proxy.PROXY_SEND_RESULT

slide-31
SLIDE 31

Returning a result set: OK

 Full result set  Array:

  • type (value)
  • resultset (table)

 fields (table)  rows (table)

proxy resultset (etc.) rows fields name type

value value value value value value

slide-32
SLIDE 32

Returning a result set: OK

rows fields name type

value value value value value value proxy.response.type = proxy.MYSQLD_PACKET_OK proxy.response.resultset = { fields = { {type=proxy.MYSQL_TYPE_LONGLONG, name="blogPostID" }, {type=proxy.MYSQL_TYPE_LONG, name="partitionKey"}, {type=proxy.MYSQL_TYPE_VAR_STRING, name="rawContent" } }, rows = { { tonumber(itemId), tonumber(partKey), contentValue } } } return proxy.PROXY_SEND_RESULT

slide-33
SLIDE 33

Cluster Operation Overview

slide-34
SLIDE 34

Cluster: Librarian

 Talks to the client  Accepts items to store  Retrieves items / gives them to client  The single Lua script that runs under

Proxy

slide-35
SLIDE 35

Cluster: Librarian

 Directory

  • itemId
  • nodeId
  • partitionKey

 Event table

  • serial no.
  • event type
  • event data
slide-36
SLIDE 36

Cluster: Librarian

 Content_partitionKey

  • itemId
  • item – compressed into largeblob

 Node table

  • nodeId
  • connection / authentication info
  • system status (disk)
  • capacity factor
slide-37
SLIDE 37

Cluster: Librarian

insert no yes syntax bad good error exists yes no error

store on this node Error: table doesn't exist create table store on this node update ALL directories store Event return OK to client

slide-38
SLIDE 38

Cluster: Librarian

 Returning an error

proxy.response={ type= proxy.MYSQLD_PACKET_ERR, errmsg= "Duplicate entry '"..itemId.."' for key 1", errcode= 1062, sqlstate= "23000"} return proxy.PROXY_SEND_RESULT

Content Tables

  • MyISAM (concurrent_insert=2)
  • One table per partition key
  • For us: 300 MB / hour
slide-39
SLIDE 39

Cluster: Librarian

select no yes syntax bad good error 1=0 yes no

return result set to client return empty result set to client

info table

yes no

return result set to client

calculate max(ID) and count(*)

read dir

local remote doesn't exist get item from remote db get item from this db return empty result set to client

slide-40
SLIDE 40

Cluster: Librarian

 Federated

select max(id)from sample; SELECT `id`, `val`, `name` FROM `sample`;

 Client

create table _content_info ( _max_id bigint unsigned, _count bigint unsigned) engine=federated connection=(...schema/_content_info...);

 Target

create view _content_info as select max(id) _max_id, count(*) _count from _content;

slide-41
SLIDE 41

Cluster: Librarian

 show table status like `_content`;

Auto_increment

 cluster node add

cluster node status cluster node offline

slide-42
SLIDE 42

Cluster: Rebalancer

 Enforces redundancy policy

  • for new inserts
  • to heal from node loss

 Rebalance

  • node added
  • equal disk usage
slide-43
SLIDE 43

Cluster: Rebalancer

Policy enforce for new items get earliest new item next step

no new items

find another node with the most free space copy the item there update all directories

slide-44
SLIDE 44

Cluster: Rebalancer

Policy enforce for other items

get one item from the Directory which exists on an insufficient number of nodes and has "me" as the first listed node

next step

no items copy the item to the node with the most free space that doesn't already have it

update all directories

slide-45
SLIDE 45

Cluster: Rebalancer

Rebalancer done

move the item from that node to me

update all directories

do I have the most free space of all the nodes no yes find the node with the least free space find an item on that node that's not on this node (that

  • beys the redundancy policy)
slide-46
SLIDE 46

Member Add

 Add new node to Node table  Get a copy of the directory from all

nodes (a piece from each)

slide-47
SLIDE 47

Member Remove

 Set free-space margin to 0  Force "free disk space" for this node

to 0

 Wait for Rebalance to copy everything

  • ff

 Restore the free-space margin and

update the Node list

slide-48
SLIDE 48

Member Fail

 Update node list  Remove all directory entries for this

node

slide-49
SLIDE 49

Information Age-Out

 Drop tables for obsolete partition keys  Remove directory entries for those

partitions

slide-50
SLIDE 50

Table Optimizer

 information_schema.tables

compare data_free to data_length

 "Lock" that partition on that node

(row in PartitionLock table)

 Run optimize table but abandon if

any node fails

 Unlock table

slide-51
SLIDE 51

Backup

 "Write Lock" one partition across all

nodes

 Copy that partition table for all nodes  Copy directory entries for that node  Unlock

slide-52
SLIDE 52

Development Directions

 Bulletproof error handling  Performance tuning

slide-53
SLIDE 53

Alternative Architectures

 Load Balancer choices  Move SQL parsing to a complex

proxy-based load balancer, communicate with nodes on network sockets

 Librarian in Perl, Java, C  Alternative databases

slide-54
SLIDE 54

Resources

 MySQL Documentation  http://forge.mysql.com/wiki/MySQL_Proxy  http://forge.mysql.com/tools/search.php?t=tag&k=mysqlproxy  http://www.lua.org  http://www.keplerproject.org/luasql  http://lua-users.org  http://jan.kneschke.de/projects/mysql/mysql-proxy  http://www.linuxvirtualserver.org

slide-55
SLIDE 55

Thank you!

Bob Burgess bob.burgess@radian6.com www.radian6.com/mysql