Big Data Without a Big Database Kate Matsudaira popforms @katemats - - PowerPoint PPT Presentation

big data
SMART_READER_LITE
LIVE PREVIEW

Big Data Without a Big Database Kate Matsudaira popforms @katemats - - PowerPoint PPT Presentation

Big Data Without a Big Database Kate Matsudaira popforms @katemats Two kinds of data reference, non- nicknames user, transactional transactional product/offer catalogs user accounts service catalogs


slide-1
SLIDE 1

Without a Big Database

Kate Matsudaira popforms @katemats

Big Data

slide-2
SLIDE 2

Two kinds of data

nicknames “user”, “transactional” “reference”, “non- transactional” examples:

  • user accounts
  • shopping cart/orders
  • user messages
  • product/offer catalogs
  • service catalogs
  • static geolocation data
  • dictionaries

created/modified by: users business (you) sensitivity to staleness: high low plan for growth: hard easy access

  • ptimization:

read/write mostly read

slide-3
SLIDE 3

Two kinds of data

nicknames “user”, “transactional” “reference”, “non- transactional” examples:

  • user accounts
  • shopping cart/orders
  • user messages
  • product/offer catalogs
  • service catalogs
  • static geolocation data
  • dictionaries

created/modified by: users business (you) sensitivity to staleness: high low plan for growth: hard easy access

  • ptimization:

read/write mostly read

slide-4
SLIDE 4
slide-5
SLIDE 5

reference data

slide-6
SLIDE 6

reference data

user data

slide-7
SLIDE 7
slide-8
SLIDE 8

reference data

slide-9
SLIDE 9

reference data

user data

slide-10
SLIDE 10

Performance

main memory read 0.0001 ms (100 ns) network round trip 0.5 ms (500,000 ns) disk seek 10 ms (10,000,000 ns)

source: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf

Reminder

slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14

The Beginning

webapp webapp load balancer load balancer BIG
 DATABASE service service service data loader

slide-15
SLIDE 15

The Beginning

webapp webapp load balancer load balancer BIG
 DATABASE service service service data loader

availability problems

slide-16
SLIDE 16

The Beginning

webapp webapp load balancer load balancer BIG
 DATABASE service service service data loader

availability problems performance problems

slide-17
SLIDE 17

The Beginning

webapp webapp load balancer load balancer BIG
 DATABASE service service service data loader

availability problems performance problems scalability problems

slide-18
SLIDE 18

Replication

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

slide-19
SLIDE 19

REPLICA

Replication

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

slide-20
SLIDE 20

REPLICA

Replication

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

scalability problems

slide-21
SLIDE 21

REPLICA

Replication

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

scalability problems performance problems

slide-22
SLIDE 22

REPLICA

Replication

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

scalability problems performance problems

  • perational
  • verhead
slide-23
SLIDE 23

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

slide-24
SLIDE 24

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

cache cache cache

slide-25
SLIDE 25

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

scalability problems

cache cache cache

slide-26
SLIDE 26

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

scalability problems

  • perational
  • verhead

cache cache cache

slide-27
SLIDE 27

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

scalability problems performance problems

  • perational
  • verhead

cache cache cache

slide-28
SLIDE 28

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

scalability problems performance problems

  • perational
  • verhead

cache cache cache

consistency problems

slide-29
SLIDE 29

REPLICA

webapp webapp load balancer load balancer

BIG
 DATABASE

service service service data loader

Local Caching

scalability problems performance problems

  • perational
  • verhead

cache cache cache

consistency problems long tail performance problems

slide-30
SLIDE 30

The Long Tail Problem

20% of requests query remaining 90%

  • f entries (tail)

80% of requests query 10% of entries (head)

slide-31
SLIDE 31

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

slide-32
SLIDE 32

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

  • perational
  • verhead
slide-33
SLIDE 33

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

performance problems

  • perational
  • verhead
slide-34
SLIDE 34

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

scalability problems performance problems

  • perational
  • verhead
slide-35
SLIDE 35

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

scalability problems performance problems consistency problems

  • perational
  • verhead
slide-36
SLIDE 36

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

scalability problems performance problems consistency problems long tail performance problems

  • perational
  • verhead
slide-37
SLIDE 37

webapp webapp load balancer load balancer replica service service service data loader BIG CACHE database preload

Big Cache

scalability problems performance problems consistency problems long tail performance problems

  • perational
  • verhead
slide-38
SLIDE 38

Big Cache Technologies

slide-39
SLIDE 39

Big Cache Technologies

memcached(b)

slide-40
SLIDE 40

Big Cache Technologies

ElastiCache (AWS) memcached(b) Do I look like I need a cache?

slide-41
SLIDE 41

Big Cache Technologies

ElastiCache (AWS) memcached(b) Oracle Coherence Do I look like I need a cache?

slide-42
SLIDE 42
slide-43
SLIDE 43

Targeted generic data/ use cases.

slide-44
SLIDE 44

Targeted generic data/ use cases. Dynamically assign keys to the “nodes”

slide-45
SLIDE 45

Targeted generic data/ use cases. Scales horizontally Dynamically assign keys to the “nodes”

slide-46
SLIDE 46

Targeted generic data/ use cases. Scales horizontally Dynamically rebalances data Dynamically assign keys to the “nodes”

slide-47
SLIDE 47

Targeted generic data/ use cases. Scales horizontally Dynamically rebalances data Poor performance

  • n cold starts

Dynamically assign keys to the “nodes”

slide-48
SLIDE 48

Targeted generic data/ use cases. Scales horizontally Dynamically rebalances data Poor performance

  • n cold starts

No assumptions about loading/ updating data Dynamically assign keys to the “nodes”

slide-49
SLIDE 49
  • Extra network hop
  • Slow scanning
  • Additional deserialization
  • Additional hardware
  • Additional configuration
  • Additional monitoring

Big Cache Technologies

slide-50
SLIDE 50
  • perational overhead

}

  • Extra network hop
  • Slow scanning
  • Additional deserialization
  • Additional hardware
  • Additional configuration
  • Additional monitoring

Big Cache Technologies

slide-51
SLIDE 51
  • perational overhead

}

performance

}

  • Extra network hop
  • Slow scanning
  • Additional deserialization
  • Additional hardware
  • Additional configuration
  • Additional monitoring

Big Cache Technologies

slide-52
SLIDE 52

NoSQL Replica webapp webapp load balancer load balancer NoSQL Database service service service data loader

NoSQL to The Rescue?

slide-53
SLIDE 53

NoSQL Replica webapp webapp load balancer load balancer NoSQL Database service service service data loader

NoSQL to The Rescue?

some performance problems

slide-54
SLIDE 54

NoSQL Replica webapp webapp load balancer load balancer NoSQL Database service service service data loader

NoSQL to The Rescue?

some performance problems some scalability problems

slide-55
SLIDE 55

NoSQL Replica webapp webapp load balancer load balancer NoSQL Database service service service data loader

NoSQL to The Rescue?

some operational

  • verhead

some performance problems some scalability problems

slide-56
SLIDE 56

Remote Store Retrieval Latency

remote store network client network

slide-57
SLIDE 57

Remote Store Retrieval Latency

TCP request: 0.5 ms Lookup/write response: 0.5 ms TCP response: 0.5 ms read/parse response: 0.25 ms

remote store network client network

slide-58
SLIDE 58

Remote Store Retrieval Latency

TCP request: 0.5 ms Lookup/write response: 0.5 ms TCP response: 0.5 ms read/parse response: 0.25 ms

remote store network client network

1.75 ms

Total time to retrieve single value:

slide-59
SLIDE 59

Total time to retrieve 
 A single value


from remote store: 1.75 ms
 from memory: 0.001 ms 
 (10 main memory reads)

slide-60
SLIDE 60

Total time to retrieve 
 A single value
 Sequential access of 1 million random keys

from remote store: 1.75 ms
 from memory: 0.001 ms 
 (10 main memory reads) from remote store: 30 minutes
 from memory: 1 second

slide-61
SLIDE 61

The Truth About 
 Databases

slide-62
SLIDE 62

“What I'm going to call as the hot data cliff: As the size of your hot data set (data frequently read at sustained rates above disk I/O capacity) approaches available memory, write operation bursts that exceeds disk write I/O capacity can create a trashing death spiral where hot disk pages that MongoDB desperately needs are evicted from disk cache by the OS as it consumes more buffer space to hold the writes in memory.”

MongoDB

Source: http://www.quora.com/Is-MongoDB-a-good-replacement-for-Memcached

slide-63
SLIDE 63

“Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory.”


Redis

source: http://redis.io/topics/faq

slide-64
SLIDE 64

They are fast if everything 
 fits into memory.

slide-65
SLIDE 65
slide-66
SLIDE 66

Can you keep it in memory yourself?

webapp webapp load balancer load balancer service full cache data loader service full cache data loader service full cache data loader BIG
 DATABASE

slide-67
SLIDE 67

Can you keep it in memory yourself?

webapp webapp load balancer load balancer service full cache data loader service full cache data loader service full cache data loader BIG
 DATABASE

  • perational

relief

slide-68
SLIDE 68

Can you keep it in memory yourself?

webapp webapp load balancer load balancer service full cache data loader service full cache data loader service full cache data loader BIG
 DATABASE

  • perational

relief scales infinitely

slide-69
SLIDE 69

Can you keep it in memory yourself?

webapp webapp load balancer load balancer service full cache data loader service full cache data loader service full cache data loader BIG
 DATABASE

  • perational

relief scales infinitely performance gain

slide-70
SLIDE 70

Can you keep it in memory yourself?

webapp webapp load balancer load balancer service full cache data loader service full cache data loader service full cache data loader BIG
 DATABASE

  • perational

relief scales infinitely performance gain consistency problems

slide-71
SLIDE 71

Fixing Consistency

webapp load balancer service full cache data loader

deployment cell

service full cache data loader webapp webapp service full cache data loader

slide-72
SLIDE 72

Fixing Consistency

webapp load balancer service full cache data loader

deployment cell

service full cache data loader webapp webapp service full cache data loader

1. Deployment “Cells” 2. Sticky user sessions

slide-73
SLIDE 73

credit: http://www.fruitshare.ca/wp-content/uploads/2011/08/car-full-of-apples.jpeg

How do you fit all of that data into memory?

slide-74
SLIDE 74

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered.

  • We should forget about small efficiencies, say about

97% of the time: premature optimization is the root

  • f all evil. Yet we should not pass up our
  • pportunities in that critical 3%.”
  • Donald Knuth
slide-75
SLIDE 75

How do you fit all that data in memory?

slide-76
SLIDE 76

The Answer

1 2 3 4 5

slide-77
SLIDE 77

Domain Model 
 Design

“Domain Layer (or Model Layer):

  • Responsible for representing concepts of the

business, information about the business situation, and business rules. State that reflects the business situation is controlled and used here, even though the technical details of storing it are delegated to the infrastructure. This layer is the heart of business software.”

  • Eric Evans, Domain-Driven Design, 2003

1 2 3 4 5

slide-78
SLIDE 78

Domain Model Design Guidelines

http://alloveralbany.com/images/bumper_gawking_dbgeek.jpg

slide-79
SLIDE 79

Domain Model Design Guidelines

#1 Keep it immutable

http://alloveralbany.com/images/bumper_gawking_dbgeek.jpg

slide-80
SLIDE 80

Domain Model Design Guidelines

#1 Keep it immutable #2 Use independent hierarchies

http://alloveralbany.com/images/bumper_gawking_dbgeek.jpg

slide-81
SLIDE 81

Domain Model Design Guidelines

#1 Keep it immutable #2 Use independent hierarchies Help! I am in the trunk! #3 Optimize Data

http://alloveralbany.com/images/bumper_gawking_dbgeek.jpg

slide-82
SLIDE 82

intern()your immutables

V1 A C B D E V2 F B’ C’ E’ K1 K2 D’ E’ V1 A C B D E V2 F K1 K2

slide-83
SLIDE 83

private ¡final ¡Map<Class<?>, ¡Map<Object, ¡WeakReference<Object>>> ¡cache ¡= ¡ ¡ ¡ ¡ ¡ new ¡ConcurrentHashMap<Class<?>, ¡Map<Object, ¡ WeakReference<Object>>>();

  • public ¡<T> ¡T ¡intern(T ¡o) ¡{

¡ if ¡(o ¡== ¡null) ¡ ¡ return ¡null; ¡ Class<?> ¡c ¡= ¡o.getClass(); ¡ Map<Object, ¡WeakReference<Object>> ¡m ¡= ¡cache.get(c); ¡ if ¡(m ¡== ¡null) ¡ ¡ ¡ cache.put(c, ¡m ¡= ¡synchronizedMap(new ¡WeakHashMap<Object, ¡ WeakReference<Object>>())); ¡ WeakReference<Object> ¡r ¡= ¡m.get(o); ¡ @SuppressWarnings("unchecked") ¡ ¡ T ¡v ¡= ¡(r ¡== ¡null) ¡? ¡null ¡: ¡(T) ¡r.get(); ¡ if ¡(v ¡== ¡null) ¡{ ¡ ¡ v ¡= ¡o; ¡ ¡ m.put(v, ¡new ¡WeakReference<Object>(v)); ¡ } ¡ ¡ ¡ return ¡v; }

slide-84
SLIDE 84

Use Independent Hierarchies

Product id = … title= … Offers Specifications Description Reviews Rumors Model History Product Summary productId = … Offers productId = … Specifications productId = … Description productId = … Reviews productId = … Rumors productId = … Model History productId = … Product Info

slide-85
SLIDE 85

Collection 
 Optimization

1 3 4 5 2

slide-86
SLIDE 86

Leverage 


Primitive Keys/Values

Trove (“High Performance Collections for Java”)

collection with 10,000 elements [0 .. 9,999] size in memory

java.util.ArrayList<Integer> 200K java.util.HashSet<Integer> 546K gnu.trove.list.array.TIntArrayList 40K gnu.trove.set.hash.TIntHashSet 102K

slide-87
SLIDE 87

Optimize

class ¡ImmutableMap<K, ¡V> ¡implements ¡Map<K,V>, ¡Serializable ¡{ ¡ ... ¡} ¡

  • class ¡MapN<K, ¡V> ¡extends ¡ImmutableMap<K, ¡V> ¡{ ¡

¡ final ¡K ¡k1, ¡k2, ¡..., ¡kN; ¡ ¡ final ¡V ¡v1, ¡v2, ¡..., ¡vN; ¡ @Override ¡public ¡boolean ¡containsKey(Object ¡key) ¡{ ¡ ¡ ¡ if ¡(eq(key, ¡k1)) ¡return ¡true; ¡ ¡ ¡ if ¡(eq(key, ¡k2)) ¡return ¡true; ¡ ¡ ¡ ... ¡ ¡ ¡ ¡ return ¡false; ¡ } ¡ ...

Collections with small number of entries (up to ~20):

small immutable collections

slide-88
SLIDE 88

Space Savings

java.util.HashMap: 128 bytes + 32 bytes per entry

  • compact immutable map:

24 bytes + 8 bytes per entry

slide-89
SLIDE 89

Numeric Data Optimization

1 2 5 3 4

slide-90
SLIDE 90

Price History Example

slide-91
SLIDE 91

Example: Price History

Problem:

  • Store daily prices for 1M

products, 2 offers per product

  • Average price history length per

product ~2 years

  • Total price points:

(1M + 2M) * 730 = ~2 billion

slide-92
SLIDE 92

Price History First attempt

TreeMap<Date, ¡Double> ¡

  • 88 bytes per entry * 2 billion =

~180 GB

slide-93
SLIDE 93

Typical Shopping Price History

price days 20 60 70 90 100 120 121 $100

slide-94
SLIDE 94

Typical Shopping Price History

price days 20 60 70 90 100 120 121 $100

slide-95
SLIDE 95

Typical Shopping Price History

price days 20 60 70 90 100 120 121 $100

slide-96
SLIDE 96

Run Length Encoding

a a a a a a b b b c c c c c c 6 a 3 b 6 c

slide-97
SLIDE 97

Price History Optimization

  • Drop pennies
  • Store prices in primitive short (use scale factor to represent prices

greater than Short.MAX_VALUE)

  • 20 100 -40 150 -10 140 -20 100 -10
  • 20 100

90

  • 9

80 Memory: 15 * 2 + 16 (array) + 24 (start date) + 4 (scale factor) = 74 bytes

  • positive: price (adjusted to scale)
  • negative: run length (precedes price)
  • zero: unavailable
slide-98
SLIDE 98

Space Savings

Reduction compared to TreeMap<Date, ¡Double>:

155 times

Estimated memory for 2 billion price points:

1.2 GB

slide-99
SLIDE 99

Space Savings

Reduction compared to TreeMap<Date, ¡Double>:

155 times

Estimated memory for 2 billion price points:

1.2 GB << 180 GB

slide-100
SLIDE 100

public ¡class ¡PriceHistory ¡{

  • ¡

private ¡final ¡Date ¡startDate; ¡// ¡or ¡use ¡org.joda.time.LocalDate ¡ ¡ private ¡final ¡short[] ¡encoded; ¡ private ¡final ¡int ¡scaleFactor;

  • ¡

public ¡PriceHistory(SortedMap<Date, ¡Double> ¡prices) ¡{ ¡… ¡} ¡// ¡encode ¡ ¡ public ¡SortedMap<Date, ¡Double> ¡getPricesByDate() ¡{ ¡… ¡} ¡// ¡decode ¡ ¡ public ¡Date ¡getStartDate() ¡{ ¡return ¡startDate; ¡}

  • ¡

// ¡Below ¡computations ¡implemented ¡directly ¡against ¡encoded ¡data ¡ public ¡Date ¡getEndDate() ¡{ ¡… ¡} ¡ ¡ public ¡Double ¡getMinPrice() ¡{ ¡… ¡} ¡ public ¡int ¡getNumChanges(double ¡minChangeAmt, ¡double ¡minChangePct, ¡ boolean ¡abs) ¡{ ¡… ¡} ¡ public ¡PriceHistory ¡trim(Date ¡startDate, ¡Date ¡endDate) ¡{ ¡… ¡} ¡ public ¡PriceHistory ¡interpolate() ¡{ ¡… ¡}

Price History Model

slide-101
SLIDE 101

Know Your 
 Data

slide-102
SLIDE 102

Compress text

1 2 3 4 5

slide-103
SLIDE 103

String Compression:

static ¡Charset ¡UTF8 ¡= ¡Charset.forName("UTF-­‑8"); ¡

  • String ¡s ¡= ¡"The ¡quick ¡brown ¡fox ¡jumps ¡over ¡the ¡lazy ¡dog”; ¡// ¡42 ¡chars, ¡136 ¡

bytes byte[] ¡b ¡a= ¡"The ¡quick ¡brown ¡fox ¡jumps ¡over ¡the ¡lazy ¡dog”.getBytes(UTF8); ¡// ¡ 64 ¡bytes String ¡s1 ¡= ¡“Hello”; ¡// ¡5 ¡chars, ¡64 ¡bytes byte[] ¡b1 ¡= ¡“Hello”.getBytes(UTF8); ¡// ¡24 ¡bytes

  • byte[] ¡toBytes(String ¡s) ¡{ ¡return ¡s ¡== ¡null ¡? ¡null ¡: ¡s.getBytes(UTF8); ¡}

String ¡toString(byte[] ¡b) ¡{ ¡return ¡b ¡== ¡null ¡? ¡null ¡: ¡new ¡String(b, ¡UTF8); ¡}

byte arrays

  • Use the minimum character set encoding
slide-104
SLIDE 104
  • Great for URLs

public ¡class ¡PrefixedString ¡{ ¡ private ¡PrefixedString ¡prefix; ¡ private ¡byte[] ¡suffix;

  • ¡

. ¡. ¡. ¡

  • ¡

@Override ¡public ¡int ¡hashCode() ¡{ ¡… ¡} ¡ @Override ¡public ¡boolean ¡equals(Object ¡o) ¡{ ¡… ¡} }

String Compression: shared prefix

slide-105
SLIDE 105

public abstract class AlphaNumericString {
 public static AlphaNumericString make(String s) {
 try { return new Numeric(Long.parseLong(s, Character.MAX_RADIX)); } catch (NumberFormatException e) { return new Alpha(s.getBytes(UTF8)); } } protected abstract String value(); @Override public String toString() {return value(); } private static class Numeric extends AlphaNumericString { long value; Numeric(long value) { this.value = value; } @Override protected String value() { return Long.toString(value, Character.MAX_RADIX); } @Override public int hashCode() { … } @Override public boolean equals(Object o) { … } } private static class Alpha extends AlphaNumericString { byte[] value; Alpha(byte[] value) {this.value = value; } @Override protected String value() { return new String(value, UTF8); } @Override public int hashCode() { … } @Override public boolean equals(Object o) { … } } }

short alphanumeric case-insensitive strings

String Compression:

slide-106
SLIDE 106

String Compression: Large Strings

Image source:https://www.facebook.com/note.php?note_id=80105080079image

slide-107
SLIDE 107

String Compression: Large Strings

Gzip Become the master of your strings!

Image source:https://www.facebook.com/note.php?note_id=80105080079image

slide-108
SLIDE 108

String Compression: Large Strings

Gzip bzip2 Become the master of your strings!

Image source:https://www.facebook.com/note.php?note_id=80105080079image

slide-109
SLIDE 109

String Compression: Large Strings

Gzip bzip2 Just convert to byte[] first, then compress Become the master of your strings!

Image source:https://www.facebook.com/note.php?note_id=80105080079image

slide-110
SLIDE 110

JVM Tuning

1 2 3 4 5

slide-111
SLIDE 111

JVM Tuning

Image srouce: http://foro-cualquiera.com

slide-112
SLIDE 112

JVM Tuning

make sure to use compressed pointers (-XX:+UseCompressedOops)

Image srouce: http://foro-cualquiera.com

slide-113
SLIDE 113

JVM Tuning

make sure to use compressed pointers (-XX:+UseCompressedOops) use low pause GC (Concurrent Mark Sweep, G1)

Image srouce: http://foro-cualquiera.com

This s#!% is heavy!

slide-114
SLIDE 114

JVM Tuning

make sure to use compressed pointers (-XX:+UseCompressedOops) use low pause GC (Concurrent Mark Sweep, G1) Overprovision heap by ~30% Adjust generation sizes/ratios

Image srouce: http://foro-cualquiera.com

This s#!% is heavy!

slide-115
SLIDE 115

JVM Tuning

slide-116
SLIDE 116

JVM Tuning

slide-117
SLIDE 117

JVM Tuning

Print garbage collection

slide-118
SLIDE 118

JVM Tuning

Print garbage collection If GC pauses still prohibitive then consider partitioning

slide-119
SLIDE 119

In summary

slide-120
SLIDE 120

Know your Business

slide-121
SLIDE 121

Know your data

slide-122
SLIDE 122

Know when to optimize

slide-123
SLIDE 123

The End

  • My company: https://popforms.com

My website: http://katemats.com

  • And much of the credit for this talk goes to Leon Stein for developing the
  • technology. Thank you, Leon.
slide-124
SLIDE 124

How do you load the data?

slide-125
SLIDE 125

Cache Loading

webapp webapp lo a d b al a n c er service full cache data loader service full cache data loader service full cache data loader webapp reliable file store (S3) “cooked” datasets

slide-126
SLIDE 126

Cache loading tips & tricks

slide-127
SLIDE 127

Cache loading tips & tricks

Final datasets should be compressed and stored (i.e. S3)

slide-128
SLIDE 128

Cache loading tips & tricks

Final datasets should be compressed and stored (i.e. S3) Keep the format simple (CSV, JSON) Help! I am in the trunk!

slide-129
SLIDE 129

Cache loading tips & tricks

Final datasets should be compressed and stored (i.e. S3) Keep the format simple (CSV, JSON) Poll for updates Poll frequency == data inconsistency threshold Help! I am in the trunk!

slide-130
SLIDE 130

Cache Loading
 Time Sensitivity

slide-131
SLIDE 131

Cache Loading:
 low time Sensitivity Data

/tax-rates /date=2012-05-01 tax-rates.2012-05-01.csv.gz /date=2012-06-01 tax-rates.2012-06-01.csv.gz /date=2012-07-01 tax-rates.2012-07-01.csv.gz

slide-132
SLIDE 132

Cache Loading:
 medium/high time Sensitivity

/prices /date=2012-07-01 price-obs.2012-07-01.csv.gz /date=2012-07-02 /full /date=2012-07-01 2012-07-01T00-10-00.csv.gz /inc 2012-07-01T00-20-00.csv.gz

slide-133
SLIDE 133

Swap

Cache Loading Strategy

Image src:http://static.fjcdn.com/pictures/funny_22d73a_372351.jpg

slide-134
SLIDE 134

Swap

Cache Loading Strategy

Cache is immutable, so no locking is required

Image src:http://static.fjcdn.com/pictures/funny_22d73a_372351.jpg

slide-135
SLIDE 135

Swap

Cache Loading Strategy

Cache is immutable, so no locking is required Works well for infrequently updated data sets meow And for datasets that need to be refreshed each update

Image src:http://static.fjcdn.com/pictures/funny_22d73a_372351.jpg

slide-136
SLIDE 136

Cache Loading Strategy: CRUD

http://www.lostwackys.com/wacky-packages/WackyAds/capn-crud.htm

slide-137
SLIDE 137

Cache Loading Strategy: CRUD

Deletions can be tricky YARRRRRR!

http://www.lostwackys.com/wacky-packages/WackyAds/capn-crud.htm

slide-138
SLIDE 138

Cache Loading Strategy: CRUD

Deletions can be tricky Avoid full synchronization YARRRRRR!

http://www.lostwackys.com/wacky-packages/WackyAds/capn-crud.htm

slide-139
SLIDE 139

Cache Loading Strategy: CRUD

Deletions can be tricky Avoid full synchronization YARRRRRR! Consider loading cache in small batches. Use

  • ne container per

partition

http://www.lostwackys.com/wacky-packages/WackyAds/capn-crud.htm

slide-140
SLIDE 140

Concurrent Locking with 


Trove Map

public class LongCache<V> { private TLongObjectMap<V> map = new TLongObjectHashMap<V>(); private ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); private Lock r = lock.readLock(), w = lock.writeLock(); public V get(long k) { r.lock(); try { return map.get(k); } finally { r.unlock(); } } public V update(long k, V v) { w.lock(); try { return map.put(k, v); } finally { w.unlock(); } } public V remove(long k) { w.lock(); try { return map.remove(k); } finally { w.unlock(); } } }

slide-141
SLIDE 141

Cache loading optimizations

slide-142
SLIDE 142

Cache loading optimizations

Keep local copies Periodically generate serialized data/state I am “cooking” the data sets. Ha!

slide-143
SLIDE 143

Cache loading optimizations

Keep local copies Periodically generate serialized data/state Validate with CRC or hash I am “cooking” the data sets. Ha!

slide-144
SLIDE 144

service instance

product summary

  • ffers

matching predictions

service status aggregator (servlet) dependencies load balancer health check

Dependent Caches

slide-145
SLIDE 145

service instance

product summary

  • ffers

matching predictions

service status aggregator (servlet) dependencies load balancer health check

Dependent Caches

slide-146
SLIDE 146

service instance

product summary

  • ffers

matching predictions

service status aggregator (servlet) dependencies load balancer health check

Dependent Caches

slide-147
SLIDE 147

service instance

product summary

  • ffers

matching predictions

service status aggregator (servlet) dependencies load balancer health check

Dependent Caches

slide-148
SLIDE 148

service instance

product summary

  • ffers

matching predictions

service status aggregator (servlet) dependencies load balancer health check

Dependent Caches

slide-149
SLIDE 149

Deployment Cell Status

deployment cell cell status aggregator load balancer health check webapp status aggregator service 1 status aggregator service 2 status aggregator HTTP or JMX

slide-150
SLIDE 150

Hierarchical Status Aggregation