Tailor-S: Look What You Made Me Do! Vadim Semenov Software Engineer - PowerPoint PPT Presentation

5. Compute: Spark (Files > 2GiB) Can't read files bigger than 2GiB into memory because arrays in java can't have more than 2^31 - 8 elements. And sometimes kafka-connect produces very big files 56

5. Compute: Spark (Files > 2GiB) 1. Copy a file locally 57

5. Compute: Spark (Files > 2GiB) 1. Copy a file locally 2. MMap it using com.indeed.util.mmap.MMapBuffer, i.e. map the file into the virtual memory 58

5. Compute: Spark (Files > 2GiB) 1. Copy a file locally 2. MMap it using com.indeed.util.mmap.MMapBuffer 3. Allocate an empty ByteBuffer using java reflections 59

5. Compute: Spark (Files > 2GiB) 1. Copy a file locally 2. MMap it using com.indeed.util.mmap.MMapBuffer 3. Allocate an empty ByteBuffer using java reflections 4. Point ByteBuffer to a region of memory inside the MMapBuffer 60

5. Compute: Spark (Files > 2GiB) 1. Copy a file locally 2. MMap it using com.indeed.util.mmap.MMapBuffer 3. Allocate an empty ByteBuffer using java reflections 4. Point ByteBuffer to a region of memory inside the MMapBuffer 5. Give ByteBuffer to ZSTD decompress 61

5. Compute: Spark (Files > 2GiB) 1. Copy a file locally 2. MMap it using com.indeed.util.mmap.MMapBuffer 3. Allocate an empty ByteBuffer using java reflections 4. Point ByteBuffer to a region of memory inside the MMapBuffer 5. Give ByteBuffer to ZSTD decompress 6. Everything thinks that it's a regular ByteBuffer but it's actually a MMap'ed file 62

5. Compute: Spark (Files > 2GiB) 63

5. Compute: Spark (Files > 2GiB) Some files a very big, so we need to read them in parallel. 1. Set spark.sql.files.maxPartitionBytes=1GB 64

5. Compute: Spark (Files > 2GiB) Some files a very big, so we need to read them in parallel. 1. Set spark.sql.files.maxPartitionBytes=1GB 2. Write length,payload,length,payload,length,payload 65

5. Compute: Spark (Files > 2GiB) Some files a very big, so we need to read them in parallel. 1. Set spark.sql.files.maxPartitionBytes=1GB 2. Write length,payload,length,payload,length,payload 3. Each reader will have startByte/endByte 66

5. Compute: Spark (Files > 2GiB) Some files a very big, so we need to read them in parallel. 1. Set spark.sql.files.maxPartitionBytes=1GB 2. Write length,payload,length,payload,length,payload 3. Each reader will have startByte/endByte 4. Keep skipping payloads until >= startByte 67

5. Compute: Spark (Files > 2GiB) Because of lots of tricks we have to track allocation/deallocation of memory in our custom reader. It's very memory efficient, doesn't use more than 4GiB per executor 68

5. Compute: Spark (Internal APIs) DataSet.map(obj => …) 1. must create objects 69

5. Compute: Spark (Internal APIs) DataSet.map(obj => …) 1. must create objects 2. copies primitives from Spark Memory (internal spark representation) 70

5. Compute: Spark (Internal APIs) DataSet.map(obj => …) 1. must create objects 2. copies primitives from Spark Memory (internal spark representation) 3. has schema 71

5. Compute: Spark (Internal APIs) DataSet.map(obj => …) 1. must create objects 2. copies primitives from Spark Memory (internal spark representation) 3. has schema 4. type-safe 72

5. Compute: Spark (Internal APIs) DataSet.queryExecution.toRdd(InternalRow => ) 1. doesn't create objects 73

5. Compute: Spark (Internal APIs) DataSet.queryExecution.toRdd(InternalRow => ) 1. doesn't create objects 2. doesn't copy primitives 74

5. Compute: Spark (Internal APIs) DataSet.queryExecution.toRdd(InternalRow => ) 1. doesn't create objects 2. doesn't copy primitives 3. has no schema 75

5. Compute: Spark (Internal APIs) DataSet.queryExecution.toRdd(InternalRow => ) 1. doesn't create objects 2. doesn't copy primitives 3. has no schema 4. not type-safe, you need to know position of all fields, easy to shoot yourself in the foot 76

5. Compute: Spark (Internal APIs) DataSet.queryExecution.toRdd(InternalRow => ) 1. doesn't create objects 2. doesn't copy primitives 3. has no schema 4. not type-safe, you need to know position of all fields 5. InternalRow has direct access to Spark memory 77

5. Compute: Spark (Internal APIs) 78

5. Compute: Spark (Memory) spark.executor.memory = 150g spark.yarn.executor.memoryOverhead = 70g spark.memory.offHeap.enabled = true, spark.memory.offHeap.size = 100g 79

Here we only compare ratio of GC to task time, 5. Compute: Spark (GC) screenshots were taken not at the same point within the job offheap=false (default setting), almost 50% is spent in GC offheap=true, GC time drops down to 20% 80

5. Compute: Spark (GC) time spent in GC = 63.8/1016.3 = 6.2% 81

5. Compute: Spark (GC) overall, GC is now ~0.3% of overall cpu time 82

Water break 83

6. Testing 1. Unit tests 84

6. Testing 1. Unit tests 85

6. Testing 1. Unit tests 2. Integration tests 86

6. Testing 1. Unit tests 2. Integration tests 87

6. Testing 1. Unit tests 2. Integration tests 3. Staging environment 88

6. Testing 1. Unit tests 2. Integration tests 3. Staging environment 4. Load-testing 89

6. Testing 1. Unit tests 2. Integration tests 3. Staging environment 4. Load-testing 5. Slowest parts 90

6. Testing 1. Unit tests 2. Integration tests 3. Staging environment 4. Load-testing 5. Slowest parts 6. Checking data correctness 91

6. Testing 1. Unit tests 2. Integration tests 3. Staging environment 4. Load-testing 5. Slowest parts 6. Checking data correctness 7. Game days 92

6. Testing (Load testing) Once we had a working prototype, we started doing load testing to make sure that the new system is going to work for the next 3 years. 1. Throw 10x data 2. See what is slow/what breaks, write it down 3. Estimate cost 93

6. Testing (Slowest parts) Have good understanding of the slowest/most skewed parts of the job, put timers around them and have historical data to compare. And we know limits of those parts and when to start optimizing them. 94

6. Testing (Slowest parts) 95

6. Testing (Easter egg) 96

6. Testing (Data correctness) We ran the new system using all the data that we have and then did one-to-one join to see what points are missing/different. This allowed us find some edge cases that we were able to eliminate 97

6. Testing (Game Days) "Game days" are when we test that our systems are resilient to errors in the ways we expect, and that we have proper monitoring of these situations. If you're not familiar with this idea, https://stripe.com/blog/game-day-exercises-at-stripe is a good intro. 1. Come up with scenarios (a node is down, the whole service is down, etc.) 2. Expected behavior? 3. Run scenarios 4. Write down what happened 5. Summarize key lessons 98

6. Testing (Game Days) 99

6. Testing (Game Days) 10 0

Tailor-S: Look What You Made Me Do! Vadim Semenov Software Engineer - PowerPoint PPT Presentation

Tailor-S: Look What You Made Me Do! Vadim Semenov Software Engineer @ Datadog vadim@datadoghq.com 1 2 3 4 5 Table of contents 1. The original system and issues with it 2. Requirements for the new system 3. Decoupling of state and

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

The Art of Drafting an RFP Tailor your Contract to your Scope Presented by: Brenda Frank

The Encyclopedie of Diderot and dAlembert 1 Tailor of Suits, I This image is in the public

The reduction of the risks of accidents, and environmental pollution, increased quality, less

NS Presentation 1 Nord Services A.E. is committed to offer services, tailor made to the needs

NS Presentation 1 Nord Services A.E. is committed to offer services, tailor made to the needs

Make it yours VMZ Composite + www.vmzinc.com + Examples of projects done with tailor made VMZINC

COMPANY PRESENTATION TAILOR MADE SOLUTION FOR OIL&GAS APPLICATIONS GROUP HISTORY 1963-

Expert in nutrition and health Vitalac Premix : a tailor made formulation Vitacid, the digestive

The Blue Cross Tailor-made Rehoming Service Mandy Jones Head of rehoming services Review of

LOGISTICS HUB LUXEMBOURG A TAILOR MADE SOLUTION FOR YOUR EUROPEAN DISTRIBUTION DANIEL LIEBERMANN

Our tailor-made solutions your way to reach the goal. www.silvirom.ee Silvirom company was

Our tailor-made solutions your way to reach the goal. www.silvirom.ee Silvirom company was

Make it yours VMZ Standing seam + www.vmzinc.com + Examples of projects done with tailor made

Our tailor-made solutions your way to reach the goal. www.silvirom.ee Silvirom company was

exTempore is a leading customer services consultancy company that audits, provides tailor-made

Statistical Natural Language Processing An overview of NLP applications: some topics not covered

2018 Annual Mee-ng Agenda Introduc)ons Chris Pyryt Approval of 2017 Annual Mee)ng Minutes

Qt Surprises signal/slot access specifjers Slot access check bool QObject::connect( const

CPSC 490: Problem Solving in Computer Science A bipartite graph is: and Y . A graph with no

Objec(ves Review Lab 1 Linux prac(ce Programming prac(ce Print statements

How to Keep Your Game on Top of The Charts The story (mostly) of Doodle Jump Igor Pu enjak

Fuzzy Drove Conservation Group Public Meeting 29th June 2015 www.fuzzydrove.co.uk Agenda

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems March 29, 2017 Roadmap

Tailor-S: Look What You Made Me Do! Vadim Semenov Software Engineer - PowerPoint PPT Presentation

Tailor-S: Look What You Made Me Do! Vadim Semenov Software Engineer @ Datadog vadim@datadoghq.com 1 2 3 4 5 Table of contents 1. The original system and issues with it 2. Requirements for the new system 3. Decoupling of state and

Collection #1 LOOk 1/8 LOOk 2/8 LOOk 3/8 LOOk 4/8 LOOk 5/8 LOOk 6/8

The Art of Drafting an RFP Tailor your Contract to your Scope Presented by: Brenda Frank

The Encyclopedie of Diderot and dAlembert 1 Tailor of Suits, I This image is in the public

The reduction of the risks of accidents, and environmental pollution, increased quality, less

NS Presentation 1 Nord Services A.E. is committed to offer services, tailor made to the needs

NS Presentation 1 Nord Services A.E. is committed to offer services, tailor made to the needs

Make it yours VMZ Composite + www.vmzinc.com + Examples of projects done with tailor made VMZINC

COMPANY PRESENTATION TAILOR MADE SOLUTION FOR OIL&amp;GAS APPLICATIONS GROUP HISTORY 1963-

Expert in nutrition and health Vitalac Premix : a tailor made formulation Vitacid, the digestive

The Blue Cross Tailor-made Rehoming Service Mandy Jones Head of rehoming services Review of

LOGISTICS HUB LUXEMBOURG A TAILOR MADE SOLUTION FOR YOUR EUROPEAN DISTRIBUTION DANIEL LIEBERMANN

Our tailor-made solutions your way to reach the goal. www.silvirom.ee Silvirom company was

Our tailor-made solutions your way to reach the goal. www.silvirom.ee Silvirom company was

Make it yours VMZ Standing seam + www.vmzinc.com + Examples of projects done with tailor made

Our tailor-made solutions your way to reach the goal. www.silvirom.ee Silvirom company was

exTempore is a leading customer services consultancy company that audits, provides tailor-made

Statistical Natural Language Processing An overview of NLP applications: some topics not covered

2018 Annual Mee-ng Agenda Introduc)ons Chris Pyryt Approval of 2017 Annual Mee)ng Minutes

Qt Surprises signal/slot access specifjers Slot access check bool QObject::connect( const

CPSC 490: Problem Solving in Computer Science A bipartite graph is: and Y . A graph with no

Objec(ves Review Lab 1 Linux prac(ce Programming prac(ce Print statements

How to Keep Your Game on Top of The Charts The story (mostly) of Doodle Jump Igor Pu enjak

Fuzzy Drove Conservation Group Public Meeting 29th June 2015 www.fuzzydrove.co.uk Agenda

Dialogue and Conversational Agents Ling575 Spoken Dialog Systems March 29, 2017 Roadmap

COMPANY PRESENTATION TAILOR MADE SOLUTION FOR OIL&GAS APPLICATIONS GROUP HISTORY 1963-