Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup - PowerPoint PPT Presentation

The Game of Big Data � Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014

Free-to-Play Real-time Strategy Games • Web and mobile • Strategy and tactics • Really real-time J • Deep relationships with players • Constantly evolving gameplay, feature set, economy, balance • >500 employees worldwide

Intro: Analytics at KIXEYE User Acquisition Game Analytics Retention and Monetization Analytic Requirements

User Acquisition Goal: ELTV > acquisition cost • User’s estimated lifetime value is more than it costs to acquire that user Mechanisms • Publisher Campaigns • On-Platform Recommendations

Game Analytics Goal: Measure and Optimize “Fun” • Difficult to define • Includes gameplay, feature set, performance, bugs • All metrics are just proxies for fun (!) Mechanisms • Game balance • Match balance • Economy management • Player typology

Retention and Monetization Goal: Sustainable Business • Monetization drivers • Revenue recognition Mechanisms • Pricing and Bundling • Tournament (“Event”) Design • Recommendations

Analytic Requirements • Data Integrity and Availability • Cohorting • Controlled experiments • Deep ad-hoc analysis

“Deep Thought” V1 Analytic System Goals Core Capabilities Implementation

V1 Analytic System Grew Organically • Built originally for user acquisition • Progressively grown to much more Idiosyncratic mix of languages, systems, tools • Log files -> Chukwa -> Hadoop -> Hive -> MySQL • PHP for reports and ETL • Single massive table with everything

V1 Analytic System Many Issues • Very slow to query • No data standardization or validation • Very difficult to add a new game, report, ETL • Extremely difficult to backfill on error or outage • Difficult for analysts to use; impossible for PMs, designers, etc. … but we survived (!)

Goals of Deep Thought Independent Scalability • Logically separate, independently scalable tiers Stability and Outage Recovery • Tiers can completely fail with no data loss • Every step idempotent and replayable Standardization • Standardized event types, fields, queries, reports

Goals of Deep Thought In-Stream Event Processing • Sessionalization, Dimensionalization, Cohorting Queryability • Structures are simple to reason about • Simple things are simple • Analysts, Data Scientists, PMs, Game Designers, etc. Extensibility • Easy to add new games, events, fields, reports

Core Capabilities • Sessionalization • Dimensionalization • Cohorting

Sessionalization All events are part of a “session” • Explicit start event, optional stop event • Game-defined semantics Event Batching • Events arrive in batch, associated with session • Pipeline computes batch-level metrics, disaggregates events • Can optionally attach batch-level metrics to each event

Sessionalization Time-Series Aggregations • Configurable metrics • 1-day X, 7-day X, lifetime X • Total attacks, total time played • Accumulated in-stream • V1 aggregate + batch delta • Faster to calculate in-stream vs. Map-Reduce

Dimensionalization Pipeline assigns unique numeric id to string enums • E.g., “twigs” resource  id 1234 Automatic mapping and assignment • Games log strings • Pipeline generates and maps ids • No configuration necessary Fast dimensional queries • Join on integers, not strings

Dimensionalization Metadata enumeration and manipulation • Easily enumerate all values for a field • Merge multiple values • “TWIGS” == “Twigs” == “twigs” Metadata tagging • Can assign arbitrary tags to metadata • E.g., “Panzer 05” is {tank, mechanized infantry, event prize} • Enables custom views

Cohorting Group players along any dimension / metric • Well beyond classic age-based cohorts Core analytical building block • Experiment groups • User acquisition campaign tracking • Prospective modeling • Retrospective analysis

Cohorting Set-based • Overlapping groups: >100, >200, etc. • Exclusive groups: (100-200), (200-500), etc. Time-based • E.g., people who played in last 3 days • E.g., “whale” == ($$ > X) in last N days • Autoexpire from a group without explicit intervention

Implementation of Pipeline • Ingestion Logging ¡Service ¡ • Event Log Ka.a ¡ • Transformation Importer ¡/ ¡Session ¡Store ¡ • Data Storage Hadoop ¡2 ¡ • Analysis and Visualization Hive ¡/ ¡Redshi: ¡

Ingestion: Logging Service Logging ¡Service ¡ HTTP / JSON Endpoint Play framework • Ka.a ¡ Non-blocking, event-driven • Importer ¡/ ¡Session ¡Store ¡ Responsibilities Message integrity via checksums • Hadoop ¡2 ¡ Durability via local disk persistence • Async batch writes to Kafka topics • Hive ¡/ ¡Redshi: ¡ {valid, invalid, unauth} •

Event Log: Kafka Logging ¡Service ¡ Persistent, replayable pipe of events Events stored for 7 days • Ka.a ¡ Responsibilities Importer ¡/ ¡Session ¡Store ¡ Durability via replication and local • disk streaming Replayability via commit log • Hadoop ¡2 ¡ Scalability via partitioned brokers • Segment data for different types of • Hive ¡/ ¡Redshi: ¡ processing

Transformation: Importer Logging ¡Service ¡ Consume Kafka topics, rebroadcast E.g., consume batches, rebroadcast • Ka.a ¡ events Importer ¡/ ¡Session ¡Store ¡ Responsibilities Batch validation against JSON schema • Hadoop ¡2 ¡ Syntactic validation • Semantic validation (is this event • possible?) Hive ¡/ ¡Redshi: ¡ Batches -> events •

Transformation: Importer Logging ¡Service ¡ Responsibilities (cont.) • Sessionalization Ka.a ¡ Assign event to session • Calculate time-series aggregates • • Dimensionalization Importer ¡/ ¡Session ¡Store ¡ String enum -> numeric id • Merge / coalesce different string • Hadoop ¡2 ¡ representations into single id • Player metadata Join player metadata from session Hive ¡/ ¡Redshi: ¡ • store

Transformation: Importer Logging ¡Service ¡ Responsibilities (cont.) • Cohorting Ka.a ¡ Process enter-cohort, exit-cohort • events Process A / B testing events Importer ¡/ ¡Session ¡Store ¡ • Evaluate cohort rules (e.g., spend • thresholds) Hadoop ¡2 ¡ Decorate events with cohort tags • Hive ¡/ ¡Redshi: ¡

Transformation: Session Store Logging ¡Service ¡ Key-value store (Couchbase) Fast, constant-time access to sessions, • players Ka.a ¡ Responsibilities Importer ¡/ ¡Session ¡Store ¡ Store Sessions, Players, Dimensions, • Config Lookup • Hadoop ¡2 ¡ Idempotent update • Store accumulated session-level metrics • Hive ¡/ ¡Redshi: ¡ Store player history •

Storage: Hadoop 2 Logging ¡Service ¡ Camus MR • Kafka -> HDFS every 3 minutes Ka.a ¡ append_events table Importer ¡/ ¡Session ¡Store ¡ • Append-only log of events • Each event has session-version for Hadoop ¡2 ¡ deduplication Hive ¡/ ¡Redshi: ¡

Storage: Hadoop 2 Logging ¡Service ¡ append_events -> base_events MR • Logical update of base_events Ka.a ¡ Update events with new metadata • Swap old partition for new partition • Importer ¡/ ¡Session ¡Store ¡ • Replayable from beginning without duplication Hadoop ¡2 ¡ Hive ¡/ ¡Redshi: ¡

Storage: Hadoop 2 Logging ¡Service ¡ base_events table • Denormalized table of all events Ka.a ¡ • Stores original JSON + decoration • Custom Serdes to query / extract Importer ¡/ ¡Session ¡Store ¡ JSON fields without materializing entire rows Hadoop ¡2 ¡ • Standardized event types  lots of functionality for free Hive ¡/ ¡Redshi: ¡

Analysis and Visualization Logging ¡Service ¡ Hive Warehouse • Normalized event-specific, game- Ka.a ¡ specific stores • Aggregate metric data for reporting, Importer ¡/ ¡Session ¡Store ¡ analysis • Maintained through custom ETL Hadoop ¡2 ¡ MR • Hive queries • Hive ¡/ ¡Redshi: ¡

Analysis and Visualization Logging ¡Service ¡ Amazon Redshift • Fast ad-hoc querying Ka.a ¡ Tableau Importer ¡/ ¡Session ¡Store ¡ • Simple, powerful reporting Hadoop ¡2 ¡ Hive ¡/ ¡Redshi: ¡

Come Join Us! KIXEYE is hiring in Deep Thought Team: SF, Seattle, Victoria, Mark Weaver • Josh McDonald • Brisbane, Amsterdam Ben Speakmon • Snehal Nagmote • Mark Roberts • Kevin Lee • Woo Chan Kim • rshoup@kixeye.com Tay Carpenter • Tim Ellis • @randyshoup Kazue Watanabe • Erica Chan • Jessica Cox • Casey DeWitt • Steve Morin • Lih Chen • Neha Kumari •

Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup - PowerPoint PPT Presentation

The Game of Big Data Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014 Free-to-Play Real-time Strategy Games Web and mobile Strategy and tactics Really real-time J

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Seran Chen|Sr. Dir. Consumer Insights|KIXEYE What is Consumer Insights? Dev team

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Google Analytics Overview Whats Google Analytics? The Google Analytics

Google Analytics A beginners guide What is Google Analytics? Google Analytics is not magic.

Introduction to Talent Analytics and Interim View 01 Overview Erich OSaben Talent Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Netflix: Netflix: Petabyte Scale Petabyte Scale Analytics Infrastructure in Analytics

BLUEcloud Analytics After much anticipation we present to you BLUEcloud Analytics What is

THINGWORX ANALYTICS Name Title KEY TAKEAWAYS IoT Analytics Analytics is a journey that

. Live Your Vision Edge Analytics Appliance Sonys First AI-Based Video Analytics Solution

Advancing Analytics: Putting Risk Analytics to Work For Your Business Sponsored By: Advancing

Paying new hires fairly Ben Teusch HR Analytics Consultant DataCamp Human Resources Analytics

SYBASE IQ ANALYTICS SERVER Sybase Inc March, 2010 SYBASE IQ ANALYTICS SERVER The New

Introduction to Process Analytics Gert Janssenswillen Creator of bupaR DataCamp Business

Unidimensional and Multidimensional IRT Modeling with the mirt Package Phil Chalmers York

Kerr Mines TSX: KER I OTC: KERMF July 2018 FORWARD-LOOKING STATEMENTS This presentation

FROM KUNENE CONSERVANCIES & COMMUNITY CBOs are both conservancies and community forests

Great Panther Silver Friendly Acquisition of Beadell Resources SEPTEMBER 2018 W W W . G R E A T

1 Growth in data standards for government and industry Data standards in industry (2005-2016):

Web Data Engin ineering: A Technical Perspective on Web Archives Dr. Helge Holzmann Web Data

Greg Gubitz | CEO Gilbert Godin | President & COO Post Q3 2019 TSX - HLS Forward-looking

Effect of the PCSK9 Inhibitor Evolocumab on Cardiovascular Outcomes MS Sabatine, RP Giugliano,

Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup - PowerPoint PPT Presentation

The Game of Big Data Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014 Free-to-Play Real-time Strategy Games Web and mobile Strategy and tactics Really real-time J

Analytics and Data Summit 2020 Analytics and Data Summit 2020 Analytics and Data Summit 2020

Seran Chen|Sr. Dir. Consumer Insights|KIXEYE What is Consumer Insights? Dev team

Undergraduate Business Analytics Minor Spreadsheet Analytics BANA-2081 Business Analytics

Architecture 3.0 Landscape Analytics Jrgen Dllner Hasso-Plattner-Institut Jrgen

Google Analytics Overview Whats Google Analytics? The Google Analytics

Google Analytics A beginners guide What is Google Analytics? Google Analytics is not magic.

Introduction to Talent Analytics and Interim View 01 Overview Erich OSaben Talent Analytics

Document Name Solar Analytics - Rooftop PV energy analytics PREPARED BY: Your Name, Your Title

Netflix: Netflix: Petabyte Scale Petabyte Scale Analytics Infrastructure in Analytics

BLUEcloud Analytics After much anticipation we present to you BLUEcloud Analytics What is

THINGWORX ANALYTICS Name Title KEY TAKEAWAYS IoT Analytics Analytics is a journey that

. Live Your Vision Edge Analytics Appliance Sonys First AI-Based Video Analytics Solution

Advancing Analytics: Putting Risk Analytics to Work For Your Business Sponsored By: Advancing

Paying new hires fairly Ben Teusch HR Analytics Consultant DataCamp Human Resources Analytics

SYBASE IQ ANALYTICS SERVER Sybase Inc March, 2010 SYBASE IQ ANALYTICS SERVER The New

Introduction to Process Analytics Gert Janssenswillen Creator of bupaR DataCamp Business

Unidimensional and Multidimensional IRT Modeling with the mirt Package Phil Chalmers York

Kerr Mines TSX: KER I OTC: KERMF July 2018 FORWARD-LOOKING STATEMENTS This presentation

FROM KUNENE CONSERVANCIES &amp; COMMUNITY CBOs are both conservancies and community forests

Great Panther Silver Friendly Acquisition of Beadell Resources SEPTEMBER 2018 W W W . G R E A T

1 Growth in data standards for government and industry Data standards in industry (2005-2016):

Web Data Engin ineering: A Technical Perspective on Web Archives Dr. Helge Holzmann Web Data

Greg Gubitz | CEO Gilbert Godin | President &amp; COO Post Q3 2019 TSX - HLS Forward-looking

Effect of the PCSK9 Inhibitor Evolocumab on Cardiovascular Outcomes MS Sabatine, RP Giugliano,

FROM KUNENE CONSERVANCIES & COMMUNITY CBOs are both conservancies and community forests

Greg Gubitz | CEO Gilbert Godin | President & COO Post Q3 2019 TSX - HLS Forward-looking