Migrating from Oracle to Espresso
David Max
Senior Software Engineer LinkedIn
Migrating from Oracle to Espresso David Max Senior Software - - PowerPoint PPT Presentation
Migrating from Oracle to Espresso David Max Senior Software Engineer LinkedIn About LinkedIn New York Engineering Located in Empire State Building Approximately 100 engineers and 1000 employees total New York Multiple teams, front
David Max
Senior Software Engineer LinkedIn
1000 employees total
end, and data science
since 2015
Thursday 11:30-12:00
David Max
Senior Software Engineer LinkedIn
www.linkedin.com/in/davidpmax/
Content Ingestion Babylonia
Content Ingestion
Babylonia
Content Ingestion
Babylonia
Content Ingestion
Babylonia
url: https://www.youtube.com/watch?v=MS3c9hz0bRg title: "SATURN 2017 Keynote: Software is Details” image: https://i.ytimg.com/vi/MS3c9hz0bRg/hqdefault.jpg?sq poaymwEYCKgBEF5IVfKriqkDCwgBFQAAiEIYAXAB\\u00 26rs=AOn4CLClwjQlBmMeoRCePtHaThN-qXRHqg
Content Ingestion
Babylonia
Content Ingestion
Babylonia
public 1st party content
decorating, and embedding content
understanding and relevance models
Database HDFS ETL
Content Ingestion
Babylonia
Data Change Events
Database HDFS ETL
Near Line Offline
Data Change Events Content Ingestion
Babylonia
each URL stored in individual rows
queries on Oracle DB
dataset in Oracle via Babylonia’s Rest.li API
Management System
data change events to near line consumers
consumers
* as of August 1, 2017
documents of the same schema (defined in Avro)
fields, which are defined in the table schema
be distributed over a cluster of machines
horizontally by adding more nodes
integrated with other tools and systems at LinkedIn
Rest.li, which makes it easier to treat Espresso endpoints like other LinkedIn Rest.li endpoints
zero downtime and no coordination with DBA teams
tables required periodic jobs to be run that involved downtime for each server
platform at LinkedIn for data of this type
Oracle Database HDFS ETL
Near Line Offline
Oracle Databus Events Content Ingestion
Babylonia
Rest.li Endpoints
Oracle Row Pegasus Object Pegasus Data Oracle Row Oracle Row Oracle Row
between Oracle format and Pegasus format
Java Objects Java Objects
generate Java objects with very similar interfaces
used to auto-generate the Avro schema
definitions are very similar
Espresso Database HDFS ETL
Near Line Offline
Espresso Brooklin Events Content Ingestion
Babylonia
Rest.li Endpoints
Espresso Avro Pegasus Object Pegasus Data Espresso Avro Espresso Avro Espresso Avro
between Avro format and Pegasus format
coordinate with DBAs
downtime
lengths to avoid the hassle
automatically as part of the Babylonia deployment process
existing data does not need to be transformed
Rest.li Pegasus schema
Espresso
consumers time to migrate
Oracle Database HDFS ETL
Near Line Offline
Oracle Databus Events Content Ingestion
Babylonia
Oracle Database
Oracle Databus Events Rest.li Endpoints
Rest.li Calls
Oracle Database
Oracle Databus Events Rest.li Endpoints
Rest.li Calls
tightly-coupled to the database
be reimplemented for Espresso, and which code should be decoupled or eliminated.
paths to migrate
The easiest lines of code to migrate are the lines of code that don’t exist
Oracle Database HDFS ETL
Offline Convert Job
Espresso Database
Espresso Bulk Loader Avro Data File
Oracle Database HDFS ETL Espresso Database
Shadow Read Validation
Oracle Database
Oracle Databus Events
Espresso Database Databus Listener
Oracle Database
Oracle Databus Events
Espresso Database Databus Listener
Shadow Read Validation Direct Write
Oracle Databus Events
Espresso Database Databus Listener
Direct Write
and Babylonia updating same record
field added to scheme indicating which process wrote the record: Bulk Loader, Databus listener, or Babylonia
Oracle Database
Oracle Databus Events
Espresso Database
Direct Read/Write Dual Writes Espresso Brooklin Events Deprecated
Espresso Database
Direct Read/Write Espresso Brooklin Events