Tracking Data Lineage at Stitch Fix
Neelesh Srinivas Salian
Strata Data Conference - New York September 12, 2018
Tracking Data Lineage at Stitch Fix Neelesh Srinivas Salian Strata - - PowerPoint PPT Presentation
Tracking Data Lineage at Stitch Fix Neelesh Srinivas Salian Strata Data Conference - New York September 12, 2018 Stitch Fix Personalized styling service serving Men, Women, and Kids Founded in 2011, Led by CEO & Founder, Katrina Lake
Strata Data Conference - New York September 12, 2018
Personalized styling service serving Men, Women, and Kids Founded in 2011, Led by CEO & Founder, Katrina Lake Employ more than 5,800 nationwide (USA) Algorithms + Humans
8
Resource
ID - Unique identifier
Job
Event
○ Upstream and Downstream to a Resource
○ Schema change ○ Data type modification
○ Journey of a resource
○ Historical information
Owner (User/ Team) Job Parent Job Read Resource / Write Resource
○ If any, there needs to be better communication
Ingestion pipeline
○ Behavior ○ Function
pipeline
Hive table
information
information is needed
Resource Attributes
Service Data Attributes
Hive Tables
○ Parent- Child relationship ○ Augmenting various clients
writes as the happen
default FileFormat
pass parentage information
Lineage information
○ Aggregated Metric Extraction ○ Resource Relationships
ETL Postgres DB
○ Showing Upstream and Downstream dependencies
○ Metrics from the Warehouse
○ In-flux changes to Resources