supporting incremental re computation with whole system
play

Supporting Incremental Re-Computation with Whole System Provenance: - PowerPoint PPT Presentation

Supporting Incremental Re-Computation with Whole System Provenance: Issues and Approaches Ashish Gehani, SRI Incremental Re-computation File 2 Read Supported via memoization open() close() Database query engines File 1 Read close()


  1. Supporting Incremental Re-Computation with Whole System Provenance: Issues and Approaches Ashish Gehani, SRI

  2. Incremental Re-computation File 2 Read • Supported via memoization open() close() – Database query engines File 1 Read close() open() – Workflow planners Process execution Time – Software build systems close() open() • Optimized with provenance File 3 Write – Identify affected subgraph • Forward slice from new inputs Output – Identify dependencies • Backward slice from affectees Operation • Whole system provenance Input 1 Input n – Applies to range of applications Ashish Gehani, Ulf Lindqvist, Bonsai: – Introduces variety of challenges Balanced Lineage Authentication , ACSAC , 2007

  3. Whole System Provenance Manual Curation ! • Multiple possible approaches PubMed Articles Enzyme Annotations Taxonomy Elements Ortholog Editor Compound Structure – Dynamic binary instrumentation (e.g. Pin) Database – Compiler-based transformation (e.g. LLVM) Application ! function initialize() int recordSize – Library call interposition (e.g. LD_PRELOAD ) var inputDatabase function processData() var errorMsg – Kernel hooks (e.g. LSM) function errorHandler() function writeBack() function terminate() var outputDatabase • Global view of monitored system Workflow Manager ! method beginWorkflow() file inputData • Provenance inferred is sound method feedback() method verifyState() file errorLog • … but often incomplete method processState() method updateLog() file workflowStatus Operating System – may suffice for ! System Startup Event Dispatcher System Log • staging read() write() send() recv() getKey() writeKey() File System Network Adapter System Registry • diagnostics Distributed System ! Grid Registry Authenticate • profiling Resource Manager Event Handler • authorization Network File System Cloud Processor – challenge for reproducibility • partial coverage Ashish Gehani, Dawood Tariq, SPADE: Support for Provenance Auditing in Distributed • semantic gap Environments , ACM Middleware , 2012

  4. Issue: Ephemeral Intermediates • Example: – Software builds Pidname:)gcc Pidname:)gcc Pidname:)gcc Pid:)2169 Pid:)2160 Pid:)2163 link objects into Ppid:)2159 Ppid:)2159 Ppid:)2159 final binary Pidname:)collect2 Pidname:)cc1 Pidname:)as Pidname:)cc1 Pidname:)as – Objects files are Pid:)2170 Pid:)2161 Pid:)2162 Pid:)2164 Pid:)2165 Ppid:)2169 Ppid:)2160 Ppid:)2160 Ppid:)2163 Ppid:)2163 then removed – Memoization Pidname:)ld benefit is lost Pid:)2171 Filename:)protocol.c Filename:)ccC1p2KN.s Filename:)network.c Filename:)ccfELrl1.s Ppid:)2170 – Provenance still useful, but Filename:)protocol.o Filename:)network.o Modified) Artifact intermediates must Subgraph)that)needs)to)be)recomputed be regenerated

  5. Approach: Maintain Data History TABLE I P ERFORMANCE ANALYSIS Apache Operations Improvement Complete re-execution 63564 Provenance-based re-execution 15701 75.3% Snapshotting filesystem + 13595 78.61% Provenance-based re-execution BLAST Complete re-execution 48602 Provenance-based re-execution 9811 79.8% Snapshotting filesystem + 8391 82.73% Provenance-based re-execution PostMark Complete re-execution 57344 Provenance-based re-execution 14305 75.05% Snapshotting filesystem + 10031 82.5% Provenance-based re-execution TABLE II S TORAGE OVERHEAD FOR PROVENANCE METADATA Apache BLAST PostMark Provenance-based re-execution 13 MB 8.7 MB 8.9 MB Hasnain Lakhani, Rashid Tahir, Azeem Aqil, Fareed Zaffar, Dawood Tariq, Ashish Gehani, Optimized Rollback and Re-computation , IEEE HCSS , 2013

  6. Issue: Dependency Conflation • Often arises when: – Instrumentation is at coarser level of abstraction – Causality manifests at finer granularity • Examples, using system calls: – Web server sends a different file to each client process:terminal process:bash pid:2043 pid:2045 – Individual element ppid:1 ppid:2043 filename:httpd of data archive utilized path:/var/httpd local_ip:192.168.1.3 size:14350 remote_ip:192.168.1.18 • Implicated dependency filename:file1.html process:bash local_ip:192.168.1.3 path:/var/htdocs/file1.html pid:5226 remote_ip:192.168.1.25 size:1205 ppid:2045 subgraph explodes local_ip:192.168.1.3 filename:file2.html remote_ip:192.168.1.7 path:/var/htdocs/file2.html size:8136 • Much re-computation filename:file3.html path:/var/htdocs/file3.html size:7160 is unnecessarily performed

  7. Approach: Execution Partitioning • Utilize finer-grained instrumentation • Example, using function calls: – Tracks web server’s input file ← output network flow dependency ID:serve_file.4-0 ID:cat.13-0 FunctionID:serve_file.4.2000 FunctionID:cat.13.2000 ArgType:i32 ArgType:i32 FunctionName:serve_file FunctionName:cat ArgName:client ArgName:client ThreadID:2000 ThreadID:2000 ArgVal:5 ArgVal:5 FunctionID:accept_request.2.2000 ID:serve_file.4-1 ID:cat.13-1 FunctionName:accept_request ArgType:i8* ArgType:%struct._IO_FILE* ThreadID:2000 ArgName:filename ArgName:resource ArgVal:0xbfa72afa ArgVal:0x9789578 ID:accept_request.2-0 ArgType:i32 ArgName:client ArgVal:5 ID:serve_file.8-0 ID:cat.18-0 FunctionID:accept_request.12.2000 FunctionID:serve_file.8.2000 FunctionID:cat.18.2000 ArgType:i32 ArgType:i32 FunctionName:accept_request FunctionName:serve_file FunctionName:cat FunctionID:main.0.2000 ArgName:client ArgName:client ThreadID:2000 ThreadID:2000 ThreadID:2000 FunctionName:main ArgVal:5 ArgVal:5 ID:accept_request.12-0 ThreadID:2000 ArgType:i32 ArgName:client ArgVal:5 ID:serve_file.8-1 ID:cat.18-1 ArgType:i8* ArgType:%struct._IO_FILE* ArgName:filename ArgName:resource ArgVal:0x243cda99 ArgVal:0x6754193 ID:accept_request.9-0 FunctionID:accept_request.9.2000 ArgType:i32 FunctionName:accept_request ArgName:client ThreadID:2000 ArgVal:5 ID:serve_file.7-0 ID:cat.16-0 FunctionID:serve_file.7.2000 FunctionID:cat.16.2000 ArgType:i32 ArgType:i32 FunctionName:serve_file FunctionName:cat ArgName:client ArgName:client ThreadID:2000 ThreadID:2000 ArgVal:5 ArgVal:5 ID:serve_file.7-1 ID:cat.16-1 ArgType:i8* ArgType:%struct._IO_FILE* ArgName:filename ArgName:resource ArgVal:0x9287c18d ArgVal:0xaa12997d Dawood Tariq, Maisem Ali, Ashish Gehani, Towards Automated Collection of Application-Level Data Provenance , TaPP , 2012

  8. Issue: Changing Runtime Environment • Application context is complex • Code dependencies – Linked libraries – System services – Utility programs ������� ����������� ���������������� ����������� ���������������� ������������� ������������������������ ������ ���������� ����������� ����������� ������ ������ ��������� ������������ ����������� ������������������ �������� ������� ��������� ������ �������� ������ ����������� ������� ����������� ���������� ������� ������ ���������� ������������ �������������� ����������� ������������ ������������� ���������� ������� ������� �������� ���������� • Environmental dependencies – Shell variables – Shared memory contents • Changes in any can affect output

  9. Approach: Code and Context Closures • Partition code into composable units (e.g. Docker layers) – Dependencies minimized, reducing need for re-computation • Virtualize application execution (e.g. Linux containers) – Replicated runtime environment Docker builds the package into a container according to the Docker fj le Wholly! generates a Docker fj le Execute build invocations ./configure && 4 Generated build products make && make install sqlite-3.18 sqlite-3.18 Download package source code 3 Source code Docker fj le Wholly! recipe Copy Wholly! subpackages that are 2 Build dependencies required as dependencies Copy base building environment 1 Base build tools (including Wholly!-built Clang compiler) Loic Gelle, Hassen Saidi, Ashish Gehani, Wholly!: A Build System For The Modern Software Stack , Lecture Notes in Computer Science, Vol. , Springer , 2018.

  10. Issue: External Dependencies • Assume code / context closures • May still face challenges: System Call – User input System Call – Randomized choices System Call System Call System Call – Distributed computation System Call System Call System Call – Asynchronous interrupts System Call • Baseline strategy System Call Exception Handling constructs models System Call System Call Routine System Call – Significant engineering System Call System Call – Error-prone System Call – May still be incomplete System Call System Call

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend