Embedded Software Streaming Embedded Software Streaming via Block Stream via Block Stream
A Dissertation by
Pramote Kucharoen
Dissertation Advisor
Professor Vincent J. Mooney III
7 April 2004
Embedded Software Streaming Embedded Software Streaming via Block - - PowerPoint PPT Presentation
Embedded Software Streaming Embedded Software Streaming via Block Stream via Block Stream A Dissertation by Pramote Kucharoen Dissertation Advisor Professor Vincent J. Mooney III 7 April 2004 Outline Outline Introduction Related
A Dissertation by
Dissertation Advisor
7 April 2004
2
3
Package Package Download & Remote execution Download & Remote execution Software Streaming Software Streaming
4
5
6
7
8
9
10
11
12
application load time application suspension resources Java Software Caching Function/Module Streaming Direct Download
Block Streaming
Remote Execution
13
14
15
16
Send when requested Send the entire application Data transfer Low High
application suspension application load time, resources
17
Program File/Data File Program File/Data File
Stream-Enabling Info Stream-Enabling Info
Block Block
Stream-Enabling Info Stream-Enabling Info
Block Block
Stream-Enabling Info Stream-Enabling Info
Block Block Stream Units
1 2 …
18
19
Stream-enabled Application Stream-enabled Application Softstream Assembly Softstream Assembly Softstream Protocol Softstream Protocol TCP TCP IP IP Subnetwork Subnetwork OSI Layers:
Application Presentation Session Transport Network Link Physical
20
6 3 7 10 4 10 10
i=0 i=1 j=0 j=1
21
22
23
24
25
26
Divide Binary Image into Blocks Divide Binary Image into Blocks Generate Stream Units Generate Stream Units Create a Transmission Profile Create a Transmission Profile Request Stream Unit Request Stream Unit Load Stream Block Load Stream Block Run the Application Run the Application Link Stream Block Link Stream Block Read Transmission Profile Read Transmission Profile Send Stream Unit Send Stream Unit Accept Request Accept Request Encounter Off- Block Branch
Server Client
Not in Memory In Memory Program Entry Point Receive Block Off-block
27
Source Code
#include <stdio.h> int main() int i; #include <stdio.h> int main() int i; 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1111010011101001011110 1100010010100011101001 0010010001100010111101 0011101001011110110001 0010100011101001001001 0001100010111101001110 1001011110110001001010 0011101001001001000110 0010111101001110100101 1110110001001010001110 1001001001000110001011 1101001110100101111011 0001001010001110100100 1001000110001011110100 1110100101111011000100 1010001110100100100100 0110001011110100111010 0101111011000100100010 0110100100100100010010 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1111010011101001011110 1100010010100011101001 0010010001100010111101 0011101001011110110001 0010100011101001001001 0001100010111101001110 1001011110110001001010 0011101001001001000110 0010111101001110100101 1110110001001010001110 1001001001000110001011 1101001110100101111011 0001001010001110100100 1001000110001011110100 1110100101111011000100 1010001110100100100100 0110001011110100111010 0101111011000100100010 0110100100100100010010
Binary Image Stream-Enabled Application
Stream-Enabling Info Stream-Enabling Info 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1001011110110001001011 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1001011110110001001011 Stream-Enabling Info Stream-Enabling Info 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1001011110110001001011 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1001011110110001001011 Stream-Enabling Info Stream-Enabling Info 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1001011110110001001011 1000111010010010010001 1000101111010011101001 0111101100010010100011 1010010010010001100010 1001011110110001001011
Stream Unit Stream Unit Stream Unit
1 2 GCC
28
comp: stwu r1,-31(r1) lwz r0,8(r1) cmpwi r0,1 bne .L3 li r0,0 stw r0,8(r31) b .L4 .L3: sc li r0,1 stw r0,8(r31) comp: stwu r1,-31(r1) lwz r0,8(r1) cmpwi r0,1 bne .L3 li r0,0 stw r0,8(r31) b .L4 .L3: sc li r0,1 stw r0,8(r31) li r3,0 .L4: … blr … li r3,0 .L4: … blr … … bl comp … … bl comp …
Off-block branch: Branch instruction that may cause the CPU to execution an instruction in a different block
29
30
31
if (i==1) i=0; else i=1;
cmpwi r0,1 bne .L3 li r0,0 stw r0,8(r31) b .L4 cmpwi r0,1 bne .L3 li r0,0 stw r0,8(r31) b .L4 .L3: li r0,1 stw r0,8(r31) .L4: … .L3: li r0,1 stw r0,8(r31) .L4: …
32
cmpwi r0,1 bne load2_1 li r0,0 stw r0,8(r31) b load2_2 cmpwi r0,1 bne load2_1 li r0,0 stw r0,8(r31) b load2_2 load2_1: … load2_2: … load2_1: … load2_2: … bne .L3 .L3: li r0,1 stw r0,8(r31) .L4: … .L3: li r0,1 stw r0,8(r31) .L4: … load3_0: … load3_0: … b load3_0
33
6 6 5 5 4 4 3 3 2 2 1 1 7 7
Program Entry Point Program Exit Point 7 6 10 1 9 3 5 3 2 3 8 2 9 1 10 1
34
Divide Binary Image into Blocks Divide Binary Image into Blocks Generate Stream Units Generate Stream Units Create a Transmission Profile Create a Transmission Profile Request Stream Unit Request Stream Unit Load Stream Block Load Stream Block Run the Application Run the Application Link Stream Block Link Stream Block Read Transmission Profile Read Transmission Profile Send Stream Unit Send Stream Unit Accept Request Accept Request Encounter Off- Block Branch
Server Client
Not in Memory In Memory Program Entry Point Receive Block Off-block
35
36
37
38
0x00010000 0x00010000 0xFFFFFFFF 0xFFFFFFFF 0xFFFFFFFF 0xFFFFFFFF 0x00010800 0x00010800 … … 0xFFFFFFFF 0xFFFFFFFF … 0x00010000: … 0x00010004: … … 0x00010400: … 0x00010404: … … 0x00010800: … 0x00010804: … … … 0x00010000: … 0x00010004: … … 0x00010400: … 0x00010404: … … 0x00010800: … 0x00010804: … … Address ID N-1 3 1 2 0x00010400
39
40
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
0.5 0.5 1 0.5 0.5 0.5 1 1 1
1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9
1 MB
41
Divide File into Blocks Divide File into Blocks Generate Stream Units Generate Stream Units Create a Transmission Profile Create a Transmission Profile sio_write() sio_write()
Server Client
sio_open() sio_open() sio_close() sio_close() sio_read() sio_read() Read Transmission Profile Read Transmission Profile Send Stream Unit Send Stream Unit Accept Request Accept Request Request Stream Unit Request Stream Unit Load Stream Block Load Stream Block Not in Memory Receive Block sio_lseek() sio_lseek() Not in Memory
42
43
44
fn1: stwu 1,-31(1) stw 3,8(1) lwz 0,8(1) … blr fn1: stwu 1,-31(1) stw 3,8(1) lwz 0,8(1) … blr fn2: stwu 1,-31(1) stw 3,8(1) li 0,1 … blr fn3: … fn1: stwu 1,-31(1) stw 3,8(1) lwz 0,8(1) … blr fn1: stwu 1,-31(1) stw 3,8(1) lwz 0,8(1) … blr fn2: stwu 1,-31(1) stw 3,8(1) li 0,1 … blr fn3: …
45
46
int fn1(…) { … x = fn5(a,b); … } int fn2(…) { … } int fn3(…) { … } int fn1(…) { … x = fn5(a,b); … } int fn2(…) { … } int fn3(…) { … } int fn4(…) { … } int fn5(…) { … y = fn7(z); … } int fn6(…) { … } int fn4(…) { … } int fn5(…) { … y = fn7(z); … } int fn6(…) { … } int fn7(…) { … } int fn8(…) { … } int fn9(…) { … } int fn7(…) { … } int fn8(…) { … } int fn9(…) { … }
47
48
cmpwi r0,1 bne .L3 li r0,0 stw r0,8(r31) b load2_2 cmpwi r0,1 bne .L3 li r0,0 stw r0,8(r31) b load2_2 load2_1: … load2_2: … load2_1: … load2_2: … bne load2_1 .L3: li r0,1 stw r0,8(r31) .L4: … .L3: li r0,1 stw r0,8(r31) .L4: …
49
50
Execution profile: 6 1 2 3 1 4 1 5 3 4 1 4 3 2 3 1 2 6 1 2 Transmission profile: 6 1 2 3 4 5 1 2 6
6 6 6 6 1 1 6 6 1 1 2 2 3 3 1 1 2 2 3 3 1 1 4 4 3 3 5 5 4 4 3 3 1 1 4 4 3 3 1 1 2 2 6 6 1 1 2 2
Client memory:
9 occurrences of application suspension for demand loading potentially 6 occurrences with block streaming
51
52
53
VCS VCS Seamless CVE Seamless CVE XRAY XRAY
Memory Memory MPC750 MPC750 MPC750 MPC750
Address/Data Bus Main processor I/O processor MPC750: 400 MHz Bus: 83 MHZ Memory: 16 MB
54
Network Cloud
MBX860:
PC: Linux Traffic Shaper 10Mbps
55
≈ ≈ ≈ ≈ 1400 softstream client ≈ ≈ ≈ ≈ 2200 softstream generator ≈ ≈ ≈ ≈ 1500 stream-enabled file I/O ≈ ≈ ≈ ≈ 1300 softstream loader/linker ≈ ≈ ≈ ≈ 3400 softstream server C lines Implementation
Server: softstream server softstream generator Client: softstream client softstream loader/linker stream-enabled file I/O
56
20 bytes Memory 4 bytes Bandwidth Overhead per off-block branch Type of overhead
57
58
59
0.03 7.0313% 20480 512 0.06 3.5156% 10240 1K 0.64 0.3516% 1024 10K 6.40 0.0352% 103 100K 32.77 0.0069% 20 0.5M 65.54 0.0034% 10 1M 131.07 0.0017% 5 2M 327.68 0.0007% 2 5M 655.36 0.0003% 1 10M Load time (s) Added code/block Total # of blocks Block size (bytes)
60
0.03 0.06 0.69 7.25 34.06 66.28 132.49 331.27 662.52
100 200 300 400 500 600 700 512 1K 10K 100K 0.5M 1M 2M 5M 10M
Block Size (bytes)
Application Load Time (s)
61
62
63
64
61 .33 28.46 73.76 1 .1 2 80.40 52.1 8 1 1 0.48 5.54 65.01 61 .36 80.31 62.47
20 40 60 80 100 120 Seq Rand 1K Stat BSearch Time (s) SIO NFS DD
Up to 55X faster
65
66
Time to acquire a certain amount of data
10 20 30 40 50 60 70 80 90 200 400 600 800 1000 Data (Kbytes) Time (s) SIO NFS DD
67
68
The amount of time it takes to process a 1 MB file
40 80 120 160 200 20 40 60 80 100 120 Data Utilization Rate (KB/s) Time (s) SIO NFS DD
69
70
22.85 72.60 32.38 97.78
20 40 60 80 100 SIO+SPF SPF NFS DD User Perceived Application Load Time (s)
71
72
using software streaming,” to be published in Proceedings of the Mobility Conference & Exhibition, Aug. 2004.
Priority Inheritance,” in Proceedings of the IEEE Real-Time Systems Symposium, pp.246-254, Dec. 2003.
streaming,” in the book Embedded Software for SoC, edited by Jerraya, A., Yoo, S., Verkest, D. and Wehn, N., Boston, MA: Kluwer Academic Publishers,
streaming,” in Proceedings of the Design Automation and Test in Europe, pp. 912-917, Mar. 2003.
for real-time systems,” in Proceedings of the International Conference on Engineering of Recongurable Systems and Algorithms, pp. 96-101, June 2003.
extensibility, and exibility in real-time operating systems,” in Proceedings of the EUROMICRO Symposium on Digital Systems Design, pp. 400-405, Sep. 2001.
embedded systems,” in Proceedings of the 27th EUROMICRO Conference, pp. 264-269, Sep. 2001.
73
systems for transmitting application software,” U.S. Patent Application 20040006637, Jan. 2004.
“Dynamic
system,” U.S. Patent Application 20030074487, Apr. 2003.
“Debugger operating system for embedded systems,” U.S. Patent Application 20030074650, Apr. 2003.
74
75