Technische Universität Dresden
HASHI: An Application-Specific Instruction Set Extension for Hashing - - PowerPoint PPT Presentation
HASHI: An Application-Specific Instruction Set Extension for Hashing - - PowerPoint PPT Presentation
Technische Universitt Dresden HASHI: An Application-Specific Instruction Set Extension for Hashing Oliver Arnold, Sebastian Haas, Gerhard Fettweis, Benjamin Schlegel, Thomas Kissinger, Tomas Karnagel, Wolfgang Lehner Technische Universitt
2
Motivation
TU Dresden
Today’s Database Systems
Fat Cores (area & power) Few HW adaptions CMOS Scaling
Database Processors
Processors build from scratch Long development cycle High development costs
Our Approach
HW/SW codesign Customizable processor Hashing-specific ISA extensions Tool flow short HW development cycles
3
Application Scenario 1: Integer Hash Function Bit Extraction
Selection of specific bits in a 32-bit key via arbitrary hash mask
TU Dresden
<32 Bit Key> Bit Selection (32 Bit ->n Bit) Shuffle Network Result (n Bit)
Histogram
<32 Bit Key> Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 31 … <32 Bit Key> <32 Bit Key> … <32 Bit Key> Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 31
Sampling Bit Extraction
Sampling
Scanning a subset of the data set to choose the most efficient hash mask
4
Application Scenario 2 CityHash32
- Non-cryptographic hash
function for strings
- Returns 32-bit hash value
TU Dresden
Hash Table Operators (Insert, Lookup)
- Operate on 32-bit keys
- Apply integer hash function
unsigned int CityHash32(char *s, int len){ int hash = comp_1(s+len-20); int i = (len-1)/20; do { hash = comp_2(s, hash); s += 20; } while(--i != 0); return comp_3(hash); }
5
Customizable Processor Model
Basic Core: Tensilica LX5
TU Dresden
Processor
- Inst. Fetch
L/S Unit 0 Instruction Set L/S Unit 1 Local Memory
Inst.
Local Memory
Data0
Local Memory
Data1 Basic RISC ISA Hash-Specific ISA Basic Registers Hash-Specific Registers Hash-Specific States
Interconnection Data Prefetcher
6
Integer Hash Function: C code
unsigned int hash, shVal, shVal_neg; unsigned int mask = 0xFFFFFFFF; for(i=0; i<keySize; i++){ //load key, bit selection hash = key[i] & hashFunc; //extract bits for(j=30; j>=0; j--){ if(!(hashFunc & (0x1<<j))){ //partial shift right shVal = hash & (mask<<j); shVal_neg = hash & ~(mask<<j); hash = (shVal>>1) | shVal_neg; } } //store hash value hashValue[i] = hash; }
TU Dresden
Pure C code
- //init pointer, variables
init_states(key, hashValue, hashFunc); LD_0(); LD_1(); //load keys, extract bits, store hash values for(i=0; i<(keySize/16); i++){ LD_0(); LD_1(); HOP(); LD_0(); LD_1(); HOP(); ST_0(); ST_1(); } HOP(); ST_0(); ST_1();
Integer Hash Function: C code
unsigned int hash, shVal, shVal_neg; unsigned int mask = 0xFFFFFFFF; for(i=0; i<keySize; i++){ //load key, bit selection hash = key[i] & hashFunc; //extract bits for(j=30; j>=0; j--){ if(!(hashFunc & (0x1<<j))){ //partial shift right shVal = hash & (mask<<j); shVal_neg = hash & ~(mask<<j); hash = (shVal>>1) | shVal_neg; } } //store hash value hashValue[i] = hash; }
TU Dresden
Pure C code C code with new instructions
- 1 cycle
1 cycle 1 cycle
//init pointer, variables init_states(key, hashValue, hashFunc); LD_0(); LD_1(); //load keys, extract bits, store hash values for(i=0; i<(keySize/16); i++){ LD_0(); LD_1(); HOP(); LD_0(); LD_1(); HOP(); ST_0(); ST_1(); } HOP(); ST_0(); ST_1();
Integer Hash Function: C code
unsigned int hash, shVal, shVal_neg; unsigned int mask = 0xFFFFFFFF; for(i=0; i<keySize; i++){ //load key, bit selection hash = key[i] & hashFunc; //extract bits for(j=30; j>=0; j--){ if(!(hashFunc & (0x1<<j))){ //partial shift right shVal = hash & (mask<<j); shVal_neg = hash & ~(mask<<j); hash = (shVal>>1) | shVal_neg; } } //store hash value hashValue[i] = hash; }
TU Dresden
Pure C code C code with new instructions
- Integer Hash Function: ISA Extensions
TU Dresden
Dataflow Dataflow
Load Execution Load
ST HOP LD_0 LD_1
Key_0 Key_1 Key_2 Key_3 Key_4 Key_5 Key_6 Key_7
Result_0 Result_1 Result_3 Result_5 HASH Op.
Load-Store Unit 1 Load-Store Unit 0 Local Data Memory 1 Local Data Memory 0
Hash Func HASH Op. HASH Op. HASH Op. HASH Op. HASH Op. HASH Op. HASH Op. Result_2 Result_4 Result_6 Result_7
- Integer Hash Function: Pipeline Snippet
TU Dresden
ST_0
Cycle n Cycle (n+1) Cycle (n+2) Cycle (n+5) Cycle (n+6)
ST_1
Cycle (n+3) Cycle (n+4)
… …
LD_0 LD_0 LD_1 LD_1 LD_0 LD_1 LD_0 LD_1
ST_0 ST_1
LD_0 LD_1 LD_0 LD_1
… ST_0 ST_1
Cycle (n+7) Cycle (n+8)
HOP HOP HOP HOP HOP HOP
Latency: 6 cycles
- Integer Hash Function: Throughput
TU Dresden
Final processor
Throughput nkey: number of keys t: time to perform the operation =
- +1 Load-Store unit (2x)
+ Extended ISA (500x) Data bus: 32->128 bit (2x)
1
Results: Throughput
TU Dresden
Final processor
Speedup: HASHI vs. 108Mini 386x 354x 2303x 1288x 125x
1
Results: Timing and Area
TU Dresden
Relative Area Consumption (HASHI) Final processor
1
Results: Comparison
TU Dresden
3x/7x lower 57x/176x lower 113x/271x lower Measures: HASHI vs. INTEL
1
Conclusion Hardware/Software Codesign approach Results
- High database throughput
- Highly reduced area and power consumption
- 170x less energy consumption than a high-end
x86 processor (@ same performance)
Silicon Prototype
- Tape-out April 2014
- 28 nm LP process: Globalfoundries
- ISA: Hash Functions, Hash Table Operators etc.
TU Dresden
1 Nöthen et al., A 105GOPS 36mm2 Heterogeneous SDR MPSoC with Energy-Aware Dynamic Scheduling and Iterative Detection-Decoding for 4G in 65nm CMOS, ISSCC. 2014 [1]