RVTensor: A light-weight neural network inference framework based - - PowerPoint PPT Presentation

▶

Mar 10, 2024 168 likes •339 views

Institute of Software,Chinese Academy of Sciences RVTensor: A light-weight neural network inference framework based on the RISC-V architecture Pengpeng Hou, Jiageng Yu, Yuxia Miao , Yang Tai, Yanjun Wu, Chen Zhao *Corresponding author: Jiageng

SLIDE 1

Institute of Software,Chinese Academy of Sciences

RVTensor: A light-weight neural network inference framework based

n the RISC-V architecture

Pengpeng Hou, Jiageng Yu, Yuxia Miao, Yang Tai, Yanjun Wu, Chen Zhao

*Corresponding author: Jiageng Yu jiageng08@iscas.ac.cn

SLIDE 2

Institute of Software,Chinese Academy of Sciences

Introduction

§

RISC-V ISA is developing rapidly

v Open source ISA

§

RISC-V is suitable for IoT scenes

v Basic instruction set + Extended instruction set v IoT scene is fragmented

Basic

extended1 extended4 extended3 extended2

SLIDE 3

Institute of Software,Chinese Academy of Sciences

Introduction

§

Popular inference framework

v For server：TensorFlow、 MXNet 、Caffe v For smart phone：TensorFlow Lite、NCNN、MNN

SLIDE 4

Institute of Software,Chinese Academy of Sciences

Introduction

§

Inference system for RISC-V +IoT is few

v Architectural limitations

F SIMD feature

v IoT hardware resource limitations

F chip performance is weak F memory capacity is samll

Security surveillance camera price

statistics

Price 90~150 150~775 775< User Rate 34% 37% 29%

SLIDE 5

Institute of Software,Chinese Academy of Sciences

Introduction

§

RVTensor：RISC-V Tensor

v A inference system for RISC-V + IoT scene v Dependent third-party libraries are rarely

F only libhd5.so

v Less hardware resource requirements v Based on SERVE.r platform

SLIDE 6

Institute of Software,Chinese Academy of Sciences

Overview of RVTensor architecture

§

RVTensor Platform Overview

v Four modules

F Model analysis F Op operators F Construction calculation graph F Execution calculation graph

SLIDE 7

Institute of Software,Chinese Academy of Sciences

Overview of RVTensor architecture

§

RVTensor Platform Overview

v Model analysis

F It mainly parses model files such as .pb, and extracts information

such as operator operations and weight data.

SLIDE 8

Institute of Software,Chinese Academy of Sciences

Overview of RVTensor architecture

§

RVTensor Platform Overview

v Op operators

F It mainly includes the implementation of each operator, including

conv, add, active, pooling, fc and other operations

SLIDE 9

Institute of Software,Chinese Academy of Sciences

Overview of RVTensor architecture

§

RVTensor Platform Overview

v Construction calculation graph

F It builds a calculation graph based on the model analysis and the

p operator modules.

SLIDE 10

Institute of Software,Chinese Academy of Sciences

Overview of RVTensor architecture

§

RVTensor Platform Overview

v Execution calculation graph

F It obtains the inference results based on the input data (such as

image data) and the calculation graph.

SLIDE 11

Institute of Software,Chinese Academy of Sciences

Optimization

§

Reducing dependencies on third-party libraries

v Multi-thread library: Pthread

F Provide many API F Rvtensor only uses a few

SLIDE 12

Institute of Software,Chinese Academy of Sciences

Optimization

§

Improving memory utilization

v Memory reuse：Share a global memory block when op is

running

F Global memory block = MAX{ op's memory requirement} F Branch phase as atomic operation

SLIDE 13

Institute of Software,Chinese Academy of Sciences

Evaluation

§

Platform: SERVR.r

§

Neural network: Resnet20

§

Date set : Cifar10

SLIDE 14

Institute of Software,Chinese Academy of Sciences

Evaluation

§

Accuracy

v RVTensor and Keras have the same results

§

Performance

v The average time to process each image is 13.51

seconds

§

Execution file size

v The executable file size of RVTensor is 193KB

Keras runs on X86 platform

SLIDE 15

Institute of Software,Chinese Academy of Sciences

Future work

§

Memory optimization

v Due to the limited memory, there will be memory

swapping in and out issue

§

Sparse convolution

v The Relu op would result in lots zeros in the data, it

would cause the convolution to be inefficient

§

Model pruning

v Compressing the model parameters through pruning

techniques to make them more suitable for IoT scenes

§

The V instruction set adaptation

v Re-implementing the op operator based on the V

instruction set to improve the efficiency

SLIDE 16

Institute of Software,Chinese Academy of Sciences

Thanks！

*Corresponding author: Jiageng Yu jiageng08@iscas.ac.cn