Enhancements to the pd developer branch initiated by the vibrez - - PowerPoint PPT Presentation

enhancements to the pd developer branch initiated by the
SMART_READER_LITE
LIVE PREVIEW

Enhancements to the pd developer branch initiated by the vibrez - - PowerPoint PPT Presentation

Enhancements to the pd developer branch initiated by the vibrez project Thomas Grill, Hannes Kcher, Tim Blechmann vibrez.net pd~convention 2007, Montral 1 Outline vibrez pure data CVS audio and MIDI sub-system DSP performance Loader


slide-1
SLIDE 1

Enhancements to the pd developer branch initiated by the vibrez project

Thomas Grill, Hannes Köcher, Tim Blechmann vibrez.net pd~convention 2007, Montréal

1

slide-2
SLIDE 2

Outline

vibrez pure data CVS audio and MIDI sub-system DSP performance Loader hooks Background processing

2

slide-3
SLIDE 3

vibrez

media performance system based on pd kernel commercially funded, partly open source OpenGL GUI, detached from patcher system many further extensions (pd externals) multi-threaded (GUI, Python scripting) targeted for Windows and MacOS

3

slide-4
SLIDE 4

4

slide-5
SLIDE 5

Does the pd kernel need to be improved?

Audio performance (latency) DSP performance (support of current CPU architectures) Low priority processing Better extensibility (scripted externals) Fixing bugs

5

slide-6
SLIDE 6

pd source situated on the sourceforge CVS

  • one main but multiple devel_0_xx branches

Additional features, improvements

  • a lot of work already put into it
  • many features haven‘t been promoted

to the main branch devel_0_39 branch used for production

6

slide-7
SLIDE 7

Unified audio subsystem

Using portaudio only

  • Host specifics and bug-fixing is sourced out

Callback-based scheduler

  • No more ringbuffer (temporary storage)
  • DSP processing done in audio thread
  • Lower latency (for pro gear)

7

slide-8
SLIDE 8

Callback-based scheduler

int m_scheduler() { sys_lock(); for(;;) { double time,rtime; time = sys_getrealtime(); /* allow the audio callback to run */ sys_unlock(); sys_microsleep(sys_sleepgrain); sys_lock(); sys_pollmidiqueue(); sys_setmiditimediff(0,1e-6*sys_schedadvance); rtime = (sys_getrealtime()-time)*1e6; /* calculate remaining time */ time = sys_schedadvance/4-rtime; if(time > 0) run_timed_idle_callbacks(time); if(sys_pollgui()) continue; /* do graphics updates */ sched_pollformeters(); } sys_unlock(); } int process(void *input,void *output, int frameCount,...) { /* how much time do we have? */ int timeout = (float)frameCount/sys_dacsr*1e6; double time,rtime; time = sys_getrealtime(); if(sys_timedlock(timeout) == ETIMEDOUT) { /* we're late */ sys_lock_timeout_notification(); return 0; } for(i = 0; i < frameCount/sys_dacblocksize; ++i) { audio_copy(sys_soundin,input,sys_inchannels); sched_tick(sys_time + sys_time_per_dsp_tick); audio_copy(output,sys_soundout,sys_outchannels); } rtime = (sys_getrealtime()-time)*1e6; /* calculate remaining time */ time = timeout*0.5-rtime; if(time > 0) run_timed_idle_callbacks(time); sys_unlock(); return 0; }

pd thread audio thread

8

slide-9
SLIDE 9

Callback-based scheduler

int m_scheduler() {

sys_lock();

for(;;) {

double time,rtime; time = sys_getrealtime();

sys_unlock(); sys_microsleep(sys_sleepgrain); sys_lock();

rtime = (sys_getrealtime()-time)*1e6; /* calculate remaining time */ time = sys_schedadvance/4-rtime;

if(time > 0) run_timed_idle_callbacks(time);

}

sys_unlock();

} int process(void *input,void *output, int frameCount,...) { int timeout = (float)frameCount/sys_dacsr*1e6; double time,rtime; time = sys_getrealtime();

sys_lock();

for(i = 0; i < frameCount/sys_dacblocksize; ++i) { audio_copy(sys_soundin,input,sys_inchannels);

sched_tick(sys_time+sys_time_per_dsp_tick);

audio_copy(output,sys_soundout,sys_outchannels); } rtime = (sys_getrealtime()-time)*1e6; /* calculate remaining time */ time = timeout*0.5-rtime;

if(time > 0) run_timed_idle_callbacks(time); sys_unlock();

return 0; }

pd thread audio thread

9

slide-10
SLIDE 10

Latency measurements

(analog loop-back)

OS / Audio interface DAD latency

WinXP / RME Fireface 400

ASIO: 6.5 ms

WinXP / RME HDSP Multiface

ASIO: 5.0 ms

WinXP / M-Audio FW410

ASIO: 6.5 ms

OS X.4 / RME Fireface 400

CoreAudio: 13.8 ms

no improvement for MME or DirectSound

10

slide-11
SLIDE 11

Changes to MIDI subsystem

Using portmidi only (analogous to portaudio) MIDI input not working for „exotic“ message types with varying byte counts

  • system exclusive, system common,

real-time messages, needed e.g. for syncing

11

slide-12
SLIDE 12

Message-based audio and MIDI configuration

Traditionally: command-line or menu-based system configuration Several new messages to pd receiver allowing configuration and querying of audio and MIDI status Message feedback from the pd sender Portaudio only!

12

slide-13
SLIDE 13

Audio configuration

13

slide-14
SLIDE 14

SIMD processing

Single instruction, multiple data Supported by current x86 and PPC CPUs For contiguous, aligned blocks of data

  • Perfect for DSP vectors
  • Considerable speed-ups

14

slide-15
SLIDE 15

SIMD processing contd.

Requires reformulation of DSP algorithms

  • Sometimes easy, sometimes very hard
  • „Auto-vectorization“ rarely effective

Coded using assembly or „compiler intrinsics“ Needs memory alignment (16 Bytes)

15

slide-16
SLIDE 16

SIMD example

t_int *plus_perf8(t_int *w) { t_float *in1 = (t_float *)(w[1]); t_float *in2 = (t_float *)(w[2]); t_float *out = (t_float *)(w[3]); int n = (int)(w[4]); for (; n; n -= 8, in1 += 8, in2 += 8, out += 8) { float f0 = in1[0], f1 = in1[1], f2 = in1[2], f3 = in1[3]; float f4 = in1[4], f5 = in1[5], f6 = in1[6], f7 = in1[7]; float g0 = in2[0], g1 = in2[1], g2 = in2[2], g3 = in2[3]; float g4 = in2[4], g5 = in2[5], g6 = in2[6], g7 = in2[7];

  • ut[0] = f0 + g0; out[1] = f1 + g1;
  • ut[2] = f2 + g2; out[3] = f3 + g3;
  • ut[4] = f4 + g4; out[5] = f5 + g5;
  • ut[6] = f6 + g6; out[7] = f7 + g7;

} return (w+5); } #define INNER 4 t_int *plus_perf_simd (t_int *w) { t_float *in1 = (t_float *)(w[1]); t_float *in2 = (t_float *)(w[2]); t_float *out = (t_float *)(w[3]); int n = w[4]; int i,j; for(i = 0; i < n; ) for(j = 0; j < INNER; ++j,i += 4) _mm_store_ps(out+i, _mm_add_ps( _mm_load_ps(in1+i), _mm_load_ps(in2+i) ) ); return w+5; }

16

slide-17
SLIDE 17

DSP performance

(mix of pd help patches)

System / CPU type CPU load

WinXP / Core2Duo 2.4GHz

60% → 40%

WinXP / P4 2.53GHz

85% → 63%

MBP / Core2Duo 2.16GHz

69% → 56%

MiniMac / PPC 1.66GHz

59% → 38%

  • nly simple DSP functions optimized

unfair conditions for SIMD (large DSP graph)

17

slide-18
SLIDE 18

Hooks provide plugin functionality for loading strategies

  • Enables externals written using

script languages or virtual machines

  • Allows new loading strategies

Used/extended for the libdir loader of pd-extended

Loader hooks

18

slide-19
SLIDE 19

Background processing

For long-lasting operations Avoiding CPU spikes that cause dropouts a) Background thread (running continuously) b) Idle message processing (called multiple times)

vasp arrayL arrayR vasp.cfft @detach 1 vasp.polar @detach 1 vasp.update

19

slide-20
SLIDE 20

Multi-threading

Communication with pd kernel thread

  • Message passing (inlet/outlet, send/receive)
  • Use lock-free FIFO

pd API not easily accessible for background thread

  • sys_lock must be used
  • Symbol lookup (gensym) is lock-free✌

⌛

20

slide-21
SLIDE 21

Idle processing

idle_hook already existing in main pd (run once per scheduler iteration) We want timed idle callback processing Callback dynamically installable (thread-safe) Callback run in pd kernel thread Can request re-triggering for this or next scheduler iteration (returning 0, 1 or 2)

21

slide-22
SLIDE 22

pd do_some_work

start ready

? delay 1 pd do_some_work

start ready

defer

Retriggering work

(using idle processing)

work takes some time, depending on CPU defer calls for work, as long as there is time

22

slide-23
SLIDE 23

Conclusion

Enhancements add functionality without compromising compatibility Discussion needed Provide patches for the SF patch tracker devel_0_39 branch ☠ Thanks to Tim Blechmann

23