lynx using os and hardware support for fast fine grained
play

Lynx: Using OS and Hardware Support for Fast Fine-Grained - PowerPoint PPT Presentation

Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang and Timothy M. Jones Computer Laboratory UKMAC 2016, Edinburgh slide 1 of 30


  1. Lynx: Using OS and Hardware Support for Fast Fine-Grained Inter-Core Communication Konstantina Mitropoulou, Vasileios Porpodas, Xiaochun Zhang and Timothy M. Jones Computer Laboratory UKMAC 2016, Edinburgh slide 1 of 30 http://www.cl.cam.ac.uk/~km647/

  2. Outline • Background: • Lamport’s queue • Multi-section queue • Lynx queue • Performance evaluation slide 2 of 30 http://www.cl.cam.ac.uk/~km647/

  3. Lamport’s Queue Bottlenecks enqueue_ptr ������������ ������������ ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

  4. Lamport’s Queue Bottlenecks enqueue_ptr ������������ ������������ ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr while ( next enqueue ptr == dequeue ptr ) { ; } slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

  5. Lamport’s Queue Bottlenecks enqueue_ptr ������������ ������������ ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr while ( next enqueue ptr == dequeue ptr ) { ; } Performance degradation due to: slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

  6. Lamport’s Queue Bottlenecks enqueue_ptr ������������ ������������ ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr while ( next enqueue ptr == dequeue ptr ) { ; } Performance degradation due to: • Frequent thread synchronisation slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

  7. Lamport’s Queue Bottlenecks enqueue_ptr ������������ ������������ ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr while ( next enqueue ptr == dequeue ptr ) { ; } Performance degradation due to: • Frequent thread synchronisation • Cache ping-pong slide 3 of 30 http://www.cl.cam.ac.uk/~km647/

  8. Cache Ping-Pong L3 cache L2 cache L2 cache L1 cache L1 cache dequeue_ptr enqueue_ptr core 1 core 2 while ( next enqueue ptr == dequeue ptr ) { ; } slide 4 of 30 http://www.cl.cam.ac.uk/~km647/

  9. Cache Ping-Pong L3 cache L2 cache L2 cache L1 cache L1 cache dequeue_ptr enqueue_ptr core 1 core 2 while ( next enqueue ptr == dequeue ptr ) { ; } • Queue pointers ping-pong across cache hierarchy slide 4 of 30 http://www.cl.cam.ac.uk/~km647/

  10. Cache Ping-Pong L3 cache L2 cache L2 cache L1 cache L1 cache dequeue_ptr enqueue_ptr core 1 core 2 while ( next dequeue ptr == enqueue ptr ) { ; } • Queue pointers ping-pong across cache hierarchy slide 5 of 30 http://www.cl.cam.ac.uk/~km647/

  11. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr section 1 section 2 dequeue_ptr slide 6 of 30 http://www.cl.cam.ac.uk/~km647/

  12. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ������������ ������������ section 1 section 2 ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr • Each section is exclusively used by one thread slide 6 of 30 http://www.cl.cam.ac.uk/~km647/

  13. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ������������ ������������ section 1 section 2 ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr • Enqueue thread cannot access section 1 because dequeue thread still uses it slide 7 of 30 http://www.cl.cam.ac.uk/~km647/

  14. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ������������ ������������ section 1 section 2 ������������ ������������ ������������ ������������ ������������ ������������ dequeue_ptr • Enqueue thread cannot access section 1 because dequeue thread still uses it • Enqueue thread waits (spins) at the end of section 2 slide 7 of 30 http://www.cl.cam.ac.uk/~km647/

  15. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ���������� ���������� section 1 section 2 ���������� ���������� ���������� ���������� ���������� ���������� dequeue_ptr • Dequeue thread reached the end of section 1 slide 8 of 30 http://www.cl.cam.ac.uk/~km647/

  16. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ���������� ���������� section 1 section 2 ���������� ���������� ���������� ���������� ���������� ���������� dequeue_ptr • Dequeue thread reached the end of section 1 • Enqueue thread enters section 1 slide 9 of 30 http://www.cl.cam.ac.uk/~km647/

  17. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ���������� ���������� section 1 section 2 ���������� ���������� ���������� ���������� ���������� ���������� dequeue_ptr Performance optimisations: slide 10 of 30 http://www.cl.cam.ac.uk/~km647/

  18. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ���������� ���������� section 1 section 2 ���������� ���������� ���������� ���������� ���������� ���������� dequeue_ptr Performance optimisations: • Infrequent boundary checks (less frequent synchronisation) slide 10 of 30 http://www.cl.cam.ac.uk/~km647/

  19. Multi-Section Queue(MSQ): state-of-the-art enqueue_ptr ���������� ���������� section 1 section 2 ���������� ���������� ���������� ���������� ���������� ���������� dequeue_ptr Performance optimisations: • Infrequent boundary checks (less frequent synchronisation) • Reduced cache ping-pong slide 10 of 30 http://www.cl.cam.ac.uk/~km647/

  20. MSQ Control-Flow Graph and Internals 1 1 2 2 3 3 4 4 5 5 6 enqueue function dequeue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  21. MSQ Control-Flow Graph and Internals enqueue 1 2 3 4 5 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  22. MSQ Control-Flow Graph and Internals enqueue 1 synchronisation code 2 3 4 5 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  23. MSQ Control-Flow Graph and Internals enqueue 1 synchronisation code checks if next section is free 2 3 4 5 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  24. MSQ Control-Flow Graph and Internals enqueue 1 synchronisation code checks if next section is free 2 spin loop 3 4 5 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  25. MSQ Control-Flow Graph and Internals enqueue 1 synchronisation code checks if next section is free 2 spin loop 3 update local variables 4 5 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  26. MSQ Control-Flow Graph and Internals enqueue 1 synchronisation code checks if next section is free 2 spin loop 3 update local variables 4 update shared variable 5 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

  27. MSQ Control-Flow Graph and Internals enqueue 1 synchronisation code checks if next section is free 2 spin loop 3 update local variables 4 update shared variable 5 join basic−block 6 enqueue function slide 11 of 30 http://www.cl.cam.ac.uk/~km647/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend