p a r s i n g j s o n r e a l l y q u i c k l y l e s s o

P a r s i n g J S O N R e a l l y Q u i c k l y : L e s s o n s L e a - PowerPoint PPT Presentation

P a r s i n g J S O N R e a l l y Q u i c k l y : L e s s o n s L e a r n e d D a n i e l L e m i r e b l o g : h t t p s : / / l e m i r e . m e t w i t t e r : @ l e m i r e G i t H u b : h t t p s : / / g i t h u b . c o m / l e m i r e / p ro fe


  1. P a r s i n g J S O N R e a l l y Q u i c k l y : L e s s o n s L e a r n e d D a n i e l L e m i r e b l o g : h t t p s : / / l e m i r e . m e t w i t t e r : @ l e m i r e G i t H u b : h t t p s : / / g i t h u b . c o m / l e m i r e / p ro fe s s o r ( C o m p u t e r S c i e n c e ) a t U n i v e r s i t é d u Q u é b e c ( T É L U Q ) M o n t r e a l 2

  2. H o w fa s t c a n yo u r e a d a l a r g e f i l e ? A r e yo u l i m i t e d b y yo u r d i s k o r A r e yo u l i m i t e d b y yo u r C P U ? 3

  3. A n i M a c d i s k : 2 . 2 G B / s , F a s t e r S S D s ( e . g . , 5 G B / s ) a r e a v a i l a b l e 4

  4. R e a d i n g t e x t l i n e s ( C P U o n l y ) ~ 0.6 G B / s o n 3. 4 G H z S k y l a ke i n J a v a void parseLine(String s) { volume += s.length(); } void readString(StringReader data) { BufferedReader bf = new BufferedReader(data); bf.lines().forEach(s -> parseLine(s)); } S o u r c e a v a i l a b l e . I m p rov e d b y J D K - 8 2 2 9 0 2 2 5

  5. R e a d i n g t e x t l i n e s ( C P U o n l y ) ~ 1. 5 G B / s o n 3. 4 G H z S k y l a ke i n C + + ( G N U G C C 8. 3 ) size_t sum_line_lengths(char * data, size_t length) { std::stringstream is; is.rdbuf()->pubsetbuf(data, length); std::string line; size_t sumofalllinelengths{0}; while(getline(is, line)) { sumofalllinelengths += line.size(); } return sumofalllinelengths; } S o u r c e a v a i l a b l e . 6

  6. 7 s o u r c e

  7. J S O N S p e c i f i e d b y D o u g l a s C ro c k f o r d R F C 7 1 5 9 b y T i m B r a y i n 2 0 1 3 U b i q u i to u s f o r m a t to e x c h a n g e d a t a {"Image": {"Width": 800,"Height": 600, "Title": "View from 15th Floor", "Thumbnail": { "Url": "http://www.example.com/81989943", "Height": 125,"Width": 100} } 8

  8. " O u r b a c ke n d s p e n d s h a l f i t s t i m e s e r i a l i z i n g a n d d e s e r i a l i z i n g j s o n " 9

  9. J S O N p a r s i n g R e a d a l l o f t h e c o n t e n t C h e c k t h a t i t i s v a l i d J S O N C h e c k U n i c o d e e n c o d i n g P a r s e n u m b e r s B u i l d D O M ( d o c u m e n t - o b j e c t - m o d e l ) H a r d e r t h a n p a r s i n g l i n e s ? 1 0

  10. J a c k s o n J S O N s p e e d ( J a v a ) t w i t t e r . j s o n : 0.3 5 G B / s o n 3. 4 G H z S k y l a ke S o u r c e c o d e a v a i l a b l e . s p e e d J a c k s o n ( J a v a ) 0. 3 5 G B / s r e a d L i n e s C + + 1. 5 G B / s d i s k 2 . 2 G B / s 1 1

  11. R a p i d J S O N s p e e d ( C + + ) t w i t t e r . j s o n : 0.6 5 0 G B / s o n 3. 4 G H z S k y l a ke s p e e d R a p i d J S O N ( C + + ) 0. 6 5 G B / s J a c k s o n ( J a v a ) 0. 3 5 G B / s r e a d L i n e s C + + 1. 5 G B / s d i s k 2 . 2 G B / s 1 2

  12. s i m d j s o n s p e e d ( C + + ) t w i t t e r . j s o n : 2 . 4 G B / s o n 3. 4 G H z S k y l a ke s p e e d s i m d j s o n ( C + + ) 2 . 4 G B / s R a p i d J S O N ( C + + ) 0. 6 5 G B / s J a c k s o n ( J a v a ) 0. 3 5 G B / s r e a d L i n e s C + + 1. 5 G B / s d i s k 2 . 2 G B / s 1 3

  13. 2 . 4 G B / s o n a 3. 4 G H z ( + t u r b o ) p ro c e s s o r i s ~ 1. 5 c y c l e s p e r i n p u t b y t e 1 4

  14. T r i c k # 1 : a vo i d h a r d - to - p r e d i c t b r a n c h e s 1 5

  15. W r i t e r a n d o m n u m b e r s o n a n a r r a y . while (howmany != 0) { out[index] = random(); index += 1; howmany--; } e . g . , ~ 3 c y c l e s p e r i t e r a t i o n 1 6

  16. W r i t e o n l y o d d r a n d o m n u m b e r s : while (howmany != 0) { val = random(); if( val is odd) { // <=== new out[index] = val; index += 1; } howmany--; } 1 7

  17. F ro m 3 c y c l e s to 1 5 c y c l e s p e r v a l u e ! 1 8

  18. G o b r a n c h l e s s ! while (howmany != 0) { val = random(); out[index] = val; index += (val bitand 1); howmany--; } b a c k to u n d e r 4 c y c l e s ! D e t a i l s a n d c o d e a v a i l a b l e 1 9

  19. W h a t i f I ke e p r u n n i n g t h e s a m e b e n c h m a r k ? ( s a m e p s e u d o - r a n d o m i n t e g e r s f ro m r u n - to - r u n ) 2 0

  20. T r i c k # 2 : U s e w i d e " wo r d s " D o n ' t p ro c e s s b y t e b y b y t e 2 1

  21. W h e n p o s s i b l e , u s e S I M D A v a i l a b l e o n m o s t c o m m o d i t y p ro c e s s o r s ( A R M , x 6 4 ) O r i g i n a l l y a d d e d ( P e n t i u m ) f o r m u l t i m e d i a ( s o u n d ) A d d w i d e r ( 1 2 8 - b i t , 2 5 6 - b i t , 5 1 2 - b i t ) r e g i s t e r s A d d s n e w f u n i n s t r u c t i o n s : d o 3 2 t a b l e l o o k u p s a t o n c e . 2 2

  22. I S A w h e r e m a x . r e g i s t e r w i d t h A R M N E O N ( A A r c h 6 4 ) m o b i l e p h o n e s , t a b l e t s 1 2 8 - b i t S S E 2 . . . S S E 4 . 2 l e g a c y x 6 4 ( I n t e l , A M D ) 1 2 8 - b i t A V X , A V X 2 m a i n s t r e a m x 6 4 ( I n t e l , A M D ) 2 5 6 - b i t A V X - 5 1 2 l a t e s t x 6 4 ( I n t e l ) 5 1 2 - b i t 2 3

  23. " I n t r i n s i c " f u n c t i o n s ( C , C + + , R u s t , . . . ) m a p p i n g to s p e c i f i c i n s t r u c t i o n s o n s p e c i f i c i n s t r u c t i o n s s e t s H i g h e r l e v e l f u n c t i o n s ( S w i f t , C + + , . . . ) : J a v a V e c to r A P I A u tov e c to r i z a t i o n ( " c o m p i l e r m a g i c " ) ( J a v a , C , C + + , . . . ) O p t i m i z e d f u n c t i o n s ( s o m e i n J a v a ) A s s e m b l y ( e . g . , i n c r y p to ) 24

  24. T r i c k # 3 : a vo i d m e m o r y / o b j e c t a l l o c a t i o n 2 5

  25. I n s i m d j s o n , t h e D O M ( d o c u m e n t - o b j e c t - m o d e l ) i s s to r e d o n o n e c o n t i g u o u s t a p e . 2 6

  26. T r i c k # 4 : m e a s u r e t h e p e r f o r m a n c e ! b e n c h m a r k - d r i v e n d e v e l o p m e n t 2 7

  27. C o n t i n u o u s I n t e g r a t i o n P e r f o r m a n c e t e s t s p e r f o r m a n c e r e g r e s s i o n i s a b u g t h a t s h o u l d b e s p o t t e d e a r l y 2 8

  28. P ro c e s s o r f r e q u e n c i e s a r e n o t c o n s t a n t E s p e c i a l l y o n l a p to p s C P U c y c l e s d i f fe r e n t f ro m t i m e T i m e c a n b e n o i s i e r t h a n C P U c y c l e s 2 9

  29. S p e c i f i c e x a m p l e s 3 0

  30. E x a m p l e 1. U T F - 8 S t r i n g s a r e A S C I I ( 1 b y t e p e r c o d e p o i n t ) O t h e r w i s e m u l t i p l e b y t e s ( 2 , 3 o r 4 ) O n l y 1. 1 M v a l i d U T F - 8 c o d e p o i n t s 3 1

  31. V a l i d a t i n g U T F - 8 w i t h i f / e l s e / w h i l e if (byte1 < 0x80) { return true; // ASCII } if (byte1 < 0xE0) { if (byte1 < 0xC2 || byte2 > 0xBF) { return false; } } else if (byte1 < 0xF0) { // Three-byte form. if (byte2 > 0xBF || (byte1 == 0xE0 && byte2 < 0xA0) || (byte1 == 0xED && 0xA0 <= byte2) blablabla ) blablabla } else { // Four-byte form. .... blabla } 3 2

  32. U s i n g S I M D L o a d 3 2 - b y t e r e g i s t e r s U s e ~ 2 0 i n s t r u c t i o n s N o b r a n c h , n o b r a n c h m i s p r e d i c t i o n 3 3

  33. E x a m p l e : V e r i f y t h a t a l l b y t e v a l u e s a r e n o l a r g e r t h a n 24 4 S a t u r a t e d s u b t r a c t i o n : x - 244 i s n o n - z e ro i f a n o n l y i f x > 244 . _mm256_subs_epu8(current_bytes, 244 ); O n e i n s t r u c t i o n , c h e c k s 3 2 b y t e s a t o n c e ! 3 4

  34. p ro c e s s i n g r a n d o m U T F - 8 c y c l e s / b y t e b r a n c h i n g 1 1 s i m d j s o n 0. 5 20 x fa s t e r ! S o u r c e c o d e a v a i l a b l e . 3 5

  35. E x a m p l e 2 . C l a s s i f y i n g c h a r a c t e r s c o m m a ( 0 x 2 c ) , c o l o n ( 0 x 3 a ) : b r a c ke t s ( 0 x 5 b , 0 x 5 d , 0 x 7 b , 0 x 7 d ) : [, ], {, } w h i t e - s p a c e ( 0 x 0 9, 0 x 0 a , 0 x 0 d , 0 x 2 0 ) o t h e r s C l a s s i f y 1 6 , 3 2 o r 6 4 c h a r a c t e r s a t o n c e ! 3 6

  36. D i v i d e v a l u e s i n to t w o ' n i b b l e s ' 0 x 2 c i s 2 ( h i g h n i b b l e ) a n d c ( l o w n i b b l e ) T h e r e a r e 1 6 p o s s i b l e l o w n i b b l e s . T h e r e a r e 1 6 p o s s i b l e h i g h n i b b l e s . 3 7

  37. A R M N E O N a n d x 6 4 p ro c e s s o r s h a v e i n s t r u c t i o n s to l o o k u p 1 6 - b y t e t a b l e s i n a v e c to r i z e d m a n n e r ( 1 6 v a l u e s a t a t i m e ) : p s h u f b , t b l 3 8

  38. S t a r t w i t h a n a r r a y o f 4 - b i t v a l u e s [ 1, 1, 0, 2 , 0, 5 , 1 0, 1 5 , 7 , 8, 1 3, 9, 0, 1 3, 5 , 1 ] C r e a t e a l o o k u p t a b l e [ 2 0 0, 2 0 1, 2 0 2 , 2 0 3, 2 0 4 , 2 0 5 , 2 0 6 , 2 07 , 2 0 8, 2 0 9, 2 1 0, 2 1 1, 2 1 2 , 2 1 3, 2 1 4 , 2 1 5 ] 0 2 0 0, 1 2 0 1, 2 2 0 2 R e s u l t : [ 2 0 1, 2 0 1, 2 0 0, 2 0 2 , 2 0 0, 2 0 5 , 2 1 0, 2 1 5 , 2 07 , 2 0 8, 2 1 3, 2 0 9, 2 0 0, 2 1 3, 2 0 5 , 2 0 1 ] 3 9

  39. F i n d t w o t a b l e s H1 a n d H2 s u c h a s t h e b i t w i s e A N D o f t h e l o o k c l a s s i f y t h e c h a r a c t e r s . H1(low(c)) & H2(high(c)) c o m m a ( 0 x 2 c ) : 1 c o l o n ( 0 x 3 a ) : 2 b r a c ke t s ( 0 x 5 b , 0 x 5 d , 0 x 7 b , 0 x 7 d ) : 4 m o s t w h i t e - s p a c e ( 0 x 0 9, 0 x 0 a , 0 x 0 d ) : 8 w h i t e s p a c e ( 0 x 2 0 ) : 1 6 o t h e r s : 0 4 0

Recommend


More recommend