VECTOR FUNCTIONAL PROGRAMMING A tour of the Q programming language - - PowerPoint PPT Presentation

vector functional programming
SMART_READER_LITE
LIVE PREVIEW

VECTOR FUNCTIONAL PROGRAMMING A tour of the Q programming language - - PowerPoint PPT Presentation

DATA ANALYSIS WITH VECTOR FUNCTIONAL PROGRAMMING A tour of the Q programming language HISTORY OF VECTOR LANGUAGES Vectors (arrays), not scalars, are the principle data type Not a new idea ( APL, 1965 ) Ok maybe new compared to


slide-1
SLIDE 1

DATA ANALYSIS WITH

VECTOR FUNCTIONAL PROGRAMMING

A tour of the Q programming language

slide-2
SLIDE 2

HISTORY OF VECTOR LANGUAGES

➤ Vectors (arrays), not scalars, are the principle data type ➤ Not a new idea (APL, 1965) ➤ Ok… maybe new compared to functional programming (λ-

calculus, 1930s)

➤ Ken Iverson’s Iverson Notation ➤ Notation as a tool of thought ➤ Notation for people first, computers later ➤ Influenced: Mathematica, Matlab, R, Julia ➤ Descendents: I.N. → APL, J, A+, K, Q

slide-3
SLIDE 3

Q PRIMER

The basic concepts

slide-4
SLIDE 4

FUNCTION APPLICATION

➤ Monadic functions have a word name and take argument to the right

abs -1 1 1 + 2 3 til 10 0 1 2 3 4 5 6 7 8 9 abs @ -1 1 (-) . 1 2

  • 1

9 mod 3

➤ Dyadic verbs appear between the arguments ➤ Function application is a verb

slide-5
SLIDE 5

ATOMIC FUNCTIONS

➤ Primitive functions (and verbs) are atomic (apply to atoms) ➤ Evaluation is always right-to-left ➤ Typically read top-down (left-to-right)

5 * 10 + til 5 50 55 60 65 70

  • 1 * 0 1 2 3 4

0 -1 -2 -3 -4 5 * (1; 2 3; (4; 5 6); 7 8; 9) (5; 10 15; (20; 25 30); 35 40; 45)

slide-6
SLIDE 6

LIST VERBS

➤ List primitives (we have them too, just use less characters):

2#til 10 0 1

  • 2#til 10

8 9 (til 4) , til 4 0 1 2 3 0 1 2 3 0 3 6 _ til 9 0 1 2 3 4 5 6 7 8 take (#) join (,) split (_)

slide-7
SLIDE 7

MAPPING A LIST - FP 101

count each 0 3 6 _ til 9 3 3 3 0 3 6 _ til 9 0 1 2 3 4 5 6 7 8 3#0 0 0 0 3 3 3#'0 1 2 0 0 0 1 1 1 2 2 2

➤ If dyadic, combine with an adverb (a pairing operator) ➤ eg, each-both (‘) take (#) + each-both (‘) = take-each-both (#’)

→ →

But Wait! There’s More!

slide-8
SLIDE 8

ADVERBS

noun verbadverb noun

3 3 3 #' 0 1 2

slide-9
SLIDE 9

FOLD AND SCAN ARE ADVERBS … MORE FP 101

➤ Fold (/) is an adverb, we call it over

0 +/ til 5 10 0 +\ til 5 0 1 3 6 10

➤ Scan (\) returns the incremental values of over (left-to-right)

A plus reduction over 0 1 2 3 4 Partial sums of 0 1 2 3 4

slide-10
SLIDE 10

FLEXIBLE MAPPING WITH ADVERBS

➤ Only 6 adverbs, but they come up all the time

(floor;ceiling) @\: 5.5 5 6 max @/: 0 3 6 _ til 9 2 5 8 0 -': til 5 0 1 1 1 1 (min;max) @\:/: 0 3 6 _ til 9 0 2 3 5 6 8 each-left (\:) each-right (/:) each-prior (‘:) compose: each-left-each-right (\:/:)

slide-11
SLIDE 11

THINKING IN ARRAYS

Prime Numbers

slide-12
SLIDE 12

THINKING IN ARRAYS - NO STINKING LOOPS*

function isPrime (n) { if (n < 2) return false; var q = Math.floor(Math.sqrt(n)); for (var i = 2; i <= q; i++) { if (n % i == 0) { return false; } } return true; }

*

Steve Apter nsl.com

slide-13
SLIDE 13

THINKING IN ARRAYS

x mod y 1 .. 100

slide-14
SLIDE 14

THINKING IN ARRAYS

x mod y = 0

slide-15
SLIDE 15

THINKING IN ARRAYS

y = 1 y = x

slide-16
SLIDE 16

THINKING IN ARRAYS

primes

slide-17
SLIDE 17

THINKING IN ARRAYS

slide-18
SLIDE 18

THE RESULT

➤ Extremely concise, 111 bytes ➤ 29 characters left for emojis when tweeting it!

p : {n where 2=sum 0=n mod/: n:1+til x} rle : {(count;first)@\:/:(where not =‘:[x])_x} expand : {(),/(#).’x}

rle : {(count;first)@\:/:(where not =‘:[x])_x} Only short programs have any hope of being correct

~ Arthur Whitney

slide-19
SLIDE 19

HOW CAN WE USE Q FOR DATA ANALYSIS?

➤ Q has dictionaries (associations) and tables (flipped dictionaries) ➤ Tables are first-class and columnar, operations on columns are

fast and efficient

➤ It is actually the scripting language for kdb+ ➤ Has an integrated sql-like query language called q-sql

select avg price by sym from trades where date > .z.d - 5

➤ Has really nice temporal types, temporal arithmetic, and temporal

joins

slide-20
SLIDE 20

Q FOR DATA ANALYSIS

slide-21
SLIDE 21

STEP 1. GET SOME DATA

// System commands start with \ \wget .../pantheon.tsv \wget .../pageviews_2008-2013.tsv -O pageviews.tsv // ETL in Q people : ("iSiSSSSSffsissssiffiiff"; enlist "\t") 0: `:pantheon.tsv; pageviews : ("iSSiSisssss",72#"i"; enlist "\t") 0: `:pageviews.tsv;

Monthly page visit information for people on WikiPedia We have a short fat table, want a long skinny table… Each month is a single column File name Tab separated Column types

slide-22
SLIDE 22

STEP 2. CLEAN THE DATA!

// All of the months months : "M"$ssr[;"-";"."] each string 11_cols pageviews; // Create a new table of the months flattened monthly : ungroup 2!([] id : pageviews`id; lang : pageviews`lang; month : (count pageviews)#enlist months; clicks : flip pageviews c:11_cols pageviews) // Left-Join click information with person information clickinfo : monthly lj `id`lang xkey people;

id name occupation lang

  • 307 Abraham Lincoln POLITICIAN af

307 Abraham Lincoln POLITICIAN am 307 Abraham Lincoln POLITICIAN an 307 Abraham Lincoln POLITICIAN ang 307 Abraham Lincoln POLITICIAN ar 307 Abraham Lincoln POLITICIAN arz … id lang month clicks

  • 307 af 2008.01 4

307 af 2008.02 5 307 af 2008.03 0 307 af 2008.04 5 307 af 2008.05 5 307 af 2008.06 1 …

Month values Long skinny table 4 columns Left join

slide-23
SLIDE 23

STEP 3. ASK SOME QUESTIONS

select from clickinfo where occupation like “COMPUTER SCIENTIST”

slide-24
SLIDE 24

STEP 3. ASK SOME QUESTIONS

select from clickinfo where occupation like “COMPUTER SCIENTIST”

slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27

STEP 4…CLEAN THE DATA… AGAIN…

file : {"List_of_Google_Doodles_in_",string `year$x}; wget : {system "wget https://en.wikipedia.org/wiki/",file x}; process : { values : (string `January`February`March`April`May`June`July`Auguest`September`October`November`December)!til 12; doc : read0 hsym `$file x; pars: where doc like\: "<p>*"; celebrated : `$first @/:/: "\"" vs/:/: (@).' flip (d; where@/: not (d : "title=\"" vs/: doc pars) like\:\: "<p>*"); headings : {[doc;x] first pos where (doc pos : x + neg til 10) like\: "<h3>*"}[doc] each pars; months : x + values first @/: "_" vs/: first @‘ "\"" vs/: ("id=\"" vs/: doc headings)@'1; : raze each celebrated group months; }; years : 2010.01 2011.01 2012.01 2013.01m; wget each years; results : raze process each years; doodles : ungroup 1!flip `month`name!(key;value)@\:results;

month name

  • 2010.01 Isaac Newton

2010.01 Django Reinhard 2010.01 Anton Chekhov 2010.02 2010 Winter Olympics …

<p>On <b>Tuesday, July 6, 2010</b>, the birth

  • f <a href="/wiki/Frida_Kahlo" title="Frida

Kahlo">Frida Kahlo</a> was celebrated with a gold Google logo wrapped with vines, flowers, and a painting of herself in her painting styles.<sup id="cite_ref-18" class="reference"><a href="#cite_note-18">[18]</a></sup></p>

slide-28
SLIDE 28

PARALLELIZATION IN Q

file : {"List_of_Google_Doodles_in_",string `year$x}; wget : {system "wget https://en.wikipedia.org/wiki/",file x}; process : { values : (string `January`February`March`April`May`June`July`Auguest`September`October`November`December)!til 12; doc : read0 hsym `$file x; pars: where doc like\: "<p>*"; celebrated : `$first @/:/: "\"" vs/:/: (@).' flip (d; where@/: not (d : "title=\"" vs/: doc pars) like\:\: "<p>*"); headings : {[doc;x] first pos where (doc pos : x + neg til 10) like\: "<h3>*"}[doc] each pars; months : x + values first @/: "_" vs/: first @‘ "\"" vs/: ("id=\"" vs/: doc headings)@'1; : raze each celebrated group months; }; years : 2010.01 2011.01 2012.01 2013.01m; wget each years; results : raze process each years; doodles : ungroup 1!flip `month`name!(key;value)@\:results;

slide-29
SLIDE 29

PARALLELIZATION IN Q

file : {"List_of_Google_Doodles_in_",string `year$x}; wget : {system "wget https://en.wikipedia.org/wiki/",file x}; process : { values : (string `January`February`March`April`May`June`July`Auguest`September`October`November`December)!til 12; doc : read0 hsym `$file x; pars: where doc like\: "<p>*"; celebrated : `$first @/:/: "\"" vs/:/: (@).' flip (d; where@/: not (d : "title=\"" vs/: doc pars) like\:\: "<p>*"); headings : {[doc;x] first pos where (doc pos : x + neg til 10) like\: "<h3>*"}[doc] each pars; months : x + values first @/: "_" vs/: first @‘ "\"" vs/: ("id=\"" vs/: doc headings)@'1; : raze each celebrated group months; }; years : 2010.01 2011.01 2012.01 2013.01m; wget each years; results : raze process each years; doodles : ungroup 1!flip `month`name!(key;value)@\:results;

slide-30
SLIDE 30

PARALLELIZATION IN Q

file : {"List_of_Google_Doodles_in_",string `year$x}; wget : {system "wget https://en.wikipedia.org/wiki/",file x}; process : { values : (string `January`February`March`April`May`June`July`Auguest`September`October`November`December)!til 12; doc : read0 hsym `$file x; pars: where doc like\: "<p>*"; celebrated : `$first @/:/: "\"" vs/:/: (@).' flip (d; where@/: not (d : "title=\"" vs/: doc pars) like\:\: "<p>*"); headings : {[doc;x] first pos where (doc pos : x + neg til 10) like\: "<h3>*"}[doc] each pars; months : x + values first @/: "_" vs/: first @‘ "\"" vs/: ("id=\"" vs/: doc headings)@'1; : raze each celebrated group months; }; years : 2010.01 2011.01 2012.01 2013.01m; wget each years; results : raze process peach years; doodles : ungroup 1!flip `month`name!(key;value)@\:results;

slide-31
SLIDE 31

PARALLELIZATION IN Q

file : {"List_of_Google_Doodles_in_",string `year$x}; wget : {system "wget https://en.wikipedia.org/wiki/",file x}; process : { values : (string `January`February`March`April`May`June`July`Auguest`September`October`November`December)!til 12; doc : read0 hsym `$file x; pars: where doc like\: "<p>*"; celebrated : `$first @/:/: "\"" vs/:/: (@).' flip (d; where@/: not (d : "title=\"" vs/: doc pars) like\:\: "<p>*"); headings : {[doc;x] first pos where (doc pos : x + neg til 10) like\: "<h3>*"}[doc] each pars; months : x + values first @/: "_" vs/: first @‘ "\"" vs/: ("id=\"" vs/: doc headings)@'1; : raze each celebrated group months; }; years : 2010.01 2011.01 2012.01 2013.01m; wget each years; results : raze process peach years; doodles : ungroup 1!flip `month`name!(key;value)@\:results;

Done!

slide-32
SLIDE 32

STEP 5. ASK SOME MORE QUESTIONS!

// Annotate the doodled months from in the main table clickinfo: update doodle:(date,’name) in doodles from clickinfo; // Get the average and median ratio between the max monthly clicks (with and without // the doodled month) and the min monthly clicks — exclude 0-click months (avg;med) @\: { exec (%) . (max clicks where doodle; max clicks where not doodle) - min clicks from flip x where not clicks = 0 } each select clicks, doodle by name from clickinfo where name in doodles`name 58.34461 10.30895

Average: 58x Median: 10x

name |

  • ---------------------| --------

Winsor McCay | 508.5705 Albert Szent-Györgyi | 404.2465 Nicolas Steno | 360.9331 Gideon Sundback | 340.303 Mary Leakey | 337.1806 Dennis Gabor | 274.4389 Grace Hopper | 220.8074

slide-33
SLIDE 33

WHY SHOULD YOU CARE?

…and summary

slide-34
SLIDE 34

WHY SHOULD YOU CARE?

➤ High-level expressive notation ➤ Not just someones pet project ➤ Developed by Kx Systems (since 1993) ➤ Practical (dicts, tables, q-sql, temporals, etc…) ➤ Very fast ➤ memory is getting larger, vector operations getting faster (SIMD,

SSE, AVX2, AVX512, …)

➤ …benchmarks available online ➤ It’s interesting, different, and will change how you think

slide-35
SLIDE 35

THANKS!

l:{(3=not[x]*n)or(or). 3 4=\:x*n:2{flip+':[x]+1_x,0b}/x} ➤ Some references: ➤ Two books: ➤ Q Tips - Nick Psaris ➤ Q for Mortals - Jeff Borror ➤ code.kx.com ➤ kx.com ➤ /software-download.php ➤ /community.php ➤ Notation as a Tool of Thought - K. Iverson’s

Turing Award Paper

@timthornton6