 
              r..eal : Integrative statistics with R ıtor Santos Costa, Jo˜ Nicos Angelopoulos, V´ ao Azevedo, Jan Wielemaker, Rui Camacho and Lodewyk Wessels n.angelopoulos@nki.nl Netherlands Cancer Institute, Amsterdam, Netherlands PADL 2013, Rome – p.1
overview availability design philosophy syntax likely application areas basic examples page-ranking Aleph search and visualisation of biochemical networks PADL 2013, Rome – p.2
what is r..eal interface to the R statistical software library that integrates the R language to LP systems language for incoporating functional statistics in LP PADL 2013, Rome – p.3
what is R + statistical programming language ( S ) functional style strong presence in niche areas user contributed packages single implementation major satellite projects (Bioconductor) - unclear semantics conglomerate of features PADL 2013, Rome – p.4
availability SWI-Prolog ?- pack_install(real). % binaries for i368-win32, x64-win64, i386-linux and x86_64-linux source compilation via package manager and manually Yap included in binary distributions latest source can be dropped in development sources PADL 2013, Rome – p.5
design minimality interactions through a small number of predicates R flavour it should feel as if we are writing R code Prolog flavour based on Prolog terms PADL 2013, Rome – p.6
access predicates R uses <- as one of its 2 assignment operators r..eal defines predicates <-/1 and <- 2 also as operators op(950,fx,<-) op(950,yfx,<-) PADL 2013, Rome – p.7
r..eal interactions Prolog is the top-level start R as a shared OS library (.so, .dll, ...) C -interface to pass data-structures between R and Prolog operations copy data across apply R functions pass results back to Prolog PADL 2013, Rome – p.8
simple example pass some Prolog data to an R variable (x) ?- x <- [1,2,3]. true. pass the result of an R function to a Prolog variable (Y) ?- Y <- mean(x) Y = 2.0. PADL 2013, Rome – p.9
call modes +Rvar <- +PLvalue x <- [1,2,3] -PLvar <- +Rvar X <- x -PLvar <- +Rexpr X <- mean(x) +RExpr1 <- +Rexpr1 length(y) <- mean(x) PLvar unbound variable PLvalue (RHS) atomic or list Rexpr Rexpr non-list term structure Rvar atomic Rvar atomic known to R PADL 2013, Rome – p.10
data traffic Prolog R integer <-> integer float <-> double atom <-> char char -> char true/false <-> logical PADL 2013, Rome – p.11
R expressions syntax as..integer(c(1,2)) => as.integer(c(1,2)) devoff(.) => dev.off() aˆ[2] => a[2] aˆ[*,*,2] => a[„2] a$val => a$val a@val => a@val source(+"String") => source("String") source(+’Atom’) => source("Atom") ’Expr’, -’Expr’, -"Expr" => Expr PADL 2013, Rome – p.12
hidden variables ?- findall(I, between(1,50000,I), Is), time( A <- mean(Is) ). % 181 inferences,0.002 CPU in 0.002 seconds (100% CPU,75597 Lips) Is = [1, 2, 3, 4, 5, 6, 7, 8, 9|...], A = 25000.5. PADL 2013, Rome – p.13
r..eal example 3 2 1 4 5 cars <- [1,3,6,4,9] . <- plot( cars ) . <- plot( [1,3,6,4,9] ) . PADL 2013, Rome – p.14
r..eal example II 1 0 y −1 −2 −1 0 1 2 <- set..seed(1), x y <- rnorm(50), x <- rnorm(y), <- x11(width=5,height=3.5) <- plot(x,y), X <- x. X = [0.39810588036706807, PADL 2013, Rome – p.15 -0.6120263932507712,
logic programming for biology relational knowledge representation logical inference database integration interactive operation scripting selection as search R statistics visualisation user-contributed code culture PADL 2013, Rome – p.16
sources of biological knowledge deluge of data generated due to high throughput technologies PPI protein-protein interactions STRING 5 , 214 , 234 proteins 224 , 346 , 017 interactions 1133 organism HPRD 39 , 194 interactions homo sapiens PADL 2013, Rome – p.17
metabolic TFs in yeast PADL 2013, Rome – p.18
adhesome library SGPP2 SGPP1 LRP1 CTSL CTSB PPAP2A NUMB CD74 SEMA3A LPL L1CAM NRP1 NOTCH1 SEMA3F SemE CALR NCAM1 HLA−DMA PVR C1QA TRIP6 NGFRAP1 DGKA APPL1 TUBA1C PRNP TUBA4A 100132941 CD226 FGD1 GNA13 FGD3 CASP3 HSPA2 MAPK8IP3 TSC1 RIPK2 ARHGEF2 Arhgap14 NGFR SRGAP2 TPM4 SRGAP1 ARHGEF11 DIAPH3 PARD6A PSTPIP1 RHEB TPM1 ARHGEF1 TPM2 DAAM1 ARHGEF12 MYL6 MEFV ACTC1 RHOG PVRL3 GJA1 DAAM2 DAPK1 PYCARD DUSP4 NLRP3 PLXNB1 ARHGEF7 HSP90AA1 RHOD ARHGDIB MEK1 MLCK PAK1 NCK1 ARHGDIA IQGAP2 PAK6 PPP1R12B FARP2 PAK2 AR PAK3 PAK7 ARHGEF6 PAK4 LIMK1 TSC2 Ngef NCK2 GIT1 RAF1 TPM3 Ship2 CDC42 ARHGAP5 MAPK8 ELMO1 RAB5A TEC ABI2 GRLF1 PIK3C3 RASSF1 PLXNA1 PRKACA ZFYVE20 RAPGEF2 ERK1 RHOA ERK2 THY1 EHD1 RASA1 RasGRF RAC3 NF1 ROCK1 RAC1 RAC2 ROCK2 NRAS PTPN6 PLCG1 RDX TIAM−2 TIAM1 KRAS1 PREX−1 VAV2 Vav1 HRAS VAV3 ITGAL SOS−2 MYL5 SLC9A1 SHP−2 PLCG2 SPRY1 SOS1 PDPK1 SPRED2 CTNND1 RAP1A WASF1 IRS1 PTK2B SPRY2 Ship1 IRS2 PIK3R3 PIK3R1 DOCK1 PIK3CB PIK3CA GSN WASF2 CFL1 DNM3 PDK1 PIK3R2 PIK3CG ITGB2 WASF3 PIPK1alpha MYH10 MYH9 DIAPH2 GRB2 PIPK1gamma ITGAM PIPK1beta PLAUR DNM1L PTEN IQGAP1 WASL ARPC1B VIL2 PABP1 AKT SRC WAS ARPC5 ARPC3 LCK DNM1 ABL1 SYK INSR GAB1 CTTN DNM2 ARPC2 DIAPH1 ENAH ARHGEF4 PTK2 FYN ILK SHC1 PXN PLAU SHC3 SHC2 CRKL CRK BCAR1 MSN GSPT1 ACTG1 PFN2 BAIAP2 VASP PFN1 VCL PTPN1 PTPRF ITGA4 ACTB ITGA5ITGA3 PLD2 ITGA2B ITGB1 ITGA2 ACTN2 CBL ITGA9 ITGA7 ITGA8 ACTN1 RAB11FIP3 UPF1 SIRPA BCR ITGAV ITGAD ITGAX ITGB7 ITGB5 ACTN4 ITGB4 ITGAE ITGB8 ITGB6 ITGA10 PARVG SERPINE1 PARVA GNAI1 PARVB NME3 RAPGEF1 TLN2 FLNB FLNC ZYX ADCY1 FLNA GRAF2 TLN1 SORBS1 BCL2 CAV1 CAV2 CTNNA1 TP53 ITGA6 CAV3 EDG2 CTNNB1 CD_82 RAB11A CSK CDH5 SSX2IP GADD45b SMG1 CXCR5 APC RHOQ TJP2 SAT1 PPP2CA EGFR NFKB1 CXCL1 HBEGF SH3KBP1 PLA2G1B ADAM17 CXCL2 CAPN2 MAOA IL20RA HSPB1 SOD1 PLD1 PRKCD SPTAN1 TRAF2 FIGF ADH1A CDH1 DMD RALA TRAF3 JAK1 ALOX15 GSTM1 TRIP10 EPB41 IL24 COMP DERL1 STAT3 MARCKS CYP1A2 VEGFC FHL2 VEGF IKBKE ARF6 AMFR VEGFB STAM STAM2 CCND1 PNP DAG1 CASK XDH COL1A1 ZBP1 IAP ITGB1BP3 DDEF1 DDEF2 HSPG2 JUP CD44 SDC1 SDC3 CDKN2B SDC4 SDC2 ETS1 MMP9 ZBTB17 MMP1 MMP2 MMP14 delphilin TEX9 AVO3 FGD2 RHOV BMCC1 FLJ31951 LAYN DOCK11 PPM1M CALR3 RAVER1 MUCL1 CENTG1 CENTD2 CENTD1 Grbp TPTE2 CABLES1 NEXN RHPN2 JUB Tensin_4 NUDT16L1 URP2 ARHGAP24 DOCK8 PADL 2013, Rome – p.19
(R)Cytoscape Cytoscape graphs visualisation software. RCytoscape R bi-directional interface to Cytoscape. rcy r..eal based routines for displaying Prolog graphs in Cytoscape PADL 2013, Rome – p.20
r..eal information http://bioinformatics.nki.nl/~nicos/sware/real also on git://www.swi-prolog.org/home/ pl/git/packages/real.git PADL 2013, Rome – p.21
piece-meal prolog bioinformatics r..eal Swi/Yap <-> R interface pubmed access pumed citation records proSQLite Swi/Yap <-> SQLite interface rcy graph visualisation depth search depth limited reachability versus the more holistic blip : http://www.blipkit.org/ PADL 2013, Rome – p.22
Aleph Histogram of r 150 100 Frequency 50 0 0.4 0.6 0.8 1.0 1.2 1.4 r pagerank(File,nav(Name,Arity,Value)) :- parse(File,Graph), g <- graph(Graph), r <- page..rank(g), Scores <- r$vector, max_element(Scores, Name, Arity, Value). PADL 2013, Rome – p.23
bottom-line r..eal is an intuitive efficient tight integration to R Future work. thread-wrapper port to more Prologs more applications PADL 2013, Rome – p.24
Recommend
More recommend