r..eal : Integrative statistics with R tor Santos Costa, Jo Nicos - - PowerPoint PPT Presentation

r eal integrative statistics with r
SMART_READER_LITE
LIVE PREVIEW

r..eal : Integrative statistics with R tor Santos Costa, Jo Nicos - - PowerPoint PPT Presentation

r..eal : Integrative statistics with R tor Santos Costa, Jo Nicos Angelopoulos, V ao Azevedo, Jan Wielemaker, Rui Camacho and Lodewyk Wessels n.angelopoulos@nki.nl Netherlands Cancer Institute, Amsterdam, Netherlands PADL 2013, Rome


slide-1
SLIDE 1

r..eal : Integrative statistics with R

Nicos Angelopoulos, V´ ıtor Santos Costa, Jo˜ ao Azevedo, Jan Wielemaker, Rui Camacho and Lodewyk Wessels

n.angelopoulos@nki.nl

Netherlands Cancer Institute, Amsterdam, Netherlands

PADL 2013, Rome – p.1

slide-2
SLIDE 2
  • verview

availability design philosophy syntax likely application areas basic examples page-ranking Aleph search and visualisation of biochemical networks

PADL 2013, Rome – p.2

slide-3
SLIDE 3

what is r..eal

interface to the R statistical software library that integrates the R language to LP systems language for incoporating functional statistics in LP

PADL 2013, Rome – p.3

slide-4
SLIDE 4

what is R

+ statistical programming language (S) functional style strong presence in niche areas user contributed packages single implementation major satellite projects (Bioconductor)

  • unclear semantics

conglomerate of features

PADL 2013, Rome – p.4

slide-5
SLIDE 5

availability

SWI-Prolog ?- pack_install(real). % binaries for i368-win32, x64-win64, i386-linux and x86_64-linux source compilation via package manager and manually Yap included in binary distributions latest source can be dropped in development sources

PADL 2013, Rome – p.5

slide-6
SLIDE 6

design

minimality interactions through a small number of predicates R flavour it should feel as if we are writing R code Prolog flavour based on Prolog terms

PADL 2013, Rome – p.6

slide-7
SLIDE 7

access predicates

R uses <- as one of its 2 assignment operators r..eal defines predicates <-/1 and <- 2 also as operators

  • p(950,fx,<-)
  • p(950,yfx,<-)

PADL 2013, Rome – p.7

slide-8
SLIDE 8

r..eal interactions

Prolog is the top-level start R as a shared OS library (.so, .dll, ...) C-interface to pass data-structures between R and Prolog

  • perations

copy data across apply R functions pass results back to Prolog

PADL 2013, Rome – p.8

slide-9
SLIDE 9

simple example

pass some Prolog data to an R variable (x) ?- x <- [1,2,3]. true. pass the result of an R function to a Prolog variable (Y) ?- Y <- mean(x) Y = 2.0.

PADL 2013, Rome – p.9

slide-10
SLIDE 10

call modes

+Rvar <- +PLvalue x <- [1,2,3]

  • PLvar

<- +Rvar X <- x

  • PLvar

<- +Rexpr X <- mean(x) +RExpr1 <- +Rexpr1 length(y) <- mean(x) PLvar unbound variable PLvalue (RHS) atomic or list Rexpr Rexpr non-list term structure Rvar atomic Rvar atomic known to R

PADL 2013, Rome – p.10

slide-11
SLIDE 11

data traffic

Prolog R integer <-> integer float <-> double atom <-> char char

  • >

char true/false <-> logical

PADL 2013, Rome – p.11

slide-12
SLIDE 12

R expressions syntax

as..integer(c(1,2)) => as.integer(c(1,2)) devoff(.) => dev.off() aˆ[2] => a[2] aˆ[*,*,2] => a[„2] a$val => a$val a@val => a@val source(+"String") => source("String") source(+’Atom’) => source("Atom") ’Expr’, -’Expr’, -"Expr" => Expr

PADL 2013, Rome – p.12

slide-13
SLIDE 13

hidden variables

?- findall(I, between(1,50000,I), Is), time( A <- mean(Is) ). % 181 inferences,0.002 CPU in 0.002 seconds (100% CPU,75597 Lips) Is = [1, 2, 3, 4, 5, 6, 7, 8, 9|...], A = 25000.5.

PADL 2013, Rome – p.13

slide-14
SLIDE 14

r..eal example

1 2 3 4 5

cars <- [1,3,6,4,9]. <- plot( cars ). <- plot( [1,3,6,4,9] ) .

PADL 2013, Rome – p.14

slide-15
SLIDE 15

r..eal example II

−1 1 2 −2 −1 1 x y

<- set..seed(1), y <- rnorm(50), x <- rnorm(y), <- x11(width=5,height=3.5) <- plot(x,y), X <- x. X = [0.39810588036706807,

  • 0.6120263932507712,

PADL 2013, Rome – p.15

slide-16
SLIDE 16

logic programming for biology

relational knowledge representation logical inference database integration interactive operation scripting selection as search R statistics visualisation user-contributed code culture

PADL 2013, Rome – p.16

slide-17
SLIDE 17

sources of biological knowledge

deluge of data generated due to high throughput technologies PPI protein-protein interactions STRING 5, 214, 234 proteins 224, 346, 017 interactions 1133 organism HPRD 39, 194 interactions homo sapiens

PADL 2013, Rome – p.17

slide-18
SLIDE 18

metabolic TFs in yeast

PADL 2013, Rome – p.18

slide-19
SLIDE 19

adhesome library

URP2 NUDT16L1 Tensin_4 JUB RHPN2 NEXN ARHGAP24 DOCK8 CENTG1 CENTD2 CENTD1 Grbp TPTE2 CABLES1 BMCC1 FLJ31951 LAYN DOCK11 PPM1M CALR3 RAVER1 MUCL1 TEX9 AVO3 FGD2 RHOV SMG1 SH3KBP1 PPP2CA EGFR NFKB1 HBEGF ADAM17 TRAF3 HSPB1 FIGF IKBKE VEGFC STAT3 VEGFB VEGF DDEF1 ZBP1 IL24 RALA DDEF2 delphilin SORBS1 GRAF2 FLNC TLN2 FLNA TLN1 RAPGEF1 ZYX RHOQ FLNB SSX2IP ITGB8 ITGA6 CTNNA1 CTNNB1 TRAF2 CDH5 IL20RA PLD1 PRKCD SPTAN1 ALOX15 EPB41 TJP2 HSPG2 DAG1 DMD ITGB1BP3 PNP AMFR XDH CYP1A2 DERL1 CASK MARCKS JAK1 FHL2 STAM2 COMP STAM CDH1 ARF6 TRIP10 COL1A1 SDC1 CD44 SDC3 SDC2 SDC4 MMP9 ETS1 JUP CCND1 IAP MMP2 MMP14 MMP1 CDKN2B ZBTB17 PABP1 SPRY2 SPRY1 SPRED2 SOS−2 SOS1 GSPT1 UPF1 SIRPA PTPN1 PLD2 RAB11A RAB11FIP3 PTPRF ITGAE ITGAX ITGAD CBL BCR CTTN RAP1A CFL1 PIPK1alpha PIPK1gamma PIPK1beta SRC VASP PXN BCAR1 CRK CRKL PTK2 ILK ITGA4 ITGA5ITGA3 ITGB1 ITGA9 ITGA7 ITGA8 ITGAV ITGA2 ACTN1 ACTN4 ACTN2 ITGA2B ARHGEF4 FYN SHC1 SHC3 SHC2 APC DNM2 CSK GSTM1 SOD1 ADH1A CXCR5 GNAI1 BCL2 EDG2 ADCY1 PARVG PARVA PARVB CAV1 CAV2 CAV3 ITGB6 ITGB4 BAIAP2 ACTG1 VCL ACTB ENAH PFN2 DIAPH1 PFN1 MSN PLAU PLAUR CAPN2 ITGB7 ITGB5 ITGA10 PLA2G1B TP53 SERPINE1 NME3 CD_82 GADD45b CXCL1 CXCL2 SAT1 MAOA HSP90AA1 AR APPL1 NGFRAP1 NLRP3 PYCARD MEFV PSTPIP1 RDX TRIP6 CD226 GNA13 RIPK2 ARHGEF2 CASP3 ARHGEF11 NGFR GJA1 ARHGEF7 PLXNB1 PAK1 DUSP4 DAPK1 ERK2 MAPK8 CDC42 RAF1 MAPK8IP3 FGD3 PRNP FGD1 HSPA2 DGKA NUMB TUBA4A TUBA1C PVR PPAP2A SGPP1 LRP1 SGPP2 NF1 RASA1 RAPGEF2 RASSF1 TPM3 Ship2 RHOD PIK3C3 RAB5A ZFYVE20 EHD1 C1QA HLA−DMA L1CAM SemE NRP1 CD74 SEMA3A CTSL CTSB NOTCH1 LPL SEMA3F NCAM1 CALR TPM2 ACTC1 TPM4 RHEB TSC1 TPM1 100132941 SRGAP2 SRGAP1 DIAPH3 PARD6A RHOG Arhgap14 INSR DNM1L PDK1 Ship1 SYK AKT DNM3 PDPK1 LCK GRB2 TSC2 PLXNA1 PTPN6 TEC NRAS KRAS1 HRAS ELMO1 FARP2 IRS2 IRS1 PIK3CB PIK3R3 CTNND1 GAB1 ABL1 DNM1 SHP−2 PLCG1 PLCG2 PREX−1 RasGRF TIAM−2 ERK1 TIAM1 Vav1 VAV3 VAV2 ITGAM ITGB2 PTEN PIK3CG PIK3R2 PIK3CA PTK2B PIK3R1 IQGAP1 DOCK1 RAC2 RAC1 RAC3 ARHGEF6 ITGAL Ngef PAK4 NCK1 MEK1 PAK2 PAK3 PAK7 PAK6 IQGAP2 THY1 ABI2 NCK2 GIT1 GRLF1 ARHGAP5 VIL2 SLC9A1 DAAM2 DAAM1 DIAPH2 ARHGDIB ARHGDIA MYL6 PPP1R12B MYH10 ARPC2 ARPC5 ARPC1B ARPC3 MYL5 MYH9 PRKACA ROCK1 WASF2 WASF3 WASL WAS ARHGEF1 ARHGEF12 PVRL3 MLCK LIMK1 WASF1 ROCK2 RHOA GSN

PADL 2013, Rome – p.19

slide-20
SLIDE 20

(R)Cytoscape

Cytoscape graphs visualisation software. RCytoscape R bi-directional interface to Cytoscape. rcy r..eal based routines for displaying Prolog graphs in Cytoscape

PADL 2013, Rome – p.20

slide-21
SLIDE 21

r..eal information

http://bioinformatics.nki.nl/~nicos/sware/real also on git://www.swi-prolog.org/home/ pl/git/packages/real.git

PADL 2013, Rome – p.21

slide-22
SLIDE 22

piece-meal prolog bioinformatics

r..eal Swi/Yap <-> R interface pubmed access pumed citation records proSQLite Swi/Yap <-> SQLite interface rcy graph visualisation depth search depth limited reachability versus the more holistic blip : http://www.blipkit.org/

PADL 2013, Rome – p.22

slide-23
SLIDE 23

Aleph

Histogram of r

r Frequency 0.4 0.6 0.8 1.0 1.2 1.4 50 100 150

pagerank(File,nav(Name,Arity,Value)) :- parse(File,Graph), g <- graph(Graph), r <- page..rank(g), Scores <- r$vector, max_element(Scores, Name, Arity, Value).

PADL 2013, Rome – p.23

slide-24
SLIDE 24

bottom-line

r..eal is an intuitive efficient tight integration to R Future work. thread-wrapper port to more Prologs more applications

PADL 2013, Rome – p.24