Revisiting Parameter Estimation in Biological Networks: Influence - - PowerPoint PPT Presentation

revisiting parameter estimation in biological networks
SMART_READER_LITE
LIVE PREVIEW

Revisiting Parameter Estimation in Biological Networks: Influence - - PowerPoint PPT Presentation

S. Cerevisiae Revisiting Parameter Estimation in Biological Networks: Influence of Symmetries Jithin K. Sreedharan Purdue University Krzysztof Wojceich Turowski Szpankowski (Purdue) (Purdue) BioKDD August 5, 2019 <latexit


slide-1
SLIDE 1

Jithin K. Sreedharan

Purdue University

Krzysztof Turowski (Purdue) Wojceich Szpankowski (Purdue)

BioKDD August 5, 2019

  • S. Cerevisiae

Revisiting Parameter Estimation in Biological Networks: Influence of Symmetries

slide-2
SLIDE 2

2

The Problem: Fitting a model

1 3 2 10 8 4 9 5 6 7 11 12

𝐻 Dynamic graph

Estimated parameters

Data stream or fixed data of interactions

X <--> Y X <--> A C <--> M Model

Parameter fitting

Pr(Gn|Gn0; θ)

<latexit sha1_base64="LVMK+LJd3d6TpbD+oGxOAySsrbY=">AZkHiclVltc9tEL6Wt1IKpOVThi+iITOltMEJQykwHZq24LbTBGOatkOU8Ui2HIvIlivJblLhP8Cv4Sv8E/4Nu8+ebMnW5OJfdrbZ3dvb1/uFHvsuWHUaPx37vxb7/z7nsX3r/4waUP/p47fKVZ6E/CbrOQdf3/OCFbYWO546cg8iNPOfFOHCsoe05z+2T+z/fOoEoeuPnkZnY+doaB2P3L7btSIidY+Nw8NsxVca3ZGxp9GsxOPOo3ZD6Y9jM1o4ETW7AvDPOqsbTS2GvgxVgfberCh9E/Lv3zpujJVT/mqyZqBw1UhGNPWpkH4P1bZqDHRjlRMtIBGLuYdNVMXCTshLoc4LKe0OcxPR1q6oieWYIdJe0ePQXENJQm5qnR+M+qPLN+o0Ub5GOGLJD+rMIMVA39HOozujbnj8nMsM5hbmHFdZHhLsNq1Cj0Hh9XQztvfp26PniHTw5xlxOjTqESqgUZdoHlGFwjoC+hYP8hoH8KgFPodGYaFNBlnCuBP1Wlveg362zkl5i/nG9M376cAbNs1MsFu8A0Oy5gTjhtpS36lb5JXG/K/YJzHpYg3sa5b+SvOy9lepObZkiD0ZkYWs0SJNs1Y9GnT7Cm0y7NE0SH5+ZBk/azukSyXcLxDRyS/2JoJWeFhL/J/N+kvJFlDQnrw34FqyepXWKpEXw6IpnFmthmn57F4nBFJ2sSj/e1x1EnEV6b6qXJVHLvQaqXUX73VMaBf719fRkadTPD3W0WPM454jIUzl1Lb6Su3Q7hpYb5d2vVjvEHp5LYmXZit6l3l8wgbY5QE9l+2ZjaxjKYltE3iL/W8QiueL0QvP9bEjEeF85A1H4mZhLo8pnvrgO0U+L6zldZxqi0zY0UWucGZlubgOhohl1n/HuMqS6LMsX5gSojKIBTPtP6kVW2G5jaQ0VXRERLd1pkV6WgT6TLDMo6RIeVy0vW4rMp6NBpjbqI7hQ+rb+g5X1c1rqp5tIWfjVJrsusp8+xI96oANaesE8Xwdw8yJVrCUu6kfkfLK+Hq1FovKdQIulpAiszyazqN4gv4avBGmvONW5UdZzkv/jIDytH8WtX25optqV+1hN/lscARK4uNDxLWnu9KNVITdQUbJmj4j3Df0+QrZxRaylBgVoE9yRf73Nogzlm06xa8xP6iTlfB3eZGNR0nzRVi/THGc6ARjHos0ot95a0SC2IQS9HPqI1MtbI2Ngnyja8WI7uqv0VdKJ7vxa+XYhv18K3CvGtGvh2gd/alX7bL8Tu10IXYauQ9wuQ9yuRXfWwAPuwBrZgG1WYptEkZMJc3dyYjyZTyTlVwLOTw8d+fW81hTptAktcR0jx1zCjVHVztD1zvSpa2GJVLxjVO24hC5yqzPSptqY6H9THfn06VxvUgGT2iHdmZFTIOTZRW1LelXCu0uj6ZJdco47IdpuamUJon8yudv5vC31FNdS38nG8YU03L4hiVM2GgT6ILznIJbdwh6sgRzqy0/TewaL/QplUpVbtV9jVJOqyZ7NZtbxzfMvL6xEyk4foYT4P0cNpdxnxiOz2cxEykxd93MdIGQ0wszifJLwegq8u2hmwdLHIHmCJY4NjOx7usMm9LsROfxgD75VDnAXeMWTmpG6rZmUEcXWg/ZITR5c5C1YwMCrFf8fy5am02tMfzUjZwTGSlZfFcNT0tKd5VIwpP+09Wn2GPWmrSus3CjklBWA30OlTir6lcb+l/WkJnwVUscIxe8GjIXnNVSnZprd2qv3FYOdrhKYsJXb+UBbl1Vi6c9f051r05xAqZp45v8zN5gdUG1YrvtQAmSuM3mYokqTVLN8TH49S6pTHma5PpX3Zh9zs/mTh/xbvKGIa5zsW+iWedWbT54m3SBWZ3YrpR4USj2oIbX+iUA89A1xtM8e1ruLzoKLdycAnUdkuQswrlHZk8m3SPSmjZrjDSfWQEnRP41tG3mWztlPcY2a5j614gc+lIDVCB87031TVzcfezER1bmqiuQxbhv6DqneYQsupOblfxqn3nI/Vb7g9LiSX34BPcXa18MatuhZ2KRqW3AkZ5RZpiqE6HC4bcx4lfWchNUD/5I3neYOXYkO7inpRoXdW5h3jht58cFU/w3tjBvXyArt9TzxDZ4q/yNfdQX+SNj7e0wgV5gH93szwzDprG9vL/3VYHTzb2dr+emvn152Nu7f1fyQuqE/VXWN9u1bdZfuTS3Kyq76S/2t/lH/rl9Zv73+4/qusJ4/pzGfqMzP+uP/AUnEAi0=</latexit>

Seed graph Parameters of the model Observed graph

§ Data usually represents a single snapshot of the graph of dynamic evolution § Random graph models tailored to specific applications provide deep insights unlike general learning models § Examples: asymptotic behavior, clustering properties, properties of motifs (subgraphs or lower/higher order structures), diffusion over the graph etc

Gobs := Gn

<latexit sha1_base64="7eApuPqXdRuPGbQdl7AQ3XIPL2w=">AZhXiclVltc9tEL6Wl5ZSIVPGb6IZjLDlCY4AULpTIemLbhlGmNMk3aIMxnJlhMR2VIl2U2q8fBr+Aq/h3/D7rMnW7L1lmRin/b2d3b25c7xfJdJ4wajf+uXH3n3fev3b9gxsf3vzo409Wbn16EHrjoGfv9zXC15Zmi7zsjej5zItV/5gW0OLd+aZ095vmXEzsIHW/0Irw7aOheTJyBk7PjIh0vLaPTSax3E3s+j2LPC6fT+AyKMukfHK2uNzQZ+jOXBlh6sKf3T9m7dvKO6q81VNjNVS2GqmIxq4yVUi/h2pLNZRPtCMVEy2gkYN5W03VDcKOicsmDpOoZ/R5Qk+HmjqiZ5YZAt0jLS79BYQ01Lrm6dN4AKp8s34jxVukI4bskP5MQpyqu/o5VBf0bc2eE5nhjMLcwrI8Ldg9UOoX1QeD29jO0D+nbpOSId/HlBnDaN+oQKaNQjmktUobCOgL7Fg7zGU3jUBJ9No7DQJoMsYdyZeqst70M/W2envMV8Pn3zftrwhkUzY+wW78CQrDnDuKE21Q9qh7zSmP0V+yQmXayBfc3S32he1v4mNceWDLEnI7KQNZokyaIZkz4tmj2HdnmWKDokPx+SrJ/VI5LlEI536IjkF1szJitc7EX+7zr9hSRrSEgX/tXHfU8tUsNYJPRySzWBPb7NGzWBwu6WRN4vGB9riDiDNJ74Z6TVJ5ECvkVp38V7HhHawfwMdHXk6xdO+jh5jFvcCWEqp7bU12qbdtfAenu068V6h9DLa0m8NF3Su8jETbALp/Sc9meWcg6lpLYNoa32P8GoXi+GD3A7EhHOQ95wJK4X5rJP8TQA3znyeW4tr+NcW9SFHT3kCmdWlovrYIhYZv0PjNsiT7L8oUpISqDWDV/pNakdVmaG4DGV0VHSHRLZ1ZkY42kS4zLOMEGVIuJ12Py6qsSyMfc2PdKTxYfVfPebqcVXNo839bJRak1PmWdHulcFqDlnSiGv/uQKdESlnIn9Uv6YHk9XI7CLirfGbSYagIpMsujySyKx+ivwaUw5R23Kj/Kcl76ZwSUq/0zr+2LFb2rdtUedpPBkegJD4+RFy7uivdTUXYA2SUrOkLwn1Hn2+QXWwhS4lRAQYkV+Tfp9EacU4znWbZmp/QT7qzdXCXiUFN98muapP+OMZ0CgGfVqp5dGCFqkFMejlyGe0RsYaGRsHRNmCF8vRPdVaQie6W7XwnUJ8pxa+XYhv18B3CvzWqfRbqxDbqoUuwlYhHxcgH1cie+pAfZpDWyzANusxDaJIicT5j7OifFkPpGUXwk4P105LezWlOk0yK0xHWMHMI56OqXaDrXehT19wSqXgnqNpxCV3kVmekRbUx0X9ZHfn0yUxvUgGT2iHdmZETIOTZQW1LelXCu0ujyYJdco47I9puamUJon8yudv5vC31QtdS/8gG3yKablcYzKmTDQJ9E5Z7mEDu4QdeQIZ1Za6xIWtQptWpZSZVWrwq4mURc9m82qxZ3jW15ej5CZPEQf83mIPk67i4hnZLeXi5CZvOjPu4AIaMRZubnk4TPRVeRbxfdPFjgCDRHsMCxnol1T2fYhGbHOo9P6ZNPlae4a+zgpGakbmsGdXSh9ZEdQpM3B1kbfGRQiP2KZ89Va7OgPZ6NQsoOjpGsvCyGo6avPc2jYkz5eaejT7MnqDcdXWHlRiGnrAD8LiptwlVvzrQ/7qGzISvWqKPXHBryJxzVku1a67dr1yS9nY4SqJCV+9lQe4ZdVZuXDW96eve3OIFTJPHd9+lZvNT6g2LFd8qQEyV1xn8jBFlSapZvmY/HqWVKc8zGJ9Ku/NHuamsycX+Td/QxHXONm30S3zqjefPLt0g1ie2a2Uul8odb+G1PonAvHQE9QYV/Psabm/6ig0cXMK1B1IkrMI65Z3ZPLcpXtUQst2hZHuIyPoHMO3tr7NZGunvMfIdh1L9wKZS0dqgAqc72Jrpnzu5+F6KhzUxPNZdgy9J9UvcMUWk7Nyf0yTr3n/EX9jtvjXHL5DfgcZ1cTb9yqa2GPomHxDUdyRplmqkKIDicYfhsjfmUtG6C68EfyvqObY0eyg3vqeYnWZ17iBd+8lR8RzvjW3cy0/R9fvqAJEt/ipfcx/1Rd74uAsrjFBhntDvRoZnerytrX4X4flwcH25tY3m9u/ba89vKf/I3Fdfa5uqy9p375XD+ne1Kas7Km/1N/qH/Xv6rXVjdVvV3eE9eoVjflMZX5Wf/wfrQ3+jA=</latexit>

Gn, Gn−1, . . . , Gn0

<latexit sha1_base64="TRYsWKR+kxX6ABi5j6Um9M+poM=">AZjniclVltc9tEL6Wt1IKpPApwxfRTGaYkgQ7DKUw06FpC26Zxhi3STvEmYxky4mIbKmS7CbV+Du/hq/wV/g37D57siVb0m9mlvn929vX25UyzfdcKo0fjvytV3n3v/Q+ufXj9oxsf/Lp2s3PDkNvEvTtg7nesFLywxt1xnbB5ETufZLP7DNkeXaL6zhz/YmoHoeONn0eXvn08Mk/HztDpmxGRTtZu9Y6M1sl4iz7i8XZztmX03IEXhUI4acx6xydrG42dBn6M1UFTDzaU/ul4N2/cVj01UJ7q4kaKVuNVURjV5kqpN8j1VQN5RPtWMVEC2jkYN5WM3WdsBPisonDJOo5fZ7S05GmjumZYZA90mLS38BIQ21qXkGNB6CKt+s30jxFumITukP5MQZ2pLP4fqkr6t+XMiM5xTmHtUYX1EuLuw2iG0Dwqvp5+xfUjfLj1HpIM/L4nTptGAUAGN+kRziSoU1hHQt3iQ13gGj5rgs2kUFtpkCWMO1dvtOUD6Gfr7JS3mM+nb95PG96waGaC3eIdGJE15xg31I76Qd0hrzTmf8U+iUkXa2Bfs/TXmpe1v07NsSUj7MmYLGSNJkmyaMakT4tmL6BdniWKjsjPRyTrF/WAZDmE4x06JvnF1kzIChd7kf+7SX8hyRoR0oX/DlRXPU3tEkuN4NMxySzWxDZ79CwWhys6WZN4fKg97iDiTNK7rV6RVB450Guk1l281zGhHezfUEdHnk7xtK+jx5jHPUdCmMqpvpG7dLuGlhvn3a9WO8IenktiZdmK3qXeTzCBtjlM3ou2zMLWcdSEtsm8Bb73yAUzxejF54bYkciwnIG47EzcJc9imehuC7QD4vrOV1XGiLerCj1zhzMpycR0MEcus/5xiyXRZ1m+MCVEZRALZtp/Uiuy2gzNbSCjq6IjJLqlMyvS0SbSZYZlnCJDyuWk63FZlXVp5GNuojuFB6u39JynqxpX1Tzaws9GqTXZ9ZR5dqx7VYCaU9aJYvh7AJkSLWEpd1K/pA+W18PVKOyh8p1Di6mkCKzPJrOo3iC/hq8Fa841blR1nOS/+MgHK1fxa1fbmi9Se2sdu8tngGJTEx0eIa1d3pa1UhN1DRsmaviTcd/T5GtnFrKUGBVgSHJF/o802iDOWabTrFrzM/pJb74O7jIxqOk+2VMd0h9nOAMaxaDPKrU8WNIitSAGvRz5hNbIWCNj45AoTXixHN1X7RV0ortdC98txHdr4TuF+E4NfLfAb91Kv7ULse1a6CJsFfJhAfJhJbKvHhdgH9fAtgqwrUpsiyhyMmHuk5wYT+YTSfmVgPTRUd+M681RTotQktcx8gxh3A+qtolut6lPnUtLJGKd4qHZfQRW51RlpUGxP9b6sjnz6d60qYFI7pDszcgqEPDuobUmvSnj3aDRdskvOcedE20utLEG0kF/5/K0c/o56rmvpH2SDTzEtyOUTkTBvokuAsl9DFHaKOHOHMSmu/hUXtQptWpVRZ1a6wq0XUZc9ms2p5/iWl9cjZCYPMcB8HmKA0+4y4gnZ7eUiZCYv+riPO0DIaIyZxfk4XPRVeTbRTcPljgCzREscWxmYt3TGTal2YnO4zP65FPlGe4ad3BSM1K3NYM6utAGyA6hyZuDrA0+MijEfsXz56q1WdAez0chZQfHSFZeFsNRM9Ce5lExpvy809Wn2VPUm6usHKjkFNWAH4XlTbhrKpfXeh/VUNmwlct0UcuDVkLjirpdo127XrmlbOxwlcSEr97KA9y6qxcOv709e9OcQKmaeOb7/OzeZHVBtWK7UAJkrjN5mKJKk1SzfEx+PUuqUx5muT6V92YPc7P5k4v8W7yhiGuc7DvolnVm0+ePbpBrM7sVUo9KJR6UENq/ROBeOgRaoyrefa13N90FJq4OQXqNiTJWYR1yzsye7RPSqhZbvCWPeRMXRO4Ftb32aytVPeY2S7jqV7gcylIzVABc73lTXzMXdz0J01LmpieYybBn6T6reYQotp+bkfhmn3nP+qp7h9riQXH4DvsDZ1cQbt+pa2KdoWH7DkZxRZpmqEKLDCYbfxohfWcs2qC78kbzv6OXYkezgvnpaonV5z7ihd9+clQ8xXtjG/fyM3T9gTpEZIu/ytc8QH2RNz7u0gojVJhH9Lud4ZmdrG0l/rsDo43N1pfruz+/vuxv27+j8S19QX6pb6ivbte3Wf7k0dysq+kv9rf5R/6vrd9Zv7f+k7BevaIxn6vMz/rj/wFPUwDO</latexit>

Jithin K. Sreedharan BioKDD'19

slide-3
SLIDE 3

Jithin K. Sreedharan BioKDD'19

3

Why need to revisit the estimation methods?

§ Most of the existing parameter estimation techniques overlook the critical property of graph symmetry (also known formally as graph automorphisms). § The estimated parameters give statistically insignificant results concerning the

  • bserved network

Symmetries of the graph

§ Existing methods heavily depend upon stead-state assumption and asymptotic properties of the graph model § Many of these assumptions has been proven not to exist or exist with strong conditions

Parameter estimation methods

Goal-1: Take into account the number of automorphisms of the observed network to restrict the parameter search to a more meaningful range Goal-2: Use exact non-asymptotic relations

slide-4
SLIDE 4

4

Why need to revisit the estimation methods?

§ Direct computation of likelihood of a dynamic graph model requires 𝑃(𝑜!) computations § Clever techniques with importance sampling or expectation-maximization still requires huge complexity § For e.g., for Duplication-Divergence graph model, it is with a large hidden constant factor (𝑜: no. of nodes, 𝜁: required resolution)

Maximum likelihood method

Θ(n3/ε2)

<latexit sha1_base64="MnlT6rNU0gmimE6lxUxV0u+WI7M=">AZOniclVlbc9tUED4t1IKpPCSGV5EM5kpkKZOpTCTGeatpCWaYwxTdshTjOSLMcisqVKtptU41d+Da/wxB/hlTeGV34Au98e2ZKtW5OJfbRnv909e/ZyjmIFnhuNGo2/zp1/4823n7nwrsX37v0/gcfrlz+6Enkj0Pb2bd9zw+fWbkeO7Q2R+5I895FoSObA856l1co/n06cMHL94ePRWeAcDszjodtzbXNEpKMVo3NgdB73nZF5dfj8hnG9MzFDJ4hczx8+3/7M6Bweraw1Nhv4MZYHW3qwpvRPy7986ZzqK7yla3GaqAcNVQjGnvKVBH9Hqgt1VAB0Q5VTLSQRi7mHTVFwk7Ji6HOEyintDnMT0daOqQnlmBLRNWjz6CwlpqHXN06VxD1T5Zv1GirdIRwzZEf2ZhOirDf0cqTP6tmbPicxoRmHuQYX1I8LdgtUuoQNQeD12xvYefXv0PCId/HlGnA6NuoQKaWQTzSOqUFhHSN/iQV5jHx41wefQKCq0ySBLGHeiXmnLu9DP1jkpbzFfQN+8nw68YdHMGLvFOzAga04wbqhN9bW6SV5pzP6KfRKTLtbAvmbpLzUva3+ZmNLBtiTIVnIGk2SZNGMSZ8WzZ5CuzxLFB2Qnw9I1nfqLslyCc7dEjyi60ZkxUe9iL/d53+IpI1IKQH/+2rtnqU2iWOoJPhySzWBPb7NOzWBwt6WRN4vGe9riLiDNJ7zX1gqTyIVeI7Xu4r2OCe1i/3o6OvJ0iqcDHT3GLO45EqJUTm2p62qbdtfAem3a9WK9A+jltSRemi7pXeTxCRtil/v0XLZnFrKOpS2jeEt9r9BKJ4vRs8918OjAjnI284EtcLczmgeOqB7xT5PLeW13GqLerADhu5wpmV5eI6GCGWf9t4wpLos+yfGFKhMogFky1/6RWZLUZmtARldFR0R0S2fWSEebSJcZlnGMDCmXk67HZVXWo1GAubHuFD6s3tBzvq5qXFXzaHM/G6XWZNdT5tmh7lUhak5ZJ4rh7y5kSrREpdxJ/ZI+WF4Pl6Owg8p3Ai2mkCKzPJoMoviMfpr+FqY8o5blR9lOS/9cwSUp/0zr+2LFb2jdtQedpPBoegJD4+QFx7uitpCLsNjJK1vQp4b6kz5fILraQpcSoAD2SK/K/odEacU4znWbZm/RTzqzdXCXiUFN98mOapH+OMZ0igGfVqp5e6CFqkFMejlyIe0RsYaGRt7RNmCF8vRtmouoRPdzVr4diG+XQvfKsS3auDbBX5rV/qtWYht1kIXYauQ9wqQ9yqRtnpQgH1QA7tbgN2txO4SRU4mzH2UE+PJfCIpvxJwfnroyK9mtaZIp0VoiesYOeYSLkBVO0PXO9OnrklUvGOUbXjErIrc5Ii2pjov91deTJzO9SQVMaod0Z0ZOgJBnF7Ut6VUJ7w6NJgt2yTnuhGg7qZUliF3kVz7/bg5/Sz3WtfRnsiGgmJZbFseonAlDfRKdc5ZLaOMOUeOcGalNV/DomahTctSqxqVti1S9RFz2azanHn+JaX1yNkJg/RxXweovT7iLiIdnt5yJkJi/6uI+7QMhoiJn5+STh89BV5NtDNw8XOELNES5wrGdi3dcZNqHZsc7jPn3yqbKPu8ZNnNSM1G3NoI4utC6yQ2jy5iBrQ4AMirBf8ey5am0WtMezUTZwTGSlZfFcNR0tad5VIwpP+09Wn2GPWmrSus3CjklBWC30OlTir6lcb+l/UkJnwVUsMkAteDZlzmqpTs21O7VXbikHO1wlMeGrt/IQt6w6KxfO+v4MdG+OsELmqePbL3Kz+T7VhuWKLzVA5orTB6mqNIk1Swfk1/PkuqUh1msT+W92cfcdPbkIf/mbyjiGif7FrplXvXmk2eHbhDLMzuVUvcLpe7XkFr/RCAeuo8a42mePS3Bx2FJm5OofockuQswrlHZk8d+geldCyXWGo+8gQOsfwraNvM9naKe8xsl3H0r1A5tKRGqIC53tvomvm/O5nITrq3NREcxm2DP0LVe8ohZTc3K/jFPvOb9XP+H2OJdcfgM+xdnVxBu36lpoUzQsvuFIzijTFWI0OEw29jxK+s5RqoHvyRvO/o5NiR7OCelSidVnHuKF35yVDzCe2MH9/I+un5XPUFki7+mRytrW4v/I1gePNne3Lqxuf3j9tqdW/r/BxfUJ+qKukpe/krdoVtOi3LIVr+q39Tv6o/VP1f/Xv1n9V9hPX9OYz5WmZ/V/4HsGrzKQ=</latexit>

Given one snapshot of the graph, (𝑜 − 𝑜))! ways to arrange the

  • rder of arrival of nodes

§ Seed graphs play an important role in biological networks § Previous solutions form seed graphs as cliques

Seed graph choice

ˆ θ = argmax

θ

Pr(Gn|Gn0; θ)

<latexit sha1_base64="OH1sFL0pghrEXU+F7AOjw74nsU=">AZvXiclVltc9tEL6Wt1IKpPApwxe1mcyUkgYnDKXAdGjaglumMa7btB2ijEeS5VhElRJdpIK/y5+Cx/4Cn+D3edOtmTrcnEPu3ts7u3ty93ihm4ThS3Wn9fuPjOu+9/8GlDy9/dOXjTz5du/rZi8ifhJZ9YPmuH74yjch2Hc8+iJ3YtV8FoW2MTd+aZ484PmXUzuMHN97Hp8H9tHYOPacoWMZMZH6a0/1Q03x4k+MuJEj0d2bMxm2l1N8Jj/Zo+Ns76CebTKb0b3mj3Pe1Prd1PvH5r9mNm+ktNP+qvbS2W/jRVgc7arAh1E/Xv3rlptDFQPjCEhMxFrbwRExjVxgiot9DsSNaIiDakUiIFtLIwbwtZuIyYSfEZROHQdQT+jymp0NF9eiZUZAW6TFpb+QkJrYVDwDGg9Bld+sX8vwlulIDuiP4MQI7GlniNxTt/m/DmVGc0pzD2usT4m3B1Y7RA6AIXY+VsH9K3S8x6eDPc+K0aTQgVEgji2guUSWFdYT0LT3IaxzBowb4bBpFpTZpZAnjTsQbZfkA+tk6O+Mt5gvom/fThjdMmplgt3gHxmTNCcYtsS2+F7fJK635X7lPEtLFGtjXLP1U8bL208wcWzLGnhkIWs0SJMwZ9mjR7Bu3yWUbRIfn5kGT9Iu6TLIdwvENHJL/cmglZ4WIvin836S8iWNCuvDfgeiJ5ldYqkxfOqRzHJNbLNPz9LiaEUna5IeHyqPO4g4g/TeEq9JKo8c6NUy6y7f64TQDvZvqKjSKf0dKCiR5vHPUdClMmpHfG12KXd1bBei3a9XO8YenktqZdmK3qXeXzChtjlET1X7ZmJrGMpqW0TeIv9rxGK58vRC8NsSMx4XzkDUfiZmkuBxRPQ/CdIZ8X1vI6zpRFOuywkCucWXkuroMRYpn139WusyT6rMoXpkSoDNKCmfKfrBV5bZri1pDRdEREd1UmRWraJPS5QzLOEaGVMvJ1uOqKuvSKMDcRHUKH1ZvqTlfVTWuqkW0hZ+1Smvy6nyrKd6VYiaU9WJEvh7AJkyWqJK7rR+yT5YXQ9Xo1BH5TuBFkNMIUXO8mg6j+IJ+mv4VpjqjluXH1U5L/tnDJSr/LOo7csVXRd7Yh+7yWeDI1BSHx8irl3VlbYyEXYXGSXdI1w39LnKbKLWQpCSrAkORK+T/QaIM4Z7lOs2rNz+gn+nwd3GUSULN9Uhd0p/kOEMaJaDParXcX9Iia0ECejXyMa2RsVrOxiFRduDFarQlOivoVHenEb5Xiu81wndL8d0G+F6J3q1fuUYjuN0GXYOuSDEuSDWqQlHpVgHzXAtkuw7VpsmyjyZMLc/YIYT+dTScWVgPTRUd+M681ZTpNQsu4TpBjDuECVLVzdL1zdepaWCIr3jGqdlJBl3LrM9Kk2pjqf1sdxfTpXG9aAdPaIbszI6dAyGcHtS3tVSnvHo2mS3bJc9wJ0fYyK0sRbeRXMX+7gL8rnqta+jvZEFBMy1sWx6g8E4bqJLrgrJbQwx2iRzJmZfWeQuLOqU2rUqps6pTY1ebqMuezWfV8s7xLa+oR8iZIsQA80WIAU67y4jHZLdfiJAzRdHfdwBQo48zCzOJymfi64iv1083CJI1Qc4RLHZi7WfZVhU5qdqDwe0SefKke4a9zGSU3L3NY06uiSNkB2SJp8c5C3IUAGRdivZP5ctzYT2pP5KLs4BjJy8tjOGoGytM8KsdUn3d6jR7jHrTUxVW3ijkKSsEv4tKm3LW1a8e9L9uIDPlq5cYIBfcBjIXnPVS7YZrtxuv3BQ2drhOYsrXbOUhblNVi45m/szUL05wgqZp4lvyrM5odUG1YrvqwBcq68zhRhyipNWs2KMcX1LK1ORZjl+lTdm3MzeZPLvJv8YiaXCy76JbFlVvPnqdINYndmrlXpQKvWgdTmJwLpoYeoMa7i2Vdyf1NRaODmFIqbkCTPIqxbviOTzrdo1Javit4qo940DmBb21m8nXTvkeI91TNUL5Fw2UkNU4GLvTVXNXNz9TERHk5ua1FyFrUL/QdU7yqDlqTm9XyaZ95y/ime4PS4kV9+Az3B2NfDGrb4WhQNy2840jPKLFcVInQ4ieG3MdKvrOUWqC78kb7v0AvsSHdwXzyp0Lqcx/xwm8/OSqe4L2xjXv5CF1/IF4gsqW/qtc8QH2Rb3zcpRXGqDAP6fdWjmfWX9vYWf6vw+rgxe72zjfbu093N+7dUf+RuCS+ENfFDdq378Q9ujd1KSst8Zf4R/wr/lv/ad1ed9c9yXrxgsJ8LnI/6f/Az0UFVM=</latexit>

= X

Gn0+1,...,Gn−1,Gn∈G(Gn0,Gn) n

Y

k=n0+1

Pr(Gk|Gk−1; θ)

<latexit sha1_base64="deXSodo8uUm3QOuL4yXLfQ+KEXw=">AZ5niclVltc9tEL7wUkopNIVPGb6IZjJT2iTYSgFJjNW3DLNMGYpu0QBY8ky7GwLKmS7CYV/gt8Y/jK3+IbP4Xd5062ZeutycQ+7e2zu7e3L3eKGbhOFDca/69fY71567/L7Vz64+uFH19avf/ws8sehZR9bvuHL0wjsl3Hs49jJ3btF0FoGyPTtZ+bwc8/3xih5Hje0/ji8A+HRlntN3LCMmUnf9tX6yr+nReNRNWt3E6zZuN6fbutvz42ibCTv02Op6mu7Q38iIB5bhJq3pTcmMuc+nmh6Efq+bDPeVhOlviUfUdkh8wz+Id0hyvtPNUaLHAzs2GHLaXd9s7Dbwo60OmqwKdRP279+9ZbQRU/4whJjMRK28ERMY1cYIqLfE9EUDREQ7VQkRAtp5GDeFlNxhbBj4rKJwyDqkD7P6OlEUT16ZpkR0BZpcekvJKQmthRPj8Z9UOU369cWeIt0JAd0Z9BiIHYVs+RuKBvc/acyoxmFOYeVgfE+4urHYIHYDC67Eytvfp26XnmHTw5wVx2jTqESqkU0l6iSwjpC+pYe5DUO4FEDfDaNokKbNLKEcUPxWlneg362zl7wFvMF9M37acMbJs2MsVu8AyOyZohxQ+yKb8Qd8kpj9lfsk4R0sQb2NUt/pXhZ+6uFObZkhD3xyELWaJAk2YM+jRp9hza5bOMohPy8wnJ+kHcJ1kO4XiHTkl+sTVjsLFXuT/btFfRLJGhHThv2PREU8WdomlxvCpRzKLNbHNPj1Li6MVnaxJeryvPO4g4gzSuyNeklQeOdCrLay7eK8TQjvYv76Kjyd0tOBih5tFvcCdFCTjXF2KPdlfDei3a9WK9I+jltaRemq7oXebxCRtilwf0XLZnJrKOpaS2jeEt9r9GKJ4vRs8918eOxITzkTciVuFuRxQPXBd458nlvL6zhXFumw0KucGZlubgORohl1r+v3WBJ9FmWL0yJUBmkBVPlP1krsto0xa0ho6uiIyK6qTIrVtEmpcsZlnGDCmXs1iPy6qsS6MAc2PVKXxYva3mfFXVuKrm0eZ+1kqtya6nzLOe6lUhak5ZJ0rg7x5kymiJSrnT+iX7YHk9XI1CHZVvC2GmECKnOXRZBbFY/TX8I0w5R23Kj/Kcl72zxgoV/lnXtuXK7ouDsQhdpPBqegpD4+QVy7qitL0TYPjJKrukzwn1Fn6+QXWwhS0lQAfokV8r/lkabxDnNdJpVa75HP9Fn6+Auk4C62Cd10Sb9SYzpFEC+rRSy/0lLbIWJKCXIx/TGhmrZWzsE6UJL5ajLXG0gk51H9XCdwrxnVr4diG+XQPfKfBbp9JvR4XYo1roImwV8kEB8kEl0hKPCrCPamBbBdhWJbZFHkyYe5uToyn86mk/ErA+emiI7+e1ZoinSahZVwnyDGHcAGq2gW63oU6dc0tkRXvDFU7KaFLudUZaVJtTPW/qY58+mSmN62Aae2Q3ZmREyDks4PalvaqlPeARpMlu+Q5bki0g4WVpYgW8iufv5XD3xZPVS39lWwIKblLYtjVJ4JQ3USnXOWS+jgDlFHjuTMSjt6A4uOCm1alVJl1VGFXS2iLns2m1XLO8e3vLweIWfyED3M5yF6O0uIx6T3X4uQs7kR/3cQcIOfIwMz+fpHwuor8dtHNwyWOUHGESxbmVj3VYZNaHas8nhAn3yqHOCucQcnNW3htqZR5e0HrJD0uSbg6wNATIown4ls+eqtZnQnsxGEWUHx0hWXhbDUdNTnuZRMab8vNRp9kz1JuOqrDyRiFPWSH4XVTalLOqfnWg/2UNmSlftcQAueDWkDnrJZq1y7XvlprCxw1USU756Kw9xy6qzcslZ35+B6s0RVsg8dXx7OzebH1JtWK34sgbIueI6k4cpqjRpNcvH5NeztDrlYZbrU3lv9jE3nT25yL/5G4qkxsm+jW6ZV735KnTDWJ15qBS6nGh1OMaUufCKSHqLGuIrnUMn9SUWhgZtTKG5BkjyLsG75jkw+63SPSmnZruCpPuJB5xi+tdVtJls75XuMbNcxVS+Qc4uRGqIC53tvomrm/O5nIjrq3NSk5jJsGfp3qt7RAlqemtP7ZbLwnvNH8Qtuj3PJ5Tfgc5xdDbxq6FkXD8huO9IwyzVSFCB1OYvhtjPQra9kB1YU/0vcdeo4d6Q4eiclWld1HiJe+O0nR8UTvDe2cS8foOv3xDNEtvRX+Zp7qC/yjY+7tMIYFeYh/e5keKbd9c3m8n8dVgfP9nabX+7u/by3e+u+o/EZfGpuCFu0r59Le7RvalNWmJ/9YurV1bW98YbPy58dfG35L1rTWF+URkfjb+R+S/CFO</latexit>

set of all sequences of graphs that starts with 𝐻*+ and ends at 𝐻*

Goal-3: Achieve Θ(𝑜) complexity Goal-4: Form a seed graph with biological relevance

slide-5
SLIDE 5

5

Duplication-Divergence model (vertex-copying model)

§ Duplication: Select a node 𝑣 from 𝐻/ uniformly at random. New node 𝑤 copies all connections of 𝑣. § Divergence: Each of the new made connections of 𝑤 are randomly deleted with probability 1 − 𝑞. For all other nodes, create a connection randomly with 𝑤 with probability 𝑠/𝑙

𝐻*+

h f g e c i d b a

1 2 3 4

𝐻*

Start with seed graph 𝐻*). A time step 𝑙:

slide-6
SLIDE 6

6

Datasets Used

Protein-protein interaction (PPI) networks of 7 species

Original graph Gobs Seed graph Gn0 Organism Scientific name # Nodes # Edges log |Aut(G)| # Nodes # Edges Baker’s yeast Saccharomyces cerevisiae 6,152 531,400 267 548 5,194 Human Homo sapiens 17,295 296,637 3026 546 2,822 Fruitfly Drosophila melanogaster 9,205 60,355 1026 416 1,210 Fission yeast Schizosaccharomyces pombe 4,177 58,084 675 412 226 Mouse-ear cress Arabidopsis thaliana Columbia 9,388 34,885 6696 613 41 Mouse Mus musculus 6,849 18,380 7827 305 7 Worm Caenorhabditis elegans 3,869 7,815 3348 185 15

Selection of seed graph

§ As the graph induced in the PPI network by the oldest proteins, those with the largest phylogenetic age (taxon age) § The age of a protein is based on its family’s appearance on a species tree, and is estimated via protein family databases and ancestral history reconstruction algorithms

Princeton Protein Orthology Database (PPOD) along with OrthoMCL and PANTHER for the protein family database and asymmetric Wagner parsimony as the ancestral history reconstruction algorithm Data collected from BioGRID. Removed self-interactions (self-loops), multiple interactions (multiple edges), and interspecies (organisms) interactions of proteins.

Jithin K. Sreedharan BioKDD'19

slide-7
SLIDE 7

Jithin K. Sreedharan BioKDD'19

7

Influence of Parameters on Symmetries of the Model

§ Neglected in most of the prior works § Real-world PPI networks exhibit large number of symmetries § Erdős–Rényi and preferential attachment models are asymmetric with high probability § Cross-checking with the number of automorphisms of the real-world network forms a null hypothesis test for the model under consideration

𝑐 𝑒 𝑓 𝑑 𝑏

Symmetries of the graph (Graph Automorphism): An automorphism of 𝐻 is adjacency preserving permutation of vertices of 𝐻 (i.e., a form of symmetry) The collection Aut(𝐻) of automorphisms of 𝐻 is called autmorphism group of 𝐻

slide-8
SLIDE 8

8

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p 5.0 4.0 3.0 2.0 1.0 0.0 r 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 p 5.0 4.0 3.0 2.0 1.0 0.0 r 10−3 10−2 10−1 100 101 102 103 104 105

generated from the DD-model. The seed graph

E[log |Aut(Gn)|]

<latexit sha1_base64="0tn2xiMewQWDiS0+Ql17IFJamBE=">AZM3iclVlbc9tUED4t1IKpPAShfRTGYKpMEJQynMdKbpBbdMY4Jp2g5WJiPZciIiW6pku0lVD7+GV3jixzC8MbzyH9j9siWbN2aTOyjPfvt7tmzl3MUO/DcaNRo/HXu/Guv/HmWxfevjOpXfe3/l8gePI38cdp39ru/54VPbihzPHTr7I3fkOU+D0LEGtuc8sU/u8PyTiRNGrj98NDoLnIOBdTR0+27XGhHpcOUjs2OY9zqm5x8ZL82d8ehq83D46csD8+BwZa2x2cCPsTzY0oM1pX/2/MuXzilT9ZSvumqsBspRQzWisacsFdFvR2phgqIdqBioU0cjHvqKm6SNgxcTnEYRH1hD6P6KmjqUN6ZpkR0F3S4tFfSEhDrWueHo37oMo36zdSvEU6YsiO6M8ixLHa0M+ROqNve/acyIxmFOYeVFg/ItwNWO0SOgCF19PN2N6nb4+eR6SDP8+I06FRj1AhjbpE84gqFNYR0rd4kNd4DI9a4HNoFBXaZJAljDtRL7TlPehn65yUt5gvoG/eTwfesGlmjN3iHRiQNScYN9Sm+kZdJ680Zn/FPolJF2tgX7P05qXtT9PzbElA+zJkCxkjRZJsmnGok+bZk+hXZ4lijrk5w7J+k7dJlku4XiHDkh+sTVjsLDXuT/rtNfRLIGhPTgv3Vg9Tu8RSR/DpkGQWa2KbfXoWi6MlnaxJPN7XHncRcRbpvaekVQeudBrpNZdvNcxoV3sX19HR5O8XSgo8eYxT1HQpTKqS31hdqm3TWw3i7terHeAfTyWhIvTZf0LvL4hA2xy8f0XLZnNrKOpS2jeEt9r9BKJ4vRs8918eOjAjnI284EtcLczmgeOqD7xT5PLeW13GqLTJhRxe5wpmV5eI6GCGWf9N4wpLos+yfGFKhMogFky1/6RWZLUZmtARldFR0R0W2fWSEebSJcZlnGEDCmXk67HZVXWo1GAubHuFD6s3tBzvq5qXFXzaHM/G6XWZNdT5tmh7lUhak5ZJ4rh7x5kSrREpdxJ/ZI+WF4Pl6PQROU7gRZLTSBFZnk0mUXxGP01fCVMecetyo+ynJf+OQLK0/6Z1/bFim6qHbWL3eSzwQEoiY87iGtPd6WNVITdREbJmj4h3Ff0+RzZxRaylBgVoE9yRf63NFojzm0yxbcw/9xJytg7tMDGq6T5pqj/THGc6QRjHo0otxe0SC2IQS9HPqA1MtbI2Ngnyha8WI7uqtYSOtHdqoVvF+LbtfB7hfi9Gvh2gd/alX5rFWJbtdBF2CrknQLknUpkV90vwN6vgW0WYJuV2CZR5GTC3Ic5MZ7MJ5LyKwHnp4eO/GJWa4p02oSWuI6RYy7hAlS1M3S9M3qmlsiFe8IVTsuoYvc6oy0qTYm+l9VRz59MtObVMCkdkh3ZuQECHl2UduSXpXw7tBosmCXnONOiLaTWlmCaCK/8vmbOfx76pGupT+TDQHFtNyOEblTBjqk+ics1xCG3eIOnKEMyut9QoWtQptWpZSZVWrwq4mURc9m82qxZ3jW15ej5CZPEQP83mIHk67i4gHZLefi5CZvOjPu4CIaMhZubnk4TPQ1eRbw/dPFzgCDVHuMCxnol1X2fYhGbHOo+P6ZNPlce4a1zHSc1I3dYM6uhC6yE7hCZvDrI2BMigCPsVz56r1mZDezwbRZQdHCNZeVkMR01Pe5pHxZjy805bn2aPUG/ausLKjUJOWSH4PVTahLOqfrWh/1kNmQlftcQAueDVkDnrJbq1Fy7U3vltnKw1USE756Kw9xy6qzcuGs789A9+YIK2SeOr79PDeb71JtWK74UgNkrjO5GKk1SzfIx+fUsqU5mMX6VN6bfcxNZ08e8m/+hiKucbLfQ7fMq958jTpBrE8s1Mpdb9Q6n4NqfVPBOKhu6gxnubZ1XJ/0Fo4eYUqs8gSc4irFvekcmzSfeohJbtCkPdR4bQOYZvHX2bydZOeY+R7Tq27gUyl47UEBU43sTXTPndz8b0VHnpiay7Bl6F+oekcptJyak/tlnHrP+b36CbfHueTyG/Apzq4W3rhV18IuRcPiG47kjDLNVIUIHU4w/DZG/MparoHqwR/J+w4zx45kB3fVwxKtyzp3ES/89pOj4iHeGzu4lx+j6/fUY0S2+Gt6uLK2tfg/guXB4+3NrS83t3/cXrt1Q/4IL6WF1RV8nLX6tbdMvZoxzql/Vb+p39cfqn6t/r/6z+q+wnj+nMR+qzM/qf/8Dl4TwnQ=</latexit>

Gn0 = K20

<latexit sha1_base64="aftuw0KzlQrNmRprvb8HlnckKZg=">AZL3iclVlbxtVED4t1IKpCkSLwsjSIhSIMTRClIlZqm4BYS45qmrYgja9deJ4vX3u2u7SZd+cfwCk/8GsQL4pV/wcw3Z+1de29NFPvsnPlm5syZyzkby3edcFyr/X3p8muv/HmW1fevrOtXfe3/t+gdPQm8SdO2jrud6wTPLDG3XGdlHY2fs2s/8wDaHlms/tQb7P90ageh40ejy98+2Rono6cvtM1x0TqrH3UPjbqnWjUqc2MO8aPnWi3NmufdNY2ats1/Birgx092FD6p+ldv3ZJtVPeaqrJmqobDVSYxq7ylQh/R6rHVTPtFOVES0gEYO5m01U1cJOyEumzhMog7o85SejV1RM8sMwS6S1pc+gsIahNzdOjcR9U+Wb9RoI3T0cE2SH9mYQ4U1v6OVQX9G3Nn2OZ4ZzC3MS68eEuw2rHUL7oPB6uinb+/Tt0vOYdPDnBXHaNOoRKqBRl2guUYXCOgL6Fg/yGs/gURN8No3CXJsMsoRxA/VSW96DfrbOTniL+Xz65v204Q2LZibYLd6BIVkzwLimtU36hZ5pTb/y/dJRLpYA/uapb/QvKz9RWKOLRliT0ZkIWs0SZJFMyZ9WjR7Du3yLF0TH4+Jlnfq3skyEc79AJyc+3ZkJWuNiL7N9N+gtJ1pCQLvx3pFrqILFLHUMn45IZr4mtmjZ7E4XNHJmsTjfe1xBxFnkt6b6jlJ5ZEDvUZi3fl7HRHawf71dXRk6RP+zp6jHncySEiZzaUV+oXdpdA+vt0q7n6x1CL68l9tJsRe8yj0fYALt8Rs9Fe2Yh61hKbNsE3mL/G4Ti+Xz0wnN97MiYcB7yhiNxMzeXfYqnPvjOkc8La3kd59qiNuzoIlc4s9JcXAdDxDLrv2PcYEn0WZQvTAlRGcSCmfaf1Iq0NkNzG8josugIiW7pzBraBPpMsMyTpEhxXKS9bioyro08jE30Z3Cg9Vbes7TVY2rahZt4Wej0Jr0eo8O9K9KkDNKepEfzdg0yJlrCQO65f0geL6+FqFLZR+QbQYqopMgsj6bzKJ6gvwavhCnuGX5UZTz0j/HQLnaP4vavlzR2pPHWI3+WxwAkrs42PEtau70lYiwu4go2RNnxDuK/p8gexiC1lKhArQJ7ki/1sabRDnLNVpVq35Dv2kPV8Hd5kI1GSfbKsm6Y9SnAGNItBnpVruLWmRWhCBXox8SGtkrJGysU+UHXixGN1VjRV0rLtRCd/Kxbcq4Zu5+GYFfCvHb61SvzVysY1K6DxsGXI/B7lfiuyqBznYBxWw9RxsvRbJ4qcTJi7kxHj8XwsKbsScH6Mgv57UmT6dFaInrCDnmEM5HVbtA17vQp6FJVLxTlG1owK6yC3PSItqY6z/VXVk06dzvXEFjGuHdGdGToGQZwe1Le5VMe8ejaZLdsk5bkC0vcTKYkQd+ZXNX8/gb6rHupb+Qjb4FNy+IYlTNhoE+iC85iCS3cIarIEc60tMYrWNTItWlVSplVjRK76kRd9mw6q5Z3jm95WT1CZrIQPcxnIXo47S4jHpLdXiZCZrKij/u4A4SMRphZnE9iPhdRb5dPNgiSPQHMESx2Yq1j2dYVOaneg8PqNPlWe4a5xCyc1I3FbM6ijC62H7BCavDlI2+Ajg0LsVzR/LlubBe3RfBRSdnCMpOWlMRw1Pe1pHuVjis87LX2aPUW9aekKzcKOWUF4HdRaWPOsvrVgv7nFWTGfOUSfeSCW0HmgrNcql1x7XblVvKxg6XSYz5q08wC2rysqFs7o/fd2bQ6yQear49vPMbL5PtWG14ksNkLn8OpOFyas0cTXLxmTXs7g6ZWGW61Nxb/YwN5s/uci/xRuKqMLJvolumVW9+eTZphvE6sxeqdSjXKlHFaRWPxGIh+6jxria51DL/UlHoYmbU6A+gyQ5i7BueUcmz26R8W0dFcY6T4ygs4JfGvr20y6dsp7jHTXsXQvkLlkpAaowNnem+qaubj7WYiOKjc10VyELUL/StU7TKDl1BzfL6PEe84f1M+4PS4kF9+Az3F2NfHGrbwWdikalt9wxGeUWaoqhOhwguG3MeJX1nITVBf+iN93tDPsiHfwUB0UaF3VeYh4befHBUHeG9s415+hq7fU08Q2eKvWdtY2f5fwSrgye72ztfbu8+2t24e1v/+CK+ljdUJ+Sl79Wd+mW06QcYv2/qd/VH+t/rv+1/s/6v8J6+ZLGfKhSP+v/Q9fLu6X</latexit>

Influence of Parameters on Symmetries of the Model

𝑜 = 100 𝑜 = 2000

For large ranges of 𝑞 and 𝑠, it is impossible to generate graphs with large number of automorphisms

Statistical test for significance of the number of symmetries with the estimated parameters

b b pu = 1 m

m

X

i=1

1{log |Aut(G(i)

n )| ≥ log |Aut(Gobs)|}

pl = 1 m

m

X

i=1

1{log |Aut(G(i)

n )| ≤ log |Aut(Gobs)|},

Then p-value = 2 min{pu, pl}. shown in Figure 2. The distrib

Let be 𝑛 graphs generated from the with the estimated parameters using any fitting method

estimate the statistical Let G(1)

n , . . . , G(m) n

be ws:

DD-model(n, b p, b r, Gn0)

<latexit sha1_base64="Ljd4PxnoYgduD7He2duV4fZIYw=">AZniclVltc9tEL6Wt1IKpPApwxfRTGZKSYMThlKY6UzTlrplGteYpu0QZzKSJcisqVKtpNU4/Br+Er/AX+DbvPnmzJ1luTiX3a2d3b29f7hQr8Nxo3Gj8d+nye+9/8OFHVz6+sm1Tz/7fO36Fy8jfxL2nIOe7/nha8uMHM8dOQdjd+w5r4PQMYeW57yTh/y/KupE0auP3oxvgico6F5MnL7bs8cE+l4bd7aHRte+jbjndztNU9c21nYI7jYLYh7Mto3kcj4bs2+M7tHx2kZju4EfY3WwowcbSv+0/evXbqmuspWvemqihspRIzWmsadMFdHvodpRDRUQ7UjFRAtp5GLeUTN1lbAT4nKIwyTqKX2e0NOhpo7omWVGQPdIi0d/ISENtal5bBr3QZVv1m+keIt0xJAd0Z9JiIHa0s+RuqBva/6cyIzmFOYeVlg/JtxdWO0SOgCF19PL2N6nb4+ex6SDPy+I06GRTaiQRj2ieUQVCusI6Vs8yGscwKMm+BwaRYU2GWQJ407VW25Df1snZPyFvMF9M376cAbFs1MsFu8A0Oy5hTjhtpWP6k75JXG/K/YJzHpYg3sa5Z+pnlZ+1lqji0ZYk9GZCFrNEmSRTMmfVo0ew7t8ixRdEh+PiRZj9UDkuUSjnfoiOQXWzMhKzsRf7vJv1FJGtISA/+O1Ad9Sy1Syx1DJ+OSGaxJrbZp2exOFrRyZrE43tcRcRZ5Le2+oNSeWRC71Gat3Fex0T2sX+9XV05OkUTwc6eox53HMkRKmc2lHfqV3aXQPr7dGuF+sdQi+vJfHSbEXvMo9P2BC7PKDnsj2zkHUsJbFtAm+x/w1C8XwxeuG5PnZkTDgfecORuFmYywHFUx9858jnhbW8jnNtURd29JArnFlZLq6DEWKZ9d8zbrAk+izLF6ZEqAxiwUz7T2pFVpuhuQ1kdFV0RES3dGaNdbSJdJlhGSfIkHI56XpcVmU9GgWYm+hO4cPqLT3n6rGVTWPtvCzUWpNdj1lnh3pXhWi5pR1ohj+tiFToiUq5U7ql/TB8nq4GoVdVL5TaDHVFJklkfTeRP0F/Dd8KUd9yq/CjLemfY6A87Z9FbV+u6F21p/axm3w2OAIl8fEh4trTXWkrFWH3kFGypq8J9wN9niG72EKWEqMC9EmuyP+ZRhvEOct0mlVrfkE/6c7XwV0mBjXdJ7uqTfrjDGdIoxj0WaWB0tapBbEoJcjn9IaGWtkbOwTZQdeLEf3VGsFnehu1cJ3CvGdWvh2Ib5dA98p8Fun0m+tQmyrFroIW4V8WIB8WInsqScF2Cc1sM0CbLMS2ySKnEyY+zgnxpP5RFJ+JeD89NCR385rTZFOi9AS1zFyzCVcgKp2ga53oU9dC0uk4p2gascldJFbnZEW1cZE/7vqyKdP53qTCpjUDunOjJwCIc8ualvSqxLePRpNl+ySc9wp0fZSK0sQTeRXPn8zh7+tXuha+gfZEFBMy2LY1TOhKE+iS4yV0cIeoI0c4s9Ja72BRq9CmVSlVrUq7GoSdmz2axa3jm+5eX1CJnJQ9iYz0PYO0uI56S3X4uQmbyo/7uAuEjEaYWZxPEj4PXUW+PXTzcIkj1BzhEsdmJtZ9nWFTmp3oPB7QJ58qB7hr3MFJzUjd1gzq6EKzkR1CkzcHWRsCZFCE/Yrnz1Vrs6A9no8iyg6Okay8LIajxtae5lExpvy809Gn2RPUm46usHKjkFNWCH4PlTbhrKpfHeh/U0NmwlctMUAueDVkLjirpTo1+7UXrmlHOxwlcSEr97KQ9y6qxcOv7M9C9OcIKmaeOb7/NzeZHVBtWK7UAJkrjN5mKJKk1SzfEx+PUuqUx5muT6V92Yfc7P5k4f8W7yhiGuc7NvolnVm0+eXbpBrM7sVUo9KJR6UENq/ROBeOgRaoynefa13Oc6Ck3cnEJ1C5LkLMK65R2ZPHfpHpXQsl1hpPvICDon8K2jbzPZ2invMbJdx9K9QObSkRqiAud7b6pr5uLuZyE6tzURHMZtgz9J1XvKIWU3Nyv4xT7zl/Vb/j9riQXH4DPsfZ1cQbt+pa2KNoWH7DkZxRZpmqEKHDCYbfxohfWctUD34I3nf0c2xI9nBfWsROuqzn3EC7/95Kh4hvfGDu7lA3R9W71EZIu/ytdso7IGx9vaYVjVJhH9Hs7wzM7XtvYWf6vw+rg5e72zvfbu7/tbty/q/8jcUV9pW6om7RvP6r7dG9qU1b21F/qb/WP+nfdWH+8vr/+XFgvX9KYL1XmZ/31/4s6CJg=</latexit>
slide-9
SLIDE 9

9

Mismatch in the number of symmetries and graph statistics with the mean-field approach

Organism b p b r E[log |Aut(Gn)|] p-value Baker’s yeast 0.28 38.25 Human 0.43 2.39 10.81 Fruitfly 0.44 0.75 3771.99 Fission yeast 0.46 1.02 897.48 Mouse-ear cress 0.44 0.43 18596.72 Mouse 0.48 0.12 34961.69 Worm 0.47 0.14 15700.26

parameters of the DD-model and average number of symmetries using

  • R. Pastor-Satorras, Eric Smith, and Ricard V Solé. Journal of Theoretical Biology, 2003
  • Fereydoun Hormozdiari, Petra Berenbrink, Nataša Pržulj, and Süleyman Cenk Sahinalp. PLoS

Computational Biology, 2007

  • Mingyu Shao, Yi Yang, Jihong Guan, and Shuigeng Zhou. Briefings in Bioinformatics, 2013

γ = 1 + 1 p − pγ−2 and r = ✓1 2 − p ◆ D(Gobs), for p < 1 2. Average degree Power-law exponent

Why existing parameter estimation methods fail in practice (contd.)?

References

Jithin K. Sreedharan BioKDD'19

slide-10
SLIDE 10

10

Why existing parameter estimation methods fail in practice (contd.)?

Cutoff neglects a huge percentage of the data

Organism b γ Cutoff percentile Baker’s yeast 4.55 94.98 Human 2.85 92.33 Fruitfly 2.71 88.00 Fission yeast 2.43 88.31 Mouse-ear cress 2.68 93.89 Mouse 2.29 78.58 Worm 2.41 88.23

<latexit sha1_base64="bSILkM2JRsGLoylt5RhyBMjCktQ=">AbiXiclVndc9vGET/LSRsraRu3j3pBYqntpBZDUpZNeSYzsZVEdsdSGDayMxE0mgNwJFHhyweQkoLhn9mH/i196e7eASRIfFkakYe9/e3u7e3HWRFnhsn3e5/723d/+j3/3+kwfbn372hz/+6fOHf34bhzNpi3M79EL5i8Vj4bmBOE/cxBO/RFJw3/LEO+v6GOfzYWM3TD4ObmLxKXPJ4E7dm2eAOnq4VZgWmLiBmnCAbK4mCbWF5fb5jyOuC3S/W7n0BD+YvuBaYsgEdINJjCOfe56faDB0vszONSo1Ptm2AwHQSRnLmCRz/KCc8cGPf+Kuxa964jpjyJDUn3Pf5YheIx7MkHI+NSEjU5HrCME2S4btOJuMlvxbyb7FxJ3icoKAncNDBO8ePekcDXY15NXM5wFS+52Bnu53Dg6y6R/kzE3G3p3ieNYjsGg0+3mHG6MDlvq6XeHGRsB72M7TScxWJfcGnYUsSxYnw6UBoPOoOj3RU+Nds/otlng85hbu67UPpaSW5LP7fWtMIkCX3lAlMETu5t3BW9T/k27Zk2j3Bn0+/jxPV5IhwjCm+ENDx+Y4jbKAzAuwYPHEOK9zNXwry94fkbN5kayVQYvuDB/tgVnmPwKJIht6ekxOW8NCQ5nIq+Hw9VXoOUujvkr3D7RZmdkQYFefP+qCp/H2Bz09OAR0z/D8OFn95jJHBYym82YzwQLWAJj3EWw+8F67Eui4B2yVKgSRi5NC/Ygm0DdgZcAjg4UK/hcwJPF5oawDPKjAltgxYP/iQgDbaneRwYj4mqvlG/scJbpSMl2TH8cUBM2WP9HLM7+Lby50xmnFOQ2+wPgHcgKx2AR0RBdjF2wfw7cHzwnowM874BQwcgAlYWQDzQOqoqAOCd/Kg7jGKXmUE5+AUVxpkwGWIO6a/aYtd0g/WidWvIV8EXzjfgryhgUzM9ot3AEfrLmcZd12BF7Cl7p5n/VPklBF2pAX6P0G82L2m9W5tASn/YkAtRIwdJFsxw+LRg9pa0q2cVRfg5wuQ9QN7CbJcwOEOXYL8amtmYIVHe1H+uwd/McjyAemR/87ZiL1Z2SWUmpBPA5BZrQltDuFZWRxv6ERNyuNj7XGXIo6D3n32HqTiyCW9xsq6q/c6BbRL+zfW0VGmU3k60tFj5HGPkRCv5FSPfc36sLsGrdeGXa/W65NeXEvmpcWG3nWeELCSdnkKz3V7ZlHWoZTMthl5C/1vArnq9FLz41pRxLAhZQ3GIl7lbkcQTyNie+W8nlpLa7jVltkh025QpmVpEL62BMsYz6vzG+REnwWZcvSImpMigLFtp/qlYUtRma26CMboqOGOiWzqxER5uSrmZQxoQypF7Oaj2uq7IejCKam+lOEZLVj/VcqKsaVtUy2tLPRq01xfXUeTbQvUpSzanrRCn52yGZKlriWu6sfqk+WF8PN6PQpMp3TVo4m5MUNYujeR7FM+qv8oMw9R23KT/qcl71z4RQnvbPsravV3STvWCntJt4NrgkSubjC4prT3elxysR9g1lFrTF4A7hM8byi60EKWkVAHGIFfJfw6jR8C5KHSaTWu+p35i5uvALpMSdbVPmwI+tMCp4RSvRFo5aXa1pULUiJXo98DWtErFGwcQyUHnmxHm2zsw10pvusFX5UiR+1wg8r8cMW+FGF30aNfjurxJ61Qldhm5DHFcjRqTNXlVgX7XAnlRgTxqxJ0BRJxPkviqJ8Ww+k1ReCTA/PerIv+W1pkqnBWgV1ynlmAu4iKraHXW9O3qWlqiKt6EqnZaQ1dymzPSgtqY6f9QHeX0ea43q4BZ7VDdGZFzQqhnl2pb1qsy3hcwmq/Zpc5x10B7sbKyDHFC+VXOf1LCP2Q/61r6K9gQUyrWxbGqDoTSn0SXLWSxjRHaKNHMVZlHb2ARadVdq0KaXJqrMGu06Au7ZYlat7xze8sp6hJopQzg0X4Zw6LS7jngNdoelCDVTFn3Yx1CqFAM8vzScbnUVdR3x51c7nGITWHXOPYK8R6qDNsDrMzncdT+MRT5ZTuGk/pGas3NYM6OiK5lB2KJp6c1C0IaIMim/0vy5aW0WaU/zUQzZgTFSlFfEYNQ42tM4qsbUn3dG+jQ7oXoz0hVW3SjUKUsSv0eVNuNsql8j0v+hcyMr1liRLngtZC5GyWKlquXbReucUE7XCTxIyv3col3bLarFxtvdnpHtzTCtEnja+/UdpNn8HtWGz4qsaoOaq60wZpqrSZNWsHFNez7LqVIZr0/1vTmkuUX+5FH+Ld9QpC1O9kPqlmXVG0+eJtwgNmdeNEo9r5R63kJq+xOB8tB3VGM8zXOq5f6o5DTzUmyr0iSOougbvWOTD2bcI/KaMWuEOg+EpDOGflW6NtMsXaq9xjFrmPpXqDmViNVUgUu95c18zl3c+i6GhzU1Oa67B16H9D9Y5X0OrUnN0v05X3nP9k/6Lb41Jy/Q34ls6unN64NdCG6Jh/Q1HdkZFKpCTB1OYfBtjPIratknqkf+yN53mCV2ZDt4yt7UaN3UeUrxgm8/MSre0HtjQfyKXV9h72lyFb+wv8a9Nb/R7A5eNv9A46/Z/6j74d6P8fMJ2Jfs7+DlZ+xbuOUMIYfsrf9s/e/+R/c/3vl0p7cz2HmuWLfuacxfWOFn5/j/RKetQ=</latexit>

100 101 102 103

x

10−3 10−2 10−1 100

CCDF(x) with cutoff

  • riginal

Power-law behavior

Complementary cumulative distribution function (CCDF) of baker’s yeast and power law fitting Estimated power law exponent and required cutoff percentile with the mean-field approach

Asymptotic and steady-state assumption

§ No theoretical proof for convergence to steady-state. § Moreover, steady-state asymptotic results, even when achievable, do not give any bounds on the rate of convergence § Assumes the average degree of the network does not change during the whole evolution.

Jithin K. Sreedharan BioKDD'19

slide-11
SLIDE 11

11

Our method based on recurrence relations of graph statistics

A set of the exact recurrence relations for basic graph statistics, which relate their values at time 𝑙 and 𝑙 + 1 of graph evolution.

Theorem 1. If Gn+1 ∼ DD-model(n + 1, p, r, Gn), then E[D(Gn+1)|Gn] = D(Gn) ✓ 1 + 2p − 1 n + 1 − 2r n(n + 1) ◆ + 2r n + 1 E[D2(Gn+1)|Gn] = D2(Gn) ✓ 1 + 2p + p2 − 1 n + 1 − 2r(1 + p) n(n + 1) + r2 n2(n + 1) ◆ + D(Gn) ✓2p − p2 + 2pr + 2r n + 1 − 2r + 2r2 n(n + 1) + r2 n2(n + 1) ◆ + 2r2 + 2r n + 1 − r2 n(n + 1) E[C3(Gn+1)|Gn] = C3(Gn) ✓ 1 + 3p2 n − 6pr n2 + 3r2 n3 ◆ + D2(Gn) ✓pr n − r2 n2 ◆ + D(Gn) r2 2n

E[S2(Gn+1)|Gn] = S2(Gn) ✓ 1 + 2p + p2 n − 2(p + 1)r n2 + r2 n3 ◆ + D(Gn) ✓ pr + p + r − pr + r + r2 n + r2 n2 ◆ + r2 2 − r2 2n.

  • 1. If Gn+1 ∼ DD-model(n + 1, p, r, Gn), then

Theorem

Mean degree Mean squared degree

  • No. of triangles
  • No. of wedges

(paths of length 2)

Jithin K. Sreedharan BioKDD'19

slide-12
SLIDE 12

12

Our method based on recurrence relations of graph statistics (contd.)

20 40 60 80 100 n 10 20 30 40 ED(Gn) experimental exact 20 40 60 80 100 n 9 10 11 12 13 ED(Gn) experimental exact 20 40 60 80 100 n 6 7 8 9 ED(Gn) experimental exact

(a) Gn DD-model(100, 0.2, 1.5, K10) ∼ (b) Gn DD-model(100, 0.5, 1.5, K10) ∼ (c) Gn DD-model(100, 0.8, 1.5, K10)

Theorem 1 and via experiments.

§ Find solution set with recurrence-relations of each graph properties § If we find a concurrence in their solutions, a necessary condition for the presence of duplication-divergence model has been satisfied § Output the converging point as the fitted parameter set § Computational complexity is

  • beforehand. Howe
  • f {(b

p, b r)} with vident modifications. Θ(n/ε log(1/ε))

<latexit sha1_base64="GEtu8kjtiV2GhXP5GKWIpRIUpU8=">AZS3iclVndctUED4FCqUSOEmDeimcy0KZ2GEphpjNW0jLNMaYpu0QZTKSLcisqVKtptU4+fgabiFKx6A5+CO4YLdb49sydZfk4l9tGe/3T179ucxQ48Nxo3Gn+fe+PNt86/c6Fdy+d+n9Dz5cu/zR08ifhF1nv+t7fvjctiLHc0fO/tgde87zIHSsoe05z+yT+z/bOqEkeuPnozPAudwaB2P3L7btcZEOlprmgeG+WTgjK2rI+OmObVCJ4hczx8ZpucfG1ebadq1a4Z5eLS20dhq4MdYHT1YEPpn7Z/+dI5Zaqe8lVXTdRQOWqkxjT2lKUi+j1QTdVQAdEOVUy0kEYu5h01UxcJOyEuhzgsop7Q5zE9HWjqiJ5ZgR0l7R49BcS0lCbmqdH4z6o8s36jRvkY4YsiP6swgxUNf1c6TO6NuePycyozmFuYcV1o8JdxtWu4QOQOH1dDO29+nbo+cx6eDPM+J0aNQjVEijLtE8ogqFdYT0LR7kNQ7gUQt8Do2iQpsMsoRxJ+qVtrwH/Wydk/IW8wX0zfvpwBs2zUywW7wDQ7LmBOG2lLfqFvklcb8r9gnMeliDexrlv5S87L2l6k5tmSIPRmRhazRIk2zVj0adPsKbTLs0TRAfn5gGR9r+6RLJdwvEOHJL/YmglZ4WEv8n836S8iWUNCevDfvuqox6ldYqlj+HREMos1sc0+PYvF0YpO1iQe72uPu4g4i/TeUC9IKo9c6DVS6y7e65jQLvavr6MjT6d4OtDRY8zjniMhSuVU91U27S7BtbpV0v1juEXl5L4qXZit5lHp+wIXZ5QM9le2Yj61hKYtsE3mL/G4Ti+WL0wnN97MiYcD7yhiNxszCXA4qnPvhOkc8La3kdp9oiE3Z0kSucWVkuroMRYpn13zGusCT6LMsXpkSoDGLBTPtPakVWm6G5DWR0VXRERLd1Zo1tIl0mWEZx8iQcjnpelxWZT0aBZib6E7hw+res7XVY2rah5t4Wej1Jrseso8O9K9KkTNKetEMfzdg0yJlqiUO6lf0gfL6+FqFJqofCfQYqkpMgsj6bzKJ6gv4avhSnvuFX5UZbz0j/HQHnaP4vavlzRTbWj9rCbfDY4BCXx8QHi2tNd6Xoqwu4go2RNnxHuK/p8iexiC1lKjArQJ7ki/1sabRDnLNpVq35Dv3EnK+Du0wMarpPmqpN+uMZ0ijGPRZpZ7S1qkFsSglyMf0RoZa2Rs7BOlCS+Wo7uqtYJOdLdq4TuF+E4tfLsQ36B7xT4rVPpt1YhtlULXYStQt4vQN6vRHbVwLswxrY3QLsbiV2lyhyMmHuo5wYT+YTSfmVgPTQ0d+Na81RTptQktcx8gxl3ABqtoZut6ZPnUtLJGKd4yqHZfQRW51RtpUGxP9r6sjnz6d60qYFI7pDszcgqEPLuobUmvSnh3aDRdskvOcSdE20mtLEHsIr/y+Xdz+Nvqia6lv5ANAcW03LI4RuVMGOqT6IKzXEIHd4g6coQzK631Gha1Cm1alVJlVavCrl2iLns2m1XLO8e3vLweITN5iB7m8xA9nHaXEY/Ibj8XITN50cd93AVCRiPMLM4nCZ+HriLfHrp5uMQRao5wiWMzE+u+zrApzU50Hg/ok0+VA9w1buGkZqRuawZ1dKH1kB1CkzcHWRsCZFCE/Yrnz1Vrs6E9no8iyg6Okay8LIajpqc9zaNiTPl5p6NPs8eoNx1dYeVGIaesEPweKm3CWVW/OtD/obMhK9aYoBc8GrIXHBWS3Vqrt2pvXJbOdjhKokJX72Vh7hl1Vm5cNb3Z6B7c4QVMk8d36Rm80PqDasVnypATJXGfyMEWVJqlm+Zj8epZUpzMcn0q780+5mbzJw/5t3hDEdc42bfRLfOqN58TbpBrM7sVErdL5S6X0Nq/ROBeOgBaoynefa03B91Fq4OYXqc0iSswjrlndk8mzSPSqhZbvCSPeREXRO4FtH32aytVPeY2S7jq17gcylIzVEBc73lTXzMXdz0Z01LmpieYybBn6V6reUQotp+bkfhmn3nP+oH7G7XEhufwGfIqzq4U3btW1sEvRsPyGIzmjzDJVIUKHEwy/jRG/spYboHrwR/K+w8yxI9nBPfW4ROuqzj3EC7/95Kh4jPfGDu7lA3T9nqKyBZ/zY7WNprL/yNYHTzd3mp+ubX90/bG3dv6/wcX1KfqirpKXv5a3aVbTptyqKt+U7+rP9Sf63+t/7P+7/p/wvrGOY35WGV+Pjn/PyuB+OE=</latexit>

Jithin K. Sreedharan BioKDD'19

slide-13
SLIDE 13

13

Results on synthetic networks: Recurrence -Relation method

0.936 0.944 0.952 0.960 0.968 0.976 0.984

p

3 6 9 12 15 18 21

r

Degree Wedges Triangles

∼ G(2)

n

∼ DD-model(n = 100, p = 0.99, r = 3.0, Gn0 = K20).

0.00 0.02 0.04 0.06 0.08 0.10 0.12

p

0.0 0.8 1.6 2.4 3.2 4.0 4.8

r

Degree Triangles Wedges

G(1)

n

∼ DD-model(n = 100, p = 0.1, r = 0.3, Gn0 = K20),

Recurrence-Relation method Log-likelihood Log-likelihood function of MLE is nearly flat for large values of 𝑞, thus MLE returns less reliable estimates

RECURRENCE-RELATION MLE Model parameters log |Aut(Gobs)| b p b r E[log |Aut(Gn)|] p-value b p b r E[log |Aut(Gn)|] p-value p = 0.1, r = 0.3 81.963 0.09 0.3 81.974 0.980 0.1 0.3 78.794 0.820 p = 0.99, r = 3.0 16.178 0.99 2.5 16.588 0.980 0.95 0.3 0.368

slide-14
SLIDE 14

14

Results on protein-protein networks: Recurrence -Relation method

0.00 0.15 0.30 0.45 0.60 0.75 0.90

p

40 80 120 160 200 240

r

Baker’s yeast Fruitfly Fission yeast Mouse-ear cress Mouse Human Worm

0.0 0.1 0.2 0.3 0.4 0.5 0.6

p

8 16 24 32 40 48

r

0.00 0.15 0.30 0.45 0.60 0.75 0.90

p

8 16 24 32 40 48 56

r

0.00 0.15 0.30 0.45 0.60 0.75 0.90

p

4 8 12 16 20 24

r

0.00 0.15 0.30 0.45 0.60 0.75 0.90

p

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5

r

Degree Triangles Wedges

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

p

15 30 45 60 75

r

0.00 0.15 0.30 0.45 0.60 0.75 0.90

p

2 4 6 8 10

r

Jithin K. Sreedharan BioKDD'19

slide-15
SLIDE 15

15

Results on protein-protein networks (contd.)

Organism b p b r E[log |Aut(Gn)|] p-value Baker’s yeast 0.98 0.35 293.27 0.71 Human 0.64 0.49 2998.81 0.51 Fruitfly 0.53 0.92 1073.83 0.64 Fission yeast 0.983 0.85 705.278 0.74 Mouse-ear cress 0.98 0.49 6210.36 0.13 Mouse 0.96 0.32 8067.56 0.67 Worm 0.85 0.35 3352.91 0.48

arameters of the real-world PPI networks estimated using R

  • R

Parameters of the real-world PPI networks estimated using Recurrence -Relation method

Jithin K. Sreedharan BioKDD'19

slide-16
SLIDE 16

16

Conclusions

§ Fitting dynamic biological networks to a probabilistic graph model from a single snapshot of the evolution with stress on a key characteristic of the networks – the number of automorphisms – that is often neglected in modeling. § Combined the number of automorphisms with a faster method of recurrence relations to to narrow down the parameter search space § Much lower computational complexity § Tested on protein-protein interaction data of 7 species § Be extra careful when applying mean-field approach without strong theoretical guarantees § Used up-to-date PPI data so that the fitted parameters in this paper can serve as a benchmark for future studies Slides, paper, code, and data are available at cs.purdue.edu/homes/jithinks/

Thank You!

Jithin K. Sreedharan BioKDD'19

slide-17
SLIDE 17

17

Extra Slides

slide-18
SLIDE 18

18

200 300 400 500

log Aut(G)

0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007

Figure 2: Normalized histogram

  • f

logarithm

  • f

number

  • f

automorphisms when Gn ∼ DD-model(500, 0.3, 0.4, K20).

slide-19
SLIDE 19

19 Organism D(Gobs) E[D(Gn)] p-value S2(Gobs) E[S2(Gn)] p-value C3(Gobs) E[C3(Gn)] p-value Baker’s yeast 172.76 115.10 220.35M 45.33M 9.77M 370.49K Human 34.30 19.39 52.25M 7.02M 1.07M 105K Fruitfly 13.11 7.87 2.94M 1.45M 195.96K 77.61K Fission yeast 27.64 6.72 7.42M 215.84K 223.61K 1.14K Mouse-ear cress 7.39 2.23 2.98M 44.46K 23.34K 23.27 Mouse 5.35 0.82 2.95M 9.33K 10.22K 0.79 Worm 4.04 0.90 346.13K 5.32K 2.41K 0.49

Table 4: Comparison of certain graph statistics of the observed graph and that of the synthetic data with parameters estimated via the mean-field approach.

slide-20
SLIDE 20

20

Why existing parameter estimation methods fail in practice?

Seed graph choice § The seed graph is typically assumed to be the largest clique of the

  • bserved graph. Then random vertices and edges are gradually

added to the network, preserving the average degree of the final network, to make the size of the network to a fixed value of 𝑜) § No formal theoretical guarantees and does not have clear justification

slide-21
SLIDE 21

21

Algorithm 1 Parameter estimation via recurrence relation of D(Gn).

1: function RECURRENCE-RELATION(n, r, Gn0, D(Gn), ε) 2:

Dmin ← FD(n, 0, r, Gn0), Dmax ← FD(n, 1, r, Gn0)

3:

if Dmin > D(Gn) or Dmax < D(Gn) then

4:

return “no suitable solution for p”

5:

pmin ← 0, pmax ← 1

6:

while pmax − pmin > ε do

7:

p0 ← pmin+pmax

2

, D0 ← FD(n, p0, r, Gn0)

8:

if D0 < D(Gn) then pmin ← p0 else pmax ← p0

9:

return pmin We note here that for each graph property under consideration, D, S or C , the estimation algorithm returns a curve

slide-22
SLIDE 22

22

b b b b Our estimation procedure can be summarized follows:

  • We employ the RECURRENCE-RELATION algorithm for solving graph recurrences of the three graph statistics D,

S2 and C3, and we identify a set of solutions for p and r.

  • With Gn ∼ DD-model(n, b

p, b r, Gn0), we find the tolerance interval of b r using the confidence interval of D(Gn) and C3(Gn).

  • We look for crossing points of the plots in the figure, and the range of values of p and r where the confidence

intervals meet around the crossing point. We call such a range of values as feasible-box.

  • Though any point in the feasible-box is a good estimate of p and r, to improve the accuracy, we uniformly sample a

fixed number of points from the box and choose the pair that gives maximum p-value with respect to the number of automorphisms of the given graph Gobs.