network embedding
play

Network Embedding Social and Technological Networks Rik Sarkar - PowerPoint PPT Presentation

Network Embedding Social and Technological Networks Rik Sarkar University of Edinburgh, 2019. Network Embedding Definition Assignment of a coordinate to each node f(v) gives the coordinates of node v In d dimensional space


  1. Network Embedding Social and Technological Networks Rik Sarkar University of Edinburgh, 2019.

  2. Network Embedding • Definition – Assignment of a coordinate to each node • f(v) gives the coordinates of node v – In d dimensional space – Usually requires unique coordinate for each vertex • Remember: Intrinsic and extrinsic metrics – Intrinsic metrics: distances that can be measured purely by walking along network edges. e.g. shortest path distance – Exterinsic:distances between vertices in the ambient space i.e. the d-dimensional Euclidean space

  3. Network embedding • Usually we are interested in distances between nodes (discrete) • In some cases, points on the edges themselves may be relevant (continuous) – E.g. road networks

  4. Example: suppose we want to preserve shortest path distances • Can we embed: – An edge in a chain – A triangle in a line – A triangle in a 2d plane – A square in a 2d plane – A cycle in a 2d plane

  5. Dimension Examples: • Embedding cliques • 1d clique: edge • 2d clique: triangle • 3d clique: tetrahedreon • “simplices” (cliques) are the minimal elements of various dimensions

  6. Tree examples: • Let’s take binary trees • Can we embed them isometrically? – (while preserving all distances)

  7. Challenges: • Sources of problem: mismatch between intrinsic and extrinsic metrics – Cycles – Rapid branching and growth – High dimensions

  8. Challenges • Dimension of a graph is hard to characterize • A triangle may not have 3- cliques • Definition: – Subdivision: Slit an edge into two – Homeomorphism: Two graphs are homeomorphic if there is a way to subdivide one to get another

  9. Challenges • Summary: Embedding is hard – In general, the metric of the graph may not match with any Euclidean metric of fixed dimension. E.g. cycles, spheres, trees.. – The right dimension d of the ambient space may be hard to decide

  10. Theoretical results • Smooth (See the Nash Embedding Theorem) – Certain classes (e.g. Riemannian manifolds of d dimension) have nice (isometric or nearly isometric) embeddings in Euclidean spaces of O(poly(d)) dimensions • (this is a math topic. So we are stating this only vaguely. Ignore for exams.)

  11. Distortion • In reality, most embeddings are not perfect – they distort the distances • Some distances contract, some expand • For a metric space X with intrinsic distance d, and distance d’ in the ambient (embedding space) • Contraction: • Expansion: • Distortion = Contraction * Expansion

  12. Distortion • Distortion = 1 means isometric • Nice property: Uniform scaling gives distortion = 1 – Verify

  13. Johnson Lindenstrauss Lemma • A set X which is n points in k-dim Euclidean space has a an embedding in – Euclidean space of dim O((log n)/ 𝜁 ) – with distortion at most (1+ 𝜁 ). • Algorithm: – Take O((log n)/ 𝜁 ) random unit vectors in R k – Project (take dot product) of points of X on these vectors – Now we have O((log n)/ 𝜁 ) dim representation of X – Has small distortion • This is the basis of a lot of modern data science algorithms, including compressed sensing

  14. <latexit sha1_base64="YEMfRPqXbsCGVdzG9W7FlfernmI=">ACFnicbVDLSsNAFJ3UV62vqks3g0VoF5akCrosunElFewDmhIm02k7dCYJ8xDT2K9w46+4caGIW3Hn3zhts9DWAxcO59zLvf4EaNS2fa3lVlaXldy67nNja3tnfyu3sNGWqBSR2HLBQtH0nCaEDqipGWpEgiPuMNP3h5cRv3hEhaRjcqjgiHY76Ae1RjJSRvPyx26UyYiWKmbE5ejeG0FXau4legxdFvZhrXhd1KWHkadLXr5gl+0p4CJxUlIAKWpe/svthlhzEijMkJRtx45UJ0FCUczIOdqSKEh6hP2oYGiBPZSaZvjeGRUbqwFwpTgYJT9fdEgriUMfdNJ0dqIOe9ifif19aqd95JaBpRQI8W9TDKoQTjKCXSoIViw2BGFBza0QD5BAWJkcyYEZ/7lRdKolJ2TcuXmtFC9SOPIgNwCIrAWegCq5ADdQBo/gGbyCN+vJerHerY9Za8ZKZ/bBH1ifP1AVn2k=</latexit> Random walk based node embedding • From each node u make many random walks of length w • Count how many times every other node occurs in these random walks N(u) (call them neighbors) – Estimate the probability of each nearby node occurring in these walks. • Find embedding z, which maximizes: X max log P ( N ( u ) | z u ) z u Given node u, predict its neighbor probabilities

  15. <latexit sha1_base64="Ed+o5vWqXpg9E0It1gz1gMps8E=">ACNnicbVDLSsNAFJ34tr6qLt0MFqHdlEQF3QhFN26ECn0ITQ2T6Y0OTiZhZlJMY7Kjd/hzo0LRdz6CU4fiK8DFw7n3Mu9/gxZ0rb9pM1NT0zOze/sFhYWl5ZXSub7RUlEgKTRrxSF74RAFnApqaQ4XsQS+hza/s3J0G/3QSoWiYZOY+iG5EqwgFGijeQVz9weUzEnqdIpB1wv9+8GXlLBR9gNJKGZC7dx2SiXWSMfeP1KnrkqCb1MuEzgVv5lNwaeqOResWRX7RHwX+JMSAlNUPeKj24vokIQlNOlOo4dqy7GZGaUQ5wU0UxITekCvoGCpICKqbjd7O8Y5RejiIpCmh8Uj9PpGRUKk09E1nSPS1+u0Nxf+8TqKDw27GRJxoEHS8KEg41hEeZoh7TALVPDWEUMnMrZheExOXNkXTAjO75f/ktZu1dmr7p7vl2rHkzgW0BbaRmXkoANUQ6eojpqIonv0hF7Qq/VgPVtv1vu4dcqazGyiH7A+PgHhFK07</latexit> <latexit sha1_base64="gtevgmHX8IElRZ817gdCc4Ao48=">ACOXicbVDLSsNAFJ34tr6iLt0MFqEuLEkVdCOIblyIRLBVaEKYTKd16GQS5lGIMb/lxr9wJ7hxoYhbf8BJ24WvCzOce869zJwTpYxK5ThP1sTk1PTM7Nx8ZWFxaXnFXl1ryUQLTJo4Ym4jpAkjHLSVFQxcp0KguKIkauof1LqVwMiJE34pcpSEsSox2mXYqQMFdqe36EyZSiTKmME+jHl5kLqBiOWnxXwEPpSx2GufSO0ilEzgGV3XtPbBdyBPkt6uVcb3N2GhgjtqlN3hgX/AncMqmBcXmg/+p0E65hwhRmSsu06qQpyJBTFjBQVX0uSItxHPdI2kKOYyCAfOi/glmE6sJsIc7iCQ/b7Ro5iKbM4MpOlKflbK8n/tLZW3YMgpzVinA8eqirGVQJLGOEHSoIViwzAGFBzV8hvkECYWXCrpgQ3N+W/4JWo+7u1hsXe9Wj43Ec2ADbIacME+OAKnwANgME9eAav4M16sF6sd+tjNDphjXfWwY+yPr8Aa3CtOA=</latexit> Turn into a loss minimization X X min L = − log P ( v | z u ) u ∈ V v ∈ N ( u ) • Evaluate P as exp( z T u z v ) P ( v | z u ) = P n ∈ V exp( z T u z n ) – Called the softmax function

  16. Stochastic gradient descent • The loss minimization can be done as SGD • Take vertices in random order – For each z u , take the gradient – the direction to move u to decrease loss – Move u slightly in the direction • Repeat with a different random order • Until convergence • SGD is a standard stats technique. We will omit the details

  17. Practical considerations • Expensive due to the z u T z n term that requires comparison with all vertices • Can be approximated at a reduced cost by suitable sampling. • SGD can be used to instead train a neural net that suggests coordinates – Less storage than storing all coordinates, but also less accurate • Paper: Deepwalk. Perozi et al. • Other variants: – Different ways of conducting the random walk

  18. Applications of embedding • Also called “representations” • Representation learning is an important area • Representing nodes in a Euclidean space lets us easily apply standard machine learning techniques – Most techniques rely on R d Space and dot products • Classification, clustering etc can now be performed on networks

  19. Embedding of attributed social networks • Suppose each node has a attributes (e.g. hobbies, interests etc) • The ideal embedding should: – Represent similarity/dissimilarity of attributes – Represent similarity/dissimilarity of network position • In theory, these can be opposing objective • In practice, homophily means these are correlated

  20. Attributed network embedding • Minimize loss that incorporates probabilities of right neighbors as well as similar attributes

  21. Embedding whole graphs • Suppose there is a database of molecules – Each node has attributes • We want to represent each as a points in R d – Such that similar molecules are close • Method 1: – Embed each as graph, then take the mean • Method 2: – In each graph, perform random walks of length w starting at random points – Collect neighborhood sequence at each graph – Perform embedding so that attribute sequences seen in random walks are close

  22. • Some authors like to distinguish as node embedding vs graph embedding

  23. Why random walks

  24. Why random walks • Saves computation: no need to consider all pairs • Known to capture relevant properties of networks like community structure – Highly connected nodes are likely to be close in random walks – Representative of diffusion processes • First methods were inspired by NLP methods of sequences in text – random walk gives natural sequences

  25. Embedding networks into other spaces • Embedding into hyperbolic spaces is a popular research area these days • Other significant papers on embedding into trees, distributions over trees etc • Embedding can be used to compare networks • E.g. for A and B – If good embeddings A -> B and B -> A exist, then A and B are probably similar.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend