 
              Network Embedding Social and Technological Networks Rik Sarkar University of Edinburgh, 2019.
Network Embedding • Definition – Assignment of a coordinate to each node • f(v) gives the coordinates of node v – In d dimensional space – Usually requires unique coordinate for each vertex • Remember: Intrinsic and extrinsic metrics – Intrinsic metrics: distances that can be measured purely by walking along network edges. e.g. shortest path distance – Exterinsic:distances between vertices in the ambient space i.e. the d-dimensional Euclidean space
Network embedding • Usually we are interested in distances between nodes (discrete) • In some cases, points on the edges themselves may be relevant (continuous) – E.g. road networks
Example: suppose we want to preserve shortest path distances • Can we embed: – An edge in a chain – A triangle in a line – A triangle in a 2d plane – A square in a 2d plane – A cycle in a 2d plane
Dimension Examples: • Embedding cliques • 1d clique: edge • 2d clique: triangle • 3d clique: tetrahedreon • “simplices” (cliques) are the minimal elements of various dimensions
Tree examples: • Let’s take binary trees • Can we embed them isometrically? – (while preserving all distances)
Challenges: • Sources of problem: mismatch between intrinsic and extrinsic metrics – Cycles – Rapid branching and growth – High dimensions
Challenges • Dimension of a graph is hard to characterize • A triangle may not have 3- cliques • Definition: – Subdivision: Slit an edge into two – Homeomorphism: Two graphs are homeomorphic if there is a way to subdivide one to get another
Challenges • Summary: Embedding is hard – In general, the metric of the graph may not match with any Euclidean metric of fixed dimension. E.g. cycles, spheres, trees.. – The right dimension d of the ambient space may be hard to decide
Theoretical results • Smooth (See the Nash Embedding Theorem) – Certain classes (e.g. Riemannian manifolds of d dimension) have nice (isometric or nearly isometric) embeddings in Euclidean spaces of O(poly(d)) dimensions • (this is a math topic. So we are stating this only vaguely. Ignore for exams.)
Distortion • In reality, most embeddings are not perfect – they distort the distances • Some distances contract, some expand • For a metric space X with intrinsic distance d, and distance d’ in the ambient (embedding space) • Contraction: • Expansion: • Distortion = Contraction * Expansion
Distortion • Distortion = 1 means isometric • Nice property: Uniform scaling gives distortion = 1 – Verify
Johnson Lindenstrauss Lemma • A set X which is n points in k-dim Euclidean space has a an embedding in – Euclidean space of dim O((log n)/ 𝜁 ) – with distortion at most (1+ 𝜁 ). • Algorithm: – Take O((log n)/ 𝜁 ) random unit vectors in R k – Project (take dot product) of points of X on these vectors – Now we have O((log n)/ 𝜁 ) dim representation of X – Has small distortion • This is the basis of a lot of modern data science algorithms, including compressed sensing
<latexit sha1_base64="YEMfRPqXbsCGVdzG9W7FlfernmI=">ACFnicbVDLSsNAFJ3UV62vqks3g0VoF5akCrosunElFewDmhIm02k7dCYJ8xDT2K9w46+4caGIW3Hn3zhts9DWAxcO59zLvf4EaNS2fa3lVlaXldy67nNja3tnfyu3sNGWqBSR2HLBQtH0nCaEDqipGWpEgiPuMNP3h5cRv3hEhaRjcqjgiHY76Ae1RjJSRvPyx26UyYiWKmbE5ejeG0FXau4legxdFvZhrXhd1KWHkadLXr5gl+0p4CJxUlIAKWpe/svthlhzEijMkJRtx45UJ0FCUczIOdqSKEh6hP2oYGiBPZSaZvjeGRUbqwFwpTgYJT9fdEgriUMfdNJ0dqIOe9ifif19aqd95JaBpRQI8W9TDKoQTjKCXSoIViw2BGFBza0QD5BAWJkcyYEZ/7lRdKolJ2TcuXmtFC9SOPIgNwCIrAWegCq5ADdQBo/gGbyCN+vJerHerY9Za8ZKZ/bBH1ifP1AVn2k=</latexit> Random walk based node embedding • From each node u make many random walks of length w • Count how many times every other node occurs in these random walks N(u) (call them neighbors) – Estimate the probability of each nearby node occurring in these walks. • Find embedding z, which maximizes: X max log P ( N ( u ) | z u ) z u Given node u, predict its neighbor probabilities
<latexit sha1_base64="Ed+o5vWqXpg9E0It1gz1gMps8E=">ACNnicbVDLSsNAFJ34tr6qLt0MFqHdlEQF3QhFN26ECn0ITQ2T6Y0OTiZhZlJMY7Kjd/hzo0LRdz6CU4fiK8DFw7n3Mu9/gxZ0rb9pM1NT0zOze/sFhYWl5ZXSub7RUlEgKTRrxSF74RAFnApqaQ4XsQS+hza/s3J0G/3QSoWiYZOY+iG5EqwgFGijeQVz9weUzEnqdIpB1wv9+8GXlLBR9gNJKGZC7dx2SiXWSMfeP1KnrkqCb1MuEzgVv5lNwaeqOResWRX7RHwX+JMSAlNUPeKj24vokIQlNOlOo4dqy7GZGaUQ5wU0UxITekCvoGCpICKqbjd7O8Y5RejiIpCmh8Uj9PpGRUKk09E1nSPS1+u0Nxf+8TqKDw27GRJxoEHS8KEg41hEeZoh7TALVPDWEUMnMrZheExOXNkXTAjO75f/ktZu1dmr7p7vl2rHkzgW0BbaRmXkoANUQ6eojpqIonv0hF7Qq/VgPVtv1vu4dcqazGyiH7A+PgHhFK07</latexit> <latexit sha1_base64="gtevgmHX8IElRZ817gdCc4Ao48=">ACOXicbVDLSsNAFJ34tr6iLt0MFqEuLEkVdCOIblyIRLBVaEKYTKd16GQS5lGIMb/lxr9wJ7hxoYhbf8BJ24WvCzOce869zJwTpYxK5ThP1sTk1PTM7Nx8ZWFxaXnFXl1ryUQLTJo4Ym4jpAkjHLSVFQxcp0KguKIkauof1LqVwMiJE34pcpSEsSox2mXYqQMFdqe36EyZSiTKmME+jHl5kLqBiOWnxXwEPpSx2GufSO0ilEzgGV3XtPbBdyBPkt6uVcb3N2GhgjtqlN3hgX/AncMqmBcXmg/+p0E65hwhRmSsu06qQpyJBTFjBQVX0uSItxHPdI2kKOYyCAfOi/glmE6sJsIc7iCQ/b7Ro5iKbM4MpOlKflbK8n/tLZW3YMgpzVinA8eqirGVQJLGOEHSoIViwzAGFBzV8hvkECYWXCrpgQ3N+W/4JWo+7u1hsXe9Wj43Ec2ADbIacME+OAKnwANgME9eAav4M16sF6sd+tjNDphjXfWwY+yPr8Aa3CtOA=</latexit> Turn into a loss minimization X X min L = − log P ( v | z u ) u ∈ V v ∈ N ( u ) • Evaluate P as exp( z T u z v ) P ( v | z u ) = P n ∈ V exp( z T u z n ) – Called the softmax function
Stochastic gradient descent • The loss minimization can be done as SGD • Take vertices in random order – For each z u , take the gradient – the direction to move u to decrease loss – Move u slightly in the direction • Repeat with a different random order • Until convergence • SGD is a standard stats technique. We will omit the details
Practical considerations • Expensive due to the z u T z n term that requires comparison with all vertices • Can be approximated at a reduced cost by suitable sampling. • SGD can be used to instead train a neural net that suggests coordinates – Less storage than storing all coordinates, but also less accurate • Paper: Deepwalk. Perozi et al. • Other variants: – Different ways of conducting the random walk
Applications of embedding • Also called “representations” • Representation learning is an important area • Representing nodes in a Euclidean space lets us easily apply standard machine learning techniques – Most techniques rely on R d Space and dot products • Classification, clustering etc can now be performed on networks
Embedding of attributed social networks • Suppose each node has a attributes (e.g. hobbies, interests etc) • The ideal embedding should: – Represent similarity/dissimilarity of attributes – Represent similarity/dissimilarity of network position • In theory, these can be opposing objective • In practice, homophily means these are correlated
Attributed network embedding • Minimize loss that incorporates probabilities of right neighbors as well as similar attributes
Embedding whole graphs • Suppose there is a database of molecules – Each node has attributes • We want to represent each as a points in R d – Such that similar molecules are close • Method 1: – Embed each as graph, then take the mean • Method 2: – In each graph, perform random walks of length w starting at random points – Collect neighborhood sequence at each graph – Perform embedding so that attribute sequences seen in random walks are close
• Some authors like to distinguish as node embedding vs graph embedding
Why random walks
Why random walks • Saves computation: no need to consider all pairs • Known to capture relevant properties of networks like community structure – Highly connected nodes are likely to be close in random walks – Representative of diffusion processes • First methods were inspired by NLP methods of sequences in text – random walk gives natural sequences
Embedding networks into other spaces • Embedding into hyperbolic spaces is a popular research area these days • Other significant papers on embedding into trees, distributions over trees etc • Embedding can be used to compare networks • E.g. for A and B – If good embeddings A -> B and B -> A exist, then A and B are probably similar.
Recommend
More recommend