Network Embedding Social and Technological Networks Rik Sarkar - - PowerPoint PPT Presentation

network embedding
SMART_READER_LITE
LIVE PREVIEW

Network Embedding Social and Technological Networks Rik Sarkar - - PowerPoint PPT Presentation

Network Embedding Social and Technological Networks Rik Sarkar University of Edinburgh, 2019. Network Embedding Definition Assignment of a coordinate to each node f(v) gives the coordinates of node v In d dimensional space


slide-1
SLIDE 1

Network Embedding

Social and Technological Networks

Rik Sarkar

University of Edinburgh, 2019.

slide-2
SLIDE 2

Network Embedding

  • Definition

– Assignment of a coordinate to each node

  • f(v) gives the coordinates of node v

– In d dimensional space – Usually requires unique coordinate for each vertex

  • Remember: Intrinsic and extrinsic metrics

– Intrinsic metrics: distances that can be measured purely by walking along network

  • edges. e.g. shortest path distance

– Exterinsic:distances between vertices in the ambient space i.e. the d-dimensional Euclidean space

slide-3
SLIDE 3

Network embedding

  • Usually we are interested in distances

between nodes (discrete)

  • In some cases, points on the edges themselves

may be relevant (continuous)

– E.g. road networks

slide-4
SLIDE 4

Example: suppose we want to preserve shortest path distances

  • Can we embed:

– An edge in a chain – A triangle in a line – A triangle in a 2d plane – A square in a 2d plane – A cycle in a 2d plane

slide-5
SLIDE 5

Dimension Examples:

  • Embedding cliques
  • 1d clique: edge
  • 2d clique: triangle
  • 3d clique: tetrahedreon
  • “simplices” (cliques) are the minimal elements
  • f various dimensions
slide-6
SLIDE 6

Tree examples:

  • Let’s take binary trees
  • Can we embed them isometrically?

– (while preserving all distances)

slide-7
SLIDE 7

Challenges:

  • Sources of problem: mismatch between

intrinsic and extrinsic metrics

– Cycles – Rapid branching and growth – High dimensions

slide-8
SLIDE 8

Challenges

  • Dimension of a graph is hard

to characterize

  • A triangle may not have 3-

cliques

  • Definition:

– Subdivision: Slit an edge into two – Homeomorphism: Two graphs are homeomorphic if there is a way to subdivide one to get another

slide-9
SLIDE 9

Challenges

  • Summary: Embedding is hard

– In general, the metric of the graph may not match with any Euclidean metric of fixed dimension. E.g. cycles, spheres, trees.. – The right dimension d of the ambient space may be hard to decide

slide-10
SLIDE 10

Theoretical results

  • Smooth (See the Nash Embedding Theorem)

– Certain classes (e.g. Riemannian manifolds of d dimension) have nice (isometric or nearly isometric) embeddings in Euclidean spaces of O(poly(d)) dimensions

  • (this is a math topic. So we are stating this
  • nly vaguely. Ignore for exams.)
slide-11
SLIDE 11

Distortion

  • In reality, most embeddings are not perfect –

they distort the distances

  • Some distances contract, some expand
  • For a metric space X with intrinsic distance d, and

distance d’ in the ambient (embedding space)

  • Contraction:
  • Expansion:
  • Distortion = Contraction * Expansion
slide-12
SLIDE 12

Distortion

  • Distortion = 1 means isometric
  • Nice property: Uniform scaling gives distortion

= 1

– Verify

slide-13
SLIDE 13

Johnson Lindenstrauss Lemma

  • A set X which is n points in k-dim Euclidean space has a an

embedding in

– Euclidean space of dim O((log n)/𝜁) – with distortion at most (1+𝜁).

  • Algorithm:

– Take O((log n)/𝜁) random unit vectors in Rk – Project (take dot product) of points of X on these vectors – Now we have O((log n)/𝜁) dim representation of X – Has small distortion

  • This is the basis of a lot of modern data science algorithms,

including compressed sensing

slide-14
SLIDE 14

Random walk based node embedding

  • From each node u make many random walks of length

w

  • Count how many times every other node occurs in

these random walks N(u) (call them neighbors)

– Estimate the probability of each nearby node occurring in these walks.

  • Find embedding z, which maximizes:

max

z

X

u

log P(N(u)|zu)

<latexit sha1_base64="YEMfRPqXbsCGVdzG9W7FlfernmI=">ACFnicbVDLSsNAFJ3UV62vqks3g0VoF5akCrosunElFewDmhIm02k7dCYJ8xDT2K9w46+4caGIW3Hn3zhts9DWAxcO59zLvf4EaNS2fa3lVlaXldy67nNja3tnfyu3sNGWqBSR2HLBQtH0nCaEDqipGWpEgiPuMNP3h5cRv3hEhaRjcqjgiHY76Ae1RjJSRvPyx26UyYiWKmbE5ejeG0FXau4legxdFvZhrXhd1KWHkadLXr5gl+0p4CJxUlIAKWpe/svthlhzEijMkJRtx45UJ0FCUczIOdqSKEh6hP2oYGiBPZSaZvjeGRUbqwFwpTgYJT9fdEgriUMfdNJ0dqIOe9ifif19aqd95JaBpRQI8W9TDKoQTjKCXSoIViw2BGFBza0QD5BAWJkcyYEZ/7lRdKolJ2TcuXmtFC9SOPIgNwCIrAWegCq5ADdQBo/gGbyCN+vJerHerY9Za8ZKZ/bBH1ifP1AVn2k=</latexit>

Given node u, predict its neighbor probabilities

slide-15
SLIDE 15

Turn into a loss minimization

  • Evaluate P as

– Called the softmax function

min L = X

u∈V

X

v∈N(u)

− log P(v|zu)

<latexit sha1_base64="gtevgmHX8IElRZ817gdCc4Ao48=">ACOXicbVDLSsNAFJ34tr6iLt0MFqEuLEkVdCOIblyIRLBVaEKYTKd16GQS5lGIMb/lxr9wJ7hxoYhbf8BJ24WvCzOce869zJwTpYxK5ThP1sTk1PTM7Nx8ZWFxaXnFXl1ryUQLTJo4Ym4jpAkjHLSVFQxcp0KguKIkauof1LqVwMiJE34pcpSEsSox2mXYqQMFdqe36EyZSiTKmME+jHl5kLqBiOWnxXwEPpSx2GufSO0ilEzgGV3XtPbBdyBPkt6uVcb3N2GhgjtqlN3hgX/AncMqmBcXmg/+p0E65hwhRmSsu06qQpyJBTFjBQVX0uSItxHPdI2kKOYyCAfOi/glmE6sJsIc7iCQ/b7Ro5iKbM4MpOlKflbK8n/tLZW3YMgpzVinA8eqirGVQJLGOEHSoIViwzAGFBzV8hvkECYWXCrpgQ3N+W/4JWo+7u1hsXe9Wj43Ec2ADbIacME+OAKnwANgME9eAav4M16sF6sd+tjNDphjXfWwY+yPr8Aa3CtOA=</latexit>

P(v|zu) = exp(zT

u zv)

P

n∈V exp(zT u zn)

<latexit sha1_base64="Ed+o5vWqXpg9E0It1gz1gMps8E=">ACNnicbVDLSsNAFJ34tr6qLt0MFqHdlEQF3QhFN26ECn0ITQ2T6Y0OTiZhZlJMY7Kjd/hzo0LRdz6CU4fiK8DFw7n3Mu9/gxZ0rb9pM1NT0zOze/sFhYWl5ZXSub7RUlEgKTRrxSF74RAFnApqaQ4XsQS+hza/s3J0G/3QSoWiYZOY+iG5EqwgFGijeQVz9weUzEnqdIpB1wv9+8GXlLBR9gNJKGZC7dx2SiXWSMfeP1KnrkqCb1MuEzgVv5lNwaeqOResWRX7RHwX+JMSAlNUPeKj24vokIQlNOlOo4dqy7GZGaUQ5wU0UxITekCvoGCpICKqbjd7O8Y5RejiIpCmh8Uj9PpGRUKk09E1nSPS1+u0Nxf+8TqKDw27GRJxoEHS8KEg41hEeZoh7TALVPDWEUMnMrZheExOXNkXTAjO75f/ktZu1dmr7p7vl2rHkzgW0BbaRmXkoANUQ6eojpqIonv0hF7Qq/VgPVtv1vu4dcqazGyiH7A+PgHhFK07</latexit>
slide-16
SLIDE 16

Stochastic gradient descent

  • The loss minimization can be done as SGD
  • Take vertices in random order

– For each zu, take the gradient – the direction to move u to decrease loss – Move u slightly in the direction

  • Repeat with a different random order
  • Until convergence
  • SGD is a standard stats technique. We will omit

the details

slide-17
SLIDE 17
slide-18
SLIDE 18

Practical considerations

  • Expensive due to the zu

Tzn term that requires

comparison with all vertices

  • Can be approximated at a reduced cost by

suitable sampling.

  • SGD can be used to instead train a neural net that

suggests coordinates

– Less storage than storing all coordinates, but also less accurate

  • Paper: Deepwalk. Perozi et al.
  • Other variants:

– Different ways of conducting the random walk

slide-19
SLIDE 19

Applications of embedding

  • Also called “representations”
  • Representation learning is an important area
  • Representing nodes in a Euclidean space lets

us easily apply standard machine learning techniques

– Most techniques rely on Rd Space and dot products

  • Classification, clustering etc can now be

performed on networks

slide-20
SLIDE 20

Embedding of attributed social networks

  • Suppose each node has a attributes (e.g.

hobbies, interests etc)

  • The ideal embedding should:

– Represent similarity/dissimilarity of attributes – Represent similarity/dissimilarity of network position

  • In theory, these can be opposing objective
  • In practice, homophily means these are

correlated

slide-21
SLIDE 21

Attributed network embedding

  • Minimize loss that incorporates probabilities
  • f right neighbors as well as similar attributes
slide-22
SLIDE 22

Embedding whole graphs

  • Suppose there is a database of molecules

– Each node has attributes

  • We want to represent each as a points in Rd

– Such that similar molecules are close

  • Method 1:

– Embed each as graph, then take the mean

  • Method 2:

– In each graph, perform random walks of length w starting at random points – Collect neighborhood sequence at each graph – Perform embedding so that attribute sequences seen in random walks are close

slide-23
SLIDE 23
  • Some authors like to distinguish as node

embedding vs graph embedding

slide-24
SLIDE 24

Why random walks

slide-25
SLIDE 25

Why random walks

  • Saves computation: no need to consider all pairs
  • Known to capture relevant properties of

networks like community structure

– Highly connected nodes are likely to be close in random walks – Representative of diffusion processes

  • First methods were inspired by NLP methods of

sequences in text – random walk gives natural sequences

slide-26
SLIDE 26

Embedding networks into other spaces

  • Embedding into hyperbolic spaces is a popular

research area these days

  • Other significant papers on embedding into

trees, distributions over trees etc

  • Embedding can be used to compare networks
  • E.g. for A and B

– If good embeddings A -> B and B -> A exist, then A and B are probably similar.