BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of - - PowerPoint PPT Presentation

bootcampr
SMART_READER_LITE
LIVE PREVIEW

BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of - - PowerPoint PPT Presentation

BootcampR AN INTRODUCTION TO R Jason A. Heppler, PhD University of Nebraska at Omaha March 10, 2020 @jaheppler Hi. I'm Jason. I like to gesture at screens. Digital Engagement Librarian , University of Nebraska at Omaha Mentor, Mozilla Open


slide-1
SLIDE 1

BootcampR

AN INTRODUCTION TO R

Jason A. Heppler, PhD University of Nebraska at Omaha March 10, 2020 @jaheppler

slide-2
SLIDE 2
  • Hi. I'm Jason.

I like to gesture at screens.

Digital Engagement Librarian, University of Nebraska at Omaha Mentor, Mozilla Open Leaders Researcher, Humanities+Design, Stanford University

slide-3
SLIDE 3

Schedule March 17: 1:30-3 Making Maps in CL 112 March 31: 1:30-3 Clustering and Classifying in CL 112

slide-4
SLIDE 4

Today's plan

  • Introduction to networks
  • Intro to ggraph and tidygraph
  • Hands-on!

Open up RStudio. We'll start doing a few things together soon.

slide-5
SLIDE 5

"The bad news is that when ever you learn a new skill you’re going to suck. It’s going to be frustrating. The good news is that is typical and happens to everyone and it is

  • nly temporary.

You can’t go from knowing nothing to becoming an expert without going through a period of great frustration and great suckiness."

—Hadley Wickham

slide-6
SLIDE 6

Networks

slide-7
SLIDE 7

Kabbalistic tree of life Athanasius Kircher Oedipus Ægyptacus, 1652-55

slide-8
SLIDE 8

Tree of Life Charles Darwin On the Origin of Species by Means of Natural Selection, 1859

slide-9
SLIDE 9

Robust Action and the Rise of the Medici, 1400-1434 John Padgett and Christopher Ansell American Journal of Sociology 98:6 (May 1993)

slide-10
SLIDE 10

O Say Can You See: Early Washington, D.C., Law & Family William G. Thomas III, Jennifer Guiliano, Trevor Muñoz http://earlywashingtondc.org/

slide-11
SLIDE 11

ORBIS: The Stanford Geospatial Network Model of the Roman World Walter Scheidel and Elijah Meeks http://orbis.stanford.edu/

slide-12
SLIDE 12

Shakespeare Tragedy Martin Grandjean http://www.martingrandjean.ch

slide-13
SLIDE 13
slide-14
SLIDE 14

What is a network?

slide-15
SLIDE 15

Network = graph

Humanities scholars call these networks, mathematicians and network scientists call these graphs. Graphs are defined as

  • a set of nodes (or vertices), and
  • a set of edges (or links) that connect nodes
slide-16
SLIDE 16

Nodes and edges

slide-17
SLIDE 17

What do nodes and edges mean?

Nodes Edges People Letters People Membership Publications Citations Cities Railways Cities Imports/exports

slide-18
SLIDE 18

Directed vs. undirected

Networks that are directed have an asymmetrical relationship, often represented by an arrow pointing to one or two nodes that share an edge. Similarly, networks that are undirected have a symmetrical relationship.

slide-19
SLIDE 19

What does network data look like?

Nodes table

names size color A 12 red B 10 blue C 4 red D 18 red E 15 Blue

slide-20
SLIDE 20

What does network data look like?

Edges table

source target weight A B 3 B C 6 C D 9 C A 4 D B 4

slide-21
SLIDE 21

How do we interpret networks?

slide-22
SLIDE 22

Spaghetti plot

slide-23
SLIDE 23

Problems

  • Networks are often incomplete (e.g., ego

networks)

  • Networks are difficult to visualize
  • Networks are hard to scale
  • Layouts are imposed, not inherent. Graphs can be

topologically similar but layout entirely different

slide-24
SLIDE 24

These graphs are the same

slide-25
SLIDE 25

...so are these

slide-26
SLIDE 26

Network measures

Degree measures

  • Degree: how many edges does a node have?
slide-27
SLIDE 27

Network measures - Degree

slide-28
SLIDE 28

Network measures

Degree measures

  • Degree: how many edges does a node have?
  • Strength/weighted degree: degree taking into

account weights of edges Centrality measures

  • Betweenness centrality: nodes that could be hubs
  • Closeness centrality: center of the graph
  • Eigenvector centrality: nodes connected to central

nodes (e.g., page rank)

slide-29
SLIDE 29

Centrality

A: Betweenness centrality B: Closeness centrality C: Eigenvector centrality D: Degree centrality E: Harmonic centrality F: Katz centrality

slide-30
SLIDE 30

Network measures

Degree measures

  • Degree: how many edges does a node have?
  • Strength/weighted degree: degree taking into

account weights of edges Centrality measures

  • Betweenness centrality: nodes that could be hubs
  • Closeness centrality: center of the graph
  • Eigenvector centrality: nodes connected to central

nodes (e.g., page rank) Community

  • Modularity/community: groups of similar nodes
slide-31
SLIDE 31

Network measures - modularity

slide-32
SLIDE 32

Bi-partite Networks

slide-33
SLIDE 33

Bipartite networks

Most basic networks can only support one kind of node type - think of connections among students taking multiple courses. This network would not, however, connect both courses and students in the same network. Network theory assumes that nodes in a network are of the same type.

slide-34
SLIDE 34

Bipartite networks

Bimodal networks support two node types, but note that edges in these kinds of networks must only allow edges between types, not edges within types.

  • Bipartite networks have two kinds of nodes
  • Bipartite networks can be projected into unipartite

networks with only one type of node

  • Each bipartite network will have two projections,
  • ne for each type of node
slide-35
SLIDE 35

Bipartite networks

slide-36
SLIDE 36

Bipartite projected to students

slide-37
SLIDE 37

Bipartite projected to courses

slide-38
SLIDE 38

Networks in R

slide-39
SLIDE 39

What's hard about networks in R?

  • It's a completely different data concept
  • It's kind of messy and very untidy
  • It makes impressive-looking plots
  • It has its own semantics and algorithms
slide-40
SLIDE 40

The network workflow

slide-41
SLIDE 41

The network workflow

Import Convert to graph object Transform Visualize Model Communicate readr tidygraph tidygraph ggraph

slide-42
SLIDE 42

tidygraph

  • An adaptation and extension of

dplyr verbs for working with network data

  • Tidyfication of (almost) all

algorithms provided by igraph

  • Unified API for all relational data

structures

  • igraph underneath
slide-43
SLIDE 43

ggraph

  • An adaptation of relational data to

ggplot -- not just node-link diagrams

  • Layouts, everything from igraph

and more.

  • Dedicated geoms for nodes and

edges

  • New facets, guides, and themes.
slide-44
SLIDE 44

Let's make this graph together

slide-45
SLIDE 45

https://tinyurl.com/unogot

Let's make this graph together

slide-46
SLIDE 46

Questions? Troubleshooting?

Next workshop: March 17, 1:30p-3p: Making Networks (CL 112)