Agent-Based Modeling and Simulation Agent-Based Modeling and - - PowerPoint PPT Presentation

agent based modeling and simulation agent based modeling
SMART_READER_LITE
LIVE PREVIEW

Agent-Based Modeling and Simulation Agent-Based Modeling and - - PowerPoint PPT Presentation

Agent-Based Modeling and Simulation Agent-Based Modeling and Simulation of Collaborative Social Networks of Collaborative Social Networks Research in Progress Research in Progress Greg Madey Vincent Freeh Renee Tynan Computer Science


slide-1
SLIDE 1

Agent-Based Modeling and Simulation Agent-Based Modeling and Simulation

  • f Collaborative Social Networks
  • f Collaborative Social Networks

Research in Progress Research in Progress

Greg Madey Yongqin Gao Computer Science & Engineering University of Notre Dame Vincent Freeh Computer Science North Carolina State University Renee Tynan Chris Hoffman Department of Management University of Notre Dame

Supported in part by the Supported in part by the National Science Foundation - Digital Society & Technology Program National Science Foundation - Digital Society & Technology Program

AMCIS2003 Tampa, FL August 2003

slide-2
SLIDE 2

Outline Outline

  • Definitions: Agents, models, simulations, collaborative

Definitions: Agents, models, simulations, collaborative social networks, computer experiments social networks, computer experiments

  • Phenomenon: Free/Open Source Software (F/OSS)

Phenomenon: Free/Open Source Software (F/OSS)

  • Conceptual models

Conceptual models

– – ER model ER model – – BA model BA model – – BA model with constant fitness BA model with constant fitness – – BA model with dynamic fitness BA model with dynamic fitness

  • Experiments and results

Experiments and results

  • Summary

Summary

  • Some discussion questions

Some discussion questions

slide-3
SLIDE 3

Agent-Based Modeling and Agent-Based Modeling and Simulation Simulation

  • Conceptual models of a phenomenon

Conceptual models of a phenomenon

  • Simulations are computer implementations of the

Simulations are computer implementations of the conceptual models conceptual models

  • Agents in models and simulations are distinct

Agents in models and simulations are distinct entities (instantiated objects) entities (instantiated objects)

– – Tend to be simple, but with large numbers of them Tend to be simple, but with large numbers of them (thousands, or more) - i.e., swarm intelligence (thousands, or more) - i.e., swarm intelligence – – Contrasted with higher level Contrasted with higher level “ “intelligent agents intelligent agents” ”

  • Foundations in complexity theory

Foundations in complexity theory

– – Self-organization Self-organization – – Emergence Emergence

slide-4
SLIDE 4

Collaborative Social Networks Collaborative Social Networks

  • Research-paper co-authorship, small world phenomenon, e.g.,

Research-paper co-authorship, small world phenomenon, e.g., Erdos Erdos number number (

(Barabasi Barabasi 2001, Newman 2001) 2001, Newman 2001)

  • Movie actors, small world phenomenon, e.g., Kevin Bacon number

Movie actors, small world phenomenon, e.g., Kevin Bacon number

(Watts 1999, 2003) (Watts 1999, 2003)

  • Interlocking corporate directorships

Interlocking corporate directorships

  • Open-source software developers

Open-source software developers (Madey et al, AMCIS 2002)

(Madey et al, AMCIS 2002)

  • Collaborators are nodes in a graph, and collaborative relationship are

Collaborators are nodes in a graph, and collaborative relationship are the edges of the graph the edges of the graph

slide-5
SLIDE 5

Classical Scientific Method Classical Scientific Method

1.

  • 1. Observe the world

Observe the world

a) a) Identify a puzzling phenomenon Identify a puzzling phenomenon

2.

  • 2. Generate a falsifiable hypothesis

Generate a falsifiable hypothesis (K. Popper)

(K. Popper)

3.

  • 3. Design and conduct an experiment with the

Design and conduct an experiment with the goal of disproving the hypothesis goal of disproving the hypothesis

a) a) If the experiment If the experiment “ “fails fails” ”, then the hypothesis is , then the hypothesis is accepted (until replaced) accepted (until replaced) b) b) If the experiment If the experiment “ “succeeds succeeds” ”, then reject hypothesis, , then reject hypothesis, but additional insight into the phenomenon may be but additional insight into the phenomenon may be

  • btained and steps 2-3 repeated
  • btained and steps 2-3 repeated
slide-6
SLIDE 6

The Computer Experiment The Computer Experiment

slide-7
SLIDE 7

Agent-Based Simulation as Agent-Based Simulation as a Component of the a Component of the Scientific Method Scientific Method

Modeling

(Hypothesis)

Agent -Based Simulation

(Experiment)

Observation

slide-8
SLIDE 8

Agent-Based Simulation as Agent-Based Simulation as a Component of the a Component of the Scientific Method Scientific Method

Modeling

(Hypothesis)

Agent -Based Simulation

(Experiment)

Observation Social Network Model of F/OSS Grow Artificial SourceForge Analysis of SourceForge Data

slide-9
SLIDE 9

Open Source Software (OSS) Open Source Software (OSS)

  • Free

Free … …

– – to view source to view source – – to modify to modify – – to share to share – – of cost

  • f cost
  • Examples

Examples

– – Apache Apache – – Perl Perl – – GNU GNU – – Linux Linux – – Sendmail Sendmail – – Python Python – – KDE KDE – – GNOME GNOME – – Mozilla Mozilla – – Thousands more Thousands more

Linux

GNU

Savannah

slide-10
SLIDE 10

Free Open Source Software (F/OSS) Free Open Source Software (F/OSS)

  • Development

Development

– – Mostly volunteer Mostly volunteer – – Global teams Global teams – – Virtual teams Virtual teams – – Self-organized - often peer-based meritocracy Self-organized - often peer-based meritocracy – – Self-managed - but often a Self-managed - but often a “ “charismatic charismatic” ” leader leader – – Often large numbers of developers, testers, support help, end Often large numbers of developers, testers, support help, end user participation user participation – – Rapid, frequent releases Rapid, frequent releases – – Mostly unpaid Mostly unpaid

slide-11
SLIDE 11

F/OSS F/OSS Developers Developers

Linus Tolvalds Linux Larry Wall Perl Richard Stallman GNU GNU Manifesto Eric Raymond Cathedral and Bazaar

slide-12
SLIDE 12

F/OSS: A F/OSS: A Puzzling

Puzzling Phenomenon

Phenomenon

  • Contradicts traditional

Contradicts traditional wisdom: wisdom:

– – Software engineering Software engineering – – Coordination, large numbers Coordination, large numbers – – Motivation of developers Motivation of developers – – Quality Quality – – Security Security – – Business strategy Business strategy

  • Almost everything is done

Almost everything is done electronically and available in electronically and available in digital form digital form

  • Opportunity for IS Research

Opportunity for IS Research

  • - large amounts of online
  • - large amounts of online

data available data available

  • Research issues:

Research issues: – – Understanding motives Understanding motives – – Understanding processes Understanding processes – – Intellectual property Intellectual property – – Digital divide Digital divide – – Self-organization Self-organization – – Government policy Government policy – – Impact on innovation Impact on innovation – – Ethics Ethics – – Economic models Economic models – – Cultural issues Cultural issues – – International factors International factors

slide-13
SLIDE 13

SourceForge SourceForge

  • VA Software
  • Part of OSDN
  • Started 12/1999
  • Collaboration tools
  • 58,685 Projects
  • 80,000 Developers
  • 590,00 Registered

Users

slide-14
SLIDE 14

Savannah Savannah

  • Uses SourceForge

Software

  • Free Software

Foundation

  • 1,508 Projects
  • 15,265 Registered

Users

slide-15
SLIDE 15

F/OSS: Importance

Major Component of e-Technology Infrastructure with major presence in

e-Commerce e-Science e-Government e-Learning

Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet e-mail runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple

Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects

slide-16
SLIDE 16

Free/Open Source Software Free/Open Source Software

  • Seems to challenge traditional economic assumptions

Seems to challenge traditional economic assumptions

  • Model for software engineering

Model for software engineering

  • New business strategies

New business strategies

– – Cooperation with competitors Cooperation with competitors – – Beyond trade associations, shared industry research, and Beyond trade associations, shared industry research, and standards processes standards processes — — shared product development! shared product development!

  • Virtual, self-organizing and self-managing teams

Virtual, self-organizing and self-managing teams

  • Social issues, e.g., digital divide, international

Social issues, e.g., digital divide, international participation participation

  • Government policy issues, e.g., US software industry,

Government policy issues, e.g., US software industry, impact on innovation, security, intellectual property impact on innovation, security, intellectual property

slide-17
SLIDE 17

Research Model Research Model

Parameter Values Structural Features Parameter Values Cross Validation Structural Features Combined Data Mining Parameter Values

Understanding the Social and Task Dynamics that Predict Developer Behaviors Social Network Analysis: Longitudinal Study of Preferential Attachment and Dynamic Attachment Conceptual Explanatory Model of OSS: Agent-Based Modeling and Simulation

slide-18
SLIDE 18

Observations Observations

  • Web mining

Web mining

  • Web crawler (scripts)

Web crawler (scripts)

– – Python Python – – Perl Perl – – AWK AWK – – Sed Sed

  • Monthly

Monthly

  • Since Jan 2001

Since Jan 2001

  • ProjectID

ProjectID

  • DeveloperID

DeveloperID

  • Almost 2 million records

Almost 2 million records

  • Relational database

Relational database PROJ|DEVELOPER 8001|dev378 8001|dev8975 8001|dev9972 8002|dev27650 8005|dev31351 8006|dev12509 8007|dev19395 8007|dev4622 8007|dev35611 8008|dev8975

slide-19
SLIDE 19

Models of the F/OSS Social Network Models of the F/OSS Social Network (Alternative Hypotheses) (Alternative Hypotheses)

  • General model features

General model features – – Agents are nodes on a graph (developers or projects) Agents are nodes on a graph (developers or projects) – – Behaviors: Create, join, abandon and idle Behaviors: Create, join, abandon and idle – – Edges are relationships (joint project participation) Edges are relationships (joint project participation) – – Growth of network: random or types of preferential Growth of network: random or types of preferential attachment, formation of clusters attachment, formation of clusters – – Fitness Fitness – – Network attributes: diameter, average degree, degree Network attributes: diameter, average degree, degree distribution, clustering coefficient distribution, clustering coefficient

  • Four specific models

Four specific models

– – ER (random graph) - (1960) ER (random graph) - (1960) – – BA (preferential attachment) - (1999) BA (preferential attachment) - (1999) – – BA ( + constant fitness) - (2001) BA ( + constant fitness) - (2001) – – BA ( + dynamic fitness) - (2003) BA ( + dynamic fitness) - (2003)

slide-20
SLIDE 20

15850 dev[46] dev[83] 15850 dev[46] dev[48] 15850 dev[46] dev[56] 15850 dev[46] dev[58] 6882 dev[58] dev[47] 6882 dev[47] dev[79] 6882 dev[47] dev[52] 6882 dev[47] dev[55] 7028 dev[46] dev[99] 7028 dev[46] dev[51] 7028 dev[46] dev[57] 7597 dev[46] dev[45] 7597 dev[46] dev[72] 7597 dev[46] dev[55] 7597 dev[46] dev[58] 7597 dev[46] dev[61] 7597 dev[46] dev[64] 7597 dev[46] dev[67] 7597 dev[46] dev[70] 9859 dev[46] dev[49] 9859 dev[46] dev[53] 9859 dev[46] dev[54] 9859 dev[46] dev[59]

dev[46] dev[83] dev[56] dev[48] dev[52] dev[79] dev[72] dev[51] dev[57] dev[55] dev[99] dev[47] dev[58] dev[53] dev[58] dev[65] dev[45] dev[70] dev[67] dev[59] dev[54] dev[49] dev[64] dev[61]

Project 6882 Project 9859 Project 7597 Project 7028 Project 15850 F/OSS Developers - Collaboration Social Network Developers are nodes / Projects are links 24 Developers 5 Projects 2 Linchpin Developers 1 Cluster

slide-21
SLIDE 21

Computer Experiments Computer Experiments

  • Agent-based simulations

Agent-based simulations

  • Java programs using Swarm class library

Java programs using Swarm class library

– – Validation (docking) exercises using Java/Repast Validation (docking) exercises using Java/Repast

  • Grow artificial

Grow artificial SourceForge SourceForge’ ’s s (Epstein & Axtell, 1996)

(Epstein & Axtell, 1996)

– – Parameterized with observed data, e.g., developer behaviors Parameterized with observed data, e.g., developer behaviors

  • Join rates

Join rates

  • New project additions

New project additions

  • Leave projects

Leave projects

– – Evaluation of four models (hypotheses) Evaluation of four models (hypotheses) – – Verification/validation Verification/validation

slide-22
SLIDE 22

Four Cycles of Modeling & Four Cycles of Modeling & Simulation Simulation

Modeling

(Hypothesis)

Agent -Based Simulation

(Experiment)

Observation Social Network Models

ER => BA => BA+Fitness => BA+Dynamic Fitness

Grow Artificial SourceForge Analysis of SourceForge Data

Degree Distribution Average Degree Diameter Clustering Coefficient Cluster Size Distribution

slide-23
SLIDE 23

ER model ER model – – degree distribution degree distribution

  • Degree

distribution is binomial distribution while it is power law in empirical data

  • Fit fails
slide-24
SLIDE 24

ER model - diameter ER model - diameter

  • Average degree is

Average degree is decreasing while it is decreasing while it is increasing in empirical increasing in empirical data data

  • Diameter is increasing

Diameter is increasing while it is decreasing in while it is decreasing in empirical data empirical data

  • Fit fails
slide-25
SLIDE 25

ER model ER model – – clustering coefficient clustering coefficient

  • Clustering coefficient is

Clustering coefficient is relatively low around 0.4 relatively low around 0.4 while it is around 0.7 in while it is around 0.7 in empirical data. empirical data.

  • Clustering coefficient is

Clustering coefficient is decreasing while it is decreasing while it is increasing in empirical increasing in empirical data data

  • Fit fails
slide-26
SLIDE 26

ER model ER model – – cluster distribution cluster distribution

  • Cluster distribution in ER

Cluster distribution in ER model also have power law model also have power law distribution with R distribution with R2

2 as 0.6667

as 0.6667 (0.9953 without the major (0.9953 without the major cluster) while R cluster) while R2

2 in empirical

in empirical data is 0.7457 (0.9797 data is 0.7457 (0.9797 without the major cluster) without the major cluster)

  • The actual distribution is

The actual distribution is different from empirical data different from empirical data

  • The later models (BA and

The later models (BA and further models) have similar further models) have similar behaviors behaviors

  • Fit fails
slide-27
SLIDE 27

BA model BA model – – degree distribution degree distribution

  • Power laws in degree

Power laws in degree distribution, similar to distribution, similar to empirical data (+ for empirical data (+ for simulated data and x for simulated data and x for empirical data). empirical data).

  • For developer distribution:

For developer distribution: simulated data has R simulated data has R2

2 of

  • f

0.9798 and empirical data has 0.9798 and empirical data has R R2

2 of 0.9712.

  • f 0.9712.

– Fit succeeds

  • For project distribution:

For project distribution: simulated data has R simulated data has R2

2 of

  • f

0.6650 and empirical data has 0.6650 and empirical data has R R2

2 of 0.9815.

  • f 0.9815.

– Fit fails

slide-28
SLIDE 28

BA model BA model – – diameter and CC diameter and CC

  • Small diameter and high

Small diameter and high clustering coefficient like clustering coefficient like empirical data empirical data

  • Diameter and clustering

Diameter and clustering coefficient are both coefficient are both decreasing like empirical decreasing like empirical data data

  • Fit succeeds
slide-29
SLIDE 29

BA model with constant fitness BA model with constant fitness

  • Power laws in degree distribution,

Power laws in degree distribution, similar to empirical data (+ for similar to empirical data (+ for simulated data and x for empirical simulated data and x for empirical data). data).

  • For developer distribution:

For developer distribution: simulated data has R simulated data has R2

2 as 0.9742 and

as 0.9742 and empirical data has R empirical data has R2

2 as 0.9712.

as 0.9712.

– Fit succeeds

  • For project distribution: simulated

For project distribution: simulated data has R data has R2

2 as 0.7253 and empirical

as 0.7253 and empirical data has R data has R2

2 as 0.9815.

as 0.9815.

– Fit fails

  • Diameter and CC are similar to

Diameter and CC are similar to simple BA model. simple BA model.

– Fit succeeds

slide-30
SLIDE 30

Discovery: BA with dynamic fitness Discovery: BA with dynamic fitness

  • Problem with BA with constant fitness

Problem with BA with constant fitness

– – Intuition: Project fitness might change with time. Intuition: Project fitness might change with time.

  • Data mining observation: project

Data mining observation: project “ “life cycle life cycle” ” property - fitness generally decreases with time property - fitness generally decreases with time

  • New model not in the literature

New model not in the literature

– – Hypothesis: BA with dynamic fitness of projects Hypothesis: BA with dynamic fitness of projects – – Computer experiment Computer experiment

slide-31
SLIDE 31

BA model with dynamic fitness BA model with dynamic fitness

  • Power laws in degree

Power laws in degree distribution, similar to distribution, similar to empirical data (+ for empirical data (+ for simulated data and x for simulated data and x for empirical data). empirical data).

  • For developer distribution:

For developer distribution: simulated data has R simulated data has R2

2 as

as 0.9695 and empirical data has 0.9695 and empirical data has R R2

2 as 0.9712.

as 0.9712.

– Fit succeeds (as before)

  • For project distribution:

For project distribution: simulated data has R simulated data has R2

2 as

as 0.8051 and empirical data has 0.8051 and empirical data has R R2

2 as 0.9815.

as 0.9815.

– Fit is better, but more work needed

slide-32
SLIDE 32

Agent-Based Modeling and Simulation Agent-Based Modeling and Simulation as Components of the Scientific Method as Components of the Scientific Method

Observation Hypothesis Experiment

slide-33
SLIDE 33

Summary Summary

  • Why Agent-Based Modeling and Simulation?

Why Agent-Based Modeling and Simulation?

– – Can be used as components of the Scientific Method Can be used as components of the Scientific Method – – A research approach for studying socio-technical A research approach for studying socio-technical systems systems

  • Case study: F/OSS - Collaboration Social Networks

Case study: F/OSS - Collaboration Social Networks

– – SourceForge SourceForge conceptual models: ER, BA, BA with conceptual models: ER, BA, BA with constant fitness and BA with dynamic fitness. constant fitness and BA with dynamic fitness. – – Simulations Simulations

  • Computer experiments that tested conceptual models

Computer experiments that tested conceptual models

  • Provided insight into the phenomenon under study and guided

Provided insight into the phenomenon under study and guided data mining of collected observations data mining of collected observations

slide-34
SLIDE 34

Discussion Discussion

“The social sciences are, in fact, the The social sciences are, in fact, the ‘ ‘hard hard’ ’ sciences sciences” ”, ,

Herbert Simon (1987)

  • Computational social science: agent-based modeling

and simulation

  • Kuhn’s periods of “Normal Science” punctuated by

“Paradigm shifts”

  • Karl Popper’s “theory-testing through falsification”

  • Relevant literature on the role of simulation in the

process of scientific discovery

slide-35
SLIDE 35

Thank you Thank you