Conceptual Framework for Agent- Conceptual Framework for Agent- - - PowerPoint PPT Presentation

conceptual framework for agent conceptual framework for
SMART_READER_LITE
LIVE PREVIEW

Conceptual Framework for Agent- Conceptual Framework for Agent- - - PowerPoint PPT Presentation

Conceptual Framework for Agent- Conceptual Framework for Agent- Based Modeling and Simulation: Based Modeling and Simulation: The Computer Experiment The Computer Experiment Yongqin Gao Vincent Freeh Greg Madey Yongqin Gao Vincent Freeh


slide-1
SLIDE 1

Conceptual Framework for Agent- Conceptual Framework for Agent- Based Modeling and Simulation: Based Modeling and Simulation: The Computer Experiment The Computer Experiment

Yongqin Gao Yongqin Gao Vincent Freeh Vincent Freeh Greg Madey Greg Madey CSE Department CSE Department CS Department CS Department CSE Department CSE Department University of Notre Dame University of Notre Dame NCSU NCSU University of Notre Dame University of Notre Dame

NAACSOS Conference NAACSOS Conference Pittsburgh, PA Pittsburgh, PA June 25, 2003 June 25, 2003

Supported in part by the Supported in part by the National Science Foundation - Digital Society & Technology Program National Science Foundation - Digital Society & Technology Program

slide-2
SLIDE 2

The Computer Experiment The Computer Experiment

slide-3
SLIDE 3

Agent-Based Simulation as Agent-Based Simulation as a Component of the a Component of the Scientific Method Scientific Method

Modeling

(Hypothesis)

Agent -Based Simulation

(Experiment)

Observation Social Network Model of F/OSS Grow Artificial SourceForge Analysis of SourceForge Data

slide-4
SLIDE 4

Outline Outline

  • Investigation: Free/Open Source Software (F/OSS)

Investigation: Free/Open Source Software (F/OSS)

  • Conceptual framework(s)

Conceptual framework(s)

  • Model description

Model description

  • ER model

ER model

  • BA model

BA model

  • BA model with constant fitness

BA model with constant fitness

  • BA model with dynamic fitness

BA model with dynamic fitness

  • Summary

Summary

slide-5
SLIDE 5

Open Source Software (OSS) Open Source Software (OSS)

  • Free

Free … …

– – to view source to view source – – to modify to modify – – to share to share – – of cost

  • f cost
  • Examples

Examples

– – Apache Apache – – Perl Perl – – GNU GNU – – Linux Linux – – Sendmail Sendmail – – Python Python – – KDE KDE – – GNOME GNOME – – Mozilla Mozilla – – Thousands more Thousands more

Linux GNU Savannah

slide-6
SLIDE 6

Free Open Source Software (F/OSS) Free Open Source Software (F/OSS)

  • Development

Development

– – Mostly volunteer Mostly volunteer – – Global teams Global teams – – Virtual teams Virtual teams – – Self-organized - often peer-based meritocracy Self-organized - often peer-based meritocracy – – Self-managed - but often a Self-managed - but often a “ “charismatic charismatic” ” leader leader – – Often large numbers of developers, testers, support help, end Often large numbers of developers, testers, support help, end user participation user participation – – Rapid, frequent releases Rapid, frequent releases – – Mostly unpaid Mostly unpaid

slide-7
SLIDE 7

Typical Typical Charismatic Charismatic Leaders? Leaders?

Linus Tolvalds Linux Larry Wall Perl Richard Stallman GNU Manifesto Eric Raymond Cathedral and Bazaar

slide-8
SLIDE 8

F/OSS: Significance F/OSS: Significance

  • Contradicts traditional wisdom:

Contradicts traditional wisdom:

– – Software engineering Software engineering – – Coordination, large numbers Coordination, large numbers – – Motivation of developers Motivation of developers – – Quality Quality – – Security Security – – Business strategy Business strategy

  • Almost everything is done

Almost everything is done electronically and available in electronically and available in digital form digital form

  • Opportunity for Social Science

Opportunity for Social Science Research -- large amounts of online Research -- large amounts of online data available data available

  • Research issues:

Research issues:

– – Understanding motives Understanding motives – – Understanding processes Understanding processes – – Intellectual property Intellectual property – – Digital divide Digital divide – – Self-organization Self-organization – – Government policy Government policy – – Impact on innovation Impact on innovation – – Ethics Ethics – – Economic models Economic models – – Cultural issues Cultural issues – – International factors International factors

slide-9
SLIDE 9

SourceForge SourceForge

  • VA Software
  • Part of OSDN
  • Started 12/1999
  • Collaboration tools
  • 58,685 Projects
  • 80,000 Developers
  • 590,00 Registered

Users

slide-10
SLIDE 10

Savannah Savannah

  • Uses SourceForge

Software

  • Free Software

Foundation

  • 1,508 Projects
  • 15,265 Registered

Users

slide-11
SLIDE 11

F/OSS: Importance

Major Component of e-Technology Infrastructure with major presence in

e-Commerce e-Science e-Government e-Learning

Apache has over 65% market share of Internet Web servers Linux on over 7 million computers Most Internet e-mail runs on Sendmail Tens of thousands of quality products Part of product offerings of companies like IBM, Apple

Apache in WebSphere, Linux on mainframe, FreeBSD in OSX Corporate employees participating on OSS projects

slide-12
SLIDE 12

Free/Open Source Software Free/Open Source Software

  • Seems to challenge traditional economic assumptions

Seems to challenge traditional economic assumptions

  • Model for software engineering

Model for software engineering

  • New business strategies

New business strategies

– – Cooperation with competitors Cooperation with competitors – – Beyond trade associations, shared industry research, and Beyond trade associations, shared industry research, and standards processes standards processes — — shared product development! shared product development!

  • Virtual, self-organizing and self-managing teams

Virtual, self-organizing and self-managing teams

  • Social issues, e.g., digital divide, international

Social issues, e.g., digital divide, international participation participation

  • Government policy issues, e.g., US software industry,

Government policy issues, e.g., US software industry, impact on innovation, security, intellectual property impact on innovation, security, intellectual property

slide-13
SLIDE 13

Research Model Research Model

Parameter Values Structural Features Parameter Values Cross Validation Structural Features Combined Data Mining Parameter Values

Understanding the Social and Task Dynamics that Predict Developer Behaviors Social Network Analysis: Longitudinal Study of Preferential Attachment and Dynamic Attachment Conceptual Explanatory Model of OSS: Agent-Based Modeling and Simulation

slide-14
SLIDE 14

Data Collection Data Collection — — Monthly Monthly

  • Web crawler (scripts)

Web crawler (scripts)

– – Python Python – – Perl Perl – – AWK AWK – – Sed Sed

  • Monthly

Monthly

  • Since Jan 2001

Since Jan 2001

  • ProjectID

ProjectID

  • DeveloperID

DeveloperID

  • Almost 2 million records

Almost 2 million records

  • Relational database

Relational database PROJ|DEVELOPER 8001|dev378 8001|dev8975 8001|dev9972 8002|dev27650 8005|dev31351 8006|dev12509 8007|dev19395 8007|dev4622 8007|dev35611 8008|dev8975

slide-15
SLIDE 15

15850 dev[46] dev[83] 15850 dev[46] dev[48] 15850 dev[46] dev[56] 15850 dev[46] dev[58] 6882 dev[58] dev[47] 6882 dev[47] dev[79] 6882 dev[47] dev[52] 6882 dev[47] dev[55] 7028 dev[46] dev[99] 7028 dev[46] dev[51] 7028 dev[46] dev[57] 7597 dev[46] dev[45] 7597 dev[46] dev[72] 7597 dev[46] dev[55] 7597 dev[46] dev[58] 7597 dev[46] dev[61] 7597 dev[46] dev[64] 7597 dev[46] dev[67] 7597 dev[46] dev[70] 9859 dev[46] dev[49] 9859 dev[46] dev[53] 9859 dev[46] dev[54] 9859 dev[46] dev[59]

dev[46] dev[83] dev[56] dev[48] dev[52] dev[79] dev[72] dev[51] dev[57] dev[55] dev[99] dev[47] dev[58] dev[53] dev[58] dev[65] dev[45] dev[70] dev[67] dev[59] dev[54] dev[49] dev[64] dev[61]

Project 6882 Project 9859 Project 7597 Project 7028 Project 15850 F/OSS Developers - Social Network Component Developers are nodes / Projects are links 24 Developers 5 Projects 2 Linchpin Developers 1 Cluster

slide-16
SLIDE 16

Models of the F/OSS Social Network Models of the F/OSS Social Network (Alternative Hypotheses) (Alternative Hypotheses)

  • General model features

General model features – – Agents are nodes on a graph (developers or projects) Agents are nodes on a graph (developers or projects) – – Behaviors: Create, join, abandon and idle Behaviors: Create, join, abandon and idle – – Edges are relationships (joint project participation) Edges are relationships (joint project participation) – – Growth of network: random or types of preferential Growth of network: random or types of preferential attachment, formation of clusters attachment, formation of clusters – – Fitness Fitness – – Network attributes: diameter, average degree, power law, Network attributes: diameter, average degree, power law, clustering coefficient clustering coefficient

  • Four specific models

Four specific models

– – ER (random graph) ER (random graph) – – BA (scale free) BA (scale free) – – BA ( + constant fitness) BA ( + constant fitness) – – BA ( + dynamic fitness) BA ( + dynamic fitness)

slide-17
SLIDE 17

ER model ER model – – degree distribution degree distribution

  • Degree

distribution is binomial distribution while it is power law in empirical data

  • R2 = 0.9712 for

developer network

  • R2 = 0.9815 for

project network

slide-18
SLIDE 18

ER model - diameter ER model - diameter

  • Average degree is

Average degree is decreasing while it is decreasing while it is increasing in empirical increasing in empirical data data

  • Diameter is increasing

Diameter is increasing while it is decreasing in while it is decreasing in empirical data empirical data

slide-19
SLIDE 19

ER model ER model – – clustering coefficient clustering coefficient

  • Clustering coefficient is

Clustering coefficient is relatively low around 0.4 relatively low around 0.4 while it is around 0.7 in while it is around 0.7 in empirical data. empirical data.

  • Clustering coefficient is

Clustering coefficient is decreasing while it is decreasing while it is increasing in empirical increasing in empirical data data

slide-20
SLIDE 20

ER model ER model – – cluster distribution cluster distribution

  • Cluster distribution in ER

Cluster distribution in ER model also have power law model also have power law distribution with R distribution with R2

2 as 0.6667

as 0.6667 (0.9953 without the major (0.9953 without the major cluster) while R cluster) while R2

2 in empirical

in empirical data is 0.7457 (0.9797 data is 0.7457 (0.9797 without the major cluster) without the major cluster)

  • The actual distribution is

The actual distribution is different from empirical data different from empirical data

  • The later models (BA and

The later models (BA and further models) have similar further models) have similar behaviors behaviors

slide-21
SLIDE 21

BA model BA model – – degree distribution degree distribution

  • Power laws in degree

Power laws in degree distribution, similar to distribution, similar to empirical data (+ for empirical data (+ for simulated data and x for simulated data and x for empirical data). empirical data).

  • For developer distribution:

For developer distribution: simulated data has R simulated data has R2

2 as

as 0.9798 and empirical data has 0.9798 and empirical data has R R2

2 as 0.9712.

as 0.9712.

  • For project distribution:

For project distribution: simulated data has R simulated data has R2

2 as

as 0.6650 and empirical data has 0.6650 and empirical data has R R2

2 as 0.9815.

as 0.9815.

slide-22
SLIDE 22

BA model BA model – – diameter and CC diameter and CC

  • Small diameter and high

Small diameter and high clustering coefficient like clustering coefficient like empirical data empirical data

  • Diameter and clustering

Diameter and clustering coefficient are both coefficient are both decreasing like empirical decreasing like empirical data data

slide-23
SLIDE 23

BA model with constant fitness BA model with constant fitness

  • Power laws in degree distribution,

Power laws in degree distribution, similar to empirical data (+ for similar to empirical data (+ for simulated data and x for empirical simulated data and x for empirical data). data).

  • For developer distribution:

For developer distribution: simulated data has R2 as 0.9742 simulated data has R2 as 0.9742 and empirical data has R2 as and empirical data has R2 as 0.9712. 0.9712.

  • For project distribution: simulated

For project distribution: simulated data has R2 as 0.7253 and empirical data has R2 as 0.7253 and empirical data has R2 as 0.9815. data has R2 as 0.9815.

  • Diameter and CC are similar to

Diameter and CC are similar to simple BA model. simple BA model.

slide-24
SLIDE 24

BA model with dynamic fitness BA model with dynamic fitness

  • Power laws in degree

Power laws in degree distribution, similar to distribution, similar to empirical data (+ for empirical data (+ for simulated data and x for simulated data and x for empirical data). empirical data).

  • For developer distribution:

For developer distribution: simulated data has R2 as simulated data has R2 as 0.9695 and empirical data has 0.9695 and empirical data has R2 as 0.9712. R2 as 0.9712.

  • For project distribution:

For project distribution: simulated data has R2 as simulated data has R2 as 0.8051 and empirical data has 0.8051 and empirical data has R2 as 0.9815. R2 as 0.9815.

slide-25
SLIDE 25

Advantage of BA with dynamic fitness Advantage of BA with dynamic fitness

  • Intuition: Fitness should decreasing with time.

Intuition: Fitness should decreasing with time.

  • Statistics: project has life cycle behavior which

Statistics: project has life cycle behavior which can not be replicated by BA model with can not be replicated by BA model with constant fitness but can be replicated by BA constant fitness but can be replicated by BA model with dynamic fitness model with dynamic fitness

slide-26
SLIDE 26

Conceptual Framework Conceptual Framework Agent-Based Modeling and Simulation Agent-Based Modeling and Simulation as Components of the Scientific Method as Components of the Scientific Method

Observation Hypothesis Experiment

slide-27
SLIDE 27

Summary Summary

  • We use ABM to model and simulate the

We use ABM to model and simulate the SourceForge collaboration network. SourceForge collaboration network.

  • Conceptual framework is proposed for agent-

Conceptual framework is proposed for agent- based modeling and simulation. based modeling and simulation.

  • Case study of this framework: SourceForge

Case study of this framework: SourceForge study through ER, BA, BA with constant fitness study through ER, BA, BA with constant fitness and BA with dynamic fitness. and BA with dynamic fitness.

slide-28
SLIDE 28

Thank you Thank you