Artificial Immune Systems Artificial Immune Systems and Data - - PowerPoint PPT Presentation

artificial immune systems artificial immune systems and
SMART_READER_LITE
LIVE PREVIEW

Artificial Immune Systems Artificial Immune Systems and Data - - PowerPoint PPT Presentation

Artificial Immune Systems Artificial Immune Systems and Data Mining: Bridging the and Data Mining: Bridging the Gap with Scalability and Gap with Scalability and Improved Learning Improved Learning Olfa Nasraoui, Fabio Gonzlez Cesar


slide-1
SLIDE 1

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Artificial Immune Systems Artificial Immune Systems and Data Mining: Bridging the and Data Mining: Bridging the Gap with Scalability and Gap with Scalability and Improved Learning Improved Learning

Olfa Nasraoui, Fabio González Cesar Cardona, Dipankar Dasgupta The University of Memphis

A Demo/Poster at the National Science Foundation Workshop on Next Generation Data Mining, Nov. 2002

slide-2
SLIDE 2

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Inspired by Nature… Inspired by Nature…

  • living organisms exhibit

living organisms exhibit extremely sophisticated extremely sophisticated learning learning and and processing processing abilities that allow them to survive and abilities that allow them to survive and proliferate proliferate

  • nature

nature has always served as has always served as inspiration inspiration for several for several scientific and technological developments, exp: Neural scientific and technological developments, exp: Neural Networks, Evolutionary Computation Networks, Evolutionary Computation

  • immune system:

immune system: parallel and distributed adaptive parallel and distributed adaptive system w/ tremendous potential in many intelligent system w/ tremendous potential in many intelligent computing applications. computing applications.

slide-3
SLIDE 3

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

What is the Immune What is the Immune System? System?

  • Protects

Protects our bodies from foreign pathogens

  • ur bodies from foreign pathogens

(viruses/bacteria) (viruses/bacteria)

  • Innate

Innate Immune System (initial, limited, ex: skin, tears, Immune System (initial, limited, ex: skin, tears, …etc) …etc)

  • Acquired

Acquired Immune System ( Immune System (Learns Learns how to respond to how to respond to NEW threats adaptively) NEW threats adaptively)

  • Primary

Primary immune response immune response

  • First response to invading pathogens

First response to invading pathogens

  • Secondary

Secondary immune response immune response

  • Encountering similar pathogen a second time

Encountering similar pathogen a second time

  • Remember

Remember past encounters past encounters

  • Faster and stronger response than primary response

Faster and stronger response than primary response

slide-4
SLIDE 4

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Points of Strength of The Points of Strength of The Immune System Immune System

  • Recognition (

Recognition (Anomaly detection, Noise tolerance) Anomaly detection, Noise tolerance)

  • Robustness

Robustness ( (Noise tolerance) Noise tolerance)

  • Feature extraction

Feature extraction

  • Diversity

Diversity (can face an entire repertoire of foreign (can face an entire repertoire of foreign invaders) invaders)

  • Reinforcement learning

Reinforcement learning

  • Memory

Memory (remembers past encounters: basis for vaccine) (remembers past encounters: basis for vaccine)

  • Distributed

Distributed Detection (no single central system) Detection (no single central system)

  • Multi

Multi-

  • layered

layered (defense mechanisms at multiple levels) (defense mechanisms at multiple levels)

  • Adaptive

Adaptive (Self (Self-

  • regulated)

regulated)

slide-5
SLIDE 5

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Major Players: Major Players:

B B-

  • Cells

Cells

  • Through a process of

Through a process of recognition recognition and and stimulation stimulation, B , B-

  • Cells will

Cells will clone and mutate clone and mutate to produce a to produce a diverse diverse set of antibodies set of antibodies adapted to different antigens adapted to different antigens

  • B

B-

  • Cells

Cells secrete secrete antibodies w/ antibodies w/ paratopes paratopes that can that can bind to bind to specific antigens specific antigens ( (epitopes epitopes) ) and destroy their host invading agent and destroy their host invading agent through a through a KILL, SUICIDE, or INGEST KILL, SUICIDE, or INGEST signal signal. .

  • B

B-

  • Cells

Cells antibody antibody paratopes paratopes also can also can bind to antibody bind to antibody idiotopes idiotopes on

  • n other
  • ther B

B-

  • Cells, hence sending a STIMULATE or

Cells, hence sending a STIMULATE or SUPPRESS signal SUPPRESS signal hence the hence the Network Network Memory Memory

slide-6
SLIDE 6

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Requirements for Clustering Requirements for Clustering Data Streams (Barbara, 02) Data Streams (Barbara, 02)

  • Compactness of representation

Compactness of representation

  • Network of B

Network of B-

  • cells: each cell can recognize several antigens

cells: each cell can recognize several antigens

  • B

B-

  • cells compressed into clusters/sub

cells compressed into clusters/sub-

  • networks

networks

  • Fast incremental processing of new data points

Fast incremental processing of new data points

  • New antigen influences only activated sub

New antigen influences only activated sub-

  • network

network

  • Activated cells updated incrementally

Activated cells updated incrementally

  • Proposed approach learns in

Proposed approach learns in 1 pass 1 pass. .

  • Clear and fast identification of “outliers”

Clear and fast identification of “outliers”

  • New antigen that does not activate any

New antigen that does not activate any subnetwork subnetwork is a is a potential outlier potential outlier create new B create new B-

  • cell to recognize it

cell to recognize it

  • This new B

This new B-

  • cell could grow into a

cell could grow into a subnetwork subnetwork (if it is stimulated (if it is stimulated by a new trend) or die/move to disk (if outlier) by a new trend) or die/move to disk (if outlier)

slide-7
SLIDE 7

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

General Architecture General Architecture

1 1-

  • Pass Adaptive

Pass Adaptive Immune Immune Learning Learning Evolving Immune Evolving Immune Network Network (compressed into (compressed into subnetworks subnetworks) ) Evolving data Immune network information system

Stimulation (competition & memory) Age (old vs. new) Outliers (based on activation)

?

slide-8
SLIDE 8

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Internal and External Immune Interactions: Before & After Internal Immune Interactions Internal Stimulation External Stimulation Lifeline of B-cell

slide-9
SLIDE 9

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Continuous Continuous Immune Immune Learning Learning

Initialize ImmuNet and MaxLimit Present NEW antigen data Update subNet* ‘s ARBs’ stimulations Clone and Mutate ARBs Trap Initial Data Compress ImmuNet Compress ImmuNet into K subNet’s Compute soft activations in subNet* Update subNet* ‘s ARB Influence range /scale Identify nearest subNet* Kill lethal ARBs

#ARBs > MaxLimit?

Kill extra ARBs (based on age/stimulation strategy) OR increase acuteness of competition OR Move oldest patterns to aux. storage

Memory Constraints

No

Start/Reset Secondary storage

Outlier?

ImmuNet Stat’s & Visualization

Activates ImmuNet?

Clone antigen

Yes No

Yes

Domain Knowledge Constraints

slide-10
SLIDE 10

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Model for Artificial Immune Model for Artificial Immune Cell Cell

  • Antigens represent data

Antigens represent data and the and the B B-

  • Cells represent clusters or

Cells represent clusters or patterns to be learned/extracted patterns to be learned/extracted

  • ARB/B

ARB/B-

  • cell object:

cell object:

  • Represents not just a single item, but a

Represents not just a single item, but a fuzzy set fuzzy set

  • Better

Better Approximate Approximate Reasoning abilities Reasoning abilities

  • Each ARB is allowed to have is own

Each ARB is allowed to have is own zone of influence zone of influence with with size/scale: size/scale: σ σi

i

  • ARBs

ARBs dynamically adapt their influence zones dynamically adapt their influence zones/hence /hence stimulation level in a strife for survival. stimulation level in a strife for survival.

  • Membership function dynamically

Membership function dynamically adapts adapts to data to data

  • Outliers

Outliers are easily detected through weak activations are easily detected through weak activations

  • No more dependence on hard threshold

No more dependence on hard threshold-

  • cuts to establish network

cuts to establish network

  • Can include most probabilistic and

Can include most probabilistic and possibilistic possibilistic models of uncertainty models of uncertainty

  • Flexible for different attributes types (numerical, categorical,

Flexible for different attributes types (numerical, categorical, …etc) …etc)

slide-11
SLIDE 11

NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD

Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining

Immune Based Learning of Immune Based Learning of Web profiles Web profiles

  • The Web server plays the role of the human body, and the incomin

The Web server plays the role of the human body, and the incoming requests play g requests play the role of antigens that need to be detected the role of antigens that need to be detected

  • The input data is similar to web log data (a record of all files

The input data is similar to web log data (a record of all files/URLs accessed by /URLs accessed by users on a Web site) users on a Web site)

  • The data is pre

The data is pre-

  • processed to produce session lists:

processed to produce session lists:

  • A session list

A session list S Si

i for user #

for user #i i is a list of is a list of URLs visited by same user URLs visited by same user

  • In discovery mode, a session is fed to the learning system as so

In discovery mode, a session is fed to the learning system as soon as it is

  • n as it is

available available

  • B

B-

  • cell

celli

i:

: i ith

th candidate profile:

candidate profile:

  • List of URLs

List of URLs

  • Historic Evidence/Support: List of supporting cumulative conditi

Historic Evidence/Support: List of supporting cumulative conditional

  • nal

probabilities ( probabilities (URL URLk

k,

, prob prob( (URL URLk

k)) with

)) with prob prob( (URL URLk

k) =

) = prob prob( (URL URLk

k | B

| B-

  • cell

celli

i)

)

  • Each profile has its own influence zone defined by

Each profile has its own influence zone defined by σ σi

i