NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
Artificial Immune Systems Artificial Immune Systems and Data - - PowerPoint PPT Presentation
Artificial Immune Systems Artificial Immune Systems and Data Mining: Bridging the and Data Mining: Bridging the Gap with Scalability and Gap with Scalability and Improved Learning Improved Learning Olfa Nasraoui, Fabio Gonzlez Cesar
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
Through a process of recognition recognition and and stimulation stimulation, B , B-
Cells will clone and mutate clone and mutate to produce a to produce a diverse diverse set of antibodies set of antibodies adapted to different antigens adapted to different antigens
B-
Cells secrete secrete antibodies w/ antibodies w/ paratopes paratopes that can that can bind to bind to specific antigens specific antigens ( (epitopes epitopes) ) and destroy their host invading agent and destroy their host invading agent through a through a KILL, SUICIDE, or INGEST KILL, SUICIDE, or INGEST signal signal. .
B-
Cells antibody antibody paratopes paratopes also can also can bind to antibody bind to antibody idiotopes idiotopes on
B-
Cells, hence sending a STIMULATE or SUPPRESS signal SUPPRESS signal hence the hence the Network Network Memory Memory
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
Compactness of representation
Network of B-
cells: each cell can recognize several antigens
B-
cells compressed into clusters/sub-
networks
Fast incremental processing of new data points
New antigen influences only activated sub-
network
Activated cells updated incrementally
Proposed approach learns in 1 pass 1 pass. .
Clear and fast identification of “outliers”
New antigen that does not activate any subnetwork subnetwork is a is a potential outlier potential outlier create new B create new B-
cell to recognize it
This new B-
cell could grow into a subnetwork subnetwork (if it is stimulated (if it is stimulated by a new trend) or die/move to disk (if outlier) by a new trend) or die/move to disk (if outlier)
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
1 1-
Pass Adaptive Immune Immune Learning Learning Evolving Immune Evolving Immune Network Network (compressed into (compressed into subnetworks subnetworks) ) Evolving data Immune network information system
Stimulation (competition & memory) Age (old vs. new) Outliers (based on activation)
?
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
Internal and External Immune Interactions: Before & After Internal Immune Interactions Internal Stimulation External Stimulation Lifeline of B-cell
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
Continuous Continuous Immune Immune Learning Learning
Initialize ImmuNet and MaxLimit Present NEW antigen data Update subNet* ‘s ARBs’ stimulations Clone and Mutate ARBs Trap Initial Data Compress ImmuNet Compress ImmuNet into K subNet’s Compute soft activations in subNet* Update subNet* ‘s ARB Influence range /scale Identify nearest subNet* Kill lethal ARBs
#ARBs > MaxLimit?
Kill extra ARBs (based on age/stimulation strategy) OR increase acuteness of competition OR Move oldest patterns to aux. storage
Memory Constraints
No
Start/Reset Secondary storage
Outlier?
ImmuNet Stat’s & Visualization
Activates ImmuNet?
Clone antigen
Yes No
Yes
Domain Knowledge Constraints
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
Antigens represent data and the and the B B-
Cells represent clusters or patterns to be learned/extracted patterns to be learned/extracted
ARB/B-
cell object:
Represents not just a single item, but a fuzzy set fuzzy set
Better Approximate Approximate Reasoning abilities Reasoning abilities
Each ARB is allowed to have is own zone of influence zone of influence with with size/scale: size/scale: σ σi
i
ARBs dynamically adapt their influence zones dynamically adapt their influence zones/hence /hence stimulation level in a strife for survival. stimulation level in a strife for survival.
Membership function dynamically adapts adapts to data to data
Outliers are easily detected through weak activations are easily detected through weak activations
No more dependence on hard threshold-
cuts to establish network
Can include most probabilistic and possibilistic possibilistic models of uncertainty models of uncertainty
Flexible for different attributes types (numerical, categorical, …etc) …etc)
NSF-NGDM, Nov. 1-3, 2002, Baltimore, MD
Nasraoui, Gonzalez, Cardona, Nasraoui, Gonzalez, Cardona, Dasgupta Dasgupta: Scalable Artificial Immune System Based Data Mining : Scalable Artificial Immune System Based Data Mining
The Web server plays the role of the human body, and the incoming requests play g requests play the role of antigens that need to be detected the role of antigens that need to be detected
The input data is similar to web log data (a record of all files/URLs accessed by /URLs accessed by users on a Web site) users on a Web site)
The data is pre-
processed to produce session lists:
A session list S Si
i for user #
for user #i i is a list of is a list of URLs visited by same user URLs visited by same user
In discovery mode, a session is fed to the learning system as soon as it is
available available
B-
celli
i:
: i ith
th candidate profile:
candidate profile:
List of URLs
Historic Evidence/Support: List of supporting cumulative conditional
probabilities ( probabilities (URL URLk
k,
, prob prob( (URL URLk
k)) with
)) with prob prob( (URL URLk
k) =
) = prob prob( (URL URLk
k | B
| B-
celli
i)
)
Each profile has its own influence zone defined by σ σi
i