2016 02 10
play

2016-02-10 5.2 Learning Bayesian networks: Learning phase: General - PDF document

2016-02-10 5.2 Learning Bayesian networks: Learning phase: General idea Representing and storing the knowledge See Witten et al. 2011. The network we want to learn has to have a node for every feature that we have, resp. want to be Bayesian


  1. 2016-02-10 5.2 Learning Bayesian networks: Learning phase: General idea Representing and storing the knowledge See Witten et al. 2011. The network we want to learn has to have a node for every feature that we have, resp. want to be Bayesian (belief) networks aim at representing represented. A node contains a table that represents probability distributions of features and how to the probabilities for the different feature values for the combine them to make predictions about the likelihood different combinations of incoming node values (the that a particular example has a particular feature value. probabilities in each row have to sum up to 1). While Nodes in such networks are for features (a concept can the table in a node is relatively easy to compute, the be treated as a feature). Within a node, the probability directed edges between the nodes (i.e. the network distribution for its values for the connected features structure) is what we really are after. Note that we are recorded. would like to avoid simply connecting every node with Predictions are performed by traversing the graph. each other, since this usually results in overfitting the Learning is searching for the best network structure. network to the given data. Machine Learning J. Denzinger Machine Learning J. Denzinger Learning phase: Learning phase: What or whom to learn from Learning method One of the most simple search algorithms for the As for many other methods, Bayesian networks are network structure is K2, which is a local hill-climbing learned from example feature vectors: search among possible network structures. ex 1 : (val 11 ,..., val 1n ) It uses an ordering on features and in each state adds a ... link from a previously added node to the currently ex k : (val k1 ,..., val kn ) worked-on node until no improvement on the network evaluation can be achieved. Often, the number of links to a node is also limited (to avoid overfitting, again). The evaluation of a network usually combines a measure of the quality of the network’s prediction with a measure for the complexity of the network. Machine Learning J. Denzinger Machine Learning J. Denzinger Learning phase: Learning phase: Learning method (cont.) Learning method (cont.) One often used evaluation measure for a network W is The prediction of a network for the concept of a particular given example ex is computed by multiplying the the Akaike Information Criterion (AIC) (which is probabilities of the parent nodes for the particular feature minimized): values of ex (recursively, if a feature node has parents itself). AIC(W) = -LL(W) + K(W), This is converted to a probability for each concept value by where K is the sum of the number of independent taking the computed predictions for each value and dividing probabilities in all tables in all nodes of W (the them by the sum of the predictions of the values (see independent probabilities of a table are the number of example later). table entries minus the number of elements in the last The quality of the predictions of the network for all column, which is always dependent on the elements in learning examples is the product of all predictions for all the previous columns since they have to add up to 1) learning examples, which is usually a much too small and LL is the so-called log-likelihood of the network. number. Therefore LL is the sum of the binary logarithms of these predictions. Machine Learning J. Denzinger Machine Learning J. Denzinger 1

  2. 2016-02-10 Learning phase: Application phase: Learning method (cont.) How to detect applicable knowledge The probabilities in the tables of the nodes are As in so many other approaches, there is only one computed as the relative frequencies of the associated structure that can be (always) applied. combinations of feature values in the training examples. Machine Learning J. Denzinger Machine Learning J. Denzinger Application phase: Application phase: How to apply knowledge Detect/deal with misleading knowledge As stated before, we compute the probability of a As for previous learning methods, this is not part of the particular example ex being of feature value feat-val process. But examples that are not predicted correctly (out of possible concept-values feat-val 1 ,...,feat-val s ) by can be used to update the tables in the current multiplying the table entries with the probabilities network, although the network structure might then coming from the parent nodes. not be good any more, so that definitely after several new (badly handled) examples a total re-learning will be necessary. Machine Learning J. Denzinger Machine Learning J. Denzinger General questions: General questions: Generalize/detect similarities? Dealing with knowledge from other sources Using probabilities is usually aimed at not generalizing Some of the search methods for the network structure resp. having similarities between examples. require a start network that obviously has to come from other sources. Bayesian Belief networks are also used in decision support without learning them but by having human experts create them. In this case, such a network could be first learned (if enough examples are available) and then modified by these human experts. Machine Learning J. Denzinger Machine Learning J. Denzinger 2

  3. 2016-02-10 (Conceptual) Example (Conceptual) Example We will use the same example (resp. a subset) as for The examples we learn from are: decision trees: ex 1 : (small, blue, blond) 3 features: ex 2 : (big, blue, red) Height feat Height = {big, small} ex 3 : (big, blue, blond) Eye color feat eye = {blue, brown} ex 4 : (big, brown, blond) Hair color feat hair = {blond, dark, red} ex 5 : (small, blue, dark) ex 6 : (big, blue, dark) ex 7 : (big, brown, dark) ex 8 : (small, brown, blond) Machine Learning J. Denzinger Machine Learning J. Denzinger (Conceptual) Example (Conceptual) Example We order the features by For adding the Eye color node we do not have a lot of choice regarding the network structure: Height < Eye color < Hair color Then we start the learning process with the following node: Eye color Height Eye color blue brown Height small 0.667 0.333 big 0.6 0.4 Height small big 0.375 0.625 Height Height small big 0.375 0.625 Machine Learning J. Denzinger Machine Learning J. Denzinger (Conceptual) Example (Conceptual) Example AIC(W1) = -(log 2 (0.375*0.667*0.4) + For adding the Hair color we now have choices: we can log 2 (0.625*0.6*0.2) + log 2 (0.625*0.6*0.4) + add a link from either Height or Eye color: log 2 (0.625*0.4*0.667) + log 2 (0.375*0.667*0.4) + log 2 (0.625*0.6*0.4) + Hair color Hair color log 2 (0.625*0.4*0.333) + AIC(W1) Eye color blond red dark log 2 (0.375*0.333*0.667)) + (4+3) = Height Eye color blue brown 25.592 + 7 = 32.592 small 0.667 0.333 big 0.6 0.4 AIC(W2) = -(log 2 (0.375*0.5) + log 2 (0.625*0.333) + AIC(W2) log 2 (0.625*0.333) + log 2 (0.625*0.5) + Height log 2 (0.375*0.5) + log 2 (0.625*0.334) + Height log 2 (0.625*0.5) + log 2 (0.375*1)) + (8+3) = small big 0.375 0.625 16.389 + 11 = 27.389 Machine Learning J. Denzinger Machine Learning J. Denzinger 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend