Bayesian Networks and ITS Overview Knowledge acquisition is hard - PowerPoint PPT Presentation

1 Bayesian Networks and ITS

Overview ● Knowledge acquisition is hard in general, 2 and not well understood. ● It is time consuming, when everything is to be hand-coded. ● Can the machine automatically gather the needed information? ● Machine Learning ● A number of approaches now available. ● Do they work well?

Machine learning outputs Quality? What to Other Core System Do? Observables Control Box Changes Inputs

Machine Learning Rule Reinforcement Induction Learning Fully …...... Rote Case Automated Learning Based Example Discovery (being told) System Based Learning Version Space Learning Inductive Neural Logic Networks Programming

Learn What? ● Domain knowledge – Correct knowledge – concepts, dependencies, rules, etc – Misconceptions – Perturbation model – Interventions ● Student model ● T utoring model

…. ● Student model inducing from behaviour records , test records, etc. ● Interventions can improve with records of past cases. ● Perturbation model used to generate misconceptions. – Machine learning? ● Bayesian networks – a probability model of a domain, probabilities change with time... – Learning!

Overview... ● Uncertainty is fundamental to education! ● Our knowledge of the learning process, access to learner's state of knowledge, and also where he/she is going. ● Strength of ITS is in effective prediction of learner's step, and choosing right action. ● Modelling uncertainty is critical. – What kind of uncertainty? – What kind of model?

Uncertainty Fuzzy Certainty Logic Non-monotonic Factors Logics & reasoning Dempster Shafer Non-numeric theory Models Dependency Probability models Networks – Bayesian networks

Uncertainty ● Given the current state of the world, form beliefs on the student knowledge level. ● Given knowledge level, decide action against a situation. ● Selection of next problem. ● Probability models may be a good start. – Bayesian networks

Bayesian Networks ● Causal Concept networks with attached probability. ● Bayesian methods are capable of handling noisy and incomplete information. ● Bayes's theorem to save us from massive probability computations.

12 Computing posterior probability from Full joint distribution P(Cavity|T oothache)= ?

13 ... P(A|B) = P(A ∧ B) / P(B) oothache) = P(Cavity ∧ T P(Cavity|T oothache) / P(T oothache) P(Cavity ∧ T oothache) = 0.04 P(T oothache) = ? P(C|T) = ?

15 Independence ● T wo random variables A B are (absolutely) independent iff P(A^B)=P(A)P(B) – If n Boolean variables are independent, the full joint is P(X1,…,Xn)= Π iP(Xi) T wo random variables A, B given C are conditionally independent iff P(A^B|C)=P(A|C)*P(B|C)

…. P(I|A,B) = P(A,B|I) * P(I) / P(A,B) = P(A|I)*P(B|I)*P(I) / P(A)*P(B)

Data availability ● P(solves_p1 | knows_c) is easier than P(knows_c | solves_p1) ● Bayes' theorem provides to build one from the other.

Bayesian Network ● Network of probability influencers! ● Nothing else will influence: markov assumption. – => all else are conditionally independent. ● Every node has associated CP distribution as a function of its parents. – P(~X) = 1 – P(X) ● Information can flow in any direction.

19 Network  Each concept is represented by a node in the graph.  A directed edge from one concept to another is added if knowledge of the former is a prerequisite for understanding the latter

20 CPD for For-loop P(For-Loop | Variable Asgn, Rel Ops, Incr/Decr Oper)

21 Belief network example Neighbors John and Mary promised to call if the alarm goes off. Sometimes alarm starts because of earthquake. If Alarm went off, what is the probability of burglary? Variables: Burglary , Earthquake , Alarm , JohnCalls , MaryCalls (n=5 variables) Network topology reflects “causal” knowledge

22 Belief network example – cont.

23 Semantics in belief networks ● In a BN, the full joint distribution is defined as the product of the local conditional distributions: P(X1,…,Xn)= Π P(Xi|Parents(Xi)) for i=1 to n e.g: P(J ∧ M ∧ A ∧¬ B ∧¬ E) is given by? = P( ¬ B)P( ¬ E)P(A| ¬ B ∧¬ E)P(J|A)P(M|A) = = 0.999 x 0.998 x 0.001 x 0.90 x 0.70 = = 0.000062 Each node is conditionally independent of its descendents given its parents

25 Building a BBN ● Expert centric – Human expert creates the structure and the probability values – Can guess, where real value not available. – Hidden nodes are a problem! ● Data centric – Use population data from real trial, etc – Approaches vary on what is constructed from data. ● Efficiency centric – A combination, using domain knowledge to increase efficiency.

Using a B.N. ● Diagnostic reasoning – Given leaf nodes, predict prob of intermediate or root nodes. ● Predictive reasoning – Given root nodes, etc predict prob of intermediate nodes and leaf nodes. ● Explaining away – Sibling propagation – earthquake knowledge helps “reduce” probability of burglary, given alarm.

Andes’ Bayesian network ● Andes’ Bayesian networks encode two kinds of knowledge: – domain-general knowledge: encompassing general concepts and procedures that define proficiency in Newtonian physics – Need to stay across sessions. – task-specific knowledge: encompassing knowledge related to a student performance on a specific problem or example – Can be removed at end of task.

28 The domain-general part  The domain-general part of the stud. model consists of  Rule nodes  Context-Rule nodes  A student has mastered a rule when he/she is able to apply it correctly in all possible contexts (problems).  Rule nodes have binary values T and F, indicating the probability that each rule is mastered or not.  Context-Rule nodes represent mastery of physics rules in specific problem solving contexts .

29 The task-specific part ● The task-specific part of the Bayesian student model contains four types of nodes: – Fact, – Goal, – Rule-application and – Strategy nodes ● Fact and Goal nodes represent information that is derived while solving a problem by applying rules from the knowledge base. ● Goal and Fact nodes have binary values T and F indicating whether they are do-able (by the student). ● They have as many parents as there are ways to derive them.

30 In Andes ● Andes uses its rules to solve each physics problem in all possible ways, and accumulates all possible derivations of the correct answer. ● The derivations are collected in a data structure called solution graph. ● The consolidated solution graph for the full Andes system runs into thousands of nodes... too heavy for BN update, etc. ● BN can handle only propositional information, general solution graph nodes are first order.

Dynamic BN ● Dynamic BN is when node scenario changes with time. ● Andes uses a version of BN for solving this problem. ● Each problem mapped to a different solution graph, and hence a different Bayesian network. – Fully propositional! ● The Bayesian networks for different tasks are completely distinct and share no nodes. ● However, the prior probabilities of the domain-general nodes are set to the probabilities of the domain general nodes from the network of the preceding exercise.

32 Rule-application nodes ● Rule-application nodes connect Context- Rule nodes, Strategy nodes and Proposition nodes to new derived Proposition nodes. ● The nodes have values indicating whether they are Doable or Not-doable. ● The node is Doable if the student has applied or can apply the corresponding Context-Rule correctly.

33 Strategy nodes  Strategy nodes represent points where the student can choose among alternative plans to solve a problem.  These are the only non-binary nodes in the network: they have as many values as there are alternative plans.  The node is always paired with a Goal node, and it is used when there is more than one mutually exclusive way to address the goal.

34 A physics problem and a segment of the corresponding solution graph

35 Probabilistic student modeling

Updating SM ● Values may be changed depending on number and type of hints used. ● Number of mistakes made. – “Guess” probability? ● Skipped steps – how to attribute credit for the involved knowledge elements? – Use of other related elements can help. ● Multiple Rule applications for a node that has been reached. – Sharing credit among them...

Bayesian Networks and ITS Overview Knowledge acquisition is hard - PowerPoint PPT Presentation

1 Bayesian Networks and ITS Overview Knowledge acquisition is hard in general, 2 and not well understood. It is time consuming, when everything is to be hand-coded. Can the machine automatically gather the needed information?

CS 331: Bayesian Networks 2 1 Bayesian Networks Youve heard about how Bayesian networks

Bayesian Networks Youve heard about how Bayesian networks have revolutionized AI

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

Bayesian networks (2) Lirong Xia Last class Bayesian networks compact, graphical

AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Methods for Neural Networks Readings: Bishop, Neural Networks for Pattern Recognition .

Chapter14 Probabilistic Reasoning (Bayesian Networks) Sec. 1 - 2 20070607 Chap14 1

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

Bayesian Networks Philipp Koehn 2 April 2020 Philipp Koehn Artificial Intelligence: Bayesian

Bayesian Networks Philipp Koehn 6 April 2017 Philipp Koehn Artificial Intelligence: Bayesian

Probabilistic Modeling: Bayesian Networks Bioinformatics: Sequence Analysis COMP 571 - Spring

Bayesian Networks Li Xiong Slide credits: Page (Wisconsin) CS760 , Zhu (Wisconsin) KDD 12

Bayesian Networks Philipp Koehn 29 October 2015 Philipp Koehn Artificial Intelligence: Bayesian

Part 7 Bayesian hierarchical modelling, simulation and MCMC by Gero Walter 252 Bayesian

Dealing with Uncertainty Paolo Turrini Department of Computing, Imperial College London

TDDC17 Bayesian Networks F 8 Ch 12, An efficient means for doing probabilistic

Chapter13 Syntax and Semantics Inference Independence and Bayes' Rule

CSCI 5582 Artificial Intelligence Lecture 14 Jim Martin CSCI 5582 Fall 2006 Today 10/17

Overview of the Lecture II Probability of what The axioms of probability Joint

Intelligente Systeme WS 18/19 Dr. Benjamin Guthier Professur fr Bildverarbeitung Intelligente

Probability Basics 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University Random

COSC343: Artificial Intelligence Lecture 16: Introduction to probability theory Alistair Knott