Quantitative Methods Assignment 1 Instructor: Xi Chen Due date: - - PDF document

quantitative methods assignment 1
SMART_READER_LITE
LIVE PREVIEW

Quantitative Methods Assignment 1 Instructor: Xi Chen Due date: - - PDF document

Quantitative Methods Assignment 1 Instructor: Xi Chen Due date: Oct. 17 1. Consider the training examples show in Figure 1 for a binary classification problem. Figure 1: Data set for Exercise 1 (a) Compute the Gini index for the overall


slide-1
SLIDE 1

Quantitative Methods Assignment 1

Instructor: Xi Chen Due date: Oct. 17

  • 1. Consider the training examples show in Figure 1 for a binary classification problem.

Figure 1: Data set for Exercise 1 (a) Compute the Gini index for the overall collection of training examples. (b) Compute the Gini index for the Customer ID attribute. (c) Compute the Gini index for the Gender attribute. (d) Compute the Gini index for the Car Type attribute. (e) Compute the Gini index for the Shirt Size attribute. (f) Which attribute is better, Gender, Car Type or Shirt Size? (g) Explain why Customer ID should not be used as the attribute test condition even though it has the lowest Gini. 1

slide-2
SLIDE 2
  • 2. Consider the training examples show in Figure 2 for a binary classification problem.

Figure 2: Data set for Exercise 2 (a) What is the entropy of this collection of training examples with respect to the positive class? (b) What are the information gains of a1 and a2 relative to these training exam- ples? (c) For a3, which is a continuous attribute, compute the information gain for every possible split. (d) What is the best split (among a1, a2, and a3) according to the information gain? (e) What is the best split (between a1 and a2) according to the information gain? (f) What is the best split (between a1 and a2) according to the Gini index?

  • 3. Consider the following data set in Figure 3 for a binary classification problem.

Figure 3: Data set for Exercise 3 (a) Calculate the information gain when splitting on A and B. Which attribute would the induction tree algorithm choose? 2

slide-3
SLIDE 3

(b) Calculate the gain in the Gini index when splitting on A and B. Which attribute would the induction tree algorithm choose? (c) In the lecture we showed that entropy and the Gini index are both monotonous- ly increasing on the range [0,0.5] and they are both monotonously decreasing

  • n the range [0.5,1]. Is it possible that information gain and the gain in the

Gini index favor different attributes? Explain.

  • 4. (Bonus Question) Show that the entropy of a node never increases after splitting it

into smaller successor nodes. 3