COMP4418, – Assignment 3

Worth: 15%

1. [20 Marks] (Social Choice and Game Theory)

(a)

a b c

d

e f g

In the tournament in the above Figure, assuming all the arcs missing from the figure are

downward arc, list

• the uncovered set;

• the top cycle;

• the set of Copeland winners;

• the set of Banks winners; and

• the set of Condorcet winners.

(b) Compute all the Nash equilibria of the following two player game.

D E

A 2, 4 8, 5

B 6, 6 4, 4

1

2. [30 Marks] (Decision Making)

(a) For each of the following games, choose the best model among (A) Markov chain (Markov

process); (B) Markov decision process (MDP); (C) Hidden Markov model (HMM); (D)

Partially-observable Markov decision process (POMDP); and (E) None/Other.

• Blackjack

• Candy Crush

• Chess

• Minesweeper

• Snakes and Ladders

• Texas Hold ’em Poker

For the next questions, consider the Markov Decision Process depicted below. Edges are labelled

“name of the action (reward associated), probability of the transition”.

Stay (1), 1 s1 s2 Stay (0), 1 s3

Leave (0), 1

Leave (5), 0.5

Leave (5), 0.5 Stay (−2), 1

Leave (−2), 1

(b) Using your intuition, give an optimal policy for situations where the discount factor is very

high (for instance, δ = 0.999)? Explain your reasoning in two or three sentences.

(c) Using your intuition, give an optimal policy for situations where the discount factor is very

low (for instance, δ = 0.001)? Explain your reasoning in two or three sentences.

(d) Represent the values computed during the first three iterations of the Value Iteration algorithm

using the following format where L represents the action Leave and S represents the action

Stay. Use a discounting factor of 0.6.

V0(s) V0(s, S) V0(s, L) V1(s) V1(s, S) V1(s, L) V2(s) V2(s, S) V2(s, L) V3(s)

s1 0 1 . . .

s2 0 . . .

s3 0

(e) Let π be the following policy: π(s1) = L, π(s2) = L, π(s3) = S. If π is assumed to hold, the

MDP turns into a Markov Chain. Represent this Markov Chain / Markov Process.

(f) Assuming the agent uses π, express the value associated to each state as a function of the

discount factor δ. Provide the formal derivation of the result as part of your answer. Elaborate

on whether the computations of this question support the intuition of questions 2b and 2c.

2

Submission

• Put your written solutions in a single PDF file assn3.pdf

• Submit using the command: give cs4418 assn3 assn3.pdf

Late Submissions

Due to the assignment due date being extended to the 22nd November there will be no late submissions

allowed.