Question 1.

(1)

Agent Type / Performance
Measure / Environment / Actuators / Sensors
Robot soccer
player / Winning game,
goals for/against / Field, ball, own team, other team,
own body / Devices (e.g.,
legs) for
locomotion and kicking / Camera, touch
sensors, accelerometers,
orientation sensors,
wheel/joint
encoders
Internet
book-shopping
agent / Obtain requested/
Interesting books, minimize
expenditure / Internet / Follow link,
enter/submit data in fields, display to user / Web pages, user requests

Agent types and their PEAS descriptions

(2)

Environment properties are given in the following figure. Suitable agent types:

a. A model-based reflex agent would suffice for most aspects; for tactical play, a utility-based agent with lookahead would be useful.

b. A goal-based agent would be appropriate for specific book requests. For more open-ended tasks—e.g., “Find me something interesting to read”—tradeoffs are involved and the agent must compare utilities for various possible purchases.

Task Environment / Observable / Deterministic / Episodic / Static / Discrete / Agents
Robot soccer / Partially / Stochastic / Sequential / Dynamic / Continuous / Multi
Internet book-shopping / Partially / Deterministic / Sequential / Static / Discrete / Single

Environment properties

Question 2.

a. The algorithm expands nodes in order of increasing path cost; therefore the first goal it encounters will be the goal with the cheapest cost.

b. It will be the same as iterative deepening, d iterations, in which O(bd)nodes are generated.

c. d/e

Question 3.

A heuristic is consistent iff, for every node n and every successor n‘of n generated by any action a,

h(n) ≤ c(n, a, n’) + h(n’)

One simple proof is by induction on the number k of nodes on the shortest path to any goal from n. For k = 1, let n‘ be the goal node; then h(n) ≤ c(n, a, n’). For the inductive case, assume n‘ is on the shortest path k steps from the goal and that h(n’) is admissible by hypothesis; then

h(n) ≤ c(n, a, n’) + h(n’) ≤ c(n, a, n’) + h*(n’) = h*(n)

so h(n) at k+1steps from the goal is also admissible.

Question 4.

The exact steps depend on certain choices you are free to make.

a. Choose the X3 variable. Its domain is {0, 1}.

b. Choose the value 1 for X3. (We can’t choose 0; it wouldn’t survive forward checking, because it would force F to be 0, and the leading digit of the sum must be non-zero.)

c. Choose F, because it has only one remaining value.

d. Choose the value 1 for F.

e. Now X2 and X1 are tied for minimum remaining values at 2; let’s choose X2.

f. Either value survives forward checking, let’s choose 0 for X2.

g. Now X1 has the minimum remaining values.

h. Again, arbitrarily choose 0 for the value of X1.

i. The variable O must be an even number (because it is the sum of T + T less than 5 (because O + O = R +10 ´ 0). That makes it most constrained.

j. Arbitrarily choose 4 as the value of O.

k. R now has only 1 remaining value.

l. Choose the value 8 for R.

m. T now has only 1 remaining value.

n. Choose the value 7 for T.

o. U must be an even number less than 9; choose U.

p. The only value for U that survives forward checking is 6.

q. The only variable left is W.

r. The only value left for W is 3.

s. This is a solution.

This is a rather easy (under-constrained) puzzle, so it is not surprising that we arrive at a solution with no backtracking (given that we are allowed to use forward checking).

Question 5.

a. The game tree, complete with annotations of all minimax values, is shown in the following figure. The “?” values are handled by assuming that an agent with a choice between winning the game and entering a “?” state will always choose the win. That is, min(–1,?) is –1 and max(+1,?) is +1. If all successors are “?”, the backed-up value is “?”.

A game tree for the four-square game. Terminal states are in single boxes, loop states in double boxes. Each state is annotated with its minimax value in a circle.

b. Standard minimax is depth-first and would go into an infinite loop. It can be fixed by comparing the current state against the stack; and if the state is repeated, then return a “?” value. Propagation of “?” values is handled as above. Although it works in this case, it does not always work because it is not clear how to compare “?” with a drawn position; nor is it clear how to handle the comparison when there are wins of different degrees (as in backgammon). Finally, in games with chance nodes, it is unclear how to compute the average of a number and a “?”. Note that it is not correct to treat repeated states automatically as drawn positions; in this example, both (1,4) and (2,4) repeat in the tree but they are won positions.

What is really happening is that each state has a well-defined but initially unknown value. These unknown values are related by the minimax equation at the bottom of page 163. If the game tree is acyclic, then the minimax algorithm solves these equations by propagating from the leaves. If the game tree has cycles, then a dynamic programming method must be used, as explained in Chapter 17. (Exercise 17.8 studies this problem in particular.) These algorithms can determine whether each node has a well-determined value (as in this example) or is really an infinite loop in that both players prefer to stay in the loop (or have no choice). In such a case, the rules of the game will need to define the value (otherwise the game will never end). In chess, for example, a state that occurs 3 times (and hence is assumed to be desirable for both players) is a draw.