Minor change on greedy policy variable usage

Chap 18, why not using directly the 'n_outputs' variable defined earlier, instead of hardcoded '2'
main
B D 2021-02-28 12:02:23 +01:00 committed by GitHub
parent 0eb31f77c2
commit 64f0e05a94
1 changed files with 1 additions and 1 deletions

View File

@ -1306,7 +1306,7 @@
"source": [
"def epsilon_greedy_policy(state, epsilon=0):\n",
" if np.random.rand() < epsilon:\n",
" return np.random.randint(2)\n",
" return np.random.randint(n_outputs)\n",
" else:\n",
" Q_values = model.predict(state[np.newaxis])\n",
" return np.argmax(Q_values[0])"