From 80f6cb27c080282e75a6036991c99736f6b13bba Mon Sep 17 00:00:00 2001 From: 8bitmp3 <19637339+8bitmp3@users.noreply.github.com> Date: Sat, 17 Oct 2020 15:04:51 +0100 Subject: [PATCH] Update (small) the reinforcement learning chapter --- 18_reinforcement_learning.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/18_reinforcement_learning.ipynb b/18_reinforcement_learning.ipynb index e6d3717..726a137 100644 --- a/18_reinforcement_learning.ipynb +++ b/18_reinforcement_learning.ipynb @@ -565,7 +565,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Let's create a neural network that will take observations as inputs, and output the action to take for each observation. To choose an action, the network will estimate a probability for each action, then we will select an action randomly according to the estimated probabilities. In the case of the Cart-Pole environment, there are just two possible actions (left or right), so we only need one output neuron: it will output the probability `p` of the action 0 (left), and of course the probability of action 1 (right) will be `1 - p`." + "Let's create a neural network that will take observations as inputs, and output the probabilities of actions to take for each observation. To choose an action, the network will estimate a probability for each action, then we will select an action randomly according to the estimated probabilities. In the case of the Cart-Pole environment, there are just two possible actions (left or right), so we only need one output neuron: it will output the probability `p` of the action 0 (left), and of course the probability of action 1 (right) will be `1 - p`." ] }, {