Fix auto-fire, add exercises, explain Space Invaders delta

2021-03-18 22:16:38 +13:00 · 2021-03-18 22:16:38 +13:00 · e9b5dce122
parent 86702573c6
commit e9b5dce122
1 changed files with 94 additions and 8 deletions
--- a/18_reinforcement_learning.ipynb
+++ b/18_reinforcement_learning.ipynb
@ -2055,14 +2055,15 @@
    "\n",
    "class AtariPreprocessingWithAutoFire(AtariPreprocessing):\n",
    "    def reset(self, **kwargs):\n",
-    "        super().reset(**kwargs)\n",
-    "        return self.step(1)[0] # FIRE to start\n",
+    "        obs = super().reset(**kwargs)\n",
+    "        super().step(1) # FIRE to start\n",
+    "        return obs\n",
    "    def step(self, action):\n",
    "        lives_before_action = self.ale.lives()\n",
-    "        out = super().step(action)\n",
+    "        obs, rewards, done, info = super().step(action)\n",
    "        if self.ale.lives() < lives_before_action and not done:\n",
-    "            out = super().step(1) # FIRE to start after life lost\n",
-    "        return out\n",
+    "            super().step(1) # FIRE to start after life lost\n",
+    "        return obs, rewards, done, info\n",
    "\n",
    "env = suite_atari.load(\n",
    "    environment_name,\n",
@ -2791,12 +2792,97 @@
    "time_step"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Exercise Solutions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. to 7.\n",
+    "\n",
+    "See Appendix A."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 8.\n",
+    "_Exercise: Use policy gradients to solve OpenAI Gym's LunarLander-v2 environment. You will need to install the Box2D dependencies (`python3 -m pip install -U gym[box2d]`)._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "TODO"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9.\n",
+    "_Exercise: Use TF-Agents to train an agent that can achieve a superhuman level at SpaceInvaders-v4 using any of the available algorithms._"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please follow the steps in the [Using TF-Agents to Beat Breakout](http://localhost:8888/notebooks/18_reinforcement_learning.ipynb#Using-TF-Agents-to-Beat-Breakout) section above, replacing `\"Breakout-v4\"` with `\"SpaceInvaders-v4\"`. There will be a few things to tweak, however. For example, the Space Invaders game does not require the user to press FIRE to begin the game. Instead, the player's laser cannon blinks for a few seconds then the game starts automatically. For better performance, you may want to skip this blinking phase (which lasts about 40 steps) at the beginning of each episode and after each life lost. Indeed, it's impossible to do anything at all during this phase, and nothing moves. One way to do this is to use the following custom environment wrapper, instead of the `AtariPreprocessingWithAutoFire` wrapper:"
+   ]
+  },
  {
   "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 132,
   "metadata": {},
   "outputs": [],
-   "source": []
+   "source": [
+    "class AtariPreprocessingWithSkipStart(AtariPreprocessing):\n",
+    "    def skip_frames(self, num_skip):\n",
+    "        for _ in range(num_skip):\n",
+    "          super().step(0) # NOOP for num_skip steps\n",
+    "    def reset(self, **kwargs):\n",
+    "        obs = super().reset(**kwargs)\n",
+    "        self.skip_frames(40)\n",
+    "        return obs\n",
+    "    def step(self, action):\n",
+    "        lives_before_action = self.ale.lives()\n",
+    "        obs, rewards, done, info = super().step(action)\n",
+    "        if self.ale.lives() < lives_before_action and not done:\n",
+    "            self.skip_frames(40)\n",
+    "        return obs, rewards, done, info"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Moreover, you should always ensure that the preprocessed images contain enough information to play the game. For example, the blasts from the laser cannon and from the aliens should still be visible despite the limited resolution. In this particular case, the preprocessing we did for Breakout still works fine for Space Invaders, but that's something you should always check if you want try other games. To do this, you can let the agent play randomly for a while, and record the preprocessed frames, then play the animation and ensure the game still looks playable.\n",
+    "\n",
+    "You will also need to let the agent train for quite a long time to get good performance. Sadly, the DQN algorithm is not able to reach superhuman level on Space Invaders, likely because humans are able to learn efficient long term strategies in this game, whereas DQN can only master fairly short strategies. But there has been a lot of progress over the past few years, and now many other RL algorithms are able to surpass human experts at this game. Check out the [State-of-the-Art for Space Invaders on paperswithcode.com](https://paperswithcode.com/sota/atari-games-on-atari-2600-space-invaders)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.\n",
+    "_Exercise: If you have about $100 to spare, you can purchase a Raspberry Pi 3 plus some cheap robotics components, install TensorFlow on the Pi, and go wild! For an example, check out this [fun post](https://homl.info/2) by Lukas Biewald, or take a look at GoPiGo or BrickPi. Start with simple goals, like making the robot turn around to find the brightest angle (if it has a light sensor) or the closest object (if it has a sonar sensor), and move in that direction. Then you can start using Deep Learning: for example, if the robot has a camera, you can try to implement an object detection algorithm so it detects people and moves toward them. You can also try to use RL to make the agent learn on its own how to use the motors to achieve that goal. Have fun!_"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It's your turn now: go crazy, be creative, but most of all, be patient and move forward step by step, you can do it!"
+   ]
  }
 ],
 "metadata": {
@ -2815,7 +2901,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.10"
+   "version": "3.7.9"
  }
 },
 "nbformat": 4,