question archive  Frozen Lake In this programming assignment we will be using the gym library from OpenAl (gym

 Frozen Lake In this programming assignment we will be using the gym library from OpenAl (gym

Subject:Computer SciencePrice: Bought3

 Frozen Lake In this programming assignment we will be using the gym library from OpenAl (gym.openai.com) to implement some algorithms for Reinforcement Learning. The gym library has several environments that can be used to test and design reinforcement learning algorithms in this Lab we will use two of them, the FrozenLake-v1 and the FrozenLake8x8-v1. For both of them the problem is similar, the difference is the space size. The problem is the following: 'Winter is still here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend. An episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise. The state space in the FrozenLake-v1 is as follows: SFFF FHFH FFFH HFFG S represents the start state, it is safe. . F represents the frozen surface, it is safe. . H represents the holes if you fall the game is over. G represents the goal state if you reach you got the frisbee back and the game is over. The red square indicates the current position in the state space. The possible actions in the state space are: LEFT = C . DOWN = 1 RIGHT = 2 UP = 3 But it is not guaranteed that if you move Right you ended up in the state at the Right because the ground is slippery. You have to use Reinforcement Learning to find the best policy for you to get your Frisbee. Notice that, because your action are not deterministic it is really hard to get such a policy but you must be able to find a better policy than a random walk.

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions