Home
About
- Home x Work
  
  Jianxu Chen's personal website and blogs
- Learn More
- Email
- LinkedIn
- Instagram
- Github
Posts
- All Posts
- All Tags
Research
Publication

RL and Deep RL

05 Sep 2016

Reading time ~1 minute

RL and Deep RL

RL + Deep Learning = Deep RL

RL is a framework for decision make (math foundation: Bellman equation). The core idea is to select the agent’s action which is mostly likely to achieve the highest ultimate reward according to the current environment state.

Deep Learning is a framework for learning representation. The core idea is to automatically learn the representation (usually highly complex) of the raw input, which is required to achieve a given objective.

Three key components in RL are

Policy: how to select action in certain state
Value function: how good the state or the action is
Model: how to represent the environmet

Each of these three components can adopt deep learning framework to learn the representation, which will be elaborated below, respectively.

Policy-based Deep RL

One good introduction of policy-based deep RL is Karpathy’s post.

Core idea: Use neural netowrk to estimate the policy (input: state, output: the probability of each action)

One policy –> Expected final reward –> The gradient of the expected reward –> Update the policy in the gradient direction.

Key weapon: Policy Gradients

Value-based Deep RL

One good introduction of value-based deep RL is the post from Nervana

Core idea: Use neural network to estimate the optimal value function, the maximum value can be achieved under any policy.

Key weapon: Q-network

Model-based Deep RL

Tutorial

Core idea: Model the environment, plan using the model