• Home
  • About
    • Home x Work photo

      Home x Work

      Jianxu Chen's personal website and blogs

    • Learn More
    • Email
    • LinkedIn
    • Instagram
    • Github
  • Posts
    • All Posts
    • All Tags
  • Research
  • Publication

RL and Deep RL

05 Sep 2016

Reading time ~1 minute

RL and Deep RL

RL + Deep Learning = Deep RL

Reference: David’s tutorial

RL is a framework for decision make (math foundation: Bellman equation). The core idea is to select the agent’s action which is mostly likely to achieve the highest ultimate reward according to the current environment state.

Deep Learning is a framework for learning representation. The core idea is to automatically learn the representation (usually highly complex) of the raw input, which is required to achieve a given objective.

Three key components in RL are

  • Policy: how to select action in certain state
  • Value function: how good the state or the action is
  • Model: how to represent the environmet

Each of these three components can adopt deep learning framework to learn the representation, which will be elaborated below, respectively.

Policy-based Deep RL

One good introduction of policy-based deep RL is Karpathy’s post.

Core idea: Use neural netowrk to estimate the policy (input: state, output: the probability of each action)

One policy –> Expected final reward –> The gradient of the expected reward –> Update the policy in the gradient direction.

Key weapon: Policy Gradients

Value-based Deep RL

One good introduction of value-based deep RL is the post from Nervana

Core idea: Use neural network to estimate the optimal value function, the maximum value can be achieved under any policy.

Key weapon: Q-network

Model-based Deep RL

Tutorial

Core idea: Model the environment, plan using the model



Reinforcement LearningDeep Learning Study Like Tweet +1