Home
About
- Home x Work
  
  Jianxu Chen's personal website and blogs
- Learn More
- Email
- LinkedIn
- Instagram
- Github
Posts
- All Posts
- All Tags
Research
Publication

Introduction To Rnn

26 Apr 2016

Reading time ~2 minutes

layout: post title: “Introduction to RNN” date: 2016-04-26 excerpt: “A Gentle Introduction to Recurrent Neural Network” tag:

RNN
Deep Learning Study comments: true —

In this material, I will try to answer two questions (1) what is RNN, and (2) how it works, using examples in natural language processing (NLP) and computer vision (CV).

Other key points will be covered:

What is the difference between RNN and classic neural networks?
Why should I care about RNN?
What is LSTM and GRU?
How to train an RNN?
Are there ready-to-use codes to try out your ideas?
Where to learn the theory of RNN?
Where is the famous “seq-2-seq” learning?
…

(Note: All pictures are from the Internet)

What is Recurrent Neural Network (RNN)?

====

NN:

Deep NN

DNN

RNN

Unfolding RNN

unfold

Deep RNN

DRNN

In short, RNN

takes a sequence of input (length>=1)
keep applying the same set of operations on each of the input aloing the sequence
carry the intenal state to represent or remember the underlying patterns/relations/states in the sequence
generates a sequene of outputs (length>=1)

Key feature: shared parameters

How RNN works?

====

LSTM: the actual RNN in practice

Due to the vanishing gradient problem, we use long short term memory (LSTM) units as the actuall RNN layer.

(Vanishing gradient problem is an issue may cause RNN hard to train. For details, check this awesome video.)

Understanding LSTM can help us understand:

How to capture long-term dependency (i.e. the result at time step T depends on the input at time step T-k)
The core mechanism of RNN
How to intreprate “RNN knows when to remember and when to forget”

Let’s move to this awesome blog

Training: back propagation through time (BPTT)

In terms of hardware acceleration, it is important to understand BPTT.

The computation is RNN consists of two parts: forward pass and backward pass. In the forward pass, the computation is mostly + - x or memory operations. The heavy part is in the backward pass. We need to compute the difference between the output and target, and propagate the gradient all the way back to the beginning. (New cudnn library (v5) from nvidia claims 6x speedup in LSTM.)

Let’s enjoy this nice blog.

Dealing with input and output with variable length

Three key RNN architectures:

Variable length to fixed length (Video Classification)
Fixed length to variable length (Image Caption, RCNN)
Variable length to variable length, i.e. seq-2-seq (Translation)

See Figure 10.9, 10.10, 10.12 in GoogleBrian’s book.

Extension: BD-RNN and Grid-RNN

Bi-directional RNN: exploring the context from two directions

bdrnn

Grid-LSTM: Extra connections are added in the LSTM in order to further explore the context. paper

Applications

====

Learn more application? Check out this awesome list.

Extra Materials

Batch Normalization in RNN
Seq-2-Seq Learning
Learning materials of RNN and LSTM
Another nice blog for LSTM
Torch RNN package with lots of examples

Like Tweet +1