layout: post title: “Introduction to RNN” date: 2016-04-26 excerpt: “A Gentle Introduction to Recurrent Neural Network” tag:
- RNN
- Deep Learning Study comments: true —
In this material, I will try to answer two questions (1) what is RNN, and (2) how it works, using examples in natural language processing (NLP) and computer vision (CV).
Other key points will be covered:
- What is the difference between RNN and classic neural networks?
- Why should I care about RNN?
- What is LSTM and GRU?
- How to train an RNN?
- Are there ready-to-use codes to try out your ideas?
- Where to learn the theory of RNN?
- Where is the famous “seq-2-seq” learning?
- …
(Note: All pictures are from the Internet)
What is Recurrent Neural Network (RNN)?
====
NN:
Deep NN
RNN
Unfolding RNN
Deep RNN
In short, RNN
-
takes a sequence of input (length>=1)
-
keep applying the same set of operations on each of the input aloing the sequence
-
carry the intenal state to represent or remember the underlying patterns/relations/states in the sequence
-
generates a sequene of outputs (length>=1)
Key feature: shared parameters
How RNN works?
====
LSTM: the actual RNN in practice
Due to the vanishing gradient problem, we use long short term memory (LSTM) units as the actuall RNN layer.
(Vanishing gradient problem is an issue may cause RNN hard to train. For details, check this awesome video.)
Understanding LSTM can help us understand:
- How to capture long-term dependency (i.e. the result at time step T depends on the input at time step T-k)
- The core mechanism of RNN
- How to intreprate “RNN knows when to remember and when to forget”
Let’s move to this awesome blog
Training: back propagation through time (BPTT)
In terms of hardware acceleration, it is important to understand BPTT.
The computation is RNN consists of two parts: forward pass and backward pass. In the forward pass, the computation is mostly + - x or memory operations. The heavy part is in the backward pass. We need to compute the difference between the output and target, and propagate the gradient all the way back to the beginning. (New cudnn library (v5) from nvidia claims 6x speedup in LSTM.)
Let’s enjoy this nice blog.
Dealing with input and output with variable length
Three key RNN architectures:
- Variable length to fixed length (Video Classification)
- Fixed length to variable length (Image Caption, RCNN)
- Variable length to variable length, i.e. seq-2-seq (Translation)
See Figure 10.9, 10.10, 10.12 in GoogleBrian’s book.
Extension: BD-RNN and Grid-RNN
Bi-directional RNN: exploring the context from two directions
Grid-LSTM: Extra connections are added in the LSTM in order to further explore the context. paper
Applications
====
Learn more application? Check out this awesome list.