Week 1

RNN - Introduction, Notation and Structure

Data captured over time. Either input x or output y or both could be sequences

Image 1: Examples of Sequence data

Symbols used in the notation are,

$x^{<t>}$ or $y^{<t>}$
- $<>$ are used to denote the index of the word in the input $x$, or output $y$
- $x^{<1>}$ is the first word in the input sequence
- $t$ serves as an index in the general form, where t means "temporal"
$X^{(i)<t>}$ or $Y^{(i)<t>}$
- For all training examples, particular examples of input $X$ and output $Y$ are indexed with $(i)$. Particular words in the sequence of the example are denoted by $<t>$
Superscript $[l]$ denotes the object in layer $l$
Subscript $i$ denotes the $i$th entry in the vector. E.g. If the one-hot encoded vector of a word has 1000 entries, the $i$th entry is denoted with the $a_{i}$
$a_{4}^{(12)[3]<5>}$ refers to the activation of the (12)th training example, in layer [3], at time step <5> and 4th entry in the vector
$T_{x}$ or $T_{y}$
- Length of the temporal sequence is denoted by $T$. $x$ and $y$ here denote input or output
$T_{x}^{(i)}$ or $T_{y}^{(i)}$
- Length of input and output across examples can be different. Not all sentences contain the same number of words after all.
- Length of the temporal sequence of a particular training example is denoted by $(i)$

Words are denoted by $x^{<t>}$, but they are represented/stored in several different ways, one of them is one-hot encoding.

A few important terms,

Image 2: Word encoding

Vocabulary/Dictionary: The set of all the words that will be used in the representation. Vocabularies usually contain 10s of thousands (10^4) words. Commercial internet-scale companies use dictionaries/vocabularies of millions of words (10^6 or 10^7) in size
Each entry in the dictionary is a word