Week 1
RNN - Introduction, Notation and Structure
Sequence Data
Notation
Symbols used in the notation are,
- $x^{<t>}$ or $y^{<t>}$
- $<>$ are used to denote the index of the word in the input $x$, or output $y$
- $x^{<1>}$ is the first word in the input sequence
- $t$ serves as an index in the general form, where t means "temporal"
- $X^{(i)<t>}$ or $Y^{(i)<t>}$
- For all training examples, particular examples of input $X$ and output $Y$ are indexed with $(i)$. Particular words in the sequence of the example are denoted by $<t>$
- Superscript $[l]$ denotes the object in layer $l$
- Subscript $i$ denotes the $i$th entry in the vector. E.g. If the one-hot encoded vector of a word has 1000 entries, the $i$th entry is denoted with the $a_{i}$
- $a_{4}^{(12)[3]<5>}$ refers to the activation of the (12)th training example, in layer [3], at time step <5> and 4th entry in the vector
- $T_{x}$ or $T_{y}$
- Length of the temporal sequence is denoted by $T$. $x$ and $y$ here denote input or output
- $T_{x}^{(i)}$ or $T_{y}^{(i)}$
- Length of input and output across examples can be different. Not all sentences contain the same number of words after all.
- Length of the temporal sequence of a particular training example is denoted by $(i)$
Word Representation
Words are denoted by $x^{<t>}$, but they are represented/stored in several different ways, one of them is one-hot encoding.
A few important terms,
Image 2: Word encoding
- Vocabulary/Dictionary: The set of all the words that will be used in the representation. Vocabularies usually contain 10s of thousands (10^4) words. Commercial internet-scale companies use dictionaries/vocabularies of millions of words (10^6 or 10^7) in size
- Each entry in the dictionary is a word