wordEmbeddingLayer

Word embedding layer for deep learning neural network

Description

A word embedding layer maps word indices to vectors.

Use a word embedding layer in a deep learning long short-term memory (LSTM) network. An LSTM network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. A word embedding layer maps a sequence of word indices to embedding vectors and learns the word embedding during training.

This layer requires Deep Learning Toolbox™.

Creation

Syntax

layer = wordEmbeddingLayer(dimension,numWords)

layer = wordEmbeddingLayer(dimension,numWords,Name,Value)

Description

layer = wordEmbeddingLayer(dimension,numWords) creates a word embedding layer and specifies the embedding dimension and vocabulary size.

example

layer = wordEmbeddingLayer(dimension,numWords,Name,Value) sets optional properties using one or more name-value pairs. Enclose each property name in single quotes.

example

Properties

expand all

Word Embedding

`Dimension` — Dimension of word embedding
positive integer

Dimension of the word embedding, specified as a positive integer.

Example: 300

`NumWords` — Number of words in model
positive integer

Number of words in the model, specified as a positive integer. If the number of unique words in the training data is greater than NumWords, then the layer maps the out-of-vocabulary words to the same vector.

`OOVMode` — Out-of-vocabulary word handling mode
`"map-to-last"` (default) | `"error"`

Since R2023b

Out-of-vocabulary word handling mode, specified as one of these values:

"map-to-last" — Map out-of-vocabulary words to the last embedding vector in Weights.
"error" — Throw an error when layer receives out-of-vocabulary words. Use this option for models that already have an out-of-vocabulary token in its vocabulary, such as BERT.

Parameters and Initialization

`WeightsInitializer` — Function to initialize weights
`'narrow-normal'` (default) | `'glorot'` | `'he'` | `'orthogonal'` | `'zeros'` | `'ones'` | function handle

Function to initialize the weights, specified as one of the following:

'narrow-normal' – Initialize the weights by independently sampling from a normal distribution with zero mean and standard deviation 0.01.
'glorot' – Initialize the weights with the Glorot initializer [1] (also known as Xavier initializer). The Glorot initializer independently samples from a uniform distribution with zero mean and variance 2/(numIn + numOut), where numIn = NumWords + 1 and numOut = Dimension.
'he' – Initialize the weights with the He initializer [2]. The He initializer samples from a normal distribution with zero mean and variance 2/numIn, where numIn = NumWords + 1.
'orthogonal' – Initialize the input weights with Q, the orthogonal matrix given by the QR decomposition of Z = QR for a random matrix Z sampled from a unit normal distribution. [3]
'zeros' – Initialize the weights with zeros.
'ones' – Initialize the weights with ones.
Function handle – Initialize the weights with a custom function. If you specify a function handle, then the function must be of the form weights = func(sz), where sz is the size of the weights.

The layer only initializes the weights when the Weights property is empty.

Data Types: char | string | function_handle

`Weights` — Layer weights
matrix

Layer weights, specified as a Dimension-by-NumWords array or a Dimension-by-(NumWords+1) array.

If Weights is a Dimension-by-NumWords array, then the software automatically appends an extra column for out-of-vocabulary input when training a network using the trainNetwork function or when initializing a dlnetwork object.

For input integers i less than or equal to NumWords, the layer outputs the vector Weights(:,i). Otherwise, the layer maps outputs the vector Weights(:,NumWords+1).

Learn Rate and Regularization

`WeightLearnRateFactor` — Learning rate factor for weights
`1` (default) | nonnegative scalar

Learning rate factor for the weights, specified as a nonnegative scalar.

The software multiplies this factor by the global learning rate to determine the learning rate for the weights in this layer. For example, if WeightLearnRateFactor is 2, then the learning rate for the weights in this layer is twice the current global learning rate. The software determines the global learning rate based on the settings you specify using the trainingOptions (Deep Learning Toolbox) function.

`WeightL2Factor` — L₂ regularization factor for weights
1 (default) | nonnegative scalar

L₂ regularization factor for the weights, specified as a nonnegative scalar.

The software multiplies this factor by the global L₂ regularization factor to determine the L₂ regularization for the weights in this layer. For example, if WeightL2Factor is 2, then the L₂ regularization for the weights in this layer is twice the global L₂ regularization factor. You can specify the global L₂ regularization factor using the trainingOptions (Deep Learning Toolbox) function.

Layer

`Name` — Layer name
`""` (default) | character vector | string scalar

Layer name, specified as a character vector or string scalar. For Layer array input, the trainnet (Deep Learning Toolbox) and dlnetwork (Deep Learning Toolbox) functions automatically assign names to layers with the name "".

The WordEmbeddingLayer object stores this property as a character vector.

Data Types: char | string

`NumInputs` — Number of inputs
`1` (default)

This property is read-only.

Number of inputs to the layer, returned as 1. This layer accepts a single input only.

Data Types: double

`InputNames` — Input names
`{'in'}` (default)

This property is read-only.

Input names, returned as {'in'}. This layer accepts a single input only.

Data Types: cell

`NumOutputs` — Number of outputs
`1` (default)

This property is read-only.

Number of outputs from the layer, returned as 1. This layer has a single output only.

Data Types: double

`OutputNames` — Output names
`{'out'}` (default)

This property is read-only.

Output names, returned as {'out'}. This layer has a single output only.

Data Types: cell

Examples

collapse all

Create Word Embedding Layer

This example uses:

Open Live Script

Create a word embedding layer with embedding dimension 300 and 5000 words.

layer = wordEmbeddingLayer(300,5000)

layer = 
  WordEmbeddingLayer with properties:

         Name: ''
      OOVMode: 'map-to-last'

   Hyperparameters
    Dimension: 300
     NumWords: 5000

   Learnable Parameters
      Weights: []

Use properties method to see a list of all properties.

Include a word embedding layer in an LSTM network.

inputSize = 1;
embeddingDimension = 300;
numWords = 5000;
numHiddenUnits = 200;
numClasses = 10;

layers = [
    sequenceInputLayer(inputSize)
    wordEmbeddingLayer(embeddingDimension,numWords)
    lstmLayer(numHiddenUnits,'OutputMode','last')
    fullyConnectedLayer(numClasses)
    softmaxLayer]

layers = 
  5x1 Layer array with layers:

     1   ''   Sequence Input         Sequence input with 1 dimensions
     2   ''   Word Embedding Layer   Word embedding layer with 300 dimensions and 5000 unique words
     3   ''   LSTM                   LSTM with 200 hidden units
     4   ''   Fully Connected        10 fully connected layer
     5   ''   Softmax                softmax

Initialize Word Embedding Layer with Pretrained Word Embedding

This example uses:

Open Live Script

To initialize a word embedding layer in a deep learning network with the weights from a pretrained word embedding, use the word2vec function to extract the layer weights and set the 'Weights' name-value pair of the wordEmbeddingLayer function. The word embedding layer expects columns of word vectors, so you must transpose the output of the word2vec function.

emb = fastTextWordEmbedding;

words = emb.Vocabulary;
dimension = emb.Dimension;
numWords = numel(words);

layer = wordEmbeddingLayer(dimension,numWords,...
    'Weights',word2vec(emb,words)')

layer = 
  WordEmbeddingLayer with properties:

         Name: ''

   Hyperparameters
    Dimension: 300
     NumWords: 999994

   Learnable Parameters
      Weights: [300×999994 single]

  Show all properties

To create the corresponding word encoding from the word embedding, input the word embedding vocabulary to the wordEncoding function as a list of words.

enc = wordEncoding(words)

enc = 
  wordEncoding with properties:

      NumWords: 999994
    Vocabulary: [1×999994 string]

References

[1] Glorot, Xavier, and Yoshua Bengio. "Understanding the Difficulty of Training Deep Feedforward Neural Networks." In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 249–356. Sardinia, Italy: AISTATS, 2010. https://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

[2] He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification." In 2015 IEEE International Conference on Computer Vision (ICCV), 1026–34. Santiago, Chile: IEEE, 2015. https://doi.org/10.1109/ICCV.2015.123

[3] Saxe, Andrew M., James L. McClelland, and Surya Ganguli. "Exact Solutions to the Nonlinear Dynamics of Learning in Deep Linear Neural Networks.” Preprint, submitted February 19, 2014. https://arxiv.org/abs/1312.6120.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

The property OOVMode is always set to "map-to-last" when generating code that depends on third-party deep learning libraries.
The property OOVMode reverts to "map-to-last" if the runtime check is disabled in configuration setting. To enable the runtime check, set RuntimeChecks to true for generating standalone C/C++ code, or set IntegrityChecks to true for generating MEX code. For more information, see coder.config (MATLAB Coder) and coder.MexCodeConfig (MATLAB Coder).

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Usage notes and limitations:

The property OOVMode is always set to "map-to-last".

Version History

Introduced in R2018b

expand all

R2023b: Specify out-of-vocabulary word handling mode

Specify the out-of-vocabulary word handling mode using the OOVMode property. You can specify that the layer maps out-of-vocabulary words to the same embedding vector or throws an error. This property enables support for models that do not support out-of-vocabulary words such as BERT.

wordEmbeddingLayer

Description

Creation

Syntax

Description

Properties

Word Embedding

`Dimension` — Dimension of word embedding
positive integer

`NumWords` — Number of words in model
positive integer

`OOVMode` — Out-of-vocabulary word handling mode
`"map-to-last"` (default) | `"error"`

Parameters and Initialization

`WeightsInitializer` — Function to initialize weights
`'narrow-normal'` (default) | `'glorot'` | `'he'` | `'orthogonal'` | `'zeros'` | `'ones'` | function handle

`Weights` — Layer weights
matrix

Learn Rate and Regularization

`WeightLearnRateFactor` — Learning rate factor for weights
`1` (default) | nonnegative scalar

`WeightL2Factor` — L₂ regularization factor for weights
1 (default) | nonnegative scalar

Layer

`Name` — Layer name
`""` (default) | character vector | string scalar

`NumInputs` — Number of inputs
`1` (default)

`InputNames` — Input names
`{'in'}` (default)

`NumOutputs` — Number of outputs
`1` (default)

`OutputNames` — Output names
`{'out'}` (default)

Examples

Create Word Embedding Layer

Initialize Word Embedding Layer with Pretrained Word Embedding

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2023b: Specify out-of-vocabulary word handling mode

See Also

Topics

wordEmbeddingLayer

Description

Creation

Syntax

Description

Properties

Word Embedding

Dimension — Dimension of word embedding positive integer

NumWords — Number of words in model positive integer

OOVMode — Out-of-vocabulary word handling mode "map-to-last" (default) | "error"

Parameters and Initialization

WeightsInitializer — Function to initialize weights 'narrow-normal' (default) | 'glorot' | 'he' | 'orthogonal' | 'zeros' | 'ones' | function handle

Weights — Layer weights matrix

Learn Rate and Regularization

WeightLearnRateFactor — Learning rate factor for weights 1 (default) | nonnegative scalar

WeightL2Factor — L2 regularization factor for weights 1 (default) | nonnegative scalar

Layer

Name — Layer name "" (default) | character vector | string scalar

NumInputs — Number of inputs 1 (default)

InputNames — Input names {'in'} (default)

NumOutputs — Number of outputs 1 (default)

OutputNames — Output names {'out'} (default)

Examples

Create Word Embedding Layer

Initialize Word Embedding Layer with Pretrained Word Embedding

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Version History

R2023b: Specify out-of-vocabulary word handling mode

See Also

Topics

`Dimension` — Dimension of word embedding
positive integer

`NumWords` — Number of words in model
positive integer

`OOVMode` — Out-of-vocabulary word handling mode
`"map-to-last"` (default) | `"error"`

`WeightsInitializer` — Function to initialize weights
`'narrow-normal'` (default) | `'glorot'` | `'he'` | `'orthogonal'` | `'zeros'` | `'ones'` | function handle

`Weights` — Layer weights
matrix

`WeightLearnRateFactor` — Learning rate factor for weights
`1` (default) | nonnegative scalar

`WeightL2Factor` — L₂ regularization factor for weights
1 (default) | nonnegative scalar

`Name` — Layer name
`""` (default) | character vector | string scalar

`NumInputs` — Number of inputs
`1` (default)

`InputNames` — Input names
`{'in'}` (default)

`NumOutputs` — Number of outputs
`1` (default)

`OutputNames` — Output names
`{'out'}` (default)

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.