Main Content

onehotencode

Encode data labels into one-hot vectors

Since R2020b

    Description

    B = onehotencode(A,featureDim) encodes data labels in categorical array A into a one-hot encoded array B. The function replaces each element of A with a numeric vector of length equal to the number of unique classes in A along the dimension specified by featureDim. The vector contains a 1 in the position corresponding to the class of the label in A, and a 0 in every other position. Any <undefined> values are encoded to NaN values.

    example

    tblB = onehotencode(tblA) encodes categorical data labels in table tblA into a table of one-hot encoded numeric values. The function replaces the single variable of tblA with as many variables as the number of unique classes in tblA. Each row in tblB contains a 1 in the variable corresponding to the class of the label in tlbA, and a 0 in all other variables.

    example

    ___ = onehotencode(___,typename) encodes the labels into numeric values of data type typename. Use this syntax with any of the input and output arguments in previous syntaxes.

    example

    ___ = onehotencode(___,'ClassNames',classes) also specifies the names of the classes to use for encoding. Use this syntax when A or tblA does not contain categorical values, when you want to exclude any class labels from being encoded, or when you want to encode the vector elements in a specific order. Any label in A or tblA of a class that does not exist in classes is encoded to a vector of NaN values.

    example

    Examples

    collapse all

    Encode a categorical vector of class labels into one-hot vectors representing the labels.

    Create a column vector of labels, where each row of the vector represents a single observation. Convert the labels to a categorical array.

    labels = ["red"; "blue"; "red"; "green"; "yellow"; "blue"];
    labels = categorical(labels);

    View the order of the categories.

    categories(labels)
    ans = 4x1 cell
        {'blue'  }
        {'green' }
        {'red'   }
        {'yellow'}
    
    

    Encode the labels into one-hot vectors. Expand the labels into vectors in the second dimension to encode the classes.

    labels = onehotencode(labels,2)
    labels = 6×4
    
         0     0     1     0
         1     0     0     0
         0     0     1     0
         0     1     0     0
         0     0     0     1
         1     0     0     0
    
    

    Each observation in labels is now a row vector with a 1 in the position corresponding to the category of the class label and 0 in all other positions. The function encodes the labels in the same order as the categories, such that a 1 in position 1 represents the first category in the list, in this case, 'blue'.

    One-hot encode a table of categorical values.

    Create a table of categorical data labels. Each row in the table holds a single observation.

    color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"];
    color = categorical(color);
    color = table (color);

    One-hot encode the table of class labels.

    color = onehotencode(color)
    color=6×4 table
        blue    green    red    yellow
        ____    _____    ___    ______
    
         1        0       0       0   
         0        0       1       0   
         1        0       0       0   
         0        1       0       0   
         0        0       0       1   
         0        0       1       0   
    
    

    Each column of the table represents a class. The function encodes the data labels with a 1 in the column of the corresponding class, and 0 everywhere else.

    If not all classes in the data are relevant, encode the data labels using only a subset of the classes.

    Create a row vector of data labels, where each column of the vector represents a single observation

    pets = ["dog" "fish" "cat" "dog" "cat" "bird"];

    Define the list of classes to encode. These classes are a subset of those present in the observations.

    animalClasses = ["bird"; "cat"; "dog"];

    One-hot encode the observations into the first dimension. Specify the classes to encode.

    encPets = onehotencode(pets,1,"ClassNames",animalClasses)
    encPets = 3×6
    
         0   NaN     0     0     0     1
         0   NaN     1     0     1     0
         1   NaN     0     1     0     0
    
    

    Observations of a class not present in the list of classes to encode are encoded to a vector of NaN values.

    Use onehotencode to encode a matrix of class labels, such as a semantic segmentation of an image.

    Define a simple 15-by-15 pixel segmentation matrix of class labels.

    A = "blue";
    B = "green";
    C = "black";
    
    A = repmat(A,8,15);
    B = repmat(B,7,5);
    C = repmat(C,7,5);
    
    seg = [A;B C B];

    Convert the segmentation matrix into a categorical array.

    seg = categorical(seg);

    One-hot encode the segmentation matrix into an array of type single. Expand the encoded labels into the third dimension.

    encSeg = onehotencode(seg,3,"single");

    Check the size of the encoded segmentation.

    size(encSeg)
    ans = 1×3
    
        15    15     3
    
    

    The three possible classes of the pixels in the segmentation matrix are encoded as vectors in the third dimension.

    If your data is a table that contains several types of class variables, you can encode each variable separately.

    Create a table of observations of several types of categorical data.

    color = ["blue"; "red"; "blue"; "green"; "yellow"; "red"];
    color = categorical(color);
    
    pets = ["dog"; "fish"; "cat"; "dog"; "cat"; "bird"];
    pets = categorical(pets);
    
    location = ["USA"; "CAN"; "CAN"; "USA"; "AUS"; "USA"];
    location = categorical(location);
    
    data = table(color,pets,location)
    data=6×3 table
        color     pets    location
        ______    ____    ________
    
        blue      dog       USA   
        red       fish      CAN   
        blue      cat       CAN   
        green     dog       USA   
        yellow    cat       AUS   
        red       bird      USA   
    
    

    Use a for-loop to one-hot encode each table variable and append it to a new table containing the encoded data.

    encData = table();
    
    for i=1:width(data)
     encData = [encData onehotencode(data(:,i))];
    end
    
    encData
    encData=6×11 table
        blue    green    red    yellow    bird    cat    dog    fish    AUS    CAN    USA
        ____    _____    ___    ______    ____    ___    ___    ____    ___    ___    ___
    
         1        0       0       0        0       0      1      0       0      0      1 
         0        0       1       0        0       0      0      1       0      1      0 
         1        0       0       0        0       1      0      0       0      1      0 
         0        1       0       0        0       0      1      0       0      0      1 
         0        0       0       1        0       1      0      0       1      0      0 
         0        0       1       0        1       0      0      0       0      0      1 
    
    

    Each row of encData encodes the three different categorical classes for each observation.

    Input Arguments

    collapse all

    Array of data labels to encode, specified as a categorical array, a numeric array, or a string array.

    • If A is a categorical array, the elements of the one-hot encoded vectors match the same order in categories(A).

    • If A is not a categorical array, you must specify the classes to encode using the 'ClassNames' name-value argument. The function encodes the vectors in the order that the classes appear in classes.

    • If A contains undefined values or values not present in classes, the function encodes those values as a vector of NaN values. typename must be 'double' or 'single'.

    Data Types: categorical | numeric | string

    Table of data labels to encode, specified as a table. The table must contain a single variable and one row for each observation. Each entry must contain a categorical scalar, a numeric scalar, or a string scalar.

    • If tblA contains categorical values, the elements of the one-hot encoded vectors match the order of the categories; for example, the same order as categories(tbl(1,n)).

    • If tblA does not contain categorical values, you must specify the classes to encode using the 'ClassNames' name-value argument. The function encodes the vectors in the order that the classes appear in classes.

    • If tblA contains undefined values or values not present in classes, the function encodes those values as NaN values. typename must be 'double' or 'single'.

    Data Types: table

    Dimension to expand to encode the labels, specified as a positive integer.

    featureDim must specify a singleton dimension of A, or be larger than n where n is the number of dimensions of A.

    Data type of the encoded labels, specified as a character vector or a string scalar.

    • If the classification label input is a categorical array, a numeric array, or a string array, then the encoded labels are returned as an array of data type typename.

    • If the classification label input is a table, then the encoded labels are returned as a table where each entry has data type typename.

    Valid values of typename are floating point, signed and unsigned integer, and logical types.

    Example: 'int64'

    Data Types: char | string

    Classes to encode, specified as a cell array of character vectors, a string vector, a numeric vector, or a two-dimensional character array.

    • If the input A or tblA does not contain categorical values, then you must specify classes. You can also use the classes argument to exclude any class labels from being encoded, or to encode the vector elements in a specific order.

    • If A or tblA contains undefined values or values not present in classes, the function encodes those values to a vector of NaN values. typename must be 'double' or 'single'.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | string | cell

    Output Arguments

    collapse all

    Encoded labels, returned as a numeric array.

    Encoded labels, returned as a table.

    Each row of tblB contains the one-hot encoded label for a single observation, in the same order as in tblA. Each row contains a 1 in the variable corresponding to the class of the label in tlbA, and a 0 in all other variables.

    Version History

    Introduced in R2020b