categorical

Array that contains values assigned to categories

Description

categorical is a data type that assigns values to a finite set of discrete categories, such as High, Med, and Low. These categories can have a mathematical ordering that you specify, such as High > Med > Low, but it is not required. A categorical array provides efficient storage and convenient manipulation of nonnumeric data, while also maintaining meaningful names for the values. A common use of categorical arrays is to specify groups of rows in a table.

Creation

Description

example

B = categorical(A) creates a categorical array from the array A. The categories of B are the sorted unique values from A.

example

B = categorical(A,valueset) creates one category for each value in valueset. The categories of B are in the same order as the values of valueset.

You can use valueset to include categories for values not present in A. Conversely, if A contains any values not present in valueset, then the corresponding elements of B are undefined.

example

B = categorical(A,valueset,catnames) names the categories in B by matching the category values in valueset with the names in catnames.

example

B = categorical(A,___,Name,Value) creates a categorical array with additional options specified by one or more Name,Value pair arguments. You can include any of the input arguments in previous syntaxes.

For example, to indicate that the categories have a mathematical ordering, specify 'Ordinal',true.

Input Arguments

expand all

Input array, specified as a numeric array, logical array, categorical array, datetime array, duration array, string array, or cell array of character vectors.

categorical removes leading and trailing spaces from input values that are strings or character vectors.

If A contains missing values, then the corresponding element of B is undefined and displays as <undefined>. The categorical function converts the following values to undefined categorical values:

• NaN in numeric and duration arrays

• The missing string (<missing>) or the empty string ("") in string arrays

• The empty character vector ('') in cell arrays of character vectors

• NaT in datetime arrays

• Undefined values (<undefined>) in categorical arrays

B does not have a category for undefined values. To create an explicit category for missing or undefined values, you must include the desired category name in catnames, and a missing value as the corresponding value in valueset.

A also can be an array of objects with the following class methods:

• unique

• eq

Categories, specified as a vector of unique values. The data type of valueset and the data type of A must be the same, except when A is a string array. In that case, valueset either can be a string array or a cell array of character vectors.

categorical removes leading and trailing spaces from elements of valueset that are strings or character vectors.

Category names, specified as a cell array of character vectors or a string array. If you do not specify the catnames input argument, then categorical uses the values in valueset as category names.

To merge multiple distinct values in A into a single category in B, include duplicate names corresponding to those values.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Ordinal',true specifies that the categories have a mathematical ordering.

Ordinal variable indicator, specified as the comma-separated pair consisting of 'Ordinal' and either false (0) or true (1).

 false (0) categorical creates a categorical array that is not ordinal, which is the default behavior.The categories of B have no mathematical ordering. Therefore, you can compare the values in B for equality only. You cannot compare the values using any other relational operator. true (1) categorical creates an ordinal categorical array.The categories of B have a mathematical ordering, such that the first category specified is the smallest and the last category is the largest. You can compare the values in B using relational operators, such as less than and greater than, in addition to comparing the values for equality. You also can use the min and max functions on an ordinal categorical array.

Protected categories indicator, specified as the comma-separated pair consisting of 'Protected' and either false (0) or true (1). The categories of ordinal categorical arrays are always protected. The default value is true when you specify 'Ordinal',true. Otherwise, the value is false.

 false (0) When you assign new values to B, the categories update automatically. Therefore, you can combine (nonordinal) categorical arrays that have different categories. The categories can update accordingly to include the categories from both arrays. true (1) When you assign new values to B, the values must belong to one of the existing categories. Therefore, you can only combine arrays that have the same categories. To add new categories to B, you must use the function addcats.

Examples

collapse all

Create a categorical array that has weather station labels. Add it to a table of temperature readings. Then use the categories to select temperature readings by station.

First, create arrays containing temperature readings, dates, and station labels.

Temps = [58; 72; 56; 90; 76];
Dates = {'2017-04-17';'2017-04-18';'2017-04-30';'2017-05-01';'2017-04-27'};
Stations = {'S1';'S2';'S1';'S3';'S2'};

Convert Stations to a categorical array.

Stations = categorical(Stations)
Stations = 5x1 categorical
S1
S2
S1
S3
S2

Display the categories. The three stations labels are categories.

categories(Stations)
ans = 3x1 cell
{'S1'}
{'S2'}
{'S3'}

Create a table that contains the temperatures, dates, and station labels.

T = table(Temps,Dates,Stations)
T=5×3 table
Temps        Dates         Stations
_____    ______________    ________

58      {'2017-04-17'}       S1
72      {'2017-04-18'}       S2
56      {'2017-04-30'}       S1
90      {'2017-05-01'}       S3
76      {'2017-04-27'}       S2

Display the readings taken from station S2. You can use the == operator to find the values of Station that equal S2. Then use logical indexing to select the table rows that have data from station S2.

TF = (T.Stations == 'S2');
T(TF,:)
ans=2×3 table
Temps        Dates         Stations
_____    ______________    ________

72      {'2017-04-18'}       S2
76      {'2017-04-27'}       S2

Convert the cell array of character vectors A to a categorical array. Specify a list of categories that includes values that are not present in A.

Create a cell array of character vectors.

A = {'republican' 'democrat'; 'democrat' 'democrat'; 'democrat' 'republican'};

Convert A to a categorical array. Add a category for independent.

valueset = {'democrat' 'republican' 'independent'};
B = categorical(A,valueset)
B = 3x2 categorical
republican      democrat
democrat        democrat
democrat        republican

Display the categories of B.

categories(B)
ans = 3x1 cell
{'democrat'   }
{'republican' }
{'independent'}

Create a numeric array.

A = [1 3 2; 2 1 3; 3 1 2]
A = 3×3

1     3     2
2     1     3
3     1     2

Convert A to categorical array B and specify category names.

B = categorical(A,[1 2 3],{'red' 'green' 'blue'})
B = 3x3 categorical
red        blue      green
green      red       blue
blue       red       green

Display the categories of B.

categories(B)
ans = 3x1 cell
{'red'  }
{'green'}
{'blue' }

B is not an ordinal categorical array. Therefore, you can compare the values in B only using the equality operators, == and ~=.

Find the elements that belong to the category 'red'. Access those elements using logical indexing.

TF = (B == 'red');
B(TF)
ans = 3x1 categorical
red
red
red

Create a 5-by-2 numeric array.

A = [3 2;3 3;3 2;2 1;3 2]
A = 5×2

3     2
3     3
3     2
2     1
3     2

Convert A to an ordinal categorical array where 1, 2, and 3 represent categories child, adult, and senior respectively.

valueset = [1:3];

B = categorical(A,valueset,catnames,'Ordinal',true)
B = 5x2 categorical
senior      senior

Since B is ordinal, the categories of B have a mathematical ordering, child < adult < senior.

Starting in R2017a, you can create string arrays using double quotes. Also, a string array can have missing values, displayed as <missing>, without quotation marks.

str = ["plane","jet","plane","helicopter",missing,"jet"]
str = 1x6 string
"plane"    "jet"    "plane"    "helicopter"    <missing>    "jet"

Convert string array str to a categorical array. The categorical function converts missing strings to undefined categorical values, displayed as <undefined>.

C = categorical(str)
C = 1x6 categorical
plane      jet      plane      helicopter      <undefined>      jet

Use the discretize function (instead of categorical) to bin 100 random numbers into three categories.

x = rand(100,1);
y = discretize(x,[0 .25 .75 1],'categorical',{'small','medium','large'});
summary(y)
small       22
medium      46
large       32

Tips

• For a list of functions that accept or return categorical arrays, see Categorical Arrays.

• If the input array has numeric, datetime, or duration values that are too close together, then the categorical function truncates them to duplicate values. For example, categorical([1 1.00001]) truncates the second element of the input array. To create categories from numeric data, use the discretize function.

Alternatives

You also can group numeric data into categories using discretize.

Extended Capabilities

Introduced in R2013b