# Calculations When Tables Have Both Numeric and Nonnumeric Data

This example shows how to perform calculations on data in tables when they have both numeric and nonnumeric data. After you identify the table variables that contain numeric data, you can access the data in those variables by using either curly braces or dot notation. Then you can perform arithmetic operations or call functions on the numeric data and assign the result back into the table, all in one line of code. You also can use the `rowfun`

function for calculations across the rows of a table and the `varfun`

function for calculations along the variables. If your table has groups of data within it, you can use the `groupsummary`

, `rowfun`

, and `varfun`

functions to perform calculations for each group in the table.

### Read Sample Data into Table

Read data from a CSV (comma-separated values) file, `testScores.csv`

, into a table by using the `readtable`

function. The sample file contains test scores for 10 students who attend two different schools. The output table contains variables that have numeric data and other variables that have text data. One of these variables, `School`

, has a fixed set of values or categories. These categories denote two groups of students within this table. Convert `School`

to a `categorical`

variable.

scores = readtable("testScores.csv","TextType","string"); scores.School = categorical(scores.School)

`scores=`*10×5 table*
LastName School Test1 Test2 Test3
__________ __________ _____ _____ _____
"Jeong" XYZ School 90 87 93
"Collins" XYZ School 87 85 83
"Torres" XYZ School 86 85 88
"Phillips" ABC School 75 80 72
"Ling" ABC School 89 86 87
"Ramirez" ABC School 96 92 98
"Lee" XYZ School 78 75 77
"Walker" ABC School 91 94 92
"Garcia" ABC School 86 83 85
"Chang" XYZ School 79 76 82

### Create Subtable with Numeric Data

One straightforward way to work with the numeric data is to create a subtable that has only the numeric variables. You can create a subtable by indexing into a table using parentheses and specifying rows and variables. The subtable is a new, smaller table that contains only the specified rows and variables from the old table.

For example, create a subtable from `scores`

that has only the test scores. Because the first two variables have nonnumeric data, you can index into this table specifying the other variables.

numericScores = scores(:,3:end)

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82

Another way to specify variables is to use the `vartype`

function to specify them by data type. This function is useful when you have a large table with many variables that have different data types. It returns a subscript that you can use to specify table variables.

`numericVars = vartype("numeric")`

numericVars = table vartype subscript: Select table variables matching the type 'numeric' See Access Data in a Table.

numericScores = scores(:,numericVars)

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
90 87 93
87 85 83
86 85 88
75 80 72
89 86 87
96 92 98
78 75 77
91 94 92
86 83 85
79 76 82

### Calculate When Subtable Has Only Numeric Data

*(Since R2023a)* You can perform operations on a table directly, as long as all its variables have data types that support the operations. For more information, see Direct Calculations on Tables and Timetables.

For example, scale the numeric data so the test scores are on a 25-point scale.

numericScores = numericScores .* 0.25

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
22.5 21.75 23.25
21.75 21.25 20.75
21.5 21.25 22
18.75 20 18
22.25 21.5 21.75
24 23 24.5
19.5 18.75 19.25
22.75 23.5 23
21.5 20.75 21.25
19.75 19 20.5

Before R2023a, you cannot use this syntax. Instead, index into the table using curly braces, or use the `Variables`

affordance to specify all table rows and variables. These syntaxes return the same result as the previous operation and work in all releases.

numericScores{:,:} = numericScores{:,:} .* 0.25 numericScores.Variables = numericScores.Variables .* 0.25

When you use these syntaxes, they extract the table contents and concatenate them into an array, perform the calculation, and assign the results back into the table. The only requirement is that the variables must all have data types that allow them to be concatenated.

With curly braces, you can also specify a subset of rows and variables, as in

`numericScores{1:5,["Test1","Test3"]}`

.With

`Variables`

, you always get all rows and all variables concatenated into an array.

*(Since R2023a)* You can also call many mathematical and statistical functions on a table directly. For example, subtract the minimum value within each table variable from that variable.

numericScores = numericScores - min(numericScores)

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
3.75 3 5.25
3 2.5 2.75
2.75 2.5 4
0 1.25 0
3.5 2.75 3.75
5.25 4.25 6.5
0.75 0 1.25
4 4.75 5
2.75 2 3.25
1 0.25 2.5

Again, before R2023a you cannot use this syntax. Instead, use either of the following syntaxes. They return the same result and work in all releases.

numericScores{:,:} = numericScores{:,:} - min(numericScores{:,:}) numericScores.Variables = numericScores.Variables - min(numericScores.Variables)

### Calculate on One Variable in Any Table

*In all releases*, you can also perform calculations on one variable at a time by using dot notation and variable names. For example, add a correction worth five points to the last set of scores in `Test3`

.

Because the other table variables are unaffected by operations on an individual variable, you can perform this kind of calculation in any table. It does not matter whether the other variables have numeric or nonnumeric data.

numericScores.Test3 = numericScores.Test3 + 5

`numericScores=`*10×3 table*
Test1 Test2 Test3
_____ _____ _____
3.75 3 10.25
3 2.5 7.75
2.75 2.5 9
0 1.25 5
3.5 2.75 8.75
5.25 4.25 11.5
0.75 0 6.25
4 4.75 10
2.75 2 8.25
1 0.25 7.5

### Calculate Across Rows in Full Table

The full table, `scores`

, has numeric and nonnumeric variables. In all releases, use curly-brace indexing or dot notation to perform calculations on specified rows and variables within tables.

For example, find the mean, minimum, and maximum values of the test scores for each student. Calculate these values across each row. Assign them to `scores`

as new table variables.

One simple, useful way is to extract the data into a matrix, call a function on it, and then assign the output to a new table variable. For example, calculate the mean test scores across each row. Then add them to `scores`

in a new table variable, `TestMean`

. Use curly braces to extract the numeric data from `Test1`

, `Test2`

, and `Test3`

into a matrix. To calculate the mean across rows, specify the dimension as `2`

when you call `mean`

.

vars = ["Test1","Test2","Test3"]; scores.TestMean = mean(scores{:,vars},2)

`scores=`*10×6 table*
LastName School Test1 Test2 Test3 TestMean
__________ __________ _____ _____ _____ ________
"Jeong" XYZ School 90 87 93 90
"Collins" XYZ School 87 85 83 85
"Torres" XYZ School 86 85 88 86.333
"Phillips" ABC School 75 80 72 75.667
"Ling" ABC School 89 86 87 87.333
"Ramirez" ABC School 96 92 98 95.333
"Lee" XYZ School 78 75 77 76.667
"Walker" ABC School 91 94 92 92.333
"Garcia" ABC School 86 83 85 84.667
"Chang" XYZ School 79 76 82 79

Another way to perform calculations across rows is to use the `rowfun`

function. You do not need to extract data from the table when using `rowfun`

. Instead, pass the table and a function to apply to the data as input arguments to `rowfun`

. While the syntax is a little more complex, `rowfun`

can be useful when the function that you apply takes multiple input arguments or returns multiple output arguments.

For example, use the `bounds`

function to find the minimum and maximum test scores. The `bounds`

function returns two output arguments, so apply it to `scores`

by using `rowfun`

. The output of `rowfun`

is a new table that has `TestMin`

and `TestMax`

variables. In this case, also specify `"SeparateInputs"`

as `false`

so that values across each row are combined into a vector before being passed to `bounds`

.

minmaxTest = rowfun(@bounds, ... scores, ... "InputVariables",vars, ... "OutputVariableNames",["TestMin","TestMax"], ... "SeparateInputs",false)

`minmaxTest=`*10×2 table*
TestMin TestMax
_______ _______
87 93
83 87
85 88
72 80
86 89
92 98
75 78
91 94
83 86
76 82

Concatenate `scores`

and `minmaxTest`

so that these values are in one table.

scores = [scores minmaxTest]

`scores=`*10×8 table*
LastName School Test1 Test2 Test3 TestMean TestMin TestMax
__________ __________ _____ _____ _____ ________ _______ _______
"Jeong" XYZ School 90 87 93 90 87 93
"Collins" XYZ School 87 85 83 85 83 87
"Torres" XYZ School 86 85 88 86.333 85 88
"Phillips" ABC School 75 80 72 75.667 72 80
"Ling" ABC School 89 86 87 87.333 86 89
"Ramirez" ABC School 96 92 98 95.333 92 98
"Lee" XYZ School 78 75 77 76.667 75 78
"Walker" ABC School 91 94 92 92.333 91 94
"Garcia" ABC School 86 83 85 84.667 83 86
"Chang" XYZ School 79 76 82 79 76 82

### Calculate Along Specified Variables in Full Table

Find the mean score for each test. Calculate these values along the table variables.

The simplest way is to use `mean`

. First use curly braces to extract the numeric data from `Test1`

, `Test2`

, and `Test3`

into a matrix. Then call `mean`

to calculate the mean of each column of the matrix. The output is a numeric vector where each element is the mean of a table variable.

vars = ["Test1","Test2","Test3"]; meanOfEachTest = mean(scores{:,vars})

`meanOfEachTest = `*1×3*
85.7000 84.3000 85.7000

Another way to perform calculations along table variables is to use the `varfun`

function. You do not need to extract data from the table when using `varfun`

. Instead, pass the table and a function to apply to the data as input arguments to `varfun`

.

Calculate the mean scores using `varfun`

. The output is a new table with meaningful names for the table variables.

meanOfEachTest = varfun(@mean, ... scores, ... "InputVariables",vars)

`meanOfEachTest=`*1×3 table*
mean_Test1 mean_Test2 mean_Test3
__________ __________ __________
85.7 84.3 85.7

### Calculate Using Groups of Data Within Full Table

If your table has one or more *grouping variables*, then you can perform calculations on groups of data within the table. You can use the values in a grouping variable to specify the groups that the rows belong to.

For example, the `School`

variable in `scores`

has two values, `ABC School`

and `XYZ School`

. You can think of these two values as categories that denote groups of data in `scores`

. In this case, you can perform calculations by school.

To apply a function and use grouping variables, you can use the `varfun`

function. You can specify a function, such as `mean`

, and then use `varfun`

to apply it to each table variable that you specify. When you also specify grouping variables, `varfun`

applies the function to each group within each table variable.

Calculate the mean score for each test by school.

vars = ["Test1","Test2","Test3"]; meanScoresBySchool = varfun(@mean, ... scores, ... "InputVariables",vars, ... "GroupingVariables","School")

`meanScoresBySchool=`*2×5 table*
School GroupCount mean_Test1 mean_Test2 mean_Test3
__________ __________ __________ __________ __________
ABC School 5 87.4 87 86.8
XYZ School 5 84 81.6 84.6

Starting in R2018a, you also can use the `groupsummary`

function to perform calculations on groups of data in each table variable.

meanScoresBySchool = groupsummary(scores,"School","mean",vars)

`meanScoresBySchool=`*2×5 table*
School GroupCount mean_Test1 mean_Test2 mean_Test3
__________ __________ __________ __________ __________
ABC School 5 87.4 87 86.8
XYZ School 5 84 81.6 84.6

The syntax for `groupsummary`

is a bit simpler. Also, you can use `groupsummary`

to specify multiple methods at once. For example, find both the minimum and maximum scores of each test by school.

minmaxBySchool = groupsummary(scores,"School",["min","max"],vars)

`minmaxBySchool=`*2×8 table*
School GroupCount min_Test1 max_Test1 min_Test2 max_Test2 min_Test3 max_Test3
__________ __________ _________ _________ _________ _________ _________ _________
ABC School 5 75 96 80 94 72 98
XYZ School 5 78 90 75 87 77 93

To use all the predefined methods of `groupsummary`

, specify `"all"`

as the method. Calculate all statistics on the mean test score by school.

allStatsBySchool = groupsummary(scores,"School","all","TestMean")

`allStatsBySchool=`*2×14 table*
School GroupCount mean_TestMean sum_TestMean min_TestMean max_TestMean range_TestMean median_TestMean mode_TestMean var_TestMean std_TestMean nummissing_TestMean nnz_TestMean numunique_TestMean
__________ __________ _____________ ____________ ____________ ____________ ______________ _______________ _____________ ____________ ____________ ___________________ ____________ __________________
ABC School 5 87.067 435.33 75.667 95.333 19.667 87.333 75.667 57.967 7.6136 0 5 5
XYZ School 5 83.4 417 76.667 90 13.333 85 76.667 29.856 5.464 0 5 5

Sometimes you might want to find a particular value in one table variable and then find the corresponding value in another table variable. In such cases use `rowfun`

.

For example, find the student in each school who had the highest mean test score. The attached supporting function, `findNameAtMax`

, returns both the highest score and the name of the student who had that score. Use `rowfun`

to apply `findNameAtMax`

to each group of students. The `rowfun`

function is suitable because `findNameAtMax`

has multiple input arguments (last names and test scores) and also returns multiple output arguments.

maxScoresBySchool = rowfun(@findNameAtMax, ... scores, ... "InputVariables",["LastName","TestMean"], ... "GroupingVariables","School", ... "OutputVariableNames",["max_TestMean","LastName"])

`maxScoresBySchool=`*2×4 table*
School GroupCount max_TestMean LastName
__________ __________ ____________ _________
ABC School 5 95.333 "Ramirez"
XYZ School 5 90 "Jeong"

### Supporting Function

function [maxValue,lastName] = findNameAtMax(names,values) % Return maximum value and the last name % from the row at which the maximum value occurred [maxValue,maxIndex] = max(values); lastName = names(maxIndex); end

## See Also

`table`

| `rowfun`

| `varfun`

| `groupsummary`

| `readtable`

| `vartype`