Main Content

matlab.io.RowFilter

Selectively import rows of interest

Description

A matlab.io.RowFilter object enables you to specify how MATLAB® imports tabular data. The object contains properties that control the data import process by specifying conditions that must be satisfied for rows to be included. You can create a matlab.io.RowFilter object by using the rowfilter function.

Creation

Description

example

rf = rowfilter(varnames) filters rows within your data set using the variables specified in varnames. You can use the returned matlab.io.RowFilter object rf as an input to parquetread and parquetDatastore to filter data before importing.

Each variable name within the input argument becomes an individual property of the RowFilter object. Use tab-completion for efficient access to available variable names. Use relational operators to express conditions on these variables to target your rows of interest. These relational operators are supported: <, <=, >, >=, ==, and ~=. To express multiple filtering constraints, combine RowFilter objects using the &, |, or ~ operators. The ~ operator can also be used with individual constraints.

example

rf = rowfilter(info) uses the variable names from the VariableNames property of a ParquetInfo object.

example

rf = rowfilter(pds) uses the variable names from the VariableNames property of a ParquetDatastore object.

Input Arguments

expand all

Names of variables used to filter rows, specified as a string scalar, character vector, string array, or cell array of character vectors.

Example: ["OutageTime" "Region" "Cause"]

ParquetInfo object used to filter rows, specified as a ParquetInfo object. The variables identified by the VariableNames property of the ParquetInfo object are used to filter rows.

ParquetDatastore object used to filter rows, specified as a ParquetDatastore object. The variables identified by the VariableNames property of the ParquetDatastore object are used to filter rows.

Examples

collapse all

Import a subset of data by specifying variables and rows to import by using a row filter.

To import a subset of the outages.parquet file, create a filter to import only the OutageTime, Region, and Cause variables. Then, refine the filter to import only rows with values that meet certain conditions.

rf = rowfilter(["OutageTime" "Region" "Cause"]);
rf2 = (rf.OutageTime > datetime("2013-02-01")) & (rf.Region == "NorthEast") & (rf.Cause == "winter storm");
d = parquetread("outages.parquet",RowFilter=rf2,SelectedVariableNames=["OutageTime" "Region" "Cause"])
d=6×3 table
         OutageTime           Region           Cause     
    ____________________    ___________    ______________

    09-Feb-2013 00:55:00    "NorthEast"    "winter storm"
    13-Feb-2013 01:44:00    "NorthEast"    "winter storm"
    25-Dec-2013 11:24:00    "NorthEast"    "winter storm"
    30-Dec-2013 11:40:00    "NorthEast"    "winter storm"
    22-Feb-2013 02:17:00    "NorthEast"    "winter storm"
    23-Feb-2013 01:53:00    "NorthEast"    "winter storm"

The resulting subset of filtered data contains only the 6 rows that meet the filter conditions and the 3 specified variables.

Import the rows of interest from a data set using a row filter and ParquetInfo object.

Create a ParquetInfo object from the outages.parquet file. Create a row filter using the ParquetInfo object. Then, use the row filter to import rows with Loss values less than 100.

info = parquetinfo("outages.parquet");
rf = rowfilter(info);
data = parquetread("outages.parquet",RowFilter=rf.Loss<100)
data=321×6 table
      Region            OutageTime          Loss     Customers       RestorationTime             Cause       
    ___________    ____________________    ______    __________    ____________________    __________________

    "West"         18-Jun-2003 02:49:00         0             0    18-Jun-2003 10:54:00    "attack"          
    "SouthEast"    05-Sep-2004 17:48:00    73.387         36073    05-Sep-2004 20:46:00    "equipment fault" 
    "SouthEast"    01-Sep-2002 18:22:00    95.917         36759    01-Sep-2002 19:12:00    "severe storm"    
    "NorthEast"    18-Sep-2004 05:54:00         0             0                     NaT    "equipment fault" 
    "SouthEast"    12-Dec-2002 18:08:00    46.918    1.0698e+05    14-Dec-2002 18:43:00    "winter storm"    
    "West"         16-Dec-2002 13:43:00    70.752    4.8193e+05    19-Dec-2002 09:38:00    "winter storm"    
    "SouthEast"    24-Feb-2003 06:13:00         0             0    24-Feb-2003 21:18:00    "attack"          
    "SouthEast"    07-May-2005 00:07:00     65.95         47140    07-May-2005 09:26:00    "thunder storm"   
    "MidWest"      25-Jun-2003 14:03:00    77.432    1.2328e+05    28-Jun-2003 07:02:00    "thunder storm"   
    "SouthEast"    14-Jul-2002 21:32:00     90.83         60133    14-Jul-2002 23:53:00    "thunder storm"   
    "MidWest"      05-Mar-2002 17:53:00    96.563    2.8666e+05    10-Mar-2002 14:41:00    "wind"            
    "MidWest"      26-Sep-2004 01:14:00    58.656    1.2288e+05    01-Oct-2004 01:36:00    "wind"            
    "SouthEast"    21-Jan-2006 21:27:00    19.513         10994    21-Jan-2006 23:25:00    "equipment fault" 
    "MidWest"      19-Jan-2004 12:37:00     16.03        2.2296    20-Jan-2004 02:50:00    "equipment fault" 
    "NorthEast"    14-Jan-2004 20:01:00     87.08         14979    16-Jan-2004 00:52:00    "energy emergency"
    "NorthEast"    04-Feb-2005 00:53:00    32.061         46182    09-Feb-2005 02:42:00    "winter storm"    
      ⋮

The filtered data subset contains 321 rows and 6 variables.

Import the rows of interest from a data set using a row filter and ParquetDatastore object.

Create a ParquetDatastore object from the outages.parquet file. Create a row filter using the ParquetDatastore object. Then, use the row filter to select rows with Customer values greater than 200,000 and Region values of "MidWest".

pds = parquetDatastore("outages.parquet");
rf = rowfilter(pds);
pds.RowFilter = rf.Customers > 2e6 & rf.Region == "MidWest";
data = readall(pds)
data=3×6 table
     Region           OutageTime          Loss     Customers       RestorationTime           Cause     
    _________    ____________________    ______    __________    ____________________    ______________

    "MidWest"    10-Dec-2002 10:45:00     14493    3.0879e+06    11-Dec-2002 18:06:00    "unknown"     
    "MidWest"    16-Jul-2006 00:05:00    1817.9     3.295e+06    27-Jul-2006 14:42:00    "severe storm"
    "MidWest"    07-Sep-2008 23:35:00       NaN     3.972e+06    19-Sep-2008 17:19:00    "severe storm"

The filtered data subset contains 3 rows and 6 variables.

Version History

Introduced in R2022a