Apache Parquet Data Type Mappings
MATLAB® represents column-oriented data with tables and timetables. Each variable in a table or timetable can have a different data type and any number of columns. Column vectors are the most common shape of table and timetable variables.
The Apache™ Parquet file format is used for column-oriented heterogeneous data. Similar to MATLAB tables and timetables, each of the columns in a Parquet file can have different data types.
Despite their similarity, the permitted data types in MATLAB tables and timetables do not always map perfectly to the permitted data types in Parquet files. In some cases, it is necessary for MATLAB to perform data type conversions to retain information in the data (such as missing values). This conversion can sometimes result in a loss of precision in the data.
In general, MATLAB tables and timetables have these behaviors when they are converted to Parquet files:
Table properties set on the original table are not saved.
Table row names or timetable row times are converted into a new table variable before being written.
When reading a variable name from a Parquet file, invalid table variable names are converted to valid table variable names.
The following tables summarize the representable data types in MATLAB tables and timetables, as well as how those variables are represented in Parquet files. These data type mappings can go in both directions (MATLAB → Parquet and Parquet → MATLAB), unless otherwise noted. Parquet files use a small number of primitive (or physical) data types. The logical types extend the physical types by specifying how they should be interpreted. Parquet data types not covered here are not supported for reading from or writing to Parquet files (JSON, BSON, binary, and so on).
Numeric Data Types
MATLAB Table or Timetable Variable Type | Apache Parquet Data Type | Notes | |
---|---|---|---|
Physical Type | Logical Type | ||
|
|
| MATLAB converts any missing floating-point numbers in a Parquet file into
|
|
|
| |
|
|
| When reading a Parquet file, if an array with integral type
contains missing values, then the array is converted into the MATLAB
For 64-bit integers, this
conversion can result in truncation of values that are larger in magnitude than
|
|
| ||
|
| ||
|
| ||
|
| ||
|
| ||
|
|
| |
|
| ||
|
|
| When reading a Parquet file, if an array with |
Text Data Types
MATLAB Table or Timetable Variable Type | Apache Parquet Data Type | Notes | |
---|---|---|---|
Physical Type | Logical Type | ||
|
|
|
|
| |||
| |||
|
Date and Time Data Types
MATLAB Table or Timetable Variable Type | Apache Parquet Data Type | Notes | |
---|---|---|---|
Physical Type | Logical Type | ||
|
|
|
MATLAB datetime arrays written to a Parquet file use
|
|
| ||
| |||
|
|
|
MATLAB duration arrays written to a Parquet file use
|
|
|