Replacing values in a table

22 views (last 30 days)
I have a table created from a database query using T = select(conn,selectquery); Missing database entries for numeric integer types are read as -2147483648 for long integer types (int32) and -32768 (int16) for short integer types. I would like to replace these with NaN throughout the entire table. If I use I=ismissing(T,[-2147483648 -32768]) I returns logical array with all the table indices where there are missing values as 1=true, in other words, this works. However, if I try to use T = standardizeMissing(T,[-32768 -2147483648]) or any variation, the values are not replaced with NaN as the command would suggest should happen.
  3 Comments
Bruce MacWilliams
Bruce MacWilliams on 11 Apr 2018
Thanks for the reply. Your suggestion was my initial try, and what I was referring to when I mentioned "any variation" of the standardizeMissing() command. So, in short, using the "one-at-a-time" approach does not solve this issue.
dpb
dpb on 11 Apr 2018
OK, I see Peter already identified what was going to be my second thought (that didn't come to me until after the posting, I'll admit)

Sign in to comment.

Accepted Answer

Peter Perkins
Peter Perkins on 11 Apr 2018
Are these actual integer types, and not just doubles with integer values in them? Because you can't put NaNs into int16 and int32. standardizeMissing completely ignores integer types:
>> T = table([1;-32768;3;4],[5;6;-2147483648;8])
T =
4×2 table
Var1 Var2
______ ___________
1 5
-32768 6
3 -2.1475e+09
4 8
>> T = standardizeMissing(T,[-32768 -2147483648])
T =
4×2 table
Var1 Var2
____ ____
1 5
NaN 6
3 NaN
4 8
>> T = table(int16([1;-32768;3;4]),int32([5;6;-2147483648;8]))
T =
4×2 table
Var1 Var2
______ ___________
1 5
-32768 6
3 -2147483648
4 8
>> T = standardizeMissing(T,[-32768 -2147483648])
T =
4×2 table
Var1 Var2
______ ___________
1 5
-32768 6
3 -2147483648
4 8
  2 Comments
Bruce MacWilliams
Bruce MacWilliams on 11 Apr 2018
Yes, that is the case, they are int data types, and I was wondering about that but the documentation seems to suggest that int types are acceptable:
Data Types: double | single | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical | char | string | cell | table | timetable | categorical | datetime | duration
Is there a suggested work around? Convert the columns to a different data type first?
Peter Perkins
Peter Perkins on 13 Apr 2018
I'll make a note to tweak the doc.
There is no "work-around" per se because integer types just don't have the notion of a NaN value. The thing to do is to convert to either 1) floating point, maybe single if memory is an issue 2) categorical, if these are really categorical data (categorical has undefined analogous to NaN)

Sign in to comment.

More Answers (1)

David Fletcher
David Fletcher on 11 Apr 2018
tab =
10×2 table
C1 C2
__ ___________
1 -32768
2 9
3 3
4 2
5 1
6 -2.1475e+09
7 7
8 9
9 1
10 2
standardizeMissing(tab,{-32768,-2147483648},'DataVariables',{'C2'})
ans =
10×2 table
C1 C2
__ ___
1 NaN
2 9
3 3
4 2
5 1
6 NaN
7 7
8 9
9 1
10 2
  2 Comments
David Fletcher
David Fletcher on 11 Apr 2018
See the answer below - these weren't expressly cast to ints so would have been doubles. I still have a nasty habit of typing what I assume is an int and forgetting that Matlab defaults to a double type.
Bruce MacWilliams
Bruce MacWilliams on 11 Apr 2018
David: Thanks for your assistance, when I first saw this your post I tried it, but it did not work and I'm pretty sure I had already attempted the 'DataVariables' argument. Conversion of the table column/variable from int to double solves the problem, so that is what I will do.

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!