Trouble using strcmp in Matlab

4 views (last 30 days)
BA
BA on 20 Jul 2022
Commented: Voss on 20 Jul 2022
Background
I have two datasets and I want to write a function that takes only the data from the two datasets that is in common between the two.
Basically, I currently only have data for a few people and I will be getting the data for the other users later. The issue is that I might not get data for all the users so I need my script to reflect that reality so I need to make the script for now take only data from users that I have data available for.
In addition, there are just random users that show up in the dataset that can't be in the final dataset so that is why I'm going out of my way right now to address this issue. The dataset will get very large as we get more users so that's why I'm doing this.
Introducing... the two datasets I have:
"Users.xlsx"
This dataset has information on ALL the users possible.
"Task.xlsx"
This dataset only has information on a few of the users.
Code
This is the code that I've been trying so far and I haven't been able to get it to work:
root_folder = %put file path in ''
users = readtable([root_folder filesep 'Users.xlsx']);
task = readtable([root_folder filesep 'Task.xlsx']);
for sub=1:length(users.UserId)
Opt_Sub = task(strcmp(task.UserId,users.UserId{sub}),:);
end
Each time I run this command, I get an empty table. What I'm basically trying to do is use strcmp to compare between the two datatables and find out which columns are 1s and 0s (with 1 showing up in both columns). If something is in common in both columns, then I use task(...) to get all the common data points.
If anyone has a solution, please let me know. Thanks in advance.

Accepted Answer

Voss
Voss on 20 Jul 2022
Assigning to Opt_Sub on each iteration of the for loop overwrites whatever was assigned to Opt_Sub on previous iterations of the loop.
If, instead, you make Opt_Sub a cell array and assign to one cell of it each time, then the code seems to do what you intend:
root_folder = '.';
users = readtable(fullfile(root_folder,'Users.xlsx'));
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
task = readtable(fullfile(root_folder,'Task.xlsx'));
N_users = size(users,1);
Opt_Sub = cell(N_users,1);
for sub = 1:N_users
Opt_Sub{sub} = task(strcmp(task.UserId,users.UserId{sub}),:);
end
Opt_Sub
Opt_Sub = 12×1 cell array
{29×6 table} {29×6 table} {32×6 table} { 0×6 table} { 0×6 table} { 0×6 table} { 0×6 table} { 0×6 table} { 0×6 table} { 0×6 table} { 0×6 table} { 0×6 table}
(You were seeing only the table from the last iteration, which is empty.)
However, for your purpose it may be more convenient to split the task table according to the unique UserId values it contains (i.e., maybe you don't need to use the users table at all for this):
[uid,~,jj] = unique(task.UserId);
N_users = numel(uid);
Opt_Sub = cell(N_users,1);
for ii = 1:N_users
Opt_Sub{ii} = task(jj == ii,:);
end
Opt_Sub
Opt_Sub = 5×1 cell array
{32×6 table} { 1×6 table} { 1×6 table} {29×6 table} {29×6 table}
(Note that the Opt_Sub from this method contains two 1x6 tables, corresponding to the two task.UserID values that are not in users.UserID.)
I don't know which way makes more sense for what you're ultimately trying to do.
  2 Comments
BA
BA on 20 Jul 2022
Hey, thanks a lot for the solution, I really appreciate it!
I tried running the code you sent me, but for some reason, each time, it gives me a table that has multiple tables within each cell. Do you know if there is a way to fix that?
Also, about this code:
[uid,~,jj] = unique(task.UserId);
N_users = numel(uid);
Opt_Sub = cell(N_users,1);
for ii = 1:N_users
Opt_Sub{ii} = task(jj == ii,:);
end
Opt_Sub
I haven't tried this yet but it might not work because its typically not just isolated instances where we will see a UserID pop up. They might show up for 3-4 days, instead of just one day like its shown in the dataset. I'm not sure if I'm reading the code right, but I believe that is what this code is doing. So the first way is likely better but I haven't been able to get any results out of that way.
Voss
Voss on 20 Jul 2022
You're welcome!
Both of those approaches give you a cell array of tables, where each cell contains one table corresponding to one UserID. If I understand what you're trying to do, you'll end up with multiple tables one way or another, right? How would you prefer those tables to be stored? They don't have to be in a cell array, but to me that seemed like the most natural way.
Neither of those approaches depends on any days or dates. They'll each do whatever they do, regardless of whether the users show up for multiple days or one day. Try to run the second approach and see if what you get makes sense.

Sign in to comment.

More Answers (0)

Categories

Find more on Startup and Shutdown in Help Center and File Exchange

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!