How do I create 2 separate matrices which are pair-matched to each other in the corresponding row from 2 original CSV files?
1 view (last 30 days)
Show older comments
I currently have two datasets of samples in two separate CSV files which I want to run a pair-matched network medical analysis on. Each of these have the same medical variables per dataset but have different sample numbers. For example, my first dataset contains 22 samples of ill patients with 18 medical variables. My second dataset contains 109 samples of recovered patients with 18 medical identical variables.
I want to be able to pair-match these samples yet keep them intact in their individual datasets so I can run individual network graph analyses on them. So, ideally my end result would be: my first dataset would contain 22 samples (ill), and my second dataset would contain 22 samples (recovered). The sample in row A of dataset 1 and dataset 2 would be matched by a variable (e.g. variable X are the same), the sample in row B of dataset 1 and dataset 2 would be matched as well by that same variable and so on.
I’ve written out the logic of code, but I’m having trouble arriving at the actual code (novice issues):
- Import ill dataset (22 samples) as a matrix of 22x18, name dataset “A”.
- Import recovered dataset (109 samples) as a matrix of 109x18, name dataset “B”.
- The variable to be pair matched is column 6.
- Select ID1 (row 1 - let’s assume each row is identified by an individual identifier tag for this logical argument) of A and compare column 6 variable against ID1-ID109 column 6 variable of B.
- If A-ID1-6 = B-IDX-6 with only a single return (where IDX is the row returned with the matching column 6 variable), then replace A-ID1 in row 1 of C (new NaN 22 matrix for pair-matched ill) and replace B-IDX in row 1 of D (new NaN 22 matrix for pair-matched recovered), removing B-IDX from the original matrix.
- If A-ID1-6 = B-IDX-6 with multiple returns, then replace A-ID1 in row 1 of C and randomly select one of the returns in B-IDX and replace in row 1 of D, removing B-IDX from the original matrix.
- If A-ID1-6 = B-IDX-6 with no returns, then select closest B-IDX-6 within +/-0.5 of the column 6 variable, replacing A-ID1-6 in row 1 of C and replacing B-IDX in row 1 of D, removing B-IDX from the original matrix. If there is no +/-0.5 match for variable in column 6, return both with a NaN in C and D and remove both from original matrices.
- Loop for A-ID2 to A-ID22 until C and D have 22 rows each which are pair-matched.
- These new data tables will then be individually used to generate network maps.
I do apologize for the lengthy explanation. It is frustrating as I can see the logic but can’t outright turn it into code that works. Please don’t hesitate to ask if there is lack of clarity in any area and thank you in advance for anyone who can help me out.
4 Comments
Accepted Answer
Bob Thompson
on 28 Aug 2019
This is a first cut at how I would set up the loop and arguments.
for i = 1:size(A,1)
tmp = B(B(:,6)==A(i,6),:);
C(i,:) = A(i,:);
if isempty(tmp) % No matches
tmp = B(B(:,6)<=A(i,6)+0.5 & B(:,6)>=A(i,6)-0.5,:); % Expand range of check
end
if size(tmp,1)==1 & ~isempty(tmp)
D(i,:) = tmp;
elseif size(tmp,1)>1
r = randi([1:size(tmp,1)]);
D(i,:) = tmp(r,:);
end
end
If you run your check, return no results, and don't get any results after the relaxed conditions you should end up with a row of NaNs.
I have not tested this, so there may be some slight errors. Feel free to debug as necessary.
6 Comments
Bob Thompson
on 30 Aug 2019
The two things that have helped me be better at MATLAB, besides projects, are being active on these forums, and 'Cody' here on the mathworks website. The first has introduced me to a number of new concepts, or better ways of doing what I already know, while the second has offered me a wide variety of small challenges that I can learn how to solve without having to jump into some major project.
Other than that, reading the documentation, coupled with the forums, has helped teach me the vocabulary that the Matlab world tends to use, which makes it much easier to look at the documentation for new functions and be able to pick them up quickly.
More Answers (0)
See Also
Categories
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!