Removing empty cells with non-zero dimensions

My code needs to deal with a cell array X, each cell of which is itself a cell array, containing a double array. For example, X could look as follows:
X = cell(N,1);
for i=1:N
X{i}=cell(1,10);
for j=1:10
X{i}{j} = randi(10, 5,2); %each cell contains a double array of size (5,2)
end
end
While manipulating my code, some rows of these double arrays might get removed. For example:
for i=1:N
for j=1:10
X{i}{j}(X{i}{j}(:,1) < 3,:) = [];
end
end
In some cases, all elements of some double arrays get removed, resulting in a 0×2 empty double matrix. This nonzero size is causing problems elsewhere in my code, how do I efficiently replace these with empty arrays?
My current approach is to call the following forloop after each set of manipulatoins that might result in empty arrays with nonzero size.
for i=1:N
for j=1:10
if isempty(X{i}{j})
X{i}{j} = [];
end
end
end
However, I'm fairly certain that there is no better way of doing this. Any suggestions?
Edit: I want to emphasize that I do not want to remove the empty cells. What I do want is to replace any 0x2 empty double matrices with 0x0 matrices.
The 10 cells inside each X{i} represent "physical" lattice sites in my simulation. An empty cell does have a meaning, and should not be removed.

3 Comments

That would just change the empty cell from a 0x2 to a 0x0. Is your goal to remove the empty cells completely? Note that the 2nd layer of cells may no longer all be the same length.
No, I explicitly want to keep the empty cells, I just don't want them to have a non-zero size if they are empty.
The 10 cells inside each X{i} represent "physical" lattice sites in my simulation. An empty cell does have a meaning, and should not be removed.
Adam Danz
Adam Danz on 24 Aug 2020
Edited: Adam Danz on 24 Aug 2020
I see. I'll update my answer.
Note that the isempty function will return the same results whether the cell is 0xn, nx0 or 0x0 but if you're using the cell size for any reason, then it matters what the empty dimensions are.

Sign in to comment.

 Accepted Answer

Adam Danz
Adam Danz on 24 Aug 2020
Edited: Adam Danz on 24 Aug 2020
How to remove empty cells
To remove all empty cells in the 2nd layer of a nested cell array named X,
for i = 1:numel(X)
X{i}(cellfun(@isempty,X{i})) = [];
end
Or, in 1 line,
X = cellfun(@(C){C(~cellfun(@isempty,C))},X);
That may eliminimate all of the 2nd layer of nested cells in which case some of the first layer may become empty. If you'd like to eliminate them as well (ie, all cells where all nested cells were removed),
X(cellfun(@isempty, X)) = [];
How to replace 0xn or nx0 empty cells with 0x0
To replace all 0xn or nx0 cells in the 2nd layer of a nested cell array named X,
for i = 1:numel(X)
X{i}(cellfun(@isempty,X{i})) = {[]};
end

1 Comment

I'm guessing that your workflow uses size() which is why it's a problem when a cell is 0x2. If that's the case, you could avoid this entire process if you use isempty() within your workflow instead of size(). If the size of the arrays are already stored somewhere as sz, you could use something like if any(sz==0).
Also, if the second block of code in your question resembles what you're actually doing, you could shave off some time by fixing the problem within that section rather than additing another set of loops to convert 0x2 to 0x0. This is the fastest method yet, I believe (not that it matters at this point).
% Replace the 2nd block of code in your question with this
for i=1:N
Xi = X{i};
for j=1:10
rmIdx = Xi{j}(:,1) < 3;
if all(rmIdx)
Xi{j} = [];
else
Xi{j}(rmIdx,:) = [];
end
end
X{i} = Xi;
end

Sign in to comment.

More Answers (1)

I like your for-loop; you might speed up a little bit
for i=1:N
Xi = X{i};
Xi(cellfun('isempty',Xi)) = {[]}; % switch to string from Rik's remark
X{i} = Xi;
end

13 Comments

You can replace the outer for-loop with cellfun
X = cellfun(@ReplaceEmpty, X, 'unif', 0)
function Xi = ReplaceEmpty(Xi)
Xi(cellfun('isempty',Xi)) = {[]}; % switch to string from Rik's remark
end
Adam Danz
Adam Danz on 24 Aug 2020
Edited: Adam Danz on 24 Aug 2020
The OP's original nested loops are actually 1.99x faster than the one in your answer and 1.84x faster than the one in my answer, on average, mainly thanks to cellfun.
Each timed 1000 times, comparing the median values.
Your loops isn't really different than mine. It unpacks and repacks the cell array which adds a tiny bit more time.
AS
AS on 24 Aug 2020
Edited: AS on 24 Aug 2020
Wait, are you saying my original method is the fasted approach? I expected somthing using cellfun to be faster, I just didn't get it to work properly without some help.
edit: some testing suggests that it isindeed quite a lot faster. I assumed that arrayfun and cellfun would speed up things, but that turns out not to be true.
Yeah, that's why I first state that I like OP's for-loop.
I'm still outthere looking for example where CELLFUN/ARRAYFUN beats FOR-LOOP.
"I expected somthing using cellfun to be faster"
I don't understand why a lot of people get this expectation from. CELLFUN/ARRAYFUN is a scam. It does provide compact code that's all.
Adam Danz
Adam Danz on 24 Aug 2020
Edited: Adam Danz on 24 Aug 2020
"CELLFUN/ARRAYFUN is a scam" 😄
Generally vectorization is faster than loops which initially gave for-loops a bad rep. But speed has generally increased, especially with Matlab's JIT compilation. cellfun, arrayfun, etc all have internal loops anyway. Their main attraction is the reduction of lines of code and, sometimes, improved readability (certainly not always; sometimes they are very difficult to interpret). For simple operations, loops, even nested loops, are often faster.
Though in this case the main slowdown is due to your use of the handle style, instead of the char input to cellfun:
N=100;
X = cell(N,1);for i=1:N,X{i}=cell(1,10);for j=1:10,X{i}{j}=randi(10,5,2);end,end
for i=1:N,for j=1:10,X{i}{j}(X{i}{j}(:,1)<3,:)=[];end,end
[timeit(@()cellfun_handle(X)) %42 microseconds
timeit(@()cellfun_str(X)) % 2.1 microseconds
timeit(@()for_fun(X))] % 1.5 microseconds
function out=cellfun_handle(X)
out=cellfun(@isempty, X);
end
function out=cellfun_str(X)
out=cellfun('isempty', X);
end
function out=for_fun(X)
out=false(size(X));
for n=1:numel(X)
out(n)=isempty(X);
end
end
This is the fatest according to my benchmark
for i=1:N
Xi = X{i};
for j=1:10
if isempty(Xi{j})
Xi{j} = [];
end
end
X{i} = Xi;
end
Rik
Rik on 24 Aug 2020
Edited: Rik on 25 Aug 2020
If you look at the numbers I posted: I agree. Using a for loop is faster. The thing I pointed out there is that it isn't much faster than cellfun('isempty',X), while cellfun(@isempty,X) is a lot slower.
Adam Danz
Adam Danz on 24 Aug 2020
Edited: Adam Danz on 24 Aug 2020
Great point, Rik!
I suppose that extra time is saved by not sorting through overloaded versions of the function. Thanks for that reminder!
@Bruno Luong, good idea adding the condition to check for empties.
@Rik, Historically the CELLFUN has special speedy implementation for a small number of functions and they are invoked through string 'xx' and not @xx. 'isempty' is among them.
At some point TMW recommended not using string, I would though they move the special implementation for @xx syntax, obviously not. So thanks for reminding us and TMW must get to work and implement what they still left over.
AS
AS on 24 Aug 2020
Edited: AS on 24 Aug 2020
@Bruno Luong, Would you mind explaning why defining and then using Xi = X{i}; inside the first loop speeds things up? It's more than twice as fast on my machine.
Well very simple explanation:
with X{i}{j} you tells matlab to indexing twice with i variable then with j.
With Xi{j} only one indexing once with j since Xi is a variable. In the for-loop it makes a difference.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Release

R2018a

Asked:

AS
on 24 Aug 2020

Edited:

on 25 Aug 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!