extract numbers from an image

Question

0 votes

Howdy,

I would like to extract the numbers inside the squares, how do you recommend I do it.

5 Comments
Show 3 older comments Hide 3 older comments

DGM on 15 Jan 2025

Edited: DGM on 15 Jan 2025

Open in MATLAB Online

On further inspection, it does appear that there are some limitations on the resolution of the representation -- and I don't just mean that the numbers are tiny. There are multiple locations where identical numbers are represented with different colors, but there are also cases where there are identical colors used for slightly different numbers. The numbers are rounded (or truncated) to three digits, but the colors are also quantized, so there's error in both representations. Which one is less wrong?

This is why I try to emphasize that graphs are low-fidelity visualizations of data and should not be treated stores of data unless there's nothing else.

inpict = imread('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1823310/image.png');

% try to isolate text

mx = max(inpict,[],3);

mn = min(inpict,[],3);

C = mx - mn;

Cm = imclose(C,ones(11));

tpict = Cm - C;

tpict = imadjust(tpict,[0 0.5],[0 1],0.2);

% try to clean up the text

mk = all(inpict < 20,3);

mk = bwareaopen(mk,100);

tmk = imdilate(mk,ones(3));

tpict = max(im2double(tpict) - tmk,0);

% try to clean up the color

cpict = imclose(inpict,ones(11));

mx = max(cpict,[],3);

mn = min(cpict,[],3);

cmk = (mx - mn) > 50;

cmk = cmk & ~mk;

% yellow maps to both 0.98 and 0.99

% identical numbers map to different colors

% identical colors map to different numbers

CT0 = [0 0.4392 0.7529; 1 0.9961 0; 1 0 0];

x0 = [0.83; 0.985; 1.21];

% try to remap color back to data units

maplen = 256;

n0 = linspace(0,1,numel(x0));

n = linspace(0,1,maplen);

x = interp1(n0,x0,n,'linear');

CT = interp1(n0,CT0,n,'linear');

% apply the CT, interpolate

X = mat2gray(rgb2ind(cpict,CT,'nodither'));

X = interp1(n,x,X);

X(~cmk) = NaN; % clean up

% overlay some text purely for visualization

% disable this for a clean data image

X = X.*(1-im2double(tpict));

% visualize

% use the data cursor tool to observe the correspondence

% between the image values and the text labels

hi = imshow(X,[]);

% slap down a datatip for the forum demo

datatip(hi,411,332); % 0.9669 vs 0.97

●

You could try OCR, but I don't think the OCR tools are really meant to work on microscopic text. The text's stroke is no more than 1px wide, but the strokes are never grid-aligned. As a result, there's nothing left of the text but antialiasing. That's why it's difficult to isolate from the background. Out of the entire image, there are only 14 pixels left which are strictly black text and not partially blended with the background color. Upsampling and filtering might help, but I'm not sure. I don't have CVT, so I'm not going to mess with that.

% ... or you can try to make OCR work, but good luck.
% in this case, ocr() can't even _find_ the text, let alone read it.
OC = ocr(tpict,'characterset','0123456789.');
OC.Words % 352 numbers in, 52 random bits of gibberish out
ans = 52x1 cell array
    {'12' }
    {'12' }
    {'150'}
    {'1'  }
    {'119'}
    {'120'}
    {'2'  }
    {'8'  }
    {'1'  }
    {'1'  }
    {'8'  }
    {'118'}
    {'16' }
    {'118'}
    {'1'  }
    {'16' }
    {'118'}
    {'16' }
    {'115'}
    {'1'  }
    {'15' }
    {'18' }
    {'1'  }
    {'1'  }
    {'1'  }
    {'2'  }
    {'1'  }
    {'19' }
    {'100'}
    {'10' }
% even if we tell it where to look, the result is still useless
wordmk = tpict>0.5;
wordmk = imdilate(wordmk,ones(5));
S = regionprops(wordmk,'boundingbox');
OC = ocr(tpict,vertcat(S.BoundingBox),'characterset','0123456789.');
vertcat(OC.Words) % different garbage
ans = 30x1 cell array
    {'120'}
    {'131'}
    {'121'}
    {'119'}
    {'119'}
    {'118'}
    {'119'}
    {'118'}
    {'119'}
    {'118'}
    {'118'}
    {'116'}
    {'117'}
    {'138'}
    {'118'}
    {'17' }
    {'147'}
    {'14' }
    {'17' }
    {'116'}
    {'114'}
    {'114'}
    {'111'}
    {'110'}
    {'110'}
    {'111'}
    {'131'}
    {'116'}
    {'16' }
    {'1.4'}

DGM on 16 Jan 2025

Open in MATLAB Online

This is better. It's still unusable garbage, but it's improved garbage.

% the original image
inpict = imread('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1823310/image.png');
% isolate the low-res text
% this is easier to do at one scale 
% than to try to make this filtering
% and cleanup independent of scale
mx = max(inpict,[],3);
mn = min(inpict,[],3);
C = mx - mn;
Cm = imclose(C,ones(11));
tpict = Cm - C;
tpict = imadjust(tpict,[0 0.5],[0 1],0.2);
% clean it up as before
mk = all(inpict < 20,3);
mk = bwareaopen(mk,100);
tmk = imdilate(mk,ones(3));
tpict = max(im2double(tpict) - tmk,0);
% upscale everything after the fact
k = 2;
inpict = imresize(inpict,k);
wordmk = imdilate(tpict>0.5,ones(5));
wordmk = imresize(wordmk,k,'nearest');
S = regionprops(wordmk,'boundingbox');
% try to apply OCR to the upscaled original image
% using bbox info from low-res copy
OC = ocr(inpict,vertcat(S.BoundingBox),'characterset','0123456789.');
vertcat(OC.Words) % improved garbage
ans = 239x1 cell array
    {'1.20'}
    {'121' }
    {'121' }
    {'1.20'}
    {'1.20'}
    {'1.20'}
    {'1.21'}
    {'1.19'}
    {'1.20'}
    {'1.19'}
    {'1.20'}
    {'1.19'}
    {'1.19'}
    {'120' }
    {'1.19'}
    {'1.18'}
    {'119' }
    {'1.18'}
    {'1.18'}
    {'1.16'}
    {'117' }
    {'1.18'}
    {'1.18'}
    {'117' }
    {'1.16'}
    {'1.18'}
    {'117' }
    {'1.15'}
    {'1.15'}
    {'1.15'}

Other than missing decimal points, I haven't bothered to check how many of these are wrong, since I can only run this on the forum.

At least the color processing approach yields a value for each cell. I still haven't managed to get OCR to do that.

Sign in to comment.

Sign in to answer this question.

Follow Question

extract numbers from an image

5 Comments
Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Tags

Community Treasure Hunt

extract numbers from an image

5 Comments Show 3 older comments Hide 3 older comments

Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

5 Comments
Show 3 older comments Hide 3 older comments