How can I extract text from image respecting the distance between words?

10 views (last 30 days)
I am trying to extract text from an image of printed text. Along with the text itself, I am interested in detecting the spaces between the words as well. The spaces between words are not consistent (deliberate) and that is something I want detected.
To achieved this I first had to extract the text lines. I achieved that using the projection profile code attached (code copied from one of the answers by ImageAnalyst). The attached image shows the output of this code.
One way I thought of achieving this was by counting the number of white pixels between the words, if I know the number of pixels taken by a single space (say n), I could just determine the number of spaces by dividing the white pixels between the words by this 'n' to get the number of spaces.
I tried that but it did not go as planned, the results are very conflicting, even when compared against known ground truth values. Determining a baseline of every text line is proving to be difficult, for a single space between two words I am getting different pixel count. This is because as counting the white pixels from letter d to b is different from counting the white pixels from c to s (the white pixels within the curve of c is also sometimes counted.)
Any guidance or suggestions would be greatly appreciated.
Thank you

Accepted Answer

Ben Drebing
Ben Drebing on 18 Dec 2017
Try looking at this answer:
This code can only process 1 line of text at a time. But I was able to get fairly good results on the first line of your data when I set
fontSize = 6;
Ben Drebing
Ben Drebing on 19 Dec 2017
I think you can modify that Untitled2.m file to concatenate all the lines into one long line and then you can use our solution on this image. You can create a variable called "allOneLine" which has all the text but in one line. Maybe something like
img = imread('Lines.png');
[rows, columns, dim] = size(img);
if dim > 1
grayImage = img(:, :, 2);
% Display the original gray scale image.
% Threshold the image.
binaryImage = grayImage < 210;
% Get rid of small areaas of 14 pixels or less
binaryImage = ~bwareaopen(binaryImage, 15);
% Vertical Profile
verticalProfile = sum(binaryImage, 2);
rowsWithText = verticalProfile < 600;
% Find top and bottom lines
topLines = find(diff(rowsWithText) == 1);
bottomLines = find(diff(rowsWithText) == -1);
allOneLine = [];
for j = 1 : length(topLines)
topRow = topLines(j);
bottomRow = bottomLines(j);
thisLine = binaryImage(topRow:bottomRow, :);
allOneLine = [ allOneLine , thisLine]; % Here we append the lines
imshow(allOneLine); % Show our new image

Sign in to comment.

More Answers (0)


Find more on Convert Image Type in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!