- Segment the Text region of the image. Try for text region detection, you can achieve this by applying morphological operations, such as dilation and erosion, to the binary image.
- You can apply filter based on the area, aspect ratio, or any other characteristic that is relevant to test cases.
- Measure font size: Once the text identification done in the text regions, you can further analyze the bounding box dimensions to estimate the font size. You can calculate the height, width, or diagonal length of the bounding box to approximate the font size in pixels.
How do you know the font size in an image?
263 views (last 30 days)
Show older comments
Editor's note:
This was the typical spam for a font website and related mobile apps. I'm leaving it since it's been answered, but I've deleted the advertisement and link.
0 Comments
Answers (2)
KALYAN ACHARJYA
on 8 Jun 2023
Edited: KALYAN ACHARJYA
on 9 Jun 2023
One Way (Limitation based on Input Data): Steps
Please note that the segmentation steps, like morphological operations, and text region identification might vary depending on your image and desired results. You may have to change those steps or parameters as per desired results.
0 Comments
DGM
on 8 Jun 2023
Edited: DGM
on 8 Jun 2023
The short answer is that you probably can't -- at least not accurately, consistently, or in the manner that's probably expected. The lack of stated constraints or requirements only make the problem more complicated and less likely to have a satisfactory solution.
You have text in an image. That could be any number of things.
It could be plain grid-aligned text, but that doesn't mean that it's necessarily going to be easy to segment
There's nothing saying it's grid-aligned. The cases where one wants to know about the fonts used in a design are also cases where it's likely that the designer did something awful with the text.
There's also nothing saying that the text is a synthetic component of the image. Was this paper printed in 14pt or 16pt?
Or maybe you expect a miracle.
In all cases, you need to be able to
- isolate the text from the background
- orient the text so that the characters have consistent orientation and scale
- get information about its scale with respect to whatever the desired "size" metric is
It's easy to come up with cases where #1 is difficult.
Orienting the text might be possible, but probably difficult. Would you be able to tell the difference between text that's been shaped by baseline displacement and text that's been shaped by deforming the characters?
If the text has been scaled, would you know? Would you have any reference to indicate what it once was? If you had some clean synthetic text like this, which part of the text is indicative of the "size"? Is it 40pt text that's been scaled up at one end, or is it 56pt text that's been scaled down at one end? Is it 72pt text that's been scaled both ways and then resized afterward? Similarly, if it's a photo, is there any spatial calibration information available?
Would you be able to do any of these things automatically with any degree of reliability?
Let's now assume all our images can be reduced to clean binarized text that's grid-aligned. Now it's worth asking at this point what "size" actually means. What are the units (px, pt, in, mm)? Are we talking about the size of the text within the image space, or the size of the text within some physical space represented in a photo? Are we talking about the nominal size of the font (i.e. the em height), or are we talking about the size of the particular text (i.e. the bounding box of a word)?
Let's make another simplifying assumption. Let's say that our text is purely synthetic and all we care about is the em height in pixels in the image space. We don't need any resolution information or anything. We should just be able to measure the height of the characters directly in the image, right? ... right?
The answer is no. The "size" of a font is its body height or em height. This is often approximately the height between descending and ascending features (or approximately 1.4 times the cap height), but that's often not the case, and the relationship between the em height and character features is inconsistent between fonts. In order to get a better estimate, you would need to identify the font, and you would need knowledge of that particular font's characteristics. Can it be done? Sure, but it's not a simple task.
If instead you wanted to settle for a simpler approximation based on ascender-descender distance or cap size, then you would need a large enough sample of characters to even get that information. Would you be able to programmatically determine whether you do have enough characters? Would you be able to tell if a bounding box is defined by cap height or ascender/descender height? Do you know whether the ascenders rise above cap height for the given font? Do you know where the baseline is?
Open up the following image.
These are three fairly mundane fonts of the same size and weight. The height of the yellow rectangle is 1 em. The other four rectangles describe the nominal distance between ascenders and descenders (green), the height of ascenders above the baseline (blue), the cap height (purple), and the x-height (orange). These are sized relative only to the first font sample.
Note that ascender and descender heights may vary within a font, and the relationships between cap height, x-height, and ascender height will vary between fonts. None of these things are equal to em height, and while the differences seem subtle in these three examples, the introduction of a script font will throw all subtlety out the window.
I'm aware that the original question was insincere, and that those motives justified its complete lack of specific details. That said, I'll take one last swing and point out that nobody said we were talking strictly about latin script.
2 Comments
Walter Roberson
on 9 Jun 2023
A further challenge is that the font size that was used to create the output does not necessarily correspond to sizes in an image.
For example if you take a picture of your monitor, then it is still meaningful to ask what the font size was on the screen even though clearly by changing the position of the camera relative to the screen you could change how many pixels in the image that any particular letter comes out as.
Suppose you have 'j' that you identify as been 12 pixels high in the image. Does it follow that it was a 12 pixel font? No! It might be a lesser-sized font seen from closer, or a higher-sized font seen from a little further away. It might have been a large font but the image might have been imresize()'d
So, how does one determine whether a character of a particular height (in pixels) was originally drawn that height or a lesser size or a greater size? At the very least you would need to figure out which font it was, and then get busy examining fine details. For example a 6 point font would have a smaller ratio of middle hole in the 'g' compared to a larger font, or a lesser font might anti-alias differently.
DGM
on 10 Jun 2023
I was bored, so I decided to explore the assumptions regarding approximations of em height. I never keep anything other than a handful of default fonts, so I decided to grab a small font pack and do a little analysis. All the fonts are fairly mundane (i.e. no script fonts or novelty/symbol fonts). In total, I processed 30 fonts, considering only A-Z,a-z,0-9.
I'm working on the assumption of naive image processing approaches (perhaps with the assistance of OCR). For the purposes of measuring cap height, I assume that you can find the baseline of the text and that you can identify capital letters. Instead of discerning some nominal ascender (or descender) height, I'm assuming that you'll find that simply from the text extents (the bounding box). I assume that you can identify and ignore any special characters that might cause problems. All this is already quite a challenge, but let's assume it's so.
I said that the em height is approximately the span between ascenders and descenders and that it's roughly 1.4 times the cap height. How close are those suggested factors?
- assumed 1.0000 1.4000
- min 0.8464 1.1344
- mean 1.0456 1.4536
- max 1.3234 1.6189
So our assumed factors are a bit off from the mean, but not by a large amount. We knew that the first factor would be an underestimate anyway.
That said, look at the range within this small sample of fonts. How much would that influence our results if we tried to use some assumed value for these factors in our estimation of em height? Let's assume that our factors are [1.05 1.45] respectively. This is the error with respect to the actual em height.
- min -20.66% -10.43%
- mean 1.27% 0.25%
- max 24.05% 27.83%
While on average, we would get a good approximation, we would routinely get terrible approximations without knowing it. At ~20% error, we can't assume that our estimate can simply be rounded to the nearest common point size. A larger sampling of fonts will only make this error spread worse.
See Also
Categories
Find more on Text Analytics Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!