How to detect text region from a document image?

I have a document image, which might be a newspaper or magazine. For example, a scanned newspaper. I want to remove all/most text and keep images in the document. Anyone know how to detect text region in the document? Below is an example. Thanks in advance!

Answers (2)

Try using the ocr function in the Computer Vision System Toolbox.

2 Comments

It needs the computer vision toolbox and it only supports English language. I want to detect text region other than english words.
Any other ideas? Thanks!
Actually, ocr supports many languages. But it does require the Computer Vision System Toolbox.

Sign in to comment.

If you don't want to use the Computer Vision System Toolbox, see this: http://www.visionbib.com/bibliography/contentschar.html#OCR,%20Document%20Analysis%20and%20Character%20Recognition%20Systems for a bunch of algorithms that can handle it in many languages. You'd have to write the code for those papers - we don't have any code for any of them.

1 Comment

Thanks! I will take a look at it, are there other simple methods to do this? As I don't need to know the content of the text, I only need to know the location of the text region. :_)

Sign in to comment.

Asked:

on 14 Nov 2014

Commented:

on 17 Nov 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!