Experts in Written Essays & Research Papers: Assignment Help Services.

To hire a writers, fill the order instructions form & checkout—guaranteed a top college graduate to write your essay & NO AI-Plagiarism in the
final papers! Pursuing an 8-16 week course? The best in completing ace my homework & online class help, will assist you today!

Posted: April 15th, 2023

Study of Document Layout Analysis Algorithms

Can I See Writer Credentials?

You can view anonymized writer profiles, highlighting their degrees and expertise, to feel confident in who is handling your paper. This transparency helps you choose the best fit for your needs. Each profile showcases years of experience and specialization areas for informed decision-making. Our team ensures every writer meets high academic standards. Assignment writer credentials include verified degrees and proven subject mastery.

Relative Study of Document Layout Analysis Algorithms for Printed Document Images

  • Divya Kamat, Divya Sharma, Parag Chitale, Prateek Dasgupta

 

ABSTRACT

In the following survey paper, the different algorithms that could be used for document layout analysis have been studied and their results have been compared. For the removal of image mask, Bloomberg’s algorithm and CRLA have been described. For the purpose of text segmentation, we have studied the Recursive XY Cut algorithm, RLSA and RLSO algorithms.

What If I Need Last-Minute Changes?

No problem! Request revisions anytime within 7 days, and we will tweak your paper quickly to meet your needs, free of charge. Our flexible process keeps your paper on track. Whether it is formatting adjustments or content refinements, we handle changes efficiently. Just update your order details in your account. Essay helper support ensures quick turnaround on modification requests without hassle.

  1. Introduction

Physical layout analysis of printed document images is the first step of the OCR conversion. For the OCR to work effectively, we need to provide an input wherein no images are present in the document i.e. the image contains only text. If this is not done properly, the OCR will return garbage values. To avoid this, we have discussed two algorithms, Bloomberg’s Algorithm and CRLA that could be used for the removal images from the document images.

The next step is the text segmentation wherein we find the text blocks inside the document. The coordinates of these text blocks are then passed as input to the OCR. To perform this segmentation, we have discussed the recursive XY cut algorithm, the RLSA and RLSO algorithms.

  1. Removal of Image from Document

The first step in the document layout analysis is to remove the images present in the original document. We will be discussing the Bloomberg’s algorithm along with its variations and the CRLA algorithm for image removal.

  1. Bloomberg’s Algorithm

The Bloomberg’s algorithm is primarily used to find the image mask of halftone images. The implementation of this algorithm uses basic morphological operations. The algorithm has the following steps:

Do You Handle Interdisciplinary Topics?

Yes, our writers with diverse expertise can blend fields like psychology and sociology, delivering cohesive papers for complex assignments. We ensure every angle is covered with precision. Cross-disciplinary research requires special skill in integrating multiple perspectives seamlessly. Share your topic details for a tailored approach. Ace my homework challenges involving multiple disciplines with our versatile academic team.

  1. In the first step, the binarization of the input image is performed.
  2. Next, 4×1 threshold reduction is performed twice using threshold T=1.
  3. 4×1 threshold reduction is performed using T=4.
  4. 4×1 threshold reduction is performed using T=3.
  5. Opening the image with a structural element of size 5×5.
  6. Next, 1×4 expansion of the image is performed twice.
  7. Next the union of overlapping components of the seed image obtained from step 6 with the image obtained from step 2 is performed.
  8. Dilation with structural element 3×3 followed by 1×4 expansion which is performed twice.
  9. The halftone mask obtained from step 8 is then subtracted from the binarized input image.

The main issue with Bloomberg’s algorithm is that it is unable to distinguish between text and sketches (i.e. line drawings) in a printed document image.

  1. Enhanced CRLA Algorithm

CRLA stands for Constraint Run Length Algorithm. In this algorithm we apply horizontal and vertical smoothening to the document image to get a clear separation between text and images in the document.

Enhanced CRLA is used to smooth out only the text part in the image and avoid smoothening of non-textual part of the document image.

Algorithm:

  1. Label the connected components in the document image.
  2. Classify the components with respect to their heights as follows:
    1. Height less than or equal to 1 cm, label it as 1
  3. Height between 1 and 3 cm, label it as 3
  4. Height greater than 3 cm, label it as 3
  5. Apply horizontal smoothening to the components with label 1 only.
  6. Apply vertical smoothening to the components with label 1 only.
  7. Logically AND the two images obtained previously.
  8. Apply horizontal smoothening to the output image of AND operation.
  9. Calculate Mean Black Run Length
  10. Calculate the Black Run Length (BRL) row-wise for the region under consideration.
  11. Maintain a Black-White Transition Count (TC) for the region.
  12. Calculate Mean BRL as MBRL= (BRL/TC).
  13. Calculate Mean Transition Count
  14. Maintain a Black-White Transition Count (TC) for the region.
  15. Calculate W, the width of the region.
  16. Calculate Mean TC as MTC=(TC/W)
  17. Extract the components from the image with label 1 having values of MBRL and MTC in the acceptable range for the typical document image.
  18. Apply horizontal smoothening to the components with label 2 only.
  19. Apply vertical smoothening to the components with label 2 only.
  20. Logically AND the two images obtained previously.
  21. Apply horizontal smoothening to the output image of AND operation.
  22. Calculate MBRL and MTC.
  23. Extract the components from the image with label 2 and 3 having values MBRL and MTC in the acceptable range for the typical document image.

At step 9 we extract the text part of the document image and at step 15 we extract the non-text part of the document image.

How Do You Define Originality?

Originality means crafting every paper from scratch, with unique content and proper citations, verified by plagiarism checks for 95%+ uniqueness. You will receive a free originality report with every order. Our zero-tolerance policy for copied content ensures your paper is academically authentic. We guarantee your paper is one-of-a-kind. Paper writing excellence demands complete originality in every sentence we produce.

The main advantage of the CRLA algorithm is that clear separation of text and non-text part of the document image. It also works for sketches as well as halftones effectively. It has considerably less complexity as selective smoothening is done.

However, after the removal of the non-textual part of the document image, some stray pixels remain the image. The connected components in the halftone image whose height is less than 1cm are assumed as text elements in the algorithm. This results in presence of unwanted components in the final image.

  1. Text Segmentation

The next step in the document layout analysis is the segmentation of text into text blocks that could be provided as input to the OCR. The following algorithms have been studied for this:

  1. Recursive XY Cut algorithm

The recursive XY cut algorithm is used for obtaining text blocks from an image that does not contain any images from the original printed document. The XY cut algorithm works in the following way:

  1. The bounding boxes of the image are calculated.
  2. Next we calculate the horizontal and vertical projections of the image.
  3. After calculating the projections, we then perform X cuts on all the valleys in the horizontal projections which have a value greater than the threshold th.
  4. Next we perform Y cuts in between these X cuts at all the valleys in the vertical projections which have a value greater than the threshold tv.
  5. We repeat the steps 3 and 4 until there are no further X or Y cuts possible in a region.

One of the problems with XY cut algorithm is that there is no method to find a threshold that will work for all the documents. Instead, a new threshold needs to be determined for each document and this cannot be done without manual intervention.

Can You Follow My Professors Unique Guidelines?

Absolutely! Share your professors rubric or specific instructions, and we will tailor the paper to meet every detail, no matter how unique. Our writers thrive on precision and customization. We have experience with unconventional requirements and specialized formatting across all disciplines. Upload your guidelines in the order form for best results. Research study bay professionals excel at interpreting and executing complex instructor specifications.

Another major issue with the recursive XY algorithm is the time complexity. The recursive XY cut algorithm requires a large time to complete execution. Despite these disadvantages, this algorithm successfully separates the text blocks provided that a manual threshold is provided.

  1. RLSA

The Run-Length Smoothing Algorithm (RLSA) works on black & white scanned images of documents. It finds runs of white pixels and converts them into black pixels whenever they are less than a given threshold. The RLSA works in four steps:

  1. In the first step, we perform horizontal smoothing. For this, we scan the image row-wise and then replace lengths of white pixels by black pixels if they are less than a threshold th.
  1. In the second step, we perform vertical smoothing. For this, we scan the image column-wise and then replace lengths of white pixels by black pixels if they are less than a threshold tv.
  1. Next, we perform logical ANDing of the images obtained from the first and second steps.
  1. Then we perform horizontal smoothing on the image obtained from step 3 with a threshold ta.
  1. RLSO

A simplified version of the RLSA, RLSO (Run-Length Smoothing with OR) works as follows:

  1. In the first step, we perform horizontal smoothing. For this, we scan the image row-wise and then replace lengths of white pixels by black pixels if they are less than a threshold th.
  2. In the second step, we perform vertical smoothing. For this, we scan the image column-wise and then replace lengths of white pixels by black pixels if they are less than a threshold tv.
  3. Next we perform a logical OR operation on the images obtained from the first and second step.

The RLSA algorithm returns rectangular frames of documents with Manhattan Layouts. On the other hand, RLSO algorithm also works well with non-Manhattan layouts. The problem with both RLSA and RLSO is that the threshold for smoothing needs to be determined manually. Also the threshold required for each document image is different and it is almost impossible to be determined manually.

  1. Conclusion

We have compared the above given algorithms for the document layout analysis. During our research we found that, while Bloomberg’s algorithm faces problems for images that contain sketches, CRLA faces problems for images that contain extremely small non-textual elements.

Can You Provide Sample Papers?

Yes, we offer sample papers upon request to showcase our quality and style, helping you decide before placing an order. These samples reflect our commitment to excellence. Reviewing samples gives you insight into our writing standards and subject expertise. Contact support to view samples relevant to your needs. Ace tutors demonstrate their capabilities through high-quality example work across disciplines.

We also observed that the recursive XY Cut algorithm and RLSA both do not work on printed documents having non-Manhattan layouts. On the other hand, the RLSO algorithm gives comparatively better results for Manhattan as well as non-Manhattan layouts. However, all three algorithms mentioned above face the common problem of manual threshold determination which is document specific.

  1. References
  1. Syed Saqib Bukhari, Faisal Shafait and Thomas M. Bruel, “Improved Document Image Segmentation Algorithm using Multiresolution Morphology”
  1. Jaekyu Ha and Robert M. Haralick, Ihsin T. Philips, “Recursive XY Cut using Bounding Boxes of Connected Components” , Third International Conference on Document Analysis and Recognition, ICDAR, 1995
  1. Stefano Ferilli, Teresa M.A. Basile, Floriana Esposito, “A histogram-based Technique for Automatic Threshold Assessment in a Run Length Smoothing-based Algorithm”, ACM, 2010.
  1. Hung-Ming Sun, “Enhanced Constrained Run-Length Algorithm for Complex Layout Document Processing”, International Journal of Applied Science and Engineering, 2006

Order | Check Discount

Why trust us? Can you do my assignment?

College students want the best grades in their courses and that’s our FOCUS

Graduate Level Writers

Our team consists of outstanding writers who have specialized knowledge in specific subject areas and are scholars experienced in academic research;custom paper writing following assessment task, assignment brief and grading rubric criteria. They hold at least a graduate degree—230 with Masters and MSN qualifications, experts carefully selected and trained to ensure the best final paper quality of our work. .

College Students Prices

We’re dedicated to bringing on board top-notch writers who can provide excellent work at prices that make sense for college students; affordable papers for all the course subjects. Our goal? To give you the best bang for your buck without ever compromising on the quality of our essay writing services—or the content of your paper. We give special extra discounts for regular clients and also for long research papers, dissertations and capstone projects. #Don’t forget to use the DISCOUNT code in the COUPONS section of the order form before checking-out!.

100% Human Written

The Online Homework Ace Tutors service guarantees that our final work is 100% original, researched, and expertly human-written. Our professional academic writers craft every custom essay and research paper from scratch, ensuring your assignment is tailored to your exact instructions. We are committed to delivering plagiarism-free and AI-free work to each university/college student's 'write my paper' request. To uphold this promise, we check every draft for any possible instances of duplication, wrong citation, grammar errors, and artificiality before we send it to you. Thus, you can always rely on us to write genuine and high-standard content for your essay assignments.

How it works

When you trust to place an order with Sample Essays, here is what happens:

Complete the Order Form

Please fill out our order form completely, providing as much detail as possible in all the required fields.

Assignment of Writer

We carefully review your order and assign it to a skilled writer with the specific expertise needed to handle it. The writer then creates your content entirely from scratch.

Order in Progress and Submission

You, along with the support team and your assigned writer, communicate directly throughout the process. Once the final draft is delivered, you can either approve it or request edits, paraphrasing, or a complete revision.

Giving us Feedback(review our essay service)

Ultimately, we value your feedback on how your experience went. You can also explore testimonials from other clients. Additionally, you have the option to recommend or select your preferred writer for any future orders.

Write My Essay For Me