Document Scanning and Processing System

The goal of this project was to develop an automated system for scanning and digitizing documents from raw smartphone images. The system aims to bridge the gap between a casual photograph and a high-quality digital scan by addressing common real-world challenges:

  • Perspective Distortions: Correcting the angled view of a document.
  • Uneven Lighting: Managing shadows and varying brightness.
  • Background Noise: Distinguishing the document from its surroundings

Methodology: Two Core Approaches

  • The Naive Approach: Relies on edge detection and linear geometry. It involves pre-processing images in various colour spaces (RGB, HSV, Grayscale), applying Canny Edge Detection, and using the Hough Transform to find lines and intersections.
  • The Segmentation Approach: A topological method that treats the image as a 3D surface where pixel intensity represents elevation. By “flooding” the image from background seeds at the edges, the algorithm creates “dams” that form the document’s boundaries.

The Processing Pipeline

  1. Pre-processing & Filtering: Utilizing the HSV colour space (specifically the Saturation channel) and Bilateral Filtering to preserve edges while reducing noise.
  2. Edge & Corner Detection: Identifying the four most likely corners of the document.
  3. Perspective Transformation: Using a transformation matrix to rectify the image into a flat, top-down “bird’s-eye” view.
  4. Enhancement: Applying the Otsu Thresholding algorithm to create a high-contrast binary image for better readability.
  5. OCR:  Using the EasyOCR engine to extract and digitize the textual content.

Key Findings & Challenges

  • The Segmentation Bottleneck: Robustly isolating the page from the background remains the most difficult phase. No single algorithm worked perfectly for every image in the dataset.
  • Computational Complexity: The geometric approach often faces high time complexity when dealing with many line intersections.
  • The Importance of Pre-processing: The success of the final OCR depends heavily on the quality of the initial noise reduction and colour space selection.