Back to work
2018
Computer Vision

Hindu-Arabic Handwritten Math OCR

Snap a photo of a handwritten Hindu-Arabic equation. Get the answer. A web and mobile system that recognizes the characters, reconstructs the equation's structure, and solves it end to end.

TensorFlowCNNOpenCVPILSymPyFlaskAWS
99%
Validation accuracy
14
Character classes
300+
Students sampled
100k+
Training images

The Problem

Off-the-shelf OCR breaks here.

Optical character recognition for printed Latin text is mature and well served by existing tools. Handwritten Hindu-Arabic mathematical equations are not. The script reads right to left, the digits are Hindu-Arabic (٠١٢٣٤٥٦٧٨٩), and the spatial conventions for powers and division differ from Latin notation. Existing solvers expect Latin left-to-right input.

No pipeline took a photo of a handwritten Hindu-Arabic equation and returned a solved answer. So I built one, end to end: ingestion, preprocessing, segmentation, classification, structural parsing, and symbolic solving.

How it works

A five-stage pipeline.

01

Preprocessing

Two strategies based on the source. Web uploads with a transparent background get composited onto white. Mobile photos run through Non-Local Means denoising, grayscale conversion, and adaptive thresholding to produce a clean binary image ready for segmentation.

02

Segmentation

A two-pass connected-components labeling algorithm walks the binary image and groups touching pixels into individual character regions. Each region is cropped and inverted into its own image, ready for the classifier.

03

Character Classification

Each character image is fed through a Convolutional Neural Network that outputs one of fourteen classes: digits 0-9, +, -, ÷, and the letter س (Arabic for x). The architecture is adapted from a public >99%-on-MNIST model, scaled up for 45×45 input and the larger class set.

04

Structure Recognition

Recognizing characters is not enough. Equations have spatial structure: the three pieces of a division symbol, an exponent floating above its base. A 'ratio logic' pass uses bounding-box geometry between recognized characters to identify ÷, =, and powers, grouping their pieces into single semantic symbols.

05

Translation and Solve

Hindu-Arabic equations read right-to-left and place exponents on the upper-left of their base, but SymPy expects standard left-to-right Latin notation. The pipeline reorders and mirrors the symbol stream into a SymPy-parseable form, computes the answer, and returns both the original Hindu-Arabic equation and the result to the client.

Data

The model is only as good as the data.

Three Kaggle datasets covered the basic shapes: Hindu-Arabic numerals (70,000 images), isolated Arabic letters (16,800 images), and math operators (around 100,000 per symbol). But each came in a different size, color depth, line thickness, and rotation convention, and none of them looked like handwriting from the people who would actually use this app.

To fix that, we collected handwriting from over 300 students. Each contributed digits, the letter س, and operators on standard paper. The pages were scanned, segmented, normalized, and merged with the public datasets.

Unified to one specification

Every sample was normalized to a 45×45 binary image with a black background and medium-thickness strokes. Smaller datasets were upsampled with PIL bicubic resizing; thicker numerals were thinned with OpenCV erode.

Augmented for diversity

Math symbols were dilated and randomly cropped at the edges to simulate variation and reduce overfitting. The final dataset was serialized as a single CSV of pixel values for fast load and minimal memory footprint.

Result

99% validation accuracy.

A working full-stack system: photo in, structured Hindu-Arabic equation and solved answer out. Built with Flask on AWS, served to both a web client and a mobile client.