Unified to one specification
Every sample was normalized to a 45×45 binary image with a black background and medium-thickness strokes. Smaller datasets were upsampled with PIL bicubic resizing; thicker numerals were thinned with OpenCV erode.
Snap a photo of a handwritten Hindu-Arabic equation. Get the answer. A web and mobile system that recognizes the characters, reconstructs the equation's structure, and solves it end to end.
The Problem
Optical character recognition for printed Latin text is mature and well served by existing tools. Handwritten Hindu-Arabic mathematical equations are not. The script reads right to left, the digits are Hindu-Arabic (٠١٢٣٤٥٦٧٨٩), and the spatial conventions for powers and division differ from Latin notation. Existing solvers expect Latin left-to-right input.
No pipeline took a photo of a handwritten Hindu-Arabic equation and returned a solved answer. So I built one, end to end: ingestion, preprocessing, segmentation, classification, structural parsing, and symbolic solving.
How it works
Two strategies based on the source. Web uploads with a transparent background get composited onto white. Mobile photos run through Non-Local Means denoising, grayscale conversion, and adaptive thresholding to produce a clean binary image ready for segmentation.
A two-pass connected-components labeling algorithm walks the binary image and groups touching pixels into individual character regions. Each region is cropped and inverted into its own image, ready for the classifier.
Each character image is fed through a Convolutional Neural Network that outputs one of fourteen classes: digits 0-9, +, -, ÷, and the letter س (Arabic for x). The architecture is adapted from a public >99%-on-MNIST model, scaled up for 45×45 input and the larger class set.
Recognizing characters is not enough. Equations have spatial structure: the three pieces of a division symbol, an exponent floating above its base. A 'ratio logic' pass uses bounding-box geometry between recognized characters to identify ÷, =, and powers, grouping their pieces into single semantic symbols.
Hindu-Arabic equations read right-to-left and place exponents on the upper-left of their base, but SymPy expects standard left-to-right Latin notation. The pipeline reorders and mirrors the symbol stream into a SymPy-parseable form, computes the answer, and returns both the original Hindu-Arabic equation and the result to the client.
Data
Three Kaggle datasets covered the basic shapes: Hindu-Arabic numerals (70,000 images), isolated Arabic letters (16,800 images), and math operators (around 100,000 per symbol). But each came in a different size, color depth, line thickness, and rotation convention, and none of them looked like handwriting from the people who would actually use this app.
To fix that, we collected handwriting from over 300 students. Each contributed digits, the letter س, and operators on standard paper. The pages were scanned, segmented, normalized, and merged with the public datasets.
Every sample was normalized to a 45×45 binary image with a black background and medium-thickness strokes. Smaller datasets were upsampled with PIL bicubic resizing; thicker numerals were thinned with OpenCV erode.
Math symbols were dilated and randomly cropped at the edges to simulate variation and reduce overfitting. The final dataset was serialized as a single CSV of pixel values for fast load and minimal memory footprint.
Result
A working full-stack system: photo in, structured Hindu-Arabic equation and solved answer out. Built with Flask on AWS, served to both a web client and a mobile client.