Credit Card Optical Character Recognition with OpenCV + Tesseract

Link: https://github.com/philippe-heitzmann/Credit_Card_Reader_OpenCV_Tesseract

Introduction

Accurately extracting a card’s 16-digit number (PAN) and cardholder name from images is foundational for building scalable card databases, improving fraud controls, and enabling smoother checkout. The goal here is a ≤ 0.5s per image pipeline using OpenCV and Pytesseract.

Because large, public datasets with both PAN and name are scarce, we tested on a 23-image set collected from Google Images and hand-annotated. On this set, the pipeline reached 48% recall on the 16-digit number and 65% recall on the cardholder name. (Full dataset and code are in the repo.)

Figure 1. Sample dummy credit card images from the dataset

Methodology

We test two out-of-the-box components:

Digits (PAN) via OpenCV template matching
Name text via Google Pytesseract

i) Digit recognition

Template matching creates a reference image for each digit (0–9) and slides it across candidate regions, scoring similarity—analogous to a fixed kernel in a CNN.

Figure 2. Sample base credit card image

We first build ten digit templates from an OCR reference sheet by finding contours and cropping each digit ROI into a dictionary.

Figure 3. OCR reference used to build digit templates

To isolate candidate digit regions on the card:

Apply a tophat transform to highlight light digits against darker backgrounds.
Figure 4.
Run a Sobel operator to emphasize edges.
Convert to binary with Otsu thresholding.
Figure 5.

Code: build digit template map

import cv2
import imutils
import numpy as np
from imutils import contours

def read_ocr(ocr_path):
    ref = cv2.imread(ocr_path)
    ref = cv2.cvtColor(ref, cv2.COLOR_BGR2GRAY)
    ref = cv2.threshold(ref, 10, 255, cv2.THRESH_BINARY_INV)[1]
    refCnts = cv2.findContours(ref.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    refCnts = imutils.grab_contours(refCnts)
    refCnts = contours.sort_contours(refCnts, method="left-to-right")[0]
    return ref, refCnts

def get_digits(ref, refCnts):
    digits = {}
    for (i, c) in enumerate(refCnts):
        (x, y, w, h) = cv2.boundingRect(c)
        roi = ref[y:y + h, x:x + w]
        roi = cv2.resize(roi, (57, 88))
        digits[i] = roi
    return digits

ocr_path = 'ocr_a_reference.png'
ref, refCnts = read_ocr(ocr_path)
digits = get_digits(ref, refCnts)

Once this Otsu-binarization applied to our image the next step in this process is to extract the contours of light regions in our image using OpenCV’s findContours() and imutils’ grab_contours() functions. We then computer rectangular bounding boxes encapsulating each contour using OpenCV’s boundingRect() and iteratively check the y-axis values of each to exclude any contour bounding boxes not falling in the 85-145 pixel height range given this height range is empirically observed to include all 16-digit credit in our credit card images out of a standardized maximum 190 pixel credit card image height. Contour bounding boxes meeting this condition are then further filtered in order to exclude any boxes with aspect ratios (width / height) falling outside the [2.5, 4.0] range given each of our four 4-digit credit card groupings making up our 16-digit number is further observed to exist in this aspect ratio range. We finally filter this resulting set to ensure that the bounding rectangles of each contour do not fall outside the [40,55] and [10,20] pixel ranges as these ranges are similarly observed to contain all 4-digit bounding boxes in our dataset. A final view of the four extracted bounding boxes for our image is shown in Figure 6.

Figure 6. Extracted 4-digit bounding boxes sample view

Once these 4-digit bounding box regions extracted we then iteratively score how closely each pixel’s pixel neighborhood in our extracted 4-digit sub-image matches our 0-9 digit templates extracted from Figure 3 using the OpenCV’s matchTemplate() function. The maximum score value and location of the corresponding pixel are then compiled using OpenCV’s minMaxLoc() function, allowing us to compute the precise location where a digit match was found in our card image.

        # loop over the digit contours
        for c in digitCnts:
            # compute the bounding box of the individual digit, extract
            # the digit, and resize it to have the same fixed size as
            # the reference OCR-A images
            (x, y, w, h) = cv2.boundingRect(c)
            roi = group[y:y + h, x:x + w]
            roi = cv2.resize(roi, (57, 88))
            # initialize a list of template matching scores
            scores = []
            # loop over the reference digit name and digit ROI
            for (digit, digitROI) in digits.items():
                # apply correlation-based template matching, take the
                # score, and update the scores list
                result = cv2.matchTemplate(roi, digitROI, cv2.TM_CCOEFF)
                (_, score, _, _) = cv2.minMaxLoc(result)
                scores.append(score)

The final result of our digit template matching can be seen below, with each of our sixteen card number digits correctly identified by OpenCV’s template matching functionality:

Figure 7. Final digit recognition output for sample image

ii) Cardholder Character Text recognition

In order to detect cardholder character text in our image we similarly select the portion of each image falling between 150 and 190 pixels and height and 0 and 15 and 200 pixels in width in order to capture the bottom left of the credit card image and apply an identical binary Otsu thresholding transformation to our grayscale image in order to make more salient our white cardholder characters against the darker credit card background. The resulting transformed sub-image is shown below in Figure 8.

Figure 8. Transformed cardholder image

Each selected sub-image is then iteratively passed through the below get_chars() function outputting the detected text in the image using the Google Pytesseract library’s image_to_string() function as shown below. Pytesseract’s underlying OCR function works by passing our image through a pre-trained neural net-based OCR engine and outputs the detected text in the image, and in our case outputs the string ‘CARDHOLDER,’ for this passed image as showin in Figure 9 below.

def get_chars(img, show_image = False, **kwargs):
    if isinstance(img, str):
        img = cv2.imread(img)
    if show_image:
        plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
        plt.title('Test Image'); plt.show()
    image = Image.fromarray(img)
    text = tess.image_to_string(image, **kwargs)
    print("PyTesseract Detected the following text: ", text)
    return text

Figure 9. Detected cardholder text output

Lastly in order to remove any extraneous punctuation that Pytesseract may erroneously detect in our image as the above this text is passed through the below function returning the first line of an uppercase string with all punctuation removed using Python’s regex module. Our final outputted text of ‘CARDHOLDER’ therefore matches this ground truth and leads to a correct text prediction.

def process_str(string):
    string = string.upper()
    string = re.sub(r'[^\w\s]','',string)
    string = string.splitlines()
    if len(string) == 0:
        return string
    return string[0]

Full Dataset Results

Applying the above pipeline across our full 23-image dataset produces 48% and 65% recall in correctly identifying digits and cardholder text in these images. In the case of text character recognition, as the minimum pixel height parameter used to select our cardholder sub-image was shown to have a relatively significant impact on model performance given non-insignificant variance observed between locations of cardholder names in our images, additional experiments exhaustively searching for optimal value over the [140,160] pixel range further revealed 150 pixels as the optimal minimum height to use in identifying cardholder locations.

Additional areas of investigation to further improve the performance of our credit card character recognition system would be to train our own Optical Character Recognition deep learning model fitted on publicly available datasets such as the MNIST and DDI-100 datasets in the case of digit and text recognition respectively in order to produce models capable of improving on this 50-65% recall performance benchmark.

Thanks for reading!