Smart Video Analytics: Motion Detection for Security & Monitoring

Link: https://github.com/philippe-heitzmann/Video_Background_Subtraction_OpenCV

Introduction

Motion detection in video footage is essential for many smart monitoring applications like security systems, parking occupancy tracking, and retail analytics. This project evaluated different motion detection algorithms to find the most reliable solution for real-world deployment.

We used the publicly available CDTNet-14 dataset to test various motion detection algorithms across different lighting conditions, including challenging scenarios with shadows and variable lighting.

CDTNet-14 Dataset

The CDTNet-14 dataset was developed for the 2014 Change Detection Workshop and contains 53 videos with ~140,000 frames covering various indoor and outdoor monitoring scenarios. The dataset categorizes videos by difficulty level:

Baseline: Videos with minimal background movement and good lighting
Shadows: Videos with moving objects that cast shadows, creating detection challenges

We tested our algorithms on highway traffic and office foot traffic videos from both categories to compare performance across different environments.

Figure 1. CDTNet-14 Baseline highway.avi and Shadows cubicle.avi videos and publicly available highway traffic recording used in analysis

Methodology

We implemented two primary motion detection algorithms using OpenCV in C++:

k-Nearest Neighbors (kNN) - A statistical approach that learns from recent video frames
MOG2 Adaptive Gaussian Mixture - A more sophisticated model that adapts to changing backgrounds

The C++ implementation takes a video file and algorithm type as input:

#include <iostream>
#include <sstream>
#include <opencv2/imgcodecs.hpp>
#include <opencv2/imgproc.hpp>
#include <opencv2/videoio.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/video.hpp>

using namespace cv;
using namespace std;

const char* params
= "{ help h         |           | Print usage }"
"{ input          | highway_traffic.mp4 | Path to a video or a sequence of image }"
"{ algo           | KNN      | Background subtraction method (KNN, MOG2) }";

The main function creates a motion detection model and processes each video frame:

int main(int argc, char* argv[])
{
    CommandLineParser parser(argc, argv, params);
    parser.about("This program shows how to use background subtraction methods provided by "
        " OpenCV. You can process both videos and images.\n");
    if (parser.has("help"))
    {
        //print help information
        parser.printMessage();
    }

    //! [create]
    //create Background Subtractor objects
    Ptr<BackgroundSubtractor> pBackSub;
    if (parser.get<String>("algo") == "MOG2")
        pBackSub = createBackgroundSubtractorMOG2();
    else
        pBackSub = createBackgroundSubtractorKNN();
    //! [create]

    //! [capture]
    VideoCapture capture(samples::findFile(parser.get<String>("input")));
    if (!capture.isOpened()) {
        //error in opening the video input
        cerr << "Unable to open: " << parser.get<String>("input") << endl;
        return 0;
    }
    //! [capture]

    Mat frame, fgMask;
    while (true) {
        capture >> frame;
        if (frame.empty())
            break;

        //! [apply]
        //update the background model
        pBackSub->apply(frame, fgMask);
        //! [apply]

        //! [display_frame_number]
        //get the frame number and write it on the current frame
        rectangle(frame, cv::Point(10, 2), cv::Point(100, 20),
            cv::Scalar(255, 255, 255), -1);
        stringstream ss;
        ss << capture.get(CAP_PROP_POS_FRAMES);
        string frameNumberString = ss.str();
        putText(frame, frameNumberString.c_str(), cv::Point(15, 15),
            FONT_HERSHEY_SIMPLEX, 0.5, cv::Scalar(0, 0, 0));
        //! [display_frame_number]

        //! [show]
        //show the current frame and the fg masks
        imshow("Frame", frame);
        imshow("FG Mask", fgMask);
        //! [show]

        //get the input from the keyboard
        int keyboard = waitKey(30);
        if (keyboard == 'q' || keyboard == 27)
            break;
    }

    return 0;
}

Running this script produces two video outputs: the original footage and a motion detection mask showing moving objects in white against a black background.

Figure 2. Sample KNN-based Background Subtractor output using OpenCV

Figure 3. Sample MOG2-based Background Subtractor output using OpenCV

Both algorithms successfully detect moving cars, but they also capture shadows, which could cause false alarms in security applications. To find better solutions for shadow-heavy environments, we evaluated eight additional algorithms from OpenCV’s background subtraction module:

MOG
GMG
LSBP-vanilla
LSBP-speed
LSBP-quality
LSBP-comp
GSOC
GSOC-comp

We used a Python evaluation script to test all algorithms on both Baseline and Shadows datasets:

def main():
    #parse command line arguments used later in our args variable
    parser = argparse.ArgumentParser(description='Evaluate all background subtractors using Change Detection 2014 dataset')
    parser.add_argument('--dataset_path', help='Path to the directory with dataset. It may contain multiple inner directories. It will be scanned recursively.', required=True)
    parser.add_argument('--algorithm', help='Test particular algorithm instead of all.')

    args = parser.parse_args()
    #get groundtruth and input data dirs
    dataset_dirs = find_relevant_dirs(args.dataset_path)
    assert len(dataset_dirs) > 0, ("Passed directory must contain at least one sequence from the Change Detection dataset. There is no relevant directories in %s. Check that this directory is correct." % (args.dataset_path))
    if args.algorithm is not None:
        global ALGORITHMS_TO_EVALUATE
        #defining OpenCV background subtraction algorithm to evaluate 
        ALGORITHMS_TO_EVALUATE = [algo_tuple for algo_tuple in ALGORITHMS_TO_EVALUATE if algo_tuple[1].lower() == args.algorithm.lower()]
    summary = {}
    #calculating pixel-level recall, precision and f1-score performance metrics of our model vs groundtruth 
    for seq in dataset_dirs:
        evaluate_on_sequence(seq, summary)
    
    #compiling performance metrics of our models 
    for category in summary:
        for algo_name in summary[category]:
            summary[category][algo_name] = np.mean(summary[category][algo_name], axis=0)
    #printing performance summaries of our models 
    for category in summary:
        print('=== SUMMARY for %s (Precision, Recall, F1, Accuracy) ===' % category)
        for algo_name in summary[category]:
            print('%05s: %.3f %.3f %.3f %.3f' % ((algo_name,) + tuple(summary[category][algo_name])))

if __name__ == '__main__':
    main()

The evaluation process:

Parse command line arguments for dataset path and algorithm selection
Load ground truth data and input videos
Create algorithm objects as specified
Calculate accuracy metrics (recall, precision, F1-score) by comparing predictions to ground truth
Compile performance summaries for all models across different categories

Results

Our evaluation revealed significant performance differences between ideal and challenging lighting conditions. The best-performing GSOC algorithm achieved:

96% recall and 99% precision in ideal lighting conditions (Baseline dataset)
82% recall and 52% precision in shadow-heavy environments (Shadows dataset)

This means in shadow environments, the system would miss ~20% of actual threats and generate false alarms for ~50% of detections - unacceptable for security applications requiring high reliability.

Visual comparison of different algorithms on a challenging shadow frame:

Figure 5. Shadows input frame #2450 image

Figure 6. Ground truth mask on Shadows input frame #2450

Figure 7. GSOC prediction mask on Shadows input frame #2450

Figure 8. GSOC-comp prediction mask on Shadows input frame #2450

Figure 9. GMG prediction mask on Shadows input frame #2450

Figure 10. MOG prediction mask on Shadows input frame #2450

Figure 11. LSBP-vanilla prediction mask on Shadows input frame #2450

Figure 12. LSBP-comp prediction mask on Shadows input frame #2450

Figure 13. LSBP-quality prediction mask on Shadows input frame #2450

Figure 14. LSBP-speed prediction mask on Shadows input frame #2450

Conclusion

This project demonstrated that while motion detection algorithms can achieve excellent performance in ideal conditions, they struggle significantly in challenging lighting environments with shadows. The GSOC algorithm performed best overall but still showed concerning reliability issues in shadow-heavy scenarios.

For production deployment, these results suggest that:

High-security applications requiring near-perfect accuracy should avoid shadow-heavy environments or implement additional preprocessing
General monitoring applications in well-lit environments can achieve reliable performance with the GSOC algorithm
Future development should focus on shadow-resistant algorithms or multi-sensor fusion approaches

Thanks for reading!