Jay's Tech Bites Logo

Mastering Hand Gesture Recognition with Python: Build Your Own Touchless Systems

From Basic Gesture Detection to Advanced Product Selection — A Step-by-Step Guide to Unlocking the Power of Hand Gesture Recognition

Jay McBride
  • Jay McBride
  • 11 min read
Image of a hand with motion capture lines and dots on it, a Pything logo to the right and to the left the title: Mater Gesture Control
Image by Jay's Tech Bites via Canva

As touchless technology becomes more integral in our daily lives, Hand Gesture Recognition (HGR) is gaining significant traction across industries. From retail to healthcare, it enables more intuitive and accessible ways to interact with systems through hand movements. In this article, we’ll guide you through the fundamental concepts of HGR, explain the roles of OpenCV and MediaPipe, and help you build two hands-on projects: a basic hand gesture recognizer and a product selection system using Python.


What is Hand Gesture Recognition (HGR)?

Hand Gesture Recognition (HGR) is a computer vision technology that detects hand movements and interprets them as commands to enable touch-free interaction with digital devices. By analyzing the position and motion of fingers and hands in real-time, HGR systems allow users to perform gestures—like swiping, clicking, or grabbing—without physically touching a screen or button.

The basic process of HGR involves three key steps:

  1. Hand Detection: A camera captures an image of the hand, segmenting it from the background using computer vision techniques.
  2. Hand Tracking: Hand landmarks, such as fingertips and joints, are mapped in real time using frameworks like MediaPipe.
  3. Gesture Recognition: Specific hand movements (e.g., a fist or open hand) are recognized based on pre-trained models, enabling gesture-based controls.

Why Use OpenCV and MediaPipe for HGR?

We’re using two primary libraries for building HGR systems:

  • OpenCV (Open Source Computer Vision Library): A powerful tool for image and video processing, OpenCV enables us to capture and process webcam feeds, manipulate images, and visualize results.

  • MediaPipe: Developed by Google, MediaPipe is a specialized library that tracks hand landmarks with high accuracy. It’s built for real-time applications and integrates seamlessly with OpenCV, making it ideal for tasks like hand detection and gesture recognition.

Together, OpenCV and MediaPipe make it possible to build flexible, real-time hand gesture systems that are both highly accurate and efficient.


Expanding Hand Gesture Recognition’s Real-World Use Cases

While HGR is often associated with gaming and retail, its potential applications stretch far beyond these fields:

  • Healthcare: Surgeons can manipulate medical images using gestures, keeping their hands sterile and reducing the risk of contamination in operating rooms.

  • Sign Language Recognition: HGR is being explored as a tool to translate sign language into text or speech, providing new communication methods for the deaf and hard of hearing.

  • Augmented Reality (AR): Hand gestures provide an intuitive interface for interacting with virtual objects, commonly seen in AR gaming and design tools.

  • Automotive Interfaces: Drivers can control in-car systems (like adjusting volume or navigation) using gestures, without taking their hands off the steering wheel—improving both safety and convenience.

  • Smart Home Controls: Homeowners can manage smart home devices, like lights or appliances, with simple gestures, even when their hands are busy or dirty.


Project 1: Basic Hand Gesture Recognition

Let’s start with a simple project: a hand gesture recognition system that detects whether your hand is open or closed.

Step 1: Install Required Libraries

Run the following commands to install the necessary Python libraries:

pip install opencv-python
pip install mediapipe

Section 1: Initialization and Setup

This section initializes the necessary components: MediaPipe for hand detection, the video capture, and the function to recognize hand gestures.

import cv2
import mediapipe as mp

# Initialize MediaPipe for hands detection
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=1)
mp_draw = mp.solutions.drawing_utils

# Function to recognize hand gestures
def recognize_gesture(hand_landmarks):
    thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
    index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
    
    # Corrected logic: thumb tip below index finger tip indicates open hand
    if thumb_tip.y > index_tip.y:
        return "Open Hand"
    else:
        return "Closed Fist"

Section 2: Capturing Video

This section sets up the video capture from the webcam, preparing the system to read the video feed for gesture recognition.

# Capture video from webcam
cap = cv2.VideoCapture(0)

Section 3: Processing and Hand Gesture Recognition

Here’s the logic for processing the video feed and detecting hand gestures. It includes drawing the hand landmarks and displaying the gesture.

while True:
    success, img = cap.read()
    if not success:
        break

    # Convert the image to RGB for MediaPipe
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Process the image and detect hands
    results = hands.process(img_rgb)

    # If hand is detected, draw landmarks and recognize gestures
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)
            gesture = recognize_gesture(hand_landmarks)
            cv2.putText(img, gesture, (10, 70), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

Section 4: Display and Termination

This section handles displaying the video feed with recognized gestures and allowing the user to quit the video feed by pressing ‘q.’

    # Display the video feed
    cv2.imshow("Hand Gesture Recognition", img)
    
    # Press 'q' to quit the video feed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture and close windows
cap.release()
cv2.destroyAllWindows()

Full Version of the Code

Here’s the full version for easy reference and implementation:

import cv2
import mediapipe as mp

# Initialize MediaPipe for hands detection
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(max_num_hands=1)
mp_draw = mp.solutions.drawing_utils

# Function to recognize hand gestures
def recognize_gesture(hand_landmarks):
    thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
    index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
    
    # Corrected logic: thumb tip below index finger tip indicates open hand
    if thumb_tip.y > index_tip.y:
        return "Open Hand"
    else:
        return "Closed Fist"

# Capture video from webcam
cap = cv2.VideoCapture(0)

while True:
    success, img = cap.read()
    if not success:
        break

    # Convert the image to RGB for MediaPipe
    img_rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    
    # Process the image and detect hands
    results = hands.process(img_rgb)

    # If hand is detected, draw landmarks and recognize gestures
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_draw.draw_landmarks(img, hand_landmarks, mp_hands.HAND_CONNECTIONS)
            gesture = recognize_gesture(hand_landmarks)
            cv2.putText(img, gesture, (10, 70), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)
    
    # Display the video feed
    cv2.imshow("Hand Gesture Recognition", img)
    
    # Press 'q' to quit the video feed
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

# Release the video capture and close windows
cap.release()
cv2.destroyAllWindows()

Project 2: Product Selection Using Hand Gestures

For this more advanced project, we’ll build a system that lets users select products by moving their hand over different regions on the screen. When the hand hovers over a product’s area, the system will register the selection.

Section 1: Initialization

In this section, you import libraries and initialize MediaPipe, the video capture, and the fullscreen dimensions.

import cv2
import mediapipe as mp
import numpy as np

# Initialize MediaPipe Hands module
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.8, max_num_hands=1)
mp_draw = mp.solutions.drawing_utils

# Fullscreen dimensions
screen_width = 1920
screen_height = 1080

# Initialize video capture
cap = cv2.VideoCapture(0)
cap.set(3, screen_width)
cap.set(4, screen_height)

Section 2: Defining Product ROIs

Here, you define the regions of interest (ROIs) for the products, including their positions and sizes.

# Define the number of products, dimensions of ROIs, and gap between them
num_products = 5
roi_width = 300
roi_height = 300
gap = (screen_width - num_products * roi_width) // (num_products + 1)

# Define ROIs for products with even spacing
rois = {
    f'Product{i+1}': [
        gap + i * (roi_width + gap),  # x-coordinate
        (screen_height - roi_height - 100),  # y-coordinate
        roi_width,
        roi_height
    ]
    for i in range(num_products)
}

correct_products = {'Product1', 'Product3', 'Product5'}
selected_products = set()
correct_count = 0
show_success_message = False

Section 3: Hand Gesture Recognition and ROI Interaction

This section includes the logic to recognize gestures and determine if a hand is over a product’s ROI.

# Helper function to determine if a hand is over a product's ROI
def is_hand_over_product(hand_x, hand_y, roi):
    x, y, w, h = roi
    return x < hand_x < x + w and y < hand_y < y + h

# Function to recognize hand gestures
def recognize_gesture(hand_landmarks):
    thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
    index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
    
    if thumb_tip.y > index_tip.y:
        return "Open Hand"
    else:
        return "Closed Fist"

Section 4: Main Program Loop

Here’s the main loop to detect hands, recognize gestures, and select products.

while cap.isOpened():
    ret, frame = cap.read()
    frame = cv2.flip(frame, 1)
    frame = cv2.resize(frame, (screen_width, screen_height))

    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = hands.process(rgb)

    hand_touching_correct = False
    hand_touching_incorrect = False

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            mp_draw.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

            gesture = recognize_gesture(hand_landmarks)
            cv2.putText(frame, f"Gesture: {gesture}", (10, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

            for landmark in hand_landmarks.landmark:
                landmark_x, landmark_y = int(landmark.x * screen_width), int(landmark.y * screen_height)
                cv2.circle(frame, (landmark_x, landmark_y), 10, (0, 0, 255), -1)

                for product, roi in rois.items():
                    if is_hand_over_product(landmark_x, landmark_y, roi):
                        if product not in selected_products:
                            selected_products.add(product)
                            if product in correct_products:
                                correct_count += 1

                        if product in correct_products:
                            hand_touching_correct = True
                        else:
                            hand_touching_incorrect = True

Section 5: Drawing Feedback and Final Selection

This section handles the visual feedback (highlighting selected products) and final messages.

    if hand_touching_correct:
        cv2.putText(frame, "Correct selection!", (10, 150), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 255, 0), 3)
    elif hand_touching_incorrect:
        cv2.putText(frame, "Incorrect selection!", (10, 150), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 3)

    for product, roi in rois.items():
        if product in selected_products:
            if product in correct_products:
                frame = draw_transparent_rect(frame, roi[0], roi[1], roi[2], roi[3], (255, 0, 0), alpha=0.5)
            else:
                frame = draw_transparent_rect(frame, roi[0], roi[1], roi[2], roi[3], (0, 0, 255), alpha=0.5)
        cv2.rectangle(frame, (roi[0], roi[1]), (roi[0] + roi[2], roi[1] + roi[3]), (0, 255, 0), 2)
        
        # Draw product name
        text_size = cv2.getTextSize(product, cv2.FONT_HERSHEY_SIMPLEX, 1.5, 2)[0]
        text_x = roi[0] + (roi[2] - text_size[0]) // 2
        text_y = roi[1] + roi[3] + 40
        cv2.putText(frame, product, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 255), 2)

    if correct_count == len(correct_products) and not show_success_message:
        show_success_message = True

    if show_success_message:
        cv2.putText(frame, "Correct Products Have Been Picked", (screen_width // 2 - 400, 150), 
                    cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 4)

    cv2.imshow('Product Selection and Gesture Recognition System', frame)
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Full Version of the Code

Here’s the full version for easy reference and implementation:

import cv2
import mediapipe as mp
import numpy as np

# Initialize MediaPipe Hands module
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.8, max_num_hands=1)
mp_draw = mp.solutions.drawing_utils

# Fullscreen dimensions
screen_width = 1920
screen_height = 1080

# Initialize video capture
cap = cv2.VideoCapture(0)
cap.set(3, screen_width)
cap.set(4, screen_height)

# Define the number of products, dimensions of ROIs, and gap between them
num_products = 5
roi_width = 300
roi_height = 300
gap = (screen_width - num_products * roi_width) // (num_products + 1)

# Define ROIs for products with even spacing
rois = {
    f'Product{i+1}': [
        gap + i * (roi_width + gap),  # x-coordinate
        (screen_height - roi_height - 100),  # Moved 100 pixels from the bottom
        roi_width,
        roi_height
    ]
    for i in range(num_products)
}

# Set of expected products
correct_products = {'Product1', 'Product3', 'Product5'}
selected_products = set()
correct_count = 0
show_success_message = False

# Helper function to determine if a hand is over a product's ROI
def is_hand_over_product(hand_x, hand_y, roi):
    x, y, w, h = roi
    return x < hand_x < x + w and y < hand_y < y + h

# Function to recognize hand gestures
def recognize_gesture(hand_landmarks):
    thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
    index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
    
    # Recognize open hand or closed fist based on thumb and index finger position
    if thumb_tip.y > index_tip.y:
        return "Open Hand"
    else:
        return "Closed Fist"

# Function to draw semi-transparent rectangles
def draw_transparent_rect(frame, x, y, w, h, color, alpha=0.5):
    overlay = frame.copy()
    cv2.rectangle(overlay, (x, y), (x + w, y + h), color, -1)
    return cv2.addWeighted(overlay, alpha, frame, 1 - alpha, 0)

while cap.isOpened():
    ret, frame = cap.read()
    frame = cv2.flip(frame, 1)  # Mirror horizontally
    frame = cv2.resize(frame, (screen_width, screen_height))  # Fit to fullscreen

    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = hands.process(rgb)

    # Initialize flags to detect if the hand is touching any of the products
    hand_touching_correct = False
    hand_touching_incorrect = False

    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            # Draw hand landmarks
            mp_draw.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

            # Recognize the hand gesture
            gesture = recognize_gesture(hand_landmarks)
            cv2.putText(frame, f"Gesture: {gesture}", (10, 100), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2)

            # Check if the hand is over any product ROI
            for landmark in hand_landmarks.landmark:
                landmark_x, landmark_y = int(landmark.x * screen_width), int(landmark.y * screen_height)
                cv2.circle(frame, (landmark_x, landmark_y), 10, (0, 0, 255), -1)

                for product, roi in rois.items():
                    if is_hand_over_product(landmark_x, landmark_y, roi):
                        if product not in selected_products:
                            selected_products.add(product)
                            if product in correct_products:
                                correct_count += 1

                        if product in correct_products:
                            hand_touching_correct = True
                        else:
                            hand_touching_incorrect = True

    # Display text feedback based on hand interaction with products
    if hand_touching_correct:
        cv2.putText(frame, "Correct selection!", (10, 150), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 255, 0), 3)
    elif hand_touching_incorrect:
        cv2.putText(frame, "Incorrect selection!", (10, 150), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 3)

    # Draw ROI boxes and manage selection display with semi-transparency
    for product, roi in rois.items():
        if product in selected_products:
            if product in correct_products:
                # Blue with alpha for correct selection
                frame = draw_transparent_rect(frame, roi[0], roi[1], roi[2], roi[3], (255, 0, 0), alpha=0.5)
            else:
                # Red with alpha for incorrect selection
                frame = draw_transparent_rect(frame, roi[0], roi[1], roi[2], roi[3], (0, 0, 255), alpha=0.5)
        # Draw green rectangle border around the ROI
        cv2.rectangle(frame, (roi[0], roi[1]), (roi[0] + roi[2], roi[1] + roi[3]), (0, 255, 0), 2)

        # Draw product name centered at the bottom of each ROI
        product_text = product
        text_size = cv2.getTextSize(product_text, cv2.FONT_HERSHEY_SIMPLEX, 1.5, 2)[0]
        text_x = roi[0] + (roi[2] - text_size[0]) // 2
        text_y = roi[1] + roi[3] + 40
        cv2.rectangle(frame, (text_x - 10, text_y - 30), (text_x + text_size[0] + 10, text_y + 10), (0, 0, 0), -1)
        cv2.putText(frame, product_text, (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255, 255, 255), 2)

    # Display the count of correctly selected products
    cv2.putText(frame, f"Correct Count: {correct_count}", (10, 200), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    # Display success message if all correct products have been selected
    if correct_count == len(correct_products) and not show_success_message:
        show_success_message = True

    if show_success_message:
        # Display success message lower down on the screen
        cv2.putText(frame, "Correct Products Have Been Picked", (screen_width // 2 - 400, 150), 
                    cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 4)

    cv2.imshow('Product Selection and Gesture Recognition System', frame)
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Conclusion: The Future of Hand Gesture Recognition

Hand Gesture Recognition is rapidly evolving and finding applications across industries, from healthcare to smart home automation. By building systems like these, you can create more intuitive and accessible user interfaces that reduce reliance on traditional input methods.

Now it’s your turn to build! Try out these projects, experiment with different gestures, and share your customizations in the comments below.

Have you tried building an HGR system? Leave a comment and let me know how it went or what challenges you faced!