Mastering the Fundamentals of Computer Vision With Python
Computer vision is a field in computer science that uses AI to enable computers to understand and recognize people and objects in images and videos. Computer vision can identify individuals in a photograph by analyzing the image and determining what appears to be a person. Let’s work together on a simple yet detailed project where we’ll provide the computer with my picture to see whether it recognizes me. But first, let’s cover a few important terms you need to understand.
Bounding box: A rectangular border that the computer draws around an identified object in an image or video. To identify a person in a photograph, the computer would draw a bounding box around each person it detects. We will see this in action later.
Intersection over union (IoU): This metric measures how accurately the predicted bounding box aligns with the actual object, known as the ground truth. The ground truth refers to the true object we aim to identify. IoU helps us determine how closely the computer’s prediction matches this true object.
The IoU value ranges from 0 to 1. If the predicted bounding box perfectly overlaps the ground truth, the IoU is 1. Conversely, if there is no overlap, the IoU is 0. For partial overlaps, the IoU is calculated using the formula: IoU = Area of intersection / Area of union of the predicted bounding box and the ground truth.
There are many more terms that are critical to computer vision, but you can familiarize yourself with them here. You can also find the complete code for this project here.
Prerequisites
Before you begin, ensure you have the following:
- Python installed: This article uses Python 3.12.4. You can check your Python version by running the command:
python --version
If you encounter an error, ensure Python is installed correctly. You can download Python from the official website.
- Text editor: This tutorial uses Visual Studio Code (VS Code) as the text editor. You can download VS Code here. However, feel free to use any text editor you choose.
Before diving into our project, it’s essential to set up a clean working environment. Here’s how to do it step by step:
- Create a project folder: First, choose a location for your project folder. For this tutorial, we will create it on the desktop.
On macOS:
- Navigate to your desktop.
- Create a new folder named, for example, “facial-recognition.”
- Open the terminal in this folder by clicking on it and pressing
Ctrl + Cmd + C
.
On Windows:
- Navigate to your desktop.
- Create a new folder, for example, “facial-recognition.”
- Right-click the folder and select “Open in Terminal” or “Open PowerShell window here.”
- Create and activate a virtual environment: This helps keep project dependencies isolated from the global Python installation.
- Create a virtual environment:
In your terminal, run the following command to create a virtual environment namedvenv
inside the project folder:
python -m venv venv
- Activate the virtual environment:
To activate the virtual environment, use the following commands based on your operating system:
source venv/bin/activate //activate virtual environment on mac
.\venv\Scripts\activate //activate virtual environment on windows
Once the virtual environment is activated, your terminal will appear like this.
NOTE: As shown in the image above, I am creating a virtual environment using python3
. This is necessary because I have two different Python installations on my computer, so I need to specify the version I am using.
With the setup complete, we can now create our main.py
file, where we will write our code step by step. Additionally, we’ll place the image in a folder called “images” within the same directory. This image will be used later to identify me from the webcam feed.
We are now going to install a library called OpenCV. This is a crucial library because it significantly reduces the amount of code we need to write. OpenCV provides numerous prebuilt functions that we can use without having to rewrite the underlying logic. Run the command below to install OpenCV:
1 |
pip install opencv-python |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
#import open cv import cv2 import time #here, we load the pre-built Haar Cascade model face_classifier = cv2.CascadeClassifier( cv2.data.haarcascades + "haarcascade_frontalface_default.xml" ) def detect_faces(): # Access the webcam cap = cv2.VideoCapture(0) if not cap.isOpened(): print("Error: Could not open webcam.") return # Allow the webcam some time to initialize time.sleep(2) while True: # Read the frame from the webcam ret, frame = cap.read() # Check if the frame was read successfully if not ret: print("Error: Failed to capture image from webcam") break # Check if the frame is empty if frame is None: print("Error: Received empty frame from webcam") continue try: # Convert the frame to grayscale gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Detect faces in the grayscale frame faces = face_classifier.detectMultiScale(gray, 1.1, 5) # Draw rectangles around the faces for (x, y, w, h) in faces: cv2.rectangle(frame, (x, y), (x + w, y + h), (255, 0, 0), 2) # Display the frame with face detections cv2.imshow("Face Detection", frame) except cv2.error as e: print(f"OpenCV Error: {e}") break # Exit loop if 'q' key is pressed if cv2.waitKey(1) & 0xFF == ord('q'): break # Release the webcam and close all OpenCV windows cap.release() cv2.destroyAllWindows() # Call the function to start the webcam feed detect_faces() |
Copy and paste the code above into your main.py
file. Now, let’s break down the code to understand what’s happening.
On line 1, we import the OpenCV library that we installed earlier. This allows us to access its prebuilt functions and classifier. We also import the time
utility to handle delays in our program.
Next, we load a prebuilt Haar Cascade classifier named haarcascade_frontalface_default.xml
. This classifier is specifically designed for detecting frontal faces in visual input.
After loading the model, we define a function called detect_faces
. In this function, we first access the webcam feed and assign it to a variable called cap
using cap = cv2.VideoCapture(0)
. We then check if the webcam stream is open with if not cap.isOpened():
. If it’s not open, we print an error message and stop execution.
To ensure the webcam initializes properly, we add a delay using time.sleep(2)
before entering a while True
loop. Inside this loop, which continues as long as cap
is open, we continuously read frames from the webcam feed and perform several checks. If frames are read successfully (if ret:
), we proceed. If the frame is not open (if frame is None:
), we print an error message; otherwise, we continue with our execution.
After entering the try
block, our first step is to convert each frame to grayscale. Subsequently, using the previously defined classifier, we detect all faces present in the grayscale image. Following this, a for
loop iterates over each detected face, drawing a blue bounding box around it. These annotated frames displaying the face detections are then shown. If an error occurs during this process, it will be printed for visibility. The loop is designed to exit upon pressing the “q” key. Finally, we release the webcam resources and close all active OpenCV windows before calling detect_faces()
to initiate the face detection process again.
With the code explained, here’s what you can expect as the outcome:
Now that a successful face detection is achieved, the next step involves comparing the detected face with the image stored earlier in the “images” folder. If there’s a match, my name will be displayed alongside the bounding box; otherwise, no additional information will be displayed.
To achieve this, we’ll need to make some modifications to the code. First, install a library called face_recognition
and the models it requires. Next, assign the path of the reference image to a variable and load the image. We will then encode the reference face. In the for
loop, we’ll resize the face region for facial recognition, which, while optional, is recommended. After resizing, we’ll encode the detected face. To handle errors, we’ll include a check to ensure that the encoded_face
array contains at least one item. If the array is not empty, we will compare it with the reference image face encoding. If a match is found, we will display “Raymond” next to the bounding box; otherwise, nothing will be printed.
1 2 3 |
pip install face_recognition pip install git+https://github.com/ageitgey/face_recognition_models pip install setuptools //to solve the bug of face_recognition_models not being recognized as installed |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
import cv2 import os import face_recognition import time # Path to the reference image in the images folder reference_image_path = 'images/profile-headshot.jpg' # Load the reference image reference_image = face_recognition.load_image_file(reference_image_path) reference_encoding = face_recognition.face_encodings(reference_image)[0] # Load the pre-built Haar Cascade model for face detection face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml') def accessWebcam(): # Access the webcam cap = cv2.VideoCapture(0) if not cap.isOpened(): print("Error: Could not open webcam.") return # Allow the webcam some time to initialize time.sleep(2) while True: # Read the frame from the webcam ret, frame = cap.read() # Check if the frame was read successfully if not ret: print("Error: Failed to capture image from webcam") break # Convert the frame to grayscale for face detection gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Detect faces in the grayscale frame faces = face_classifier.detectMultiScale(gray, 1.1, 5) # Process each detected face for (x, y, w, h) in faces: # Extract the face region from the frame face_region = frame[y:y+h, x:x+w] # Resize the face region for face recognition (optional but recommended) face_region_resized = cv2.resize(face_region, (128, 128)) # Encode the detected face face_encoding = face_recognition.face_encodings(face_region_resized) # Compare the detected face encoding with the reference face encoding if len(face_encoding) > 0: match = face_recognition.compare_faces([reference_encoding], face_encoding[0], tolerance=0.5) # If match is found, display the name on the bounding box if match[0]: cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2) cv2.putText(frame, "Raymond", (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (36,255,12), 2) |