Whether you want to build a complex deep learning model for a self-driving car, a live face recognition program, or making your image processing software for your graduate project, you will have to learn OpenCV along the way.
OpenCV is a huge image and video processing library designed to work with many languages such as python, C/C++, Java, and more. It is so popular and powerful that it is the foundation for many of the applications you know that deal with image processing.
Getting started with OpenCV can be challenging, primarily if you rely on its official documentation, which is known for being cumbersome and hard to understand.
Today we will learn how to work with OpenCV, and I’ll do my best to keep it simple.
Now it’s the time to install OpenCV using only one simple command:
pip3 install opencv-python
Importing a simple image
The first thing you will need to learn is importing a simple image and displaying it using OpenCV.
The code is straightforward:
import cv2 # Read the image img = cv2.imread("image.jpg") # Display the image cv2.imshow("Image", img) # Wait for a keypress cv2.waitKey(0) # Clean up cv2.destroyAllWindows()
After reading the code, if you think that we are doing more than just loading the image, you are right. After all, loading an image with OpenCV goes down to only one line of python code:
img = cv2.imread("image.jpg")
So, what about the rest? Well… the first thing we have to do is to import the library. Only then we can read the image using the
imread method and pass the image’s path as the only parameter.
If we stop the program now, we would have loaded the image but done nothing with it, so instead, let’s at least present the image into a new window so the user can see the result. For that, we will use
cv2.imshow and passing the window name and the image as arguments.
Lastly, we tell Python not to exit the program until we press a key or close the window. Then we clean everything up by destroying all windows we opened.
OpenCV is great at dealing not only with images but also with videos. The video streams can be loaded from a video file or directly from a video source such as a webcam.
In the next example, we will load a video from the webcam and present it on a new window:
import cv2 # Load the video stream video = cv2.VideoCapture(0) while(True): # Capture each frame as an image ret, frame = video.read() # show the image on the screen cv2.imshow('frame', frame) # Stop the playback when pressing ‘q’ if cv2.waitKey(1) == ord('q'): Break # Release the video from memory video.release() # Clean up cv2.destroyAllWindows()
The code is self-explanatory, but let’s review it in detail. We use the method
VideoCapture to load the video resource. The first argument defines what input we are reading. Passing a
0, we are referring to the main webcam (if existent). In case you have multiple webcams connect, you can use
2, etc. If your video is captured and saved in a file, you can pass a string with the path to the file.
Next, we start a loop that will only end on user command, but more on that later. What’s important here is what happens inside the loop. The first thing we are doing is asking our
VideoCapture to read a frame of the video. In the case of the camera, it will be a snapshot of the camera at that time, and in the case of a video file, it will be the current video frame.
Each frame we read from a video load in the same way as an image is crucial because it means that we have the entire arsenal of OpenCV functions at our disposal when dealing with videos.
For example, the frame’s output captured with
read can be passed to the method
imshow exactly as we did in the previous example of working with images.
Now the video is playing, but there’s no way out of the
while loop, so let’s build an exit strategy by detecting if the
q key has been pressed. If it is, then we exit the loop for the cleanup activities.
We have an additional step for clean up, which releases the camera or video file because even if we are not reading any more frames, we still have the objects open in memory. We can do that by using the method
release from the
Changing image sizes has a wide range of applications, from optimizing for sizes, zooming or even feeding a neural network to perform some magic. If resizing an image is what you want, OpenCV got you covered.
Let’s now see an example of how to resize an image:
import cv2 img = cv2.imread("image.jpg") scale = 60 width = int(img.shape * scale / 100) height = int(img.shape * scale / 100) dim = (width, height) resized_img = cv2.resize(img, dim, cv2.INTER_AREA) cv2.imshow("Resized_Image", resized_img) cv2.waitKey(0) cv2.destroyAllWindows()
It’s pretty simple, so we added some flavor to it, and instead of simply resizing the image to a specific size, we are scaling down the image by a factor of X (60% for the example). Note that the code will be simpler if we are targeting specific dimensions.
resize expects at least two arguments, the image to be resized and the new dimensions (in x and y as a tuple). Optionally we can pass the third argument to define the interpolation as described on the resize function docs
Switching color spaces
When we read an image with OpenCV we think of colors as channels or the depth of the image array where each channel or dimension corresponds to a color. The most common color space and the one you probably already know is RGB, consisting of 3 channels: red, green, and blue. But other systems can represent a color on an image, like LAB, YCrCb, HLS, and HSV, among others. Each of them with different characteristics worth studying and learning.
A more popular option for setting colors on an image is grayscale, where only one channel defines each pixel. Let’s see an example of how we can transform a color image into a greyscale one.
import cv2 img = cv2.imread("image.jpg") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.imshow("Gray", gray) cv2.waitKey(0) cv2.destroyAllWindows()
The function where all the magic happens is
cvtColor, which expects two arguments, the image and the color space, and returns the new image without altering the original. Fortunately, OpenCV has defined values for each known color space transformation. In our case, we use
COLOR_BGR2GRAY, which transforms BGR to GRAY.
So what’s BGR? It’s the default way that OpenCV loads images.
We often need to save the image result after processing it, maybe after changing its color space, making image transformation, or whatever operation we do on that image.
The following code shows you how to save an image after changing its color to grayscale:
import cv2 img = cv2.imread("image.jpg") gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.imshow("Gray", gray) cv2.imwrite("image_gray.jpg", gray) cv2.waitKey(0) cv2.destroyAllWindows()
You can see that we’ve used a function called
imwrite responsible for saving the gray image in our computer after specifying the path for the newly saved image. The actual image that we need to save is the variable gray in this case.
OpenCV offers tools to smooth an image and help reduce the noise in it. The process is fully automated, and all the complexity of how it works is encapsulated for us on a single, simple to use function.
Smoothing an image intends to improve its quality, though not perfect, in some scenarios, it can be a substantial change and key for using the image on further processes.
Here is an example of how to smooth an image:
import cv2 img = cv2.imread("early_1800.jpg") blur = cv2.blur(img, (5, 5)) cv2.imshow("Blur", blur) cv2.waitKey(0) cv2.destroyAllWindows()
The image looks much better, but how does it work? Through the method
blur on the OpenCV library, which expects the image and the kernel size as arguments. The kernel size being a tuple to reflect the x and y axis. Note that different values of x and y will result in different outputs, so you will have to play around with those values for your images.
The kernel size works by taking a small pixel area (5x5 in our case), taking the average value of those pixels, and replacing the real one (pixel) to get the new little noisy image.
There are other ways to smooth an image using for example,
medianBlur that works similarly.
Drawing On Images
So far, we have been playing with the images without adding anything new to them. It’s time we change that. OpenCV allows us not only to perform transformations and effects to the images but to change them or draw on them.
Drawing on images can be useful if, for example, you are trying to make an object tracking program or face recognition program where you want to draw a square or shape to highlight the identified objects.
Let’s draw a few geometric shapes on images to show how it works.
Drawing a Line
We will try to draw a line on an image using the
import cv2 img = cv2.imread("image.jpg") line = cv2.line(img, (20, 20), (150, 150), (255, 0, 0), 5) cv2.imshow("Line", line) cv2.waitKey(0) cv2.destroyAllWindows()
line function expects the image and four more arguments: the start of the line in (x1, y1), the end of the line in (x2, y2), the line’s color (in BGR for our image), and its thickness in pixels.
Drawing a Rectangle
I think rectangles are the most used shape, at least in the AI world, as they are commonly used to track objects like faces, cars, or traffic signs on the images. They are also super easy to use. Here is an example:
import cv2 as cv img = cv2.imread("image.jpg") rectangle = cv2.rectangle(img, (200, 200), (450, 450), (255, 0, 0), 5) cv2.imshow("Rectangle", rectangle) cv2.waitKey(0) cv2.destroyAllWindows()
rectangle function is very similar to the
line function. It expects the image and four more arguments: the top-left corner of the rectangle in (x1, y1), the bottom-right of it in (x2, y2), the line’s color (in BGR for our image), and its thickness in pixels.
###. Drawing a Circle
The last thing we will draw is a small circle on the image which is sometimes useful when you are tracking a circle object like a ball.
import cv2 img = cv2.imread("image.jpg") circle = cv2.circle(img, (300, 300), 50, (255, 0, 0), 5) cv2.imshow("Circle", circle) cv2.waitKey(0) cv2.destroyAllWindows()
Again all these functions are pretty similar. To render a circle on an image, we use the
circle function that expects the image and four more arguments: the center point of the circle in (x, y), the radius in pixels, the color of it, and the line thickness.
OpenCV is an exciting and powerful library for dealing with images and videos. Its wide range of uses goes from a simple helper library to perform image manipulations to implement state-of-the-art computer vision algorithms.
Today we covered just a small percentage of what this library is capable of. If you enjoy this reading, I recommend checking my article on essential OpenCV functions , which will extend your knowledge of the library a bit more.
Computer vision is a topic that fascinates me, and I’ll write more about OpenCV in the future, so stay tuned.
Thanks for reading!