Few of us seem to ever go very long without watching some form of video. It’s as common in our lives as texting, online shopping and social media. As video technology creeps deeper into our daily lives, it’s converging with online commerce in ways that were inconceivable until very recently.
Advertisers and retailers are always looking for ways to make online shopping (or e-commerce) easier and more relevant for their customers. Likewise, those customers — burdened as they often are with too many choices — frequently look to trusted sources for recommendations to help guide their purchasing decisions.
For instance, if you’re looking for a new skillet, you might look to buy the same one that’s used by the chef on your favorite cooking show. Product placement, of course, has a long history, and that skillet likely didn’t get there by accident. Chances are, your favorite chef is using that particular skillet because of a marketing campaign.
And until recently, if you wanted to buy that skillet, you either had to know what it was — the manufacturer, model, size — or the chef would need to mention the type and where you could buy it. Then, separately you would need to initiate a search and a subsequent transaction with the manufacturer or a retailer. It was a multi-step process.
This is changing. Here at Adeia, we’ve developed a technology to streamline the process of purchasing from a video view. Imagine watching a cooking show, clicking on the skillet in the video itself and getting a pop up that directs you to a purchase page. This is what we’re calling “clickable video,” and there are a couple of things going on in the background to make it happen.
Object Detection for Background Recognition
The first is object detection. Products show up in every frame of videos, movies and TV shows. In a typical cooking show, the chef not only uses a skillet, but also utensils, measuring tools, appliances and more. In the clickable video technology, every frame of that cooking show can now be analyzed using object detection algorithms.
Object detection is an application of computer vision technology, where images and video are scanned and objects are recognized, identified and tagged by a computer. For object detection to work properly, the computer must be trained to recognize the various objects it may encounter.
In one cooking show, for instance, hundreds of different objects appear. Every cup, plate, bowl, spoon, spatula and skillet is a different object, and as humans we recognize them because we have learned over our lifetimes what their shapes and uses are. The computer must be trained to detect and recognize them, too.
Once recognized, items can be identified and tagged by the computer, and then tracked from frame to frame. Object detection and tracking is one of the major fields of computer vision, and it’s becoming increasingly relevant for video, because every recognizable object in each frame can be tagged — including those in the background. The show’s producers and advertisers might be marketing a skillet, but you might be interested in the spatula instead. Or a beautiful flower vase in the background, which can also be identified and tagged.
Tags are represented as rectangular “regions of interest” (ROIs) and are coded into the video in the background, viewable to the computer but not to the person watching. The video can then be processed so that a click on the object (e.g., inside the rectangle) links to an e-commerce site where the item can be purchased.
Halos for Highlighting In-Frame
To become “clickable,” each ROI needs to be indicated to the viewer, without becoming a distraction. As you can see in the frame above, a rectangle outlining every object in a video would quickly become very distracting to the viewer.
Our technology highlights objects with what we call a “halo,” which indicates to viewers that the object is clickable. In the image below, you can see a yellowish “cloud” that appears on or around objects that are clickable — not only the skillet on the countertop, but also a cookbook on the shelf behind the host, bottles of oil and vinegar, mixing bowls and so on.
Clicking on any of those halos will link to an appropriate site for more information, or where the product can be purchased, while clicking outside a halo area will highlight the other halos in the frame.
In one sense, content producers are very interested in applications like clickable video, because it gives them new opportunities to monetize, using referring links and other programs to get bonuses for items purchased through their video. Over time, there will be demand from consumers as well: as they become more accustomed to this technology, they’ll come to expect that every video is clickable.
Even outside an e-commerce context, some clickable objects may simply embed more information in the videos we watch. Perhaps one day, you’ll click on an image of a painting on the wall of a TV show, and while that painting might not be for sale, you may be able to learn more about it and the artist who painted it. Before this, you might have had to Google for any information about the painting, which in many cases may have been difficult to search.
This technology will become both easier to use and more capable as its components improve. As object detection algorithms get better, for instance, more objects will be able to be automatically detected and tagged.