by Paul Melcher

There are two elements of a photograph: the image itself and its associated information. Without a name, a credit, a description or keywords, the life of an image is greatly endangered, if not doomed. In our text-based search world, an image without its information loses the ability to be found, forever. In order to prevent such a fate, processes are in place to add relevant metadata to every single image. And while adding information was already a tedious task with chemical film, the process has become nearly impossible with digital files. The reason? Volume.

It is estimated that 3.934 trillion digital photos will be stored on hard drives and other storage systems this year. With all the good will (and time) in the world, keywording each one of them is an impossible task: if you only spent one second to keyword each one, it would still take 125,000 years.


Enter image recognition. Because photos are now digital (a file of binary data), it is possible for computers to process them and extract information. Image recognition does this in two ways: by searching for patterns and by learning what each pattern corresponds to. Processing huge amount of data is what machines excel at, while learning is where artificial intelligence steps in. Similar to how we teach our children to identify new objects, image recognition programs learn by being shown examples. But unlike humans, who can recognize a chair in many other new configurations after seeing just one example, machines need to see many examples (anywhere from 300 to 1,000). Once a machine or program has been taught, it can proceed in recognizing the new object in practically any situation.

It is now possible to automatically keyword very large amounts of images in a minimum amount of time. 10,000 images, for example, can be easily tagged with multiple keywords in about two hours. Furthermore, it can be done at a fraction of the current cost. Most image recognition vendors would charge a minimal $15 to process 10,000 images, or $0.0015 cents per image.


But keywording (or auto-tagging) is only the beginning of what image recognition can do. Since image recognition programs can recognize anything, they are a powerful tool for moderation. The software can easily flag nudity or violence in order to block them from public view. Or, for a stock photo agency, image recognition can pinpoint a face or monument in order to check on the existence of a related release.

Image recognition companies like Imagga also provide content-aware cropping. This service automatically identifies the main subject of an image, regardless of its position in the frame, in order to create a perfect crop. Others, like EyeEm, can determine the aesthetic quality of images, even allowing a personalized ranking based on any specific visual requirement.

Marketing companies also use image recognition to identify the content of the visual conversations being held on social media. By analyzing the millions of images being shared on Facebook, Instagram, Twitter and others, these companies help brands pinpoint the perfect targets of their next campaigns.

But image recognition has even broader applications. The technology helps doctors analyze medical imaging and return the appropriate diagnostic. It helps farmers survey thousands of acres of crops via drones and automatically spot the areas that need more water or those affected by diseases. It helps self-driving cars know the difference between a lamppost and a pedestrian about to cross the street. Image recognition can also help your fridge know exactly what is inside its door and alert you if your fruits are about to turn bad.


Image Recognition company Imagga offers an API that can be plugged into any application.


Because our world is increasingly visual (by 2020, there will be 45 billion cameras), it is estimated that processing all this visual data via image recognition will amount to an annual $38.92 billion industry by 2021. This financial projection is why the biggest companies in the world have arms entirely dedicated to it. IBM, Microsoft, Amazon, and Google all offer access to their technology via API. Very specialized companies like Imagga or Clarifai also offer dedicated services, and unlike the Fortune 500s, they offer more personalized and custom-made solutions with the same quality of results. Still others like Adobe, Shutterstock, or Getty prefer to keep it for themselves. They use image recognition to process the thousands of new images they receive daily, as well as for re-indexing their archives.

While integration requires a rather advanced knowledge of coding, more and more general solutions offer image recognition as part of their offering. DAM providers, for example, are currently deploying image recognition in their offerings and companies like Imagga offer a free Lightroom plug-in for smaller collections or individual photographers.


Pretty soon all cameras, including our phones, will have image recognition and will automatically add keywords as we take pictures. © Steve Jurvetson


Today, image recognition is a very powerful and cost effective tool to quickly categorize very large amounts of images, but it is still in its infancy. While very accurate, it doesn’t offer 100% perfect results (but then again, humans don’t either). Variations in lighting or perspective or image imperfections can throw off its parameters and return inaccurate results. However, the technology is progressing very fast—so fast that the real limitation to its growth is processing power.

We can expect all our cameras to have built-in auto tagging features in the near future. Each image taken will automatically contain all the appropriate keywords, up to the name of the person in the frame, without any human actions involved in the process.

Another area of future progress will be context understanding. While image recognition programs are experts in recognizing content, they currently have a very hard time in making sense of the context. For example, a photo of a cat seen in the reflection of a mirror will return “cat” but not “mirror.” With context understanding, image recognition will be able to go beyond just describing what it sees to revealing information about how each object interacts with each other, adding a level of understanding that will go beyond human perception.

melcher_paulPaul Melcher is managing director at MelcherSystem. He is an entrepreneur, advisor, and consultant with a rich background in visual tech, content licensing, and technology, and more than 20 years of experience in developing world-renowned photo-based companies. In his spare time, he writes about photography on Thoughts of a Bohemian and visual tech on Kaptur.

Real Time Analytics