A New Technology To Help Visually Impaired Shoppers Is Being Tested By IBM in Brazil

Imagine you are a tourist in a foreign country, with a different alphabet, in Japan, say. You see a vending machine, but you are not able to know which products are being sold, because you can't read the names.

Or you're in an airport or a train station in the same country, and you can't read the board to know if your flight is late or is arriving on time. The same happens if you're staying at home, but you are one of many visually impaired customers that have trouble accessing information displayed in public spaces.

So, what do these situations have in common? They deal with a context where a fixed layout is present (the vending machine or the airplane board), but the content changes dynamically and, for one reason or another, can't be accessed the usual way. A solution could be using QR codes to identify the products or use an iPhone app like VizWiz to identify the content, crowdsourcing the recognition to an online community. All of these approaches have their pros and cons: QR codes for instance, require connectivity, while the effectiveness of crowdsourcing depends on how fast the community's response is.

IBM ’s researchers in Brazil are trying a different path: instead of focusing on the content, which could be too time consuming or require too much computational power, not to mention connectivity, they are focusing on the layout. The secret is using some identifying markers, placed around a display, and smooth the content's identification process by using some training templates.

English: Snack Machine (Photo credit: Wikipedia)

"It works like this - researcher Andrea Britto Mattos says - We add four markers to the scene in which we want to recognize image or textual content. Then, we take a picture, in the frontal position, where the four markers and the target container are visible, computing their position and storing the template model".

When an image is provided as input by the user, through a smartphone app, instead of trying to locate the target objects, the algorithm locates the markers, which is easier. Based on their position, it is possible to infer the location of the target containers. Then, once the target objects are located, any recognition method can be applied.

The researchers propose one based on supervised learning, in which the target objects are matched to training templates. "We can assign one unique set of four markers - Britto explains - to some fixed set of objects, reducing therefore the size of the training set. Hence, once the markers are identified, the matching operation can be done on a restrict universe, making the recognition problem computationally easier".

In a test, 700 product images were evaluated this way, with a recognition rate of 88,85%.

An advantage of using this technique is that allows perspective correction. Unlike a QR code, which displays an error message if the image is not taken correctly, the IBM solution prompts the customer to retake the picture and provides verbal instructions on how to effectively capture the image.

"Since we use four markers - Britto Mattos says - it's hard for a blind person to take a picture in which all of them are visible. But can use the markers ID to help the visually‑impaired user to put their cameras in a proper position. For example, if you take a picture in which only the left markers are visible, we are able to get this information from the markers' ID and we can output a message telling the user to move their camera to the right."

Once the picture is taken correctly, back to the vending machine case, visually impaired users could receive an audio playback of the item associated with each number on the keypad, allowing them to confidently identify and purchase the item they would like, without having to actually see it. This method it's just a prototype, at the moment, but it does hold promise for the near future.

"Right now - the researcher says - We are trying to arrange experiments here in Brazil with some institutions that work with systems of blind people because there are some interface details about which we believe that they can give us some very useful inputs."

And it's not only blind people that could benefit from this technique: marketers could use it to optimize displays for all customers who enter their store and to gather information about the content of shelves with fixed layouts.

"In terms of commercial application - Britto Mattos tells me - we could use this for retail monitoring because another problem that we have here is that sometimes, the retail companies, they need to know what's being displayed in their shelves for providing analytics of replenishment and avoiding products that should be out of shelf and this kind of thing". There's still some fine-tuning to be done, however, before turning this scenario into reality. And it's still early to tell whether this method could really have a commercial future or not.

"We don't really have a very fixed chronogram right now, so our next step is to do these experiments with the blind users. We are also arranging with a retail company here in Brazil to do some experiments in a real supermarket. We believe that next year, we will probably have some more interesting results with real scenarios and real users."

More From Forbes

A New Technology To Help Visually Impaired Shoppers Is Being Tested By IBM in Brazil