Trying the Microsoft Computer Vision API

I\'ve been working on a project recently for which I hope to leverage some 3rd party APIs to do some basic image recognition tasks.

Image recognition is difficult. Although I know the methods and techniques to build start of the art computer vision systems, such systems always take a fair amount of fine tuning and testing. Further, the best approaches (I\'m speaking of deep learning, of course) require significant computational resources and time. I need something relatively scrappy to bootstrap this problem.

Ideally, I would have preferred to leverage transfer learning directly. My use case is a bit specialized, but I should be able to use this powerful technique to quite figuratively stand on the shoulders of giants. Transfer learning is the approach of taking a neural network that has been trained by someone else, replace it\'s input and output layer with your own, and beginning your own training in this "parboiled" network.

I would like to live in a world where there\'s a marketplace of open source and potentially commercial starter networks one could easily fork to begin their own vision projects. We\'re not quite there yet, and I actually need something that\'s just good enough for my scrapy new project, so I decided to explore what API services could do for me.

I want to find a 3rd party API that is just good enough for me to not need to create my own. I have been aware of Microsoft Cognitive Services for some time. I helped deploy their speech APIs and text analytics APIs to clients of mine on separate occasions. The results were just good enough that I thought I\'d start with a test of Microsoft Cognitive Services Computer Vision API to see if it could meet my needs.

I spent a few minutes wandering about the house and taking some photos to use as test cases. You can see my test photos via the link below.

https://photos.google.com/share/AF1QipMxbdarrZ8N-Axst2lnYLur5TWHFYy2ZN6g5-jFNluBMlJCUFRQKf2Ed5ZLy4jUng?key=clZ5TFJWN1pMeW5WOEYwcVlmZ2FWTG1FMTdkYWZ3

In case your curious, these were all taken with my brand new Google Pixel phone, although any poor quality is likely a reflection of my lack of photography skills rather than any hardware limitation.

Conclusion

This quick trial leaves me a little lukewarm to the current state of the service. On the plus side, it makes very few boneheaded errors, the API seems to be more likely to return no result than a spurious one. That\'s good. I\'m willing to overlook a yellow squash being labelled as a banana, although there are more than a few wierd ones in my photo set.

Overall, I think this Cognitive Services API can be useful to me in situations where I\'m going to make suggestions to a user but always require their (perhaps single click) approval and the opportunity to edit the results. Perhaps by the time I get back around to building my own system, the API will have improved without breaking :)

You can review the details of my API calls in the rest of this post.

Some setup code below

%matplotlib inline
import matplotlib.pyplot as plt
import os
import json
import numpy as np
import math
import httplib, urllib, base64
from IPython.core.display import HTML
photos = os.listdir('./photos')
# This is how I am storing my keys for my Cognitive Services tests, just set your value for `key`
f = open('azure.keys', 'r')
keys = json.loads(f.read())
f.close()

Calling the API

The function below is pretty close to the boilerplate code provided by Microsoft here. The API offers a few types of feature extraction options. I decided to get the category, tags, and description. I threw in \'adult\' out of curiosity. I expect all my photos to be regarded in no way as adult content. Trust, but verify.

def get_vision_api_response(url):
    key = keys['computer-vision']
    headers = {'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': key}

    params = urllib.urlencode({
        # Request parameters
        'visualFeatures': 'categories,tags,description,adult',
        'language': 'en',
    })
    body = json.dumps({'url': url})
    try:
        conn = httplib.HTTPSConnection('api.projectoxford.ai')
        conn.request("POST", "/vision/v1.0/analyze?%s" % params, body, headers)
        response = conn.getresponse()
        data = response.read()
        o = json.loads(data)
        conn.close()
    except Exception as e:
        print("[Errno {0}] {1}".format(e.errno, e.strerror))
        return None
    return o
results = {}
bucket = 'https://s3.amazonaws.com/microsoft-vision-test/'
for photo in photos:
    url = bucket + photo
    if not(results.has_key(photo)):
        o = get_vision_api_response(url)
        results[photo] = o
rows = []
for photo in photos:
    o = results[photo]
    categories = []
    if o.has_key('categories'):
        categories = map(lambda x: x['name'], o['categories'])
    adultScore = 'N/A'
    racyScore = 'N/A'
    if o.has_key('adult'):
        adultScore = str(math.floor(o['adult']['adultScore'] * 1000) / 10) + '%'
        racyScore = str(math.floor(o['adult']['racyScore'] * 1000) / 10) + '%'
    row = '<div>'
    row += '<p><img src="' + bucket + 'thumb_' + photo + '" /></p>'
    if (o.has_key('description')):
        tags = []
        if o['description']['captions'][0].has_key('tags'):
            tags = map(lambda x: x['name'], o['description']['captions'][0]['tags'])
        tags.extend(map(lambda x: x['name'], o['tags']))
        row += '<p><b>Caption:</b> ' + o['description']['captions'][0]['text'] + '</p>'
        row += '<p><b>Tags:</b> ' + ', '.join(tags) + '</p>'
        row += '<p><b>Categories:</b> ' + ', '.join(categories) + '</p>'
        row += '<p><b>Adult Score:</b> ' + adultScore + ', <b>Racy Score:</b> ' + racyScore + '</p>'
        row += '<br/><br/><br/>'
    else:
        row += '<p>N/A</p>'
        row += '<br/><br/><br/>'
    row += '</div>'
    rows.append(row)

Results

HTML(reduce(lambda a, b: a+b, rows))
Out[245]:

Caption: a building with a wooden floor

Tags: floor, indoor, building, wall, wood

Categories: abstract_, others_

Adult Score: 0.8%, Racy Score: 1.1%




N/A

Caption: a brown leather couch sitting on a table

Tags: indoor, wall, floor, living, sofa, seat, leather

Categories: abstract_, indoor_room, others_

Adult Score: 1.6%, Racy Score: 3.3%




Caption: a parrot sitting on a table

Tags: indoor, parrot, bird

Categories: abstract_, indoor_room

Adult Score: 1.1%, Racy Score: 0.9%




Caption: a parrot sitting on a table

Tags: parrot, bird, sitting, animal

Categories: abstract_, others_

Adult Score: 3.9%, Racy Score: 2.0%




Caption: a close up of a laptop

Tags: indoor

Categories: others_, outdoor_, text_sign

Adult Score: 2.6%, Racy Score: 1.6%




N/A

Caption: a bouquet of flowers in a vase on a table

Tags: wall, flower, plant, bouquet, indoor, white, different, colored, several

Categories: plant_flower

Adult Score: 1.8%, Racy Score: 1.8%




Caption: a blender sitting on a counter

Tags:

Categories:

Adult Score: 1.7%, Racy Score: 6.1%




N/A

Caption: a close up of a microwave

Tags: indoor, oven

Categories: abstract_shape, others_

Adult Score: 2.6%, Racy Score: 5.3%




N/A

Caption: person sitting on a table next to a palm tree

Tags: plant

Categories: others_, outdoor_

Adult Score: 1.0%, Racy Score: 1.8%




Caption: a bag of food

Tags:

Categories: others_, outdoor_

Adult Score: 2.6%, Racy Score: 3.8%




Caption: a parking meter sitting next to a fire hydrant

Tags: ground, outdoor, meter, device

Categories: abstract_, others_, outdoor_

Adult Score: 2.1%, Racy Score: 10.5%




N/A

Caption: a suitcase sitting on a bench

Tags:

Categories: abstract_, abstract_rect, others_, outdoor_

Adult Score: 0.6%, Racy Score: 0.8%




Caption: a toothbrush is sitting next to a sink

Tags: indoor, sink

Categories: others_, outdoor_

Adult Score: 2.1%, Racy Score: 10.4%




Caption: a close up of a sink

Tags: indoor, sink

Categories: others_, outdoor_

Adult Score: 3.3%, Racy Score: 3.7%




Caption: a green plant

Tags: wall, plant

Categories:

Adult Score: 1.0%, Racy Score: 1.6%




Caption: a cat sitting on a table

Tags: indoor

Categories: others_

Adult Score: 1.8%, Racy Score: 2.4%




Caption: a bed with a book

Tags: clothing, indoor, underpants

Categories: others_

Adult Score: 4.8%, Racy Score: 11.9%




Caption: a white toilet sitting next to a sink

Tags: floor, indoor, toilet, sink, tiled, tile, bathroom

Categories: abstract_, abstract_shape, others_, outdoor_

Adult Score: 1.7%, Racy Score: 1.5%




N/A

Caption: an old photo of a large mirror

Tags: gallery, room, picture frame

Categories: abstract_rect, others_

Adult Score: 4.4%, Racy Score: 7.9%




Caption: a close up of a box

Tags: indoor, picture frame, case, box

Categories: abstract_rect

Adult Score: 5.4%, Racy Score: 4.0%




Caption: a clock on the wall

Tags: indoor

Categories: others_

Adult Score: 0.6%, Racy Score: 0.9%




Caption: a banana sitting on a wooden cutting board

Tags: wooden, fruit, wood, half, sliced

Categories: abstract_, others_

Adult Score: 1.6%, Racy Score: 1.9%




Adult Content Detection

This is a situation in which most people have a low tollerance for false positives. Strangely, for a few photos, the system was unable to provide any score.

While this is by no means a good test of the ability for this service to detect adult content, the system produces a correct result by rating all this content very low in likelihood of being adult or racy. Interestingly, the highest scoring photo is a picture of a t-shirt on the floor. I don\'t personally find it suggestive, but hey, to each their own.