Experimenting With Google Cloud Vision API for Landmark detection In Images
This blog post is a detailed long form version of Codelabs tutorial on the same topic. https://codelabs.developers.google.com/codelabs/cloud-vision-intro/index.html?index=..%2F..index#8
In exploring Google Cloud Platform, I wanted to try out the Vision API to check on the possibilities.
Here’s a link to the demo. https://cloud.google.com/vision/docs/drag-and-drop
This is pretty cool as it detected the landmark [Atlantis hotel], objects [bridge, building] along with labels associated with the image. The full list of features is here:
https://cloud.google.com/vision/docs/features-list
Face detection
Landmark detection
Logo detection
Label detection
Text detection
Document text detection [example, from pdf]
Image properties [dominant colours]
Crop hint detection
Web entities
Let’s try with the below sample public image from Dubai Frame.
In order to run this exercise:
Create a GCP project and Enable the Vision API [you can just search for it from the top bar and all you need to do is enable the API after you have enabled the project along with creating an API key for the project.
Go to Storage from left pane navigation and create a Storage bucket where your files will be hosted in GCP. You also need to make the images public for the API to run [I’m not sure why, trying to figure it out]
Once you enable the Cloud Shell [top right panel], you can create a request.json file containing a link to Google cloud storage file in the bucket.
In the below request.json, I have asked the Vision API to detect the landmark + show 3 labels for the image [in the same request]
{ "requests": [ { "image": { "source": { "gcsImageUri": "gs://visionapitestadil/Landmark/Dubai frame in background.JPG" } }, "features": [ { "type": "LABEL_DETECTION", "maxResults": 3 }, { "maxResults": 1, "type": "LANDMARK_DETECTION" }, ] } ] }
You then go back to the Terminal in Cloud Shell and input:
curl -s -X POST -H "Content-Type: application/json" --data-binary @ request .json https://vision.googleapis.com/v1/images:annotate?key=$
Your API key needs to go at the end.
Once the response comes back, you can check that the location was correctly detected as Dubai Frame along with passing the lat/long for the tourist spot.
Let’s try with a slightly harder example:
In this image, the labels would surely detect a person but let’s try and check if it’s able to pick up Burj Khalifa in the background. You need to go back to the request.JSON file and change the file loc to the new image and rerun the API call in the terminal.
hmmm..the landmark came bak as Ras Al Khor wildlife sanctuary. Not quite what I was expecting. Also, this location is most probably Dubai Canal [but that would be a tough one to detect].
So, what we used right now was the default Vision API without doing any training on the dataset. In order to improve this, I’ll cover AutoML vision in a separate post as we can then add more training datasets with more labels to let the API better detect such images.