Google Cloud Vision | The Blog Pros

December 30, 2021

New Age DAM APIs to Simplify Your Media Workflows

High-velocity, online businesses produce multiple digital assets like banners, images, videos, PDFs, etc., to promote their businesses online. For such businesses, Digital Asset Management (DAM) solutions are essential. These solutions help centrally store, manage, organize, search and track digital assets. Having a central repository of assets helps in the faster execution of campaigns and improves cross-functional collaboration.

But, for an organization operating at scale and dealing with millions of digital assets flowing in from multiple sources, certain parts of your asset management workflow cannot be done manually using a UI. For example, how do you upload thousands of images in the correct folders every day? Or integrate an internal CMS to add product SKU IDs as a tag on the product image in the DAM?

This is why leading DAM solutions come with APIs to allow you to integrate them into your existing workflows and get the benefits of a DAM system at scale. Let’s first understand what an API is before getting to some common examples and use cases you can solve with them.

What is an API?

API stands for Application Programming Interface. It allows two software pieces or applications to communicate using a common definition.

An analogy in the physical world is when you order a dish in a restaurant, the chef understands what you ordered and prepares it. Here, the menu with the dish’s name serves as the common language for you (one of the parties) to communicate with the chef (the other party).

Let’s look at an example of an API in an e-commerce application. To check the delivery time to your location, you enter your pin code, and in a second or two, the time appears on your mobile screen. Here, your app (one of the software) is talking to the server (the other software), asking to give delivery times for a pin code (the definition or the common language between the two software). The delivery time that is returned by the server is called an API’s “response.”

What is a DAM API?

Continuing our explanation above, DAM APIs allow you to communicate with the Digital Asset Management system using a defined language. These APIs allow you to use all or most of the features of a DAM system, but instead of doing it via the user interface in a browser, you would be able to use them from a software program.

For example, a DAM’s user interface lets you drag and drop an image to upload it. However, the same DAM system could offer an API to upload images from your user’s Android app. Here the Android app is one of the software, the DAM system itself is the other software, and the upload API communicates what and how to upload to the DAM system. Once completed, the API responds with information about the uploaded image.

What’s ImageKit? What’s its DAM offering?

ImageKit is a leading Digital Asset Management solution. It comes with standard DAM features like storage, management, AI tagging, custom metadata, and advanced search. It also has optimized asset delivery integrated into the system.

While ImageKit’s DAM system comes with a user-friendly UI, like all leading players in this space, it also offers media APIs to use all of its features programmatically.

Use cases you can solve with DAM APIs

Before jumping to the APIs, here are some ways to use a DAM system’s APIs.

If you have an app or website where users can upload images or videos, or other content,you can use the DAM API to upload them directly to the DAM system.
Suppose you build a product that offers integrated media storage to its users. Instead of exposing your users to the DAM system directly, you would want to integrate it into your product natively (or white-label it). You can use a combination of DAM upload APIs, list and search APIs, and get image detail APIs to build this asset library for the users of your product.
Suppose your team uses an existing CMS or any other system to manage internal data. You can use the DAM as the underlying file storage and use its advanced management and search features via its APIs. Your team never has to leave their existing CMS while still leveraging all the features of the DAM system.
If you require and your DAM solution supports it, you can use the real-time image and video optimization APIs to deliver the assets to your users or on different platforms. ImageKit is one such DAM that supports file delivery for any asset upload to its media library.

Common Digital Asset Management APIs

Let’s look at some of the standard APIs that most DAM systems offer. For demonstration and examples, we would be using ImageKit’s DAM APIs.

1. API for uploading a file

This is the most basic API of all — before you use the DAM system, you need to upload files to it.

ImageKit’s Upload API allows you to upload an actual file from your file system or a web URL. You can use this API on a front-end application, like a mobile app, or a back-end application, like your application server. Here is an example of uploading the image from a back-end application.

curl -X POST "https://upload.imagekit.io/api/v1/files/upload" \

-u your_private_api_key: \

-F 'file=@/Users/username/Desktop/my_file_name.jpg;type=image/jpg' \

-F 'fileName=my_file_name.jpg'

You would get some information about the uploaded file in the API response. For example, you would usually get a unique ID for your file, which would be super valuable for subsequent APIs, along with other information like the file’s format, size, upload time, etc.

{
  "fileId": "598821f949c0a938d57563bd",
  "name": "my_file_name.jpg",
  "url": "https://ik.imagekit.io/your_imagekit_id/images/products/file1.jpg",
  "height": 300,
  "width": 200,
  "size": 83622,
  // other information...
}

2. Moving, copying, and deleting a file

After uploading a file to the DAM system, you might want to remove it or move it around to different folders. This also can be done programmatically via APIs.

For example, in ImageKit, to move a file from one folder to the other, you need to give the file’s path (sourceFilePath) and the destination folder path (destinationPath) in the API.

curl -X POST "https://api.imagekit.io/v1/files/move" \
-H 'Content-Type: application/json' \    
-u your_private_key: -d '    
{
  "sourceFilePath" : "/path/to/file.jpg",
  "destinationPath" : "/folder/to/move/into/"
}'

3. Updating a file with manual and AI Tags

File nomenclature and creating the correct folder structure are often insufficient to organize and find content in a growing repository of digital assets.

Associating custom metadata or tags with an asset helps build another layer of organization for your content. For example, you could assign values to fields such as “Product Category” (Shoe, Shirt, Jeans, etc.), “Platform” (Facebook, Instagram, etc.), “Sale Name” (Thanksgiving, Black Friday, etc.) to the files in your DAM system, to build a more business-specific organization.

Through services like Google Cloud Vision, taking advantage of AI can help speed up asset tagging workflows and reduce errors. In addition, good DAM systems do provide APIs to associate tags with your assets.

For example, ImageKit allows you to add AI-inferred tags, using Google Cloud Vision, to your asset in the code below.

curl -X PATCH "https://api.imagekit.io/v1/files//details" \

  -H 'Content-Type: application/json' \
  -u your_private_key: -d'

  {
    "extensions": [
    {
      "name": "google-auto-tagging",
      "maxTags": 5,
      "minConfidence": 95
    }
  ]
}

While the above API adds tags to an existing file, you can also do this when the file is first uploaded.

4. Searching for a file using search APIs

The most significant advantage of using a DAM is searching for the exact asset amongst thousands of them. Therefore, a good search API is necessary for any DAM system. It should allow searching on all the possible parameters associated with an asset, including custom tags and metadata that we add to create a business-specific organization for ourselves.

ImageKit provides a very flexible search API that lets you construct complex search queries to pinpoint the exact resource you need. The example below finds out all assets you created more than seven days ago with a size of more than 2MB.

curl -X GET "https://api.imagekit.io/v1/files" \

-G --data-urlencode "searchQuery=createdAt >= \"7d\" AND size > \"2mb\"" \

-u your_private_api_key:

5. The image and video delivery API

Once your team starts managing and collaborating on the assets on the DAM, the next obvious step would be to be able to use these assets on the web, share them via their URLs, use them on your website, apps, emails, and so on.

Leading DAM solutions like ImageKit provide ready-to-use URLs for any file stored with them. ImageKit API also has in-built automatic optimizations and real-time manipulations for images and videos that ensure optimized asset delivery every time.

https://ik.imagekit.io/ikmedia/default-image.jpg?tr=w-200,h-200

The above example resizes the original image to a 200×200 square thumbnail while compressing it and optimizing its format. But, of course, you can do the same using a similar URL-based API for videos too.Read more about ImageKit’s media APIs

Conclusion

Apart from the basic APIs explained above, all DAM solutions offer several other APIs that allow you to manage folders, get file details, control the shareability of assets, and more. The possibilities are endless for integrating these APIs to simplify and automate your existing workflows. Using a DAM solution like ImageKit, with its extensive media management APIs given here, will bring your marketing, creative, and technology teams on the same page and help them execute campaigns faster. Sign up today on ImageKit’s forever free DAM plan and start optimizing your media workflows.

New Age DAM APIs to Simplify Your Media Workflows originally published on CSS-Tricks

July 3, 2019

Google Cloud Vision With Spring Boot

In this post, we will take a look at how we can use Google Cloud Vision from a Spring Boot application. With Google Cloud Vision it is possible to derive all kinds of things from images, like labels, face and text recognition, etc. As a bonus, some examples with Python are provided too.

1. Introduction

A good point to start experimenting with Cloud Vision is the Cloud Vision API Documentation. The documentation is comprehensive and the examples actually do work ;-) . In the next paragraphs, we will explore some of the image processing capabilities. We will do the following:

January 9, 2019

Powerful Image Analysis With Google Cloud Vision And Python

Bartosz Biskupski

2019-01-09T13:45:32+01:00 2019-01-10T04:13:47+00:00

Quite recently, I’ve built a web app to manage user’s personal expenses. Its main features are to scan shopping receipts and extract data for further processing. Google Vision API turned out to be a great tool to get a text from a photo. In this article, I will guide you through the development process with Python in a sample project.

If you’re a novice, don’t worry. You will only need a very basic knowledge of this programming language — with no other skills required.

Let’s get started, shall we?

Never Heard Of Google Cloud Vision?

It’s an API that allows developers to analyze the content of an image through extracted data. For this purpose, Google utilizes machine learning models trained on a large dataset of images. All of that is available with a single API request. The engine behind the API classifies images, detects objects, people’s faces, and recognizes printed words within images.

To give you an example, let’s bring up the well-liked Giphy. They’ve adopted the API to extract caption data from GIFs, what resulted in significant improvement in user experience. Another example is realtor.com, which uses the Vision API’s OCR to extract text from images of For Sale signs taken on a mobile app to provide more details on the property.

Machine Learning At A Glance

Let’s start with answering the question many of you have probably heard before — what is the Machine Learning?

The broad idea is to develop a programmable model that finds patterns in the data its given. The higher quality data you deliver and the better the design of the model you use, the smarter outcome will be produced. With ‘friendly machine learning’ (as Google calls their Machine Learning through API services), you can easily incorporate a chunk of Artificial Intelligence into your applications.

Recommended reading: Getting Started With Machine Learning

How To Get Started With Google Cloud

Let’s start with the registration to Google Cloud. Google requires authentication, but it’s simple and painless — you’ll only need to store a JSON file that’s including API key, which you can get directly from the Google Cloud Platform.

Download the file and add it’s path to environment variables:

export GOOGLE_APPLICATION_CREDENTIALS=/path/to/your/apikey.json

Alternatively, in development, you can support yourself with the from_serivce_account_json() method, which I’ll describe further in this article. To learn more about authentication, check out Cloud’s official documentation.

Google provides a Python package to deal with the API. Let’s add the latest version of google-cloud-vision==0.33 to your app. Time to code!

How To Combine Google Cloud Vision With Python

Firstly, let’s import classes from the library.

from google.cloud import vision
from google.cloud.vision import types

When that’s taken care of, now you’ll need an instance of a client. To do so, you’re going to use a text recognition feature.

client = vision.ImageAnnotatorClient()

If you won’t store your credentials in environment variables, at this stage you can add it directly to the client.

client = vision.ImageAnnotatorClient.from_service_account_file(
'/path/to/apikey.json'
)

Assuming that you store images to be processed in a folder ‘images’ inside your project catalog, let’s open one of them.

Image of receipt that could be processed by Google Cloud Vision — An example of a simple receipt that could be processed by Google Cloud Vision. (Large preview)

image_to_open = 'images/receipt.jpg'

with open(image_to_open, 'rb') as image_file:
    content = image_file.read()

Next step is to create a Vision object, which will allow you to send a request to proceed with text recognition.

image = vision.types.Image(content=content)

text_response = client.text_detection(image=image)

The response consists of detected words stored as description keys, their location on the image, and a language prediction. For example, let’s take a closer look at the first word:

[
...
description: "SHOPPING"
bounding_poly {
  vertices {
    x: 1327
    y: 1513
  }
  vertices {
    x: 1789
    y: 1345
  }
  vertices {
    x: 1821
    y: 1432
  }
  vertices {
    x: 1359
    y: 1600
  }
}
...
]

As you can see, to filter text only, you need to get a description “on all the elements”. Luckily, with help comes Python’s powerful list comprehension.

texts = [text.description for text in text_response.text_annotations]

['SHOPPING STORE\nREG 12-21\n03:22 PM\nCLERK 2\n618\n1 MISC\n1 STUFF\n$0.49\n$7.99\n$8.48\n$0.74\nSUBTOTAL\nTAX\nTOTAL\nCASH\n6\n$9. 22\n$10.00\nCHANGE\n$0.78\nNO REFUNDS\nNO EXCHANGES\nNO RETURNS\n', 'SHOPPING', 'STORE', 'REG', '12-21', '03:22', 'PM', 'CLERK', '2', '618', '1', 'MISC', '1', 'STUFF', '$0.49', '$7.99', '$8.48', '$0.74', 'SUBTOTAL', 'TAX', 'TOTAL', 'CASH', '6', '$9.', '22', '$10.00', 'CHANGE', '$0.78', 'NO', 'REFUNDS', 'NO', 'EXCHANGES', 'NO', 'RETURNS']

If you look carefully, you can notice that the first element of the list contains all text detected in the image stored as a string, while the others are separated words. Let’s print it out.

print(texts[0])

SHOPPING STORE
REG 12-21
03:22 PM
CLERK 2
618
1 MISC
1 STUFF
$0.49
$7.99
$8.48
$0.74
SUBTOTAL
TAX
TOTAL
CASH
6
$9. 22
$10.00
CHANGE
$0.78
NO REFUNDS
NO EXCHANGES
NO RETURNS

Pretty accurate, right? And obviously quite useful, so let’s play more.

What Can You Get From Google Cloud Vision?

As I’ve mentioned above, Google Cloud Vision it’s not only about recognizing text, but also it lets you discover faces, landmarks, image properties, and web connections. With that in mind, let’s find out what it can tell you about web associations of the image.

web_response = client.web_detection(image=image)

Okay Google, do you actually know what is shown on the image you received?

web_content = web_response.web_detection
web_content.best_guess_labels
>>> [label: "Receipt"]

Good job, Google! It’s a receipt indeed. But let’s give you a bit more exercise — can you see anything else? How about more predictions expressed in percentage?

predictions = [
(entity.description, '{:.2%}'.format(entity.score))) for entity in web_content.web_entities
]

>>> [('Receipt', '70.26%'), ('Product design', '64.24%'), ('Money', '56.54%'), ('Shopping', '55.86%'), ('Design', '54.62%'), ('Brand', '54.01%'), ('Font', '53.20%'), ('Product', '51.55%'), ('Image', '38.82%')]

Lots of valuable insights, well done, my almighty friend! Can you also find out where the image comes from and whether it has any copies?

web_content.full_matching_images
 >>> [
url: "http://www.rcapitalassociates.com/wp-content/uploads/2018/03/receipts.jpg", 
url:"https://media.istockphoto.com/photos/shopping-receipt-picture-id901964616?k=6&m=901964616&s=612x612&w=0&h=RmFpYy9uDazil1H9aXkkrAOlCb0lQ-bHaFpdpl76o9A=", 
url: "https://www.pakstat.com.au/site/assets/files/1172/shutterstock_573065707.500x500.jpg"
]

I’m impressed. Thanks, Google! But one is not enough, can you please give me three examples of similar images?

web_content.visually_similar_images[:3]
>>>[
url: "https://thumbs.dreamstime.com/z/shopping-receipt-paper-sales-isolated-white-background-85651861.jpg", 
url: "https://thumbs.dreamstime.com/b/grocery-receipt-23403878.jpg", 
url:"https://image.shutterstock.com/image-photo/closeup-grocery-shopping-receipt-260nw-95237158.jpg"
]

Sweet! Well done.

Is There Really An Artificial Intelligence In Google Cloud Vision?

As you can see in the image below, dealing with receipts can get a bit emotional.

Man screaming and looking stressed while holding a long receipt — An example of stress you can experience while getting a receipt. (Large preview)

Let’s have a look at what the Vision API can tell you about this photo.

image_to_open = 'images/face.jpg'

with open(image_to_open, 'rb') as image_file:
    content = image_file.read()
image = vision.types.Image(content=content)

face_response = client.face_detection(image=image)
face_content = face_response.face_annotations

face_content[0].detection_confidence
>>> 0.5153166651725769

Not too bad, the algorithm is more than 50% sure that there is a face in the picture. But can you learn anything about the emotions behind it?

face_content[0]
>>> [
...
joy_likelihood: VERY_UNLIKELY
sorrow_likelihood: VERY_UNLIKELY
anger_likelihood: UNLIKELY
surprise_likelihood: POSSIBLE
under_exposed_likelihood: VERY_UNLIKELY
blurred_likelihood: VERY_UNLIKELY
headwear_likelihood: VERY_UNLIKELY
...
]

Surprisingly, with a simple command, you can check the likeliness of some basic emotions as well as headwear or photo properties.

When it comes to the detection of faces, I need to direct your attention to some of the potential issues you may encounter. You need to remember that you’re handing a photo over to a machine and although Google’s API utilizes models trained on huge datasets, it’s possible that it will return some unexpected and misleading results. Online you can find photos showing how easily artificial intelligence can be tricked when it comes to image analysis. Some of them can be found funny, but there is a fine line between innocent and offensive mistakes, especially when a mistake concerns a human face.

With no doubt, Google Cloud Vision is a robust tool. Moreover, it’s fun to work with. API’s REST architecture and the widely available Python package make it even more accessible for everyone, regardless of how advanced you are in Python development. Just imagine how significantly you can improve your app by utilizing its capabilities!

Recommended reading: Applications Of Machine Learning For Designers

How Can You Broaden Your Knowledge On Google Cloud Vision

The scope of possibilities to apply Google Cloud Vision service is practically endless. With Python Library available, you can utilize it in any project based on the language, whether it’s a web application or a scientific project. It can certainly help you bring out deeper interest in Machine Learning technologies.

Google documentation provides some great ideas on how to apply the Vision API features in practice as well as gives you the possibility to learn more about the Machine Learning. I especially recommend to check out the guide on how to build an advanced image search app.

One could say that what you’ve seen in this article is like magic. After all, who would’ve thought that a simple and easily accessible API is backed by such a powerful, scientific tool? All that’s left to do is write a few lines of code, unwind your imagination, and experience the boundless potential of image analysis.

(rb, ra, il)