Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases now! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Conferences
Free Learning
Arrow right icon

Automating OCR and Translation with Google Cloud Functions: A Step-by-Step Guide

Save for later
View related Packt books & videos

article-image

This article is an excerpt from the book, "Google Cloud Associate Cloud Engineer Certification and Implementation Guide", by Agnieszka Koziorowska, Wojciech Marusiak. This book serves as a guide for students preparing for ACE certification, offering invaluable practical knowledge and hands-on experience in implementing various Google Cloud Platform services. By actively engaging with the content, you’ll gain the confidence and expertise needed to excel in your certification journey.

automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide-img-0

Introduction 

In this article, we will walk you through an example of implementing Google Cloud Functions for optical character recognition (OCR) on Google Cloud Platform. This tutorial will demonstrate how to automate the process of extracting text from an image, translating the text, and storing the results using Cloud Functions, Pub/Sub, and Cloud Storage. By leveraging Google Cloud Vision and Translation APIs, we can create a workflow that efficiently handles image processing and text translation. The article provides detailed steps to set up and deploy Cloud Functions using Golang, covering everything from creating storage buckets to deploying and running your function to translate text. 

Google Cloud Functions Example 

Now that you’ve learned what Cloud Functions is, I’d like to show you how to implement a sample Cloud Function. 

We will guide you through optical character recognition (OCR) on Google Cloud Platform with Cloud Functions. 

Our use case is as follows: 

1. An image with text is uploaded to Cloud Storage. 

2. A triggered Cloud Function utilizes the Google Cloud Vision API to extract the text and identify the source language. 

3. The text is queued for translation by publishing a message to a Pub/Sub topic. 

4. A Cloud Function employs the Translation API to translate the text and stores the result in the translation queue. 

5. Another Cloud Function saves the translated text from the translation queue to Cloud Storage. 6. The translated results are available in Cloud Storage as individual text files for each translation. 

We need to download the samples first; we will use Golang as the programming language. Source files can be downloaded from – https://github.com/GoogleCloudPlatform/golangsamples. Before working with the OCR function sample, we recommend enabling the Cloud Translation API and the Cloud Vision API. If they are not enabled, your function will throw errors, and the process will not be completed. Let’s start with deploying the function: 

1. We need to create a Cloud Storage bucket.  Create your own bucket with unique name – please refer to documentation on bucket naming under following link: https://cloud.google.com/storage/docs/buckets We will use the following code: 

gsutil mb gs://wojciech_image_ocr_bucket 

2. We also need to create a second bucket to store the results: 

gsutil mb gs://wojciech_image_ocr_bucket_results 

3. We must create a Pub/Sub topic to publish the finished translation results. We can do so with the following code: gcloud pubsub topics create YOUR_TOPIC_NAME. We used the following command to create it: 

gcloud pubsub topics create wojciech_translate_topic 

4. Creating a second Pub/Sub topic to publish translation results is necessary. We can use the following code to do so: 

gcloud pubsub topics create wojciech_translate_topic_results 

5. Next, we will clone the Google Cloud GitHub repository with some Python sample code: 

git clone https://github.com/GoogleCloudPlatform/golang-samples 

6. From the repository, we need to go to the golang-samples/functions/ocr/app/ file to be able to deploy the desired Cloud Function. 

7. We recommend reviewing the included go files to review the code and understand it in more detail. Please change the values of your storage buckets and Pub/Sub topic names. 

8. We will deploy the first function to process images. We will use the following command: 

gcloud functions deploy ocr-extract-go --runtime go119 --trigger-bucket wojciech_image_ocr_bucket --entry-point  
ProcessImage --set-env-vars "^:^GCP_PROJECT=wmarusiak-book- 
351718:TRANSLATE_TOPIC=wojciech_translate_topic:RESULT_ 
TOPIC=wojciech_translate_topic_results:TO_LANG=es,en,fr,ja" 

9. After deploying the first Cloud Function, we must deploy the second one to translate the text.  

We can use the following code snippet: 

gcloud functions deploy ocr-translate-go --runtime go119 --trigger-topic wojciech_translate_topic --entry-point  
TranslateText --set-env-vars "GCP_PROJECT=wmarusiak-book- 
351718,RESULT_TOPIC=wojciech_translate_topic_results" 

10. The last part of the complete solution is a third Cloud Function that saves results to Cloud Storage. We will use the following snippet of code to do so: 

gcloud functions deploy ocr-save-go --runtime go119 --triggertopic wojciech_translate_topic_results --entry-point SaveResult  
--set-env-vars "GCP_PROJECT=wmarusiak-book-351718,RESULT_ 
BUCKET=wojciech_image_ocr_bucket_results" 

11. We are now free to upload any image containing text. It will be processed first, then translated and saved into our Cloud Storage bucket. 

Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime

12. We uploaded four sample images that we downloaded from the Internet that contain some text. We can see many entries in the ocr-extract-go Cloud Function’s logs. Some Cloud Function log entries show us the detected language in the image and the other extracted text: 

automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide-img-1 

Figure 7.22 – Cloud Function logs from the ocr-extract-go function 

13. ocr-translate-go translates detected text in the previous function: 

automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide-img-2 

Figure 7.23 – Cloud Function logs from the ocr-translate-go function 

14. Finally, ocr-save-go saves the translated text into the Cloud Storage bucket: 

automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide-img-3 

Figure 7.24 – Cloud Function logs from the ocr-save-go function 

15. If we go to the Cloud Storage bucket, we’ll see the saved translated files: 

automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide-img-4 

Figure 7.25 – Translated images saved in the Cloud Storage bucket 

16. We can view the content directly from the Cloud Storage bucket by clicking Download next to the file, as shown in the following screenshot: 

automating-ocr-and-translation-with-google-cloud-functions-a-step-by-step-guide-img-5 

Figure 7.26 – Translated text from Polish to English stored in the Cloud Storage bucket 

Cloud Functions is a powerful and fast way to code, deploy, and use advanced features. We encourage you to try out and deploy Cloud Functions to understand the process of using them better. 

At the time of writing, Google Cloud Free Tier offers a generous number of free resources we can use. Cloud Functions offers the following with its free tier: 

  • 2 million invocations per month (this includes both background and HTTP invocations) 
  • 400,000 GB-seconds, 200,000 GHz-seconds of compute time 
  • 5 GB network egress per month 

Google Cloud has comprehensive tutorials that you can try to deploy. Go to https://cloud.google.com/functions/docs/tutorials to follow one. 

Conclusion 

In conclusion, Google Cloud Functions offer a powerful and scalable solution for automating tasks like optical character recognition and translation. Through this example, we have demonstrated how to use Cloud Functions, Pub/Sub, and the Google Cloud Vision and Translation APIs to build an end-to-end OCR and translation pipeline. By following the provided steps and code snippets, you can easily replicate this process for your own use cases. Google Cloud's generous Free Tier resources make it accessible to get started with Cloud Functions. We encourage you to explore more by deploying your own Cloud Functions and leveraging the full potential of Google Cloud Platform for serverless computing. 

Author Bio

Agnieszka is an experienced Systems Engineer who has been in the IT industry for 15 years. She is dedicated to supporting enterprise customers in the EMEA region with their transition to the cloud and hybrid cloud infrastructure by designing and architecting solutions that meet both business and technical requirements. Agnieszka is highly skilled in AWS, Google Cloud, and VMware solutions and holds certifications as a specialist in all three platforms. She strongly believes in the importance of knowledge sharing and learning from others to keep up with the ever-changing IT industry.

With over 16 years in the IT industry, Wojciech is a seasoned and innovative IT professional with a proven track record of success. Leveraging extensive work experience in large and complex enterprise environments, Wojciech brings valuable knowledge to help customers and businesses achieve their goals with precision, professionalism, and cost-effectiveness. Holding leading certifications from AWS, Alibaba Cloud, Google Cloud, VMware, and Microsoft, Wojciech is dedicated to continuous learning and sharing knowledge, staying abreast of the latest industry trends and developments.