Analytics Log - Adil Khan

View Original

Use Google Cloud Translations API To Run Translations At Scale Via googleLanguageR package In R Programming

googleLanguageR is an easy-to-use package to work with NLP, Translation, Speech-to-text and Text-to-speech APIs from Google Cloud Platform. For this post, I’ll use an Airbnb sample review data. The ultimate objective in the upcoming posts will be to re-create the Entity/Sentiment analysis using Google Cloud Platform, this post will focus on the early step, using Google Translation API to first convert the review text from other languages into English.

Here are the languages supported in Google Language Translation API. [From/to any of these languages]

The googleLanaugeR package has been developed by Mark Edmondson [also of googleAnalyticsR fame].

In the below sample data, we’re interested in the ‘comments’ column. Rows 4 and 5 are in French and Spanish, respectively.

See this content in the original post

In the above code, the gl_translate function works on the ‘comments’ column, uses a target language of English, format as text [other option is html] and source language = ““ [i.e. auto detect language in the cells and then run translation api]

Note: Before you run the gl_translate function, you need to enable the Language Translation API in Google Cloud Platform.

The API output has 3 columns, translatedText [containing the translation], the detectedSourceLangauge and original text. I particularly like the fact that you don’t need to specify the original language.

You can now run the cbind function to append the API output from Translation API to the original dataset.

Next, subset function removes the ‘text’ column as it’s the same as ‘comments’.

Next post, I’ll run the Entity/Sentiment analysis on the translated comments and then visualize the converted text data.

See this content in the original post