Skip to content

Automate Your R Tasks on Google Cloud Platform – Part 1.

I’m a big fan of Google Cloud Platform. It can help your work in many ways from data storage (Google Big Query, Storage Bucket) to automate tasks (Cloud Scheduler) and a lot of other tasks.

Believe me, it is a rabbit hole. GCP has a lot of tools you can use in your daily tasks.

In this post, I go through how you can connect to automate your R tasks and save the result to Google Cloud Stora.

The Background

We need a few things. First of all, a simple task we would like to run on schedule. And we would like to make a cron job and finally save the output to GCS.

Task is to save the current weather data every hour. We use this API: https://openweathermap.org/current– I know it is not the best way to get weather data, but keep in mind it is just a demo, you can subsitute the task 🙂

We would like to run the API call every hour and save the output to Google Cloud Storage as a csv file to use it later.

Weather API call with R

First, sign up and get your API key. This service is free, until you reach the limit. Under 1 million calls/month the service is free, see more: https://openweathermap.org/price

If you are eager as I was right after the registration, you face this error message on the first api call:

{"cod":401, "message": "Invalid API key. Please see http://openweathermap.org/faq#error401 for more info."}

Don’t worry, it just take time to activate your api key:

Your API key is not activated yet. Within the next couple of hours, it will be activated and ready to use.

https://openweathermap.org/faq#error401

After the activation, you can make your first api call. I made it for Budapest:

http://api.openweathermap.org/data/2.5/weather?q=Budapest&appid={my_app_id}

The result is:

{"coord":{"lon":19.0399,"lat":47.498},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"base":"stations","main":{"temp":274.79,"feels_like":269.86,"temp_min":273.15,"temp_max":275.93,"pressure":1011,"humidity":86},"visibility":10000,"wind":{"speed":4.12,"deg":300},"clouds":{"all":0},"dt":1615953695,"sys":{"type":1,"id":6663,"country":"HU","sunrise":1615956774,"sunset":1615999891},"timezone":3600,"id":3054643,"name":"Budapest","cod":200}

It contains a lot of data in JSON, but we are able to get the data in XML and HTML, too.

For this example only the temp data is necessary. But it is Kelvin by default. Only one modification need to make it Celsius (or Fahrenheit). Change the unit with this parameter:

&units=metric

If you would like to use Fahrenheit, jut replace the metric to imperial.

The whole (and very complex 🙂 ) api call looks like this:

http://api.openweathermap.org/data/2.5/weather?q=Budapest&appid={my_app_id}&units=metric

And the result is:

{"coord":{"lon":19.0399,"lat":47.498},"weather":[{"id":800,"main":"Clear","description":"clear sky","icon":"01n"}],"base":"stations","main":{"temp":1.48,"feels_like":-3.48,"temp_min":-1,"temp_max":2.78,"pressure":1011,"humidity":86},"visibility":10000,"wind":{"speed":4.12,"deg":310},"clouds":{"all":0},"dt":1615954501,"sys":{"type":1,"id":6663,"country":"HU","sunrise":1615956774,"sunset":1615999891},"timezone":3600,"id":3054643,"name":"Budapest","cod":200}

We know how the api works, just make the api call in R

The R Code

For the weather api call we only have to use 1 library, the ‘httr’

library(httr)

It makes the http requests, let’s how.

weatherURL <- "http://api.openweathermap.org/data/2.5/weather"
weatherAppID <- "123myappid456"

weatherData <- GET(weatherURL,
                   query = list(
                     q = "Budapest",
                     appid = weatherAppID,
                     units = "metric",
                     mode="xml"
                   ))

It is just a simple GET request, similar what we use in the AdForm API tutorial (but it is a POST request)

The first line is the URL, it stored in a variable (weatherURL).

The second parameter in the GET request is the query. This is the part of the whole URL after the ‘?’: q=Budapest&appid={my_app_id}&units=metric

The query contains everything we need to get the data:

  • q – the name of the city
  • appid – the app ID, I stored in a variable
  • units – royal (Fahrenheit) or metric (Celsius), if you don’t add it the default value is Kelvin
  • mode – the output of the GET request, I choose XML, but if you don’t use it the output is JSON by default. For me XML output is suitable and the whole tutorial based on XML output

If everything works fine you get something like this:

Response [http://api.openweathermap.org/data/2.5/weather?q=Budapest&appid=123myappid456&units=metric&mode=xml]
  Date: 2021-03-20 07:51
  Status: 200
  Content-Type: application/xml; charset=utf-8
  Size: 860 B
<BINARY BODY>

So far it is not so informative, let’s encrypt it 🙂

Read the XML output

Since the output is XML we should call a new library XML (what a big surprise)

library(XML)

You can read the description of the package here: https://cran.r-project.org/web/packages/XML/index.html

First of all, for better understanding and cleaner code go back to the GET request and save the whole request in a variable, like this:

weatherData <- GET(weatherURL,
                   query = list(
                     q = "Budapest",
                     appid = weatherAppID,
                     units = "metric",
                     mode="xml"
                   ))

The ‘weatherData’ stores the XML output, now we can encrypt it:

xmlParse(weatherData)

This simple line of code make the XML output readable in R. The output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<current>
  <city id="3054643" name="Budapest">
    <coord lon="19.0399" lat="47.498"/>
    <country>HU</country>
    <timezone>3600</timezone>
    <sun rise="2021-03-20T04:46:48" set="2021-03-20T16:55:46"/>
  </city>
  <temperature value="2.36" min="1.67" max="2.78" unit="celsius"/>
  <feels_like value="-2.88" unit="celsius"/>
  <humidity value="69" unit="%"/>
  <pressure value="1021" unit="hPa"/>
  <wind>
    <speed value="4.12" unit="m/s" name="Gentle Breeze"/>
    <gusts/>
    <direction value="100" code="E" name="East"/>
  </wind>
  <clouds value="75" name="broken clouds"/>
  <visibility value="10000"/>
  <precipitation mode="no"/>
  <weather number="803" value="broken clouds" icon="04d"/>
  <lastupdate value="2021-03-20T07:51:22"/>
</current>

The next step is to convert it as a list, it is easy with this line of code:

xmlToList(xmlParse(weatherData))

Just add ‘xmlToList’ and a xml changes to a simple list:

$city
$city$coord
      lon       lat 
"19.0399"  "47.498" 

$city$country
[1] "HU"

$city$timezone
[1] "3600"

$city$sun
                 rise                   set 
"2021-03-20T04:46:48" "2021-03-20T16:55:46" 

$city$.attrs
        id       name 
 "3054643" "Budapest" 


$temperature
    value       min       max      unit 
   "2.36"    "1.67"    "2.78" "celsius" 

$feels_like
    value      unit 
  "-2.88" "celsius" 

$humidity
value  unit 
 "69"   "%" 

$pressure
 value   unit 
"1021"  "hPa" 

$wind
$wind$speed
          value            unit            name 
         "4.12"           "m/s" "Gentle Breeze" 

$wind$gusts
NULL

$wind$direction
 value   code   name 
 "100"    "E" "East" 


$clouds
          value            name 
           "75" "broken clouds" 

$visibility
  value 
"10000" 

$precipitation
mode 
"no" 

$weather
         number           value            icon 
          "803" "broken clouds"           "04d" 

$lastupdate
                value 
"2021-03-20T07:51:22" 

This is a big nested list, contains a lot of useful data, but we only need the temperature:

xmlToList(xmlParse(weatherData))["temperature"]

The output is a list, since the whole list is a nested list.

$temperature
    value       min       max      unit 
    "0.3"      "-3"    "2.78" "celsius"  

We have two options. The first one is to use this output and retrive the data we would like to use. This is tha ‘value’ parameter, so get it with this line:

xmlToList(xmlParse(weatherData))["temperature"][[1]][[1]]

The output is the value itself:

[1] "0.3"

If you would like to use other data than the value, maybe you should convert the whole ‘temperature’ list as a dataFrame. For me it is a bit easier:

as.data.frame(xmlToList(xmlParse(weatherData))["temperature"])
      temperature
value        2.36
min          1.67
max          2.78
unit      celsius

But now we only would like to use the ‘value’ which is the actual temperature on the given city.

Create a new data frame with the actual date and the value. It’s pretty straighforward, just get the items (date and the value). But first we have to create a date variable.

dateForDf <- Sys.Date()
data.frame(date = dateForDf,
           value = actualTemp)

The result is a simple data frame with the actual date and value:

        date value
1 2021-03-26   0.3

Convert it as a csv

We have the data, we should convert it as a csv file we can upload to Google Cloud Storage

fileName <- paste("weather_data_",dateForDf, ".csv", sep = "")

write.csv(weatherDataDF, file = fileName, row.names=FALSE)

The first line generates the filename. It just add the weather_data_ prefix before the actual date and the .csv. It ends up a filename like this: weather_data_20210229.csv.

The script runs every day, so every day it generates a file with the actual date on the name and of course the data is also up to date 🙂

The second line converts the weatherDataDF (it contains the actual data), to a csv file with the filename we generated on the first line.

The ‘row.names=FALSE’ parameter drop the row names. In this example the row names are just row indexes which is useless for now, so just drop it.

If you run this code, you’ll get the csv in the working directory.

Next Steps

We have the data for the automation. In the next post we’ll go through how to upload it to Google Cloud Storage, schedule it to run every morning and load the csv to Google Big Query.

The scheduled automation of the R script and the BigQuery setup coming on the next posts.

Published inAnalyticsGoogle BigQuery

2 Comments

  1. Charles Charles

    Is there going to be a part 2 to this post? Looking to do a similar thing – thanks

    • meheszlev meheszlev

      Hi Charles, yes, it is on the roadmap 🙂
      If you have specific questions, ideas, just let me know and I’ll add it

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.