Scraping cards and dashboards

For this demo, we’ve set up what looks like a state government dashboard, with some statistics for ~~each suburb~~ the 100 most populous suburbs in Melbourne pulled from the Australian Bureau of Statistics. Each card has a number of relevant pieces of information.

How would we go about extracting the info from these cards into Google Sheets? If it was a <table>, we could just use the importhtml() function to get the whole thing in one go.

We can’t do that this time—it isn’t a table.

Recreating this exercise

I’ve cheated and created this listing from a spreadsheet myself—but it mimics the sorts of listings you’ll see on product listings and dashboards all over the internet. If you’re interested, the code here shows I collected and transformed ABS spreadsheets to make this exercise.

Step 1: Downloading Census DataPacks

Code

library(tidyverse)
library(readxl)
library(scales)
library(yaml)
library(here)

zip_path <- here("lessons", "scraping-cards", "vic-stats.zip")
zip_url <- paste0(
  "https://www.abs.gov.au/",
  "census/find-census-data/datapacks/download/",
  "2021_GCP_SAL_for_VIC_short-header.zip")
download.file(zip_url, zip_path)
unzip(zip_path, exdir = here("lessons", "scraping-cards", "data"))
file.rename(
  here("lessons", "scraping-cards", "data", "2021 Census GCP Suburbs and Localities for VIC"),
  here("lessons", "scraping-cards", "data", "Responses"))

Step 2: Loading Census tables and tidy them up

Code

# suburb codes and names
here("lessons", "scraping-cards", "data", "Metadata",
  "2021Census_geog_desc_1st_2nd_3rd_release.xlsx") |>
  read_excel(sheet = "2021_ASGS_Non_ABS_Structures") |>
  filter(ASGS_Structure == "SAL") |>
  select(Code = Census_Code_2021, Name = Census_Name_2021) ->
suburb_map

# total population
here("lessons", "scraping-cards", "data", "Responses", "2021Census_G01_VIC_SAL.csv") |>
  read_csv(col_select = c(SAL_CODE_2021, Tot_P_P)) |>
  mutate(`Total population` = as.integer(Tot_P_P)) |>
  select(SAL_CODE_2021, `Total population`) ->
population

# income and rent (format as currencies)
here("lessons", "scraping-cards", "data", "Responses", "2021Census_G02_VIC_SAL.csv") |>
  read_csv(col_select =
    c(SAL_CODE_2021, Median_rent_weekly, Median_tot_fam_inc_weekly)) |>
  mutate(
    Median_rent_weekly = label_dollar(accuracy = 1)(Median_rent_weekly),
    Median_tot_fam_inc_weekly =
      label_dollar(accuracy = 1)(Median_tot_fam_inc_weekly)) |>
  rename(
    `Median weekly rent` = Median_rent_weekly,
    `Median weekly family income` = Median_tot_fam_inc_weekly) ->
income_and_rent

# commuting:
# we just want the most popular commute method for each area,
# which i'll encode as emoji
here("lessons", "scraping-cards", "data", "Responses", "2021Census_G62_VIC_SAL.csv") |>
  read_csv(
    col_types = cols(SAL_CODE_2021 = col_character(), .default = col_integer()),
    col_select = c(SAL_CODE_2021, ends_with("_P"))) |>
  select(SAL_CODE_2021, matches("One_method"), matches("Two_methods"),
    matches("Three_meth"), -matches("Tot")) |>
  pivot_longer(-SAL_CODE_2021, names_to = "method", values_to = "count") |>
  filter(count > 0) |>
  group_by(SAL_CODE_2021) |>
  slice_max(count, n = 1) |>
  ungroup() |>
  mutate(
    "Most popular commute method" = str_replace_all(method, c(
      "Train" = "🚂 Train",
      "Trn" = "🚂 Train",
      "Bus" = "🚌 Bus",
      "Ferry" = "⛴️ Ferry",
      "Car_as_driver" = "🚗 Driving",
      "Car_as_drvr" = "🚗 Driving",
      "Car_as_passenger" = "🚗 Passenger",
      "Car_as_pass" = "🚗 Passenger",
      "Truck" = "🚚 Truck",
      "Motorbike_scootr" = "🛵 Motorbike or scooter",
      "Other" = "❓ Other",
      "Walked_only" = "🚶 Walk",
      "_P" = "",
      "Tr_2_oth_meth" = "🚂 Train and two other methods",
      "Othr_three_meth" = "❓ Three other methods",
      "One_method" = "",
      "Two_methods" = "",
      "Three_meth" = "",
      "_" = " "))) |>
  select(-method, -count) ->
commuting_mostpopular

Step 3: Joining the tidied tables and outputting them as YAML (in order to make the listing below)

Code

# join and write out biggest 100 suburbs to yaml (so we can make a listing)
population |>
  left_join(income_and_rent, join_by(SAL_CODE_2021)) |>
  left_join(commuting_mostpopular, join_by(SAL_CODE_2021)) |>
  left_join(suburb_map, join_by(SAL_CODE_2021 == Code)) |>
  select(Name, Code = SAL_CODE_2021, everything()) |>
  mutate(`Total population` = as.integer(`Total population`)) |>
  rename(title = Name) |>
  replace_na(list(`Most popular commute method` = "")) |>
  slice_max(`Total population`, n = 100) ->
joined

write_yaml(joined, here("lessons", "scraping-cards", "suburbs.yml"), column.major = FALSE)

Solution

We can use Google Sheets’ more general importxml() function to get information that is in all sorts of structures—not just tables!

The importxml() function takes a page address too, but the second thing we have to tell it is called an XPath. An XPath is a kind of address for looking up content on a web page.

For example, to extract the title from each of the cards below, we would write the following:

=importxml(
  "https://360-info.github.io/training-datajournalism/lessons/scraping-cards",
  "//h5")

The first part is the URL of this page; the second tells the scraper to look for fifth-level headings (h5). In other words, the heading from each card. (We’d want to make sure there weren’t any other fifth-level headings on the page, or else we’d want to be more specific!)

To get the income, we would use:

=importxml(
  "https://360-info.github.io/training-datajournalism/lessons/scraping-cards",
  "//td[@class="Median weekly rent"]")

This is pretty similar, but instead of looking for headings inside cards, we’re looking for table cells (td) that have the class Median weekly rent. That’s because each card on this page has a little table inside it.

Wait, what’s a class?

Elements on web pages can have a unique id, as well as one or more classes to help describe them. The cards on this page have the class card, and the pieces of information on each card have a class named for the data (its ‘column’, if this were a spreadsheet).

Every web page is arranged differently!

Learning how to write XPath can take time, and it involves learning about web pages are strcutured. But this demo shows you the power you have with common tools to extract data–even from places where the authors haven’t made it easy to access!

If you’d like to learn more, here are some resources:

Point Cook

Code	SAL22086
Total population	66781
Median weekly rent	$400
Median weekly family income	$2,468
Most popular commute method	🚗 Driving

Craigieburn

Code	SAL20661
Total population	65178
Median weekly rent	$380
Median weekly family income	$1,855
Most popular commute method	🚗 Driving

Tarneit

Code	SAL22451
Total population	56370
Median weekly rent	$380
Median weekly family income	$2,081
Most popular commute method	🚗 Driving

Melbourne

Code	SAL21640
Total population	54941
Median weekly rent	$381
Median weekly family income	$2,083
Most popular commute method	🚶 Walk

Pakenham

Code	SAL22027
Total population	54118
Median weekly rent	$351
Median weekly family income	$1,879
Most popular commute method	🚗 Driving

Reservoir (Vic.)

Code	SAL22161
Total population	51096
Median weekly rent	$360
Median weekly family income	$1,983
Most popular commute method	🚗 Driving

Berwick

Code	SAL20224
Total population	50298
Median weekly rent	$400
Median weekly family income	$2,346
Most popular commute method	🚗 Driving

Werribee

Code	SAL22750
Total population	50027
Median weekly rent	$330
Median weekly family income	$1,944
Most popular commute method	🚗 Driving

Glen Waverley

Code	SAL21013
Total population	42642
Median weekly rent	$480
Median weekly family income	$2,158
Most popular commute method	🚗 Driving

Sunbury

Code	SAL22391
Total population	38851
Median weekly rent	$361
Median weekly family income	$2,239
Most popular commute method	🚗 Driving

St Albans (Vic.)

Code	SAL22330
Total population	38042
Median weekly rent	$325
Median weekly family income	$1,317
Most popular commute method	🚗 Driving

Frankston

Code	SAL20947
Total population	37331
Median weekly rent	$342
Median weekly family income	$1,871
Most popular commute method	🚗 Driving

Hoppers Crossing

Code	SAL21203
Total population	37216
Median weekly rent	$340
Median weekly family income	$1,793
Most popular commute method	🚗 Driving

Truganina

Code	SAL22582
Total population	36305
Median weekly rent	$390
Median weekly family income	$2,108
Most popular commute method	🚗 Driving

Mount Waverley

Code	SAL21816
Total population	35340
Median weekly rent	$476
Median weekly family income	$2,383
Most popular commute method	🚗 Driving

Mildura

Code	SAL21682
Total population	34565
Median weekly rent	$280
Median weekly family income	$1,635
Most popular commute method	🚗 Driving

Preston (Vic.)

Code	SAL22121
Total population	33790
Median weekly rent	$392
Median weekly family income	$2,345
Most popular commute method	🚗 Driving

Rowville

Code	SAL22207
Total population	33571
Median weekly rent	$441
Median weekly family income	$2,410
Most popular commute method	🚗 Driving

Epping (Vic.)

Code	SAL20878
Total population	33489
Median weekly rent	$361
Median weekly family income	$1,811
Most popular commute method	🚗 Driving

Noble Park

Code	SAL21952
Total population	32257
Median weekly rent	$341
Median weekly family income	$1,545
Most popular commute method	🚗 Driving

Shepparton

Code	SAL22275
Total population	32067
Median weekly rent	$270
Median weekly family income	$1,580
Most popular commute method	🚗 Driving

Clyde North

Code	SAL20582
Total population	31681
Median weekly rent	$410
Median weekly family income	$2,199
Most popular commute method	🚗 Driving

Warrnambool

Code	SAL22710
Total population	31308
Median weekly rent	$290
Median weekly family income	$1,861
Most popular commute method	🚗 Driving

Doncaster East

Code	SAL20772
Total population	30926
Median weekly rent	$462
Median weekly family income	$2,071
Most popular commute method	🚗 Driving

Narre Warren South

Code	SAL21896
Total population	30909
Median weekly rent	$401
Median weekly family income	$2,174
Most popular commute method	🚗 Driving

Bentleigh East

Code	SAL20215
Total population	30159
Median weekly rent	$500
Median weekly family income	$2,640
Most popular commute method	🚗 Driving

Dandenong

Code	SAL20707
Total population	30127
Median weekly rent	$319
Median weekly family income	$1,413
Most popular commute method	🚗 Driving

Keysborough

Code	SAL21339
Total population	30018
Median weekly rent	$421
Median weekly family income	$2,015
Most popular commute method	🚗 Driving

Mill Park

Code	SAL21683
Total population	28712
Median weekly rent	$366
Median weekly family income	$1,992
Most popular commute method	🚗 Driving

Croydon (Vic.)

Code	SAL20682
Total population	28608
Median weekly rent	$380
Median weekly family income	$2,120
Most popular commute method	🚗 Driving

Richmond (Vic.)

Code	SAL22170
Total population	28587
Median weekly rent	$441
Median weekly family income	$3,096
Most popular commute method	🚗 Driving

Bundoora (Vic.)

Code	SAL20399
Total population	28068
Median weekly rent	$381
Median weekly family income	$1,991
Most popular commute method	🚗 Driving

Narre Warren

Code	SAL21893
Total population	27689
Median weekly rent	$360
Median weekly family income	$1,924
Most popular commute method	🚗 Driving

Ferntree Gully

Code	SAL20917
Total population	27398
Median weekly rent	$386
Median weekly family income	$2,142
Most popular commute method	🚗 Driving

Doreen

Code	SAL20779
Total population	27122
Median weekly rent	$397
Median weekly family income	$2,308
Most popular commute method	🚗 Driving

Traralgon

Code	SAL22569
Total population	26907
Median weekly rent	$275
Median weekly family income	$1,950
Most popular commute method	🚗 Driving

Coburg

Code	SAL20596
Total population	26574
Median weekly rent	$430
Median weekly family income	$2,532
Most popular commute method	🚗 Driving

Glen Iris (Vic.)

Code	SAL21010
Total population	26131
Median weekly rent	$450
Median weekly family income	$3,447
Most popular commute method	🚗 Driving

Hampton Park

Code	SAL21133
Total population	26082
Median weekly rent	$351
Median weekly family income	$1,614
Most popular commute method	🚗 Driving

Mornington (Vic.)

Code	SAL21763
Total population	25759
Median weekly rent	$400
Median weekly family income	$2,001
Most popular commute method	🚗 Driving

Northcote

Code	SAL21971
Total population	25276
Median weekly rent	$475
Median weekly family income	$3,181
Most popular commute method	🚗 Driving

South Yarra

Code	SAL22314
Total population	25028
Median weekly rent	$415
Median weekly family income	$3,106
Most popular commute method	🚗 Driving

Doncaster

Code	SAL20771
Total population	25020
Median weekly rent	$450
Median weekly family income	$1,958
Most popular commute method	🚗 Driving

South Morang

Code	SAL22311
Total population	24989
Median weekly rent	$390
Median weekly family income	$2,249
Most popular commute method	🚗 Driving

Brunswick (Vic.)

Code	SAL20361
Total population	24896
Median weekly rent	$441
Median weekly family income	$2,807
Most popular commute method	🚗 Driving

Cranbourne North

Code	SAL20664
Total population	24683
Median weekly rent	$385
Median weekly family income	$2,019
Most popular commute method	🚗 Driving

Cranbourne East

Code	SAL20663
Total population	24679
Median weekly rent	$400
Median weekly family income	$1,975
Most popular commute method	🚗 Driving

Kew (Vic.)

Code	SAL21336
Total population	24499
Median weekly rent	$476
Median weekly family income	$3,301
Most popular commute method	🚗 Driving

Caroline Springs

Code	SAL20500
Total population	24488
Median weekly rent	$400
Median weekly family income	$2,271
Most popular commute method	🚗 Driving

Endeavour Hills

Code	SAL20871
Total population	24455
Median weekly rent	$375
Median weekly family income	$1,906
Most popular commute method	🚗 Driving

Wollert

Code	SAL22820
Total population	24407
Median weekly rent	$391
Median weekly family income	$2,027
Most popular commute method	🚗 Driving

Roxburgh Park

Code	SAL22208
Total population	24129
Median weekly rent	$390
Median weekly family income	$1,632
Most popular commute method	🚗 Driving

Cheltenham (Vic.)

Code	SAL20539
Total population	23992
Median weekly rent	$430
Median weekly family income	$2,543
Most popular commute method	🚗 Driving

Glenroy (Vic.)

Code	SAL21047
Total population	23792
Median weekly rent	$369
Median weekly family income	$1,939
Most popular commute method	🚗 Driving

Boronia

Code	SAL20304
Total population	23607
Median weekly rent	$376
Median weekly family income	$2,000
Most popular commute method	🚗 Driving

Langwarrin

Code	SAL21467
Total population	23588
Median weekly rent	$365
Median weekly family income	$2,253
Most popular commute method	🚗 Driving

Mernda

Code	SAL21659
Total population	23369
Median weekly rent	$381
Median weekly family income	$2,141
Most popular commute method	🚗 Driving

Brighton (Vic.)

Code	SAL20337
Total population	23252
Median weekly rent	$600
Median weekly family income	$3,778
Most popular commute method	🚗 Driving

Lalor

Code	SAL21452
Total population	23219
Median weekly rent	$351
Median weekly family income	$1,489
Most popular commute method	🚗 Driving

Mooroolbark

Code	SAL21755
Total population	23059
Median weekly rent	$400
Median weekly family income	$2,246
Most popular commute method	🚗 Driving

Southbank

Code	SAL22315
Total population	22631
Median weekly rent	$411
Median weekly family income	$2,448
Most popular commute method	🚗 Driving

Dandenong North

Code	SAL20708
Total population	22550
Median weekly rent	$341
Median weekly family income	$1,640
Most popular commute method	🚗 Driving

Hawthorn (Vic.)

Code	SAL21152
Total population	22322
Median weekly rent	$400
Median weekly family income	$3,279
Most popular commute method	🚗 Driving

Malvern East

Code	SAL21587
Total population	22296
Median weekly rent	$421
Median weekly family income	$3,288
Most popular commute method	🚗 Driving

Springvale (Vic.)

Code	SAL22328
Total population	22174
Median weekly rent	$357
Median weekly family income	$1,470
Most popular commute method	🚗 Driving

Carrum Downs

Code	SAL20508
Total population	21976
Median weekly rent	$370
Median weekly family income	$1,993
Most popular commute method	🚗 Driving

Camberwell (Vic.)

Code	SAL20453
Total population	21965
Median weekly rent	$486
Median weekly family income	$3,264
Most popular commute method	🚗 Driving

Balwyn North

Code	SAL20124
Total population	21302
Median weekly rent	$554
Median weekly family income	$2,727
Most popular commute method	🚗 Driving

Cranbourne

Code	SAL20662
Total population	21281
Median weekly rent	$342
Median weekly family income	$1,685
Most popular commute method	🚗 Driving

Greenvale (Vic.)

Code	SAL21105
Total population	21274
Median weekly rent	$430
Median weekly family income	$2,323
Most popular commute method	🚗 Driving

Essendon

Code	SAL20885
Total population	21240
Median weekly rent	$380
Median weekly family income	$2,946
Most popular commute method	🚗 Driving

Greensborough

Code	SAL21104
Total population	21070
Median weekly rent	$404
Median weekly family income	$2,454
Most popular commute method	🚗 Driving

Wantirna South

Code	SAL22686
Total population	20754
Median weekly rent	$441
Median weekly family income	$2,201
Most popular commute method	🚗 Driving

Highton

Code	SAL21187
Total population	20736
Median weekly rent	$360
Median weekly family income	$2,533
Most popular commute method	🚗 Driving

Wheelers Hill

Code	SAL22766
Total population	20652
Median weekly rent	$481
Median weekly family income	$2,124
Most popular commute method	🚗 Driving

Wyndham Vale

Code	SAL22883
Total population	20518
Median weekly rent	$343
Median weekly family income	$1,892
Most popular commute method	🚗 Driving

Wodonga

Code	SAL22819
Total population	20259
Median weekly rent	$280
Median weekly family income	$1,786
Most popular commute method	🚗 Driving

Thomastown

Code	SAL22504
Total population	20234
Median weekly rent	$350
Median weekly family income	$1,387
Most popular commute method	🚗 Driving

Cranbourne West

Code	SAL20666
Total population	19969
Median weekly rent	$375
Median weekly family income	$1,936
Most popular commute method	🚗 Driving

Mulgrave (Vic.)

Code	SAL21827
Total population	19889
Median weekly rent	$430
Median weekly family income	$2,134
Most popular commute method	🚗 Driving

Warragul

Code	SAL22698
Total population	19856
Median weekly rent	$321
Median weekly family income	$1,984
Most popular commute method	🚗 Driving

Mount Martha

Code	SAL21803
Total population	19846
Median weekly rent	$496
Median weekly family income	$2,481
Most popular commute method	🚗 Driving

St Kilda (Vic.)

Code	SAL22343
Total population	19490
Median weekly rent	$381
Median weekly family income	$2,737
Most popular commute method	🚗 Driving

Wangaratta (Vic.)

Code	SAL22680
Total population	19214
Median weekly rent	$260
Median weekly family income	$1,694
Most popular commute method	🚗 Driving

Ringwood (Vic.)

Code	SAL22174
Total population	19144
Median weekly rent	$381
Median weekly family income	$2,146
Most popular commute method	🚗 Driving

Lara

Code	SAL21469
Total population	19014
Median weekly rent	$370
Median weekly family income	$2,229
Most popular commute method	🚗 Driving

Thornbury

Code	SAL22508
Total population	19005
Median weekly rent	$391
Median weekly family income	$2,713
Most popular commute method	🚗 Driving

Clayton

Code	SAL20569
Total population	18988
Median weekly rent	$400
Median weekly family income	$1,755
Most popular commute method	🚗 Driving

Eltham (Vic.)

Code	SAL20865
Total population	18847
Median weekly rent	$440
Median weekly family income	$2,811
Most popular commute method	🚗 Driving

Frankston South

Code	SAL20949
Total population	18801
Median weekly rent	$420
Median weekly family income	$2,474
Most popular commute method	🚗 Driving

Mount Eliza

Code	SAL21793
Total population	18734
Median weekly rent	$512
Median weekly family income	$2,956
Most popular commute method	🚗 Driving

Sunshine West

Code	SAL22397
Total population	18552
Median weekly rent	$346
Median weekly family income	$1,533
Most popular commute method	🚗 Driving

Torquay (Vic.)

Code	SAL22551
Total population	18534
Median weekly rent	$500
Median weekly family income	$2,627
Most popular commute method	🚗 Driving

Officer

Code	SAL22006
Total population	18503
Median weekly rent	$396
Median weekly family income	$2,256
Most popular commute method	🚗 Driving

Altona Meadows

Code	SAL20036
Total population	18479
Median weekly rent	$341
Median weekly family income	$1,946
Most popular commute method	🚗 Driving

Pascoe Vale

Code	SAL22041
Total population	18171
Median weekly rent	$400
Median weekly family income	$2,472
Most popular commute method	🚗 Driving

Deer Park

Code	SAL20729
Total population	18145
Median weekly rent	$350
Median weekly family income	$1,583
Most popular commute method	🚗 Driving

Bentleigh

Code	SAL20214
Total population	17921
Median weekly rent	$480
Median weekly family income	$2,877
Most popular commute method	🚗 Driving

Carnegie

Code	SAL20498
Total population	17909
Median weekly rent	$395
Median weekly family income	$2,545
Most popular commute method	🚗 Driving

Ocean Grove

Code	SAL22005
Total population	17714
Median weekly rent	$420
Median weekly family income	$2,299
Most popular commute method	🚗 Driving