(This article was first published on R-exercises, and kindly contributed to R-bloggers)
[For this exercise, before proceeding, first read the rvest package help and the selectorgadget help.]
Answers to the exercises are available here.
Exercise 1
Consider the url ‘http://statbel.fgov.be/en/statistics/figures/economy/indicators/prix_prod_con/’
Extract all the information load on table ‘Third Quarter 2016’.
Exercise 2
Consider the url ‘http://www2.sas.com/proceedings/sugi30/toc.html’
Extract all the papers names, from 001-30 to 268-30
Exercise 3
Consider the url ‘http://www.gibbon.se/Retailer/Map.aspx?SectionId=832’
Extract all the options (countries) availables on select button.
Exercise 4
Consider the url ‘http://r-exercises.com/start-here-to-learn-r/’
Extract all the topics available on the url.
Exercise 5
Consider the url ‘http://www.immobiliare.it/Roma/agenzie_immobiliari_provincia-Roma.html’
Extract all inmobiliaries names published on first page.
Exercise 6
Consider the url ‘http://www.gibbon.se/Retailer/Map.aspx?SectionId=832’.
Extract the links to the detailed information of each row on the table.
For example, for the first adress, Karlbergsvägen 32, 113 27 stockholm, the details are
A.E.N HUND I STAN AB
ADRESS OCH ÖPPETTIDER
Karlbergsvägen 32
113 27 STOCKHOLM
Öppettider:
Telefon: 08-313058
Mail-adress: info@hundistan.eu
Hemsida:
The link to that details (clicking on Karlbergsvägen 32, 113 27 stockholm) is http://www.gibbon.se/Retailer/Retailer.aspx?ItemId=45128.
You have to extract all the links available, one per row.
Exercise 7
Consider the url ‘https://www.bkk-klinikfinder.de/suche/suchergebnis.php?next=1’
Extract the links to the detailed information of each hospital. For example, for the hospital
Krankenhaus Dresden-Friedrichstadt Städtisches Klinikum, the details are available on the link:
https://www.bkk-klinikfinder.de/krankenhaus/index.php?id=26140094900
Exercise 8
Consider the url scraped in Exercise 7.
Extract the links to ‘Details’ for each hospital display on the first 4 pages.
Exercise 9
Consider the url=’http://www.dictionary.com/browse/’ and the words ‘handy’,’whisper’,’lovely’,’scrape’.
Build a data frame, where the first variables is “Word” and the second variables is “definitions”. Scrape the definitions from the url.
Exercise 10
Consider the url ‘http://www.gibbon.se/Retailer/Map.aspx?SectionId=832’.
Build a data frame with all the information available for each row.
For example, for the first adress, Karlbergsvägen 32, 113 27 stockholm, the details are
A.E.N HUND I STAN AB
ADRESS OCH ÖPPETTIDER
Karlbergsvägen 32
113 27 STOCKHOLM
Öppettider:
Telefon: 08-313058
Mail-adress: info@hundistan.eu
Hemsida:
For the second row, Inedalsgatan 5, 112 33 stockholm, the details are
ARKENZOO KUNGSHOLMEN A
ADRESS OCH ÖPPETTIDER
Kungs Zoo AB
Inedalsgatan 5
112 33 STOCKHOLM
Öppettider:
Telefon: 08-7248110
Mail-adress: kungsholmen@arkenzoo.se
Hemsida: www.arkenzoo.se
This details will be saved on the first row of the data.frame.
Website address Name of store Phone Number Email adress City Country
1 A.E.N Hund i Stan AB 08-313058 info@hundistan.eu Stocholm Sweden
2 www.arkenzoo.se ArkenZoo Kungsholmen A 08-7248110 kungsholmen@arkenzoo.se Stocholm Sweden
To leave a comment for the author, please follow the link and comment on their blog: R-exercises.
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...