Programmatically drive a web browser
Whenever you need to programmatically drive a web browser.
Most often:
Prerequisites: JRE or JDK installed on your system, Mozilla Firefox
install.packages("RSelenium")
Download selenium-server-standalone-4.0.0-alpha-2.jar (or whatever is the latest ‘selenium-server-standalone’ file)
Download the latest Mozilla geckodriver release, and place in same directory as the jar file
At the terminal, first cd to the directory where your two new files are saved, then run:
-jar selenium-server-standalone-4.0.0-alpha-2.jar java
The selenium server must be up and running before attempting to execute the R code below.
library(RSelenium)
library(keyring)
library(rvest)
library(magrittr)
# Start Selenium Session
remDr <- remoteDriver(
remoteServerAddr = "localhost",
port = 4444L,
browserName = "firefox"
)
remDr$open()
# Navigate to login page
remDr$navigate("https://website.com/login")
Sys.sleep(5) # Give page time to load
# Find 'username' element and send 'saved_user' as input
webElem1 <- remDr$findElement(using = "xpath", "//input[@name = 'username']")
webElem1$sendKeysToElement(list(key_get("saved_user")))
# Find 'password' element and send 'saved_pass' and 'enter' keystroke as input
webElem2 <- remDr$findElement(using = "xpath", "//input[@name = 'password']")
webElem2$sendKeysToElement(list(key_get("saved_pass"), key = "enter"))
Sys.sleep(5) # Give page time to load
# Navigate to desired page and download source
remDr$navigate("https://website.com/somepage")
Sys.sleep(5) # Give page time to load
html <- remDr$getPageSource()[[1]] %>% read_html()
# Use further rvest commands to extract required data
# ...
# End Selenium Session
remDr$close()
Basic vignette: https://docs.ropensci.org/RSelenium/articles/basics.html
For attribution, please cite this work as
shikokuchuo (2021, May 3). shikokuchuo{net}: R | Selenium. Retrieved from https://shikokuchuo.net/posts/03-rselenium/
BibTeX citation
@misc{shikokuchuo2021r, author = {shikokuchuo, }, title = {shikokuchuo{net}: R | Selenium}, url = {https://shikokuchuo.net/posts/03-rselenium/}, year = {2021} }