Advanced DBI Usage

James Wondrasek, Kirill Müller

17/03/2020

Who this tutorial is for

This tutorial is for you if you need to use a richer set of SQL features such as data manipulation queries, parameterized queries and queries performed using SQL’s transaction features. See vignette("DBI", package = "DBI") for a more basic tutorial covering connecting to DBMS and executing simple queries.

How to run more complex queries using DBI

dbGetQuery() works by calling a number of functions behind the scenes. If you need more control you can manually build your own query, retrieve results at your selected rate, and release the resources involved by calling the same functions.

These functions are:

library(DBI)

con <- dbConnect(
  RMariaDB::MariaDB(),
  host = "db.relational-data.org",
  port = 3306,
  username = "guest",
  password = "relational",
  dbname = "sakila"
)

res <- dbSendQuery(con, "SELECT * FROM film WHERE rating = 'G'")
df <- dbFetch(res, n = 3)
dbClearResult(res)

head(df, 3)
##   film_id            title
## 1       2   ACE GOLDFINGER
## 2       4 AFFAIR PREJUDICE
## 3       5      AFRICAN EGG
##                                                                                                             description
## 1                  A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China
## 2                          A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank
## 3 A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico
##   release_year language_id original_language_id rental_duration rental_rate
## 1         2006           1                   NA               3        4.99
## 2         2006           1                   NA               5        2.99
## 3         2006           1                   NA               6        2.99
##   length replacement_cost rating               special_features
## 1     48            12.99      G        Trailers,Deleted Scenes
## 2    117            26.99      G Commentaries,Behind the Scenes
## 3    130            22.99      G                 Deleted Scenes
##           last_update
## 1 2006-02-15 04:03:42
## 2 2006-02-15 04:03:42
## 3 2006-02-15 04:03:42

How to read part of a table from a database

If your dataset is large you may want to fetch a limited number of rows at a time. As demonstrated below, this can be accomplished by using a while loop where the function dbHasCompleted() is used to check for ongoing rows, and dbFetch() is used with the n = X argument, specifying how many rows to return on each iteration. Again, we call dbClearResult() at the end to release resources.

res <- dbSendQuery(con, "SELECT * FROM film")
while (!dbHasCompleted(res)) {
  chunk <- dbFetch(res, n = 300)
  print(nrow(chunk))
}
## [1] 300
## [1] 300
## [1] 300
## [1] 100
dbClearResult(res)

How to use parameters (safely) in SQL queries

dbSendQuery() can be used with parameterized SQL queries. DBI supports two ways to avoid SQL injection attacks from user-supplied parameters: quoting and parameterized queries.

Quoting

Quoting of parameter values is performed using the function dbQuoteLiteral(), which supports many R data types, including date and time.1

In cases where users may be supplying table or column names to use in the query for data retrieval, those names or identifiers must also be escaped. As there may be DBMS-specific rules for escaping these identifiers, DBI provides the function dbQuoteIdentifier() to generate a safe string representation.

safe_id <- dbQuoteIdentifier(con, "rating")
safe_param <- dbQuoteLiteral(con, "G")

query <- paste0("SELECT title, ", safe_id, " FROM film WHERE ", safe_id, " = ", safe_param)
query
## [1] "SELECT title, `rating` FROM film WHERE `rating` = 'G'"
res <- dbSendQuery(con, query)
dbFetch(res)
##              title rating
## 1   ACE GOLDFINGER      G
## 2 AFFAIR PREJUDICE      G
## 3      AFRICAN EGG      G
## Showing 3 out of 178 rows.
dbClearResult(res)

The same result can be had by using glue::glue_sql(). It performs the same safe quoting on any variable or R statement appearing between braces within the query string.

id <- "rating"
param <- "G"
query <- glue::glue_sql("SELECT title, {`id`} FROM film WHERE {`id`} = {param}", .con = con)

df <- dbGetQuery(con, query)
head(df, 3)
##              title rating
## 1   ACE GOLDFINGER      G
## 2 AFFAIR PREJUDICE      G
## 3      AFRICAN EGG      G

Parameterized queries

Rather than performing the parameter substitution ourselves, we can push it to the DBMS by including placeholders in the query. Different DBMS use different placeholder schemes, DBI passes through the SQL expression unchanged.

MariaDB uses a question mark (?) as placeholder and expects an unnamed list of parameter values. Other DBMS may use named parameters. We recommend consulting the documentation for the DBMS you are using. As an example, a web search for “mariadb parameterized queries” leads to the documentation for the PREPARE statement which mentions:

Within the statement, “?” characters can be used as parameter markers to indicate where data values are to be bound to the query later when you execute it.

Currently there is no list of which placeholder scheme a particular DBMS supports.

Placeholders only work for literal values. Other parts of the query, e.g. table or column identifiers, still need to be quoted with dbQuoteIdentifier().

For a single set of parameters, the params argument to dbSendQuery() or dbGetQuery() can be used. It takes a list and its members are substituted in order for the placeholders within the query.

params <- list("G")
safe_id <- dbQuoteIdentifier(con, "rating")

query <- paste0("SELECT * FROM film WHERE ", safe_id, " = ?")
query
## [1] "SELECT * FROM film WHERE `rating` = ?"
res <- dbSendQuery(con, query, params = params)
dbFetch(res, n = 3)
##   film_id            title
## 1       2   ACE GOLDFINGER
## 2       4 AFFAIR PREJUDICE
## 3       5      AFRICAN EGG
##                                                                                                             description
## 1                  A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China
## 2                          A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank
## 3 A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico
##   release_year language_id original_language_id rental_duration rental_rate
## 1         2006           1                   NA               3        4.99
## 2         2006           1                   NA               5        2.99
## 3         2006           1                   NA               6        2.99
##   length replacement_cost rating               special_features
## 1     48            12.99      G        Trailers,Deleted Scenes
## 2    117            26.99      G Commentaries,Behind the Scenes
## 3    130            22.99      G                 Deleted Scenes
##           last_update
## 1 2006-02-15 04:03:42
## 2 2006-02-15 04:03:42
## 3 2006-02-15 04:03:42
dbClearResult(res)

Below is an example query using multiple placeholders with the MariaDB driver. The placeholders are supplied as a list of values ordered to match the position of the placeholders in the query.

q_params <- list("G", 90)
query <- "SELECT title, rating, length FROM film WHERE rating = ? AND length >= ?"

res <- dbSendQuery(con, query, params = q_params)
dbFetch(res, n = 3)
##              title rating length
## 1 AFFAIR PREJUDICE      G    117
## 2      AFRICAN EGG      G    130
## 3  ALAMO VIDEOTAPE      G    126
dbClearResult(res)

When you wish to perform the same query with different sets of parameter values, dbBind() is used. There are two ways to use dbBind(). Firstly, it can be used multiple times with same query.

res <- dbSendQuery(con, "SELECT * FROM film WHERE rating = ?")
dbBind(res, list("G"))
dbFetch(res, n = 3)
##   film_id            title
## 1       2   ACE GOLDFINGER
## 2       4 AFFAIR PREJUDICE
## 3       5      AFRICAN EGG
##                                                                                                             description
## 1                  A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China
## 2                          A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank
## 3 A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico
##   release_year language_id original_language_id rental_duration rental_rate
## 1         2006           1                   NA               3        4.99
## 2         2006           1                   NA               5        2.99
## 3         2006           1                   NA               6        2.99
##   length replacement_cost rating               special_features
## 1     48            12.99      G        Trailers,Deleted Scenes
## 2    117            26.99      G Commentaries,Behind the Scenes
## 3    130            22.99      G                 Deleted Scenes
##           last_update
## 1 2006-02-15 04:03:42
## 2 2006-02-15 04:03:42
## 3 2006-02-15 04:03:42
dbBind(res, list("PG"))
dbFetch(res, n = 3)
##   film_id            title
## 1       1 ACADEMY DINOSAUR
## 2       6     AGENT TRUMAN
## 3      12   ALASKA PHANTOM
##                                                                                        description
## 1 A Epic Drama of a Feminist And a Mad Scientist who must Battle a Teacher in The Canadian Rockies
## 2        A Intrepid Panorama of a Robot And a Boy who must Escape a Sumo Wrestler in Ancient China
## 3               A Fanciful Saga of a Hunter And a Pastry Chef who must Vanquish a Boy in Australia
##   release_year language_id original_language_id rental_duration rental_rate
## 1         2006           1                   NA               6        0.99
## 2         2006           1                   NA               3        2.99
## 3         2006           1                   NA               6        0.99
##   length replacement_cost rating                 special_features
## 1     86            20.99     PG Deleted Scenes,Behind the Scenes
## 2    169            17.99     PG                   Deleted Scenes
## 3    136            22.99     PG      Commentaries,Deleted Scenes
##           last_update
## 1 2006-02-15 04:03:42
## 2 2006-02-15 04:03:42
## 3 2006-02-15 04:03:42
dbClearResult(res)

Secondly, dbBind() can be used to execute the same statement with multiple values at once.

res <- dbSendQuery(con, "SELECT * FROM film WHERE rating = ?")
dbBind(res, list(c("G", "PG")))
dbFetch(res, n = 3)
##   film_id            title
## 1       2   ACE GOLDFINGER
## 2       4 AFFAIR PREJUDICE
## 3       5      AFRICAN EGG
##                                                                                                             description
## 1                  A Astounding Epistle of a Database Administrator And a Explorer who must Find a Car in Ancient China
## 2                          A Fanciful Documentary of a Frisbee And a Lumberjack who must Chase a Monkey in A Shark Tank
## 3 A Fast-Paced Documentary of a Pastry Chef And a Dentist who must Pursue a Forensic Psychologist in The Gulf of Mexico
##   release_year language_id original_language_id rental_duration rental_rate
## 1         2006           1                   NA               3        4.99
## 2         2006           1                   NA               5        2.99
## 3         2006           1                   NA               6        2.99
##   length replacement_cost rating               special_features
## 1     48            12.99      G        Trailers,Deleted Scenes
## 2    117            26.99      G Commentaries,Behind the Scenes
## 3    130            22.99      G                 Deleted Scenes
##           last_update
## 1 2006-02-15 04:03:42
## 2 2006-02-15 04:03:42
## 3 2006-02-15 04:03:42
dbClearResult(res)

Use a list of vectors if your query has multiple parameters:

q_params <- list(c("G", "PG"), c(90, 120))
query <- "SELECT title, rating, length FROM film WHERE rating = ? AND length >= ?"

res <- dbSendQuery(con, query, params = q_params)
dbFetch(res, n = 3)
##              title rating length
## 1 AFFAIR PREJUDICE      G    117
## 2      AFRICAN EGG      G    130
## 3  ALAMO VIDEOTAPE      G    126
dbClearResult(res)

Always disconnect from the database when done.

dbDisconnect(con)

SQL data manipulation - UPDATE, DELETE and friends

For SQL queries that affect the underlying database, such as UPDATE, DELETE, INSERT INTO, and DROP TABLE, DBI provides two functions. dbExecute() passes the SQL statement to the DBMS for execution and returns the number of rows affected. dbSendStatement() performs in the same manner, but returns a result object. Call dbGetRowsAffected() with the result object to get the count of the affected rows. You then need to call dbClearResult() with the result object afterwards to release resources.

In actuality, dbExecute() is a convenience function that calls dbSendStatement(), dbGetRowsAffected(), and dbClearResult(). You can use these functions if you need more control over the query process.

The subsequent examples use an in-memory SQL database provided by RSQLite::SQLite(), because the remote database used in above examples does not allow writing.

library(DBI)
con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "cars", head(cars, 3))

dbExecute(
  con,
  "INSERT INTO cars (speed, dist) VALUES (1, 1), (2, 2), (3, 3)"
)
## [1] 3
rs <- dbSendStatement(
  con,
  "INSERT INTO cars (speed, dist) VALUES (4, 4), (5, 5), (6, 6)"
)
dbGetRowsAffected(rs)
## [1] 3
dbClearResult(rs)

dbReadTable(con, "cars")
##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## Showing 3 out of 9 rows.

Do not forget to disconnect from the database at the end.

dbDisconnect(con)

SQL transactions with DBI

DBI allows you to group multiple queries into a single atomic transaction. Transactions are initiated with dbBegin() and either made persistent with dbCommit() or undone with dbRollback(). The example below updates two tables and ensures that either both tables are updated, or no changes are persisted to the database and an error is thrown.

con <- dbConnect(RSQLite::SQLite(), ":memory:")

dbWriteTable(con, "cash", data.frame(amount = 100))
dbWriteTable(con, "account", data.frame(amount = 2000))

withdraw <- function(amount) {
  # All operations must be carried out as logical unit:
  dbExecute(con, "UPDATE cash SET amount = amount + ?", list(amount))
  dbExecute(con, "UPDATE account SET amount = amount - ?", list(amount))
}

withdraw_transacted <- function(amount) {
  # Ensure atomicity
  dbBegin(con)

  # Perform operation
  withdraw(amount)

  # Persist results
  dbCommit(con)
}

withdraw_transacted(300)

After withdrawing 300 credits, the cash is increased and the account is decreased by this amount. The transaction ensures that either both operations succeed, or no change occurs.

dbReadTable(con, "cash")
##   amount
## 1    400
dbReadTable(con, "account")
##   amount
## 1   1700

We can roll back changes manually if necessary. Do not forget to call dbRollback() in case of error, otherwise the transaction remains open indefinitely.

withdraw_if_funds <- function(amount) {
  dbBegin(con)
  withdraw(amount)
  # Rolling back after detecting negative value on account:
  if (dbReadTable(con, "account")$amount >= 0) {
    dbCommit(con)
    TRUE
  } else {
    message("Insufficient funds")
    dbRollback(con)
    FALSE
  }
}

withdraw_if_funds(5000)
## Insufficient funds
## [1] FALSE
dbReadTable(con, "cash")
##   amount
## 1    400
dbReadTable(con, "account")
##   amount
## 1   1700

dbWithTransaction() simplifies using transactions. Pass it a connection and the code you want to run as a transaction. It will execute the code and call dbCommit() on success and call dbRollback() if an error is thrown.

withdraw_safely <- function(amount) {
  dbWithTransaction(con, {
    withdraw(amount)
    if (dbReadTable(con, "account")$amount < 0) {
      stop("Error: insufficient funds", call. = FALSE)
    }
  })
}

withdraw_safely(5000)
## Error: Error: insufficient funds
dbReadTable(con, "cash")
##   amount
## 1    400
dbReadTable(con, "account")
##   amount
## 1   1700

As usual, do not forget to disconnect from the database when done.

dbDisconnect(con)

Conclusion

That concludes the major features of DBI. For more details on the library functions covered in this tutorial and the vignette("DBI", package = "DBI") introductory tutorial see the DBI specification at vignette("spec", package = "DBI"). If you are after a data manipulation library that works at a higher level of abstraction, check out dplyr. It is a grammar of data manipulation that can work with local dataframes and remote databases and uses DBI under the hood.


  1. An older method, dbQuoteString(), was used to quote string values only. The dbQuoteLiteral() method forwards to dbQuoteString() for character vectors. Users do not need to distinguish between these two cases.↩︎