tidyquery runs SQL queries on R data frames.
It uses queryparser to translate SQL queries into R expressions, then it uses dplyr to evaluate these expressions and return results. tidyquery does not load data frames into a database; it queries them in place.
Install the released version of tidyquery from CRAN with:
Or install the development version from GitHub with:
# install.packages("devtools") devtools::install_github("ianmcook/tidyquery")
tidyquery exports two functions: query() and show_dplyr().
To run a SQL query on an R data frame, call the function query(), passing a SELECT statement enclosed in quotes as the first argument. The table names in the FROM clause should match the names of data frames in your current R session:
library(tidyquery) library(nycflights13) query( " SELECT origin, dest, COUNT(flight) AS num_flts, round(SUM(seats)) AS num_seats, round(AVG(arr_delay)) AS avg_delay FROM flights f LEFT OUTER JOIN planes p ON f.tailnum = p.tailnum WHERE distance BETWEEN 200 AND 300 AND air_time IS NOT NULL GROUP BY origin, dest HAVING num_flts > 3000 ORDER BY num_seats DESC, avg_delay ASC LIMIT 2;" ) #> # A tibble: 2 x 5 #> origin dest num_flts num_seats avg_delay #> <chr> <chr> <int> <dbl> <dbl> #> 1 LGA DCA 4468 712643 6 #> 2 EWR BOS 5247 611192 5
Alternatively, for single-table queries, you can pass a data frame as the first argument and a SELECT statement as the second argument, omitting the FROM clause. This allows query() to function like a dplyr verb:
library(dplyr) airports %>% query("SELECT name, lat, lon ORDER BY lat DESC LIMIT 5") #> # A tibble: 5 x 3 #> name lat lon #> <chr> <dbl> <dbl> #> 1 Dillant Hopkins Airport 72.3 42.9 #> 2 Wiley Post Will Rogers Mem 71.3 -157. #> 3 Wainwright Airport 70.6 -160. #> 4 Wainwright As 70.6 -160. #> 5 Atqasuk Edward Burnell Sr Memorial Airport 70.5 -157.
You can chain dplyr verbs before and after query():
planes %>% filter(engine == "Turbo-fan") %>% query("SELECT manufacturer AS maker, COUNT(*) AS num_planes GROUP BY maker") %>% arrange(desc(num_planes)) %>% head(5) #> # A tibble: 5 x 2 #> maker num_planes #> <chr> <int> #> 1 BOEING 1276 #> 2 BOMBARDIER INC 368 #> 3 AIRBUS 331 #> 4 EMBRAER 298 #> 5 AIRBUS INDUSTRIE 270
In the SELECT statement, the names of data frames and columns are case-sensitive (like in R) but keywords and function names are case-insensitive (like in SQL).
In addition to R data frames and tibbles (tbl_df objects), query() can be used to query other data frame-like objects, including:
tbl_sql objects created with dbplyr or a dbplyr backend package, enabling you to write SQL which is translated to dplyr then translated back to SQL and run in a database 🤪
tidyquery works by generating dplyr code. To print the dplyr code instead of running it, use show_dplyr():
show_dplyr( " SELECT manufacturer, COUNT(*) AS num_planes FROM planes WHERE engine = 'Turbo-fan' GROUP BY manufacturer ORDER BY num_planes DESC;" ) #> planes %>% #> filter(engine == "Turbo-fan") %>% #> group_by(manufacturer) %>% #> summarise(num_planes = dplyr::n()) %>% #> ungroup() %>% #> arrange(dplyr::desc(num_planes))
tidyquery also has the following additional limitations:
Joins involving three or more tables are not supported.
Because joins in dplyr currently work in a fundamentally different way than joins in SQL, some other types of join queries are not supported. Examples of unsupported join queries include non-equijoin queries and outer join queries with qualified references to the join column(s). Planned changes in dplyr will enable future versions of tidyquery to support more types of joins.
In the code printed by show_dplyr(), calls to functions with more than five arguments might be truncated, with arguments after the fifth replaced with .... This is caused by a current limitation of the rlang package that is expected to be resolved in a future version.
The sqldf package (CRAN, GitHub) runs SQL queries on R data frames by transparently setting up a database, loading data from R data frames into the database, running SQL queries in the database, and returning results as R data frames.