--- title: "getting-started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{getting-started} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # metricminer `metricminer` is an R package that helps you mine metrics on common places on the web through the power of their APIs. It also helps make the data in a format that is easily used for a dashboard or other purposes. It will have an associated dashboard template and tutorials to help you fully use the data you retrieve with `metricminer` (but these are still under development!) You can [read the metricminer package documentation here](https://hutchdatascience.org/metricminer/). ## Apps supported Currently `metricminer` supports mining data from: - [Calendly](https://calendly.com/) - [GitHub](https://github.com/) - [Google Analytics](https://developers.google.com/analytics) - [Google Forms](https://www.google.com/forms/about/) - [Slido](https://admin.sli.do/events) export files stored on Googledrive ## Data format options `metricminer` attempts to retrieve API data for you and give you it to you in a format that is a tidy data.frame. this means metricminer has to be opinionated about what metrics it returns so it fits in a useful and human ready to read data frame. If you find that the data returned is not what you need you have two options (these options can be pursued concurrently): 1. You can set the `dataformat` argument to `"raw"` to see the original, unedited JSON formatted data as it was returned from the API. Then you can personally look for the data that you want and extract it. 2. You can post a GitHub issue to explain why the metric missing from the data frame formatted data should be included. And if possible and reasonable, we can work on including that data in the next version of `metricminer`. ## How to install You can install metricminer from CRAN. ``` install.packages("metricminer") ``` If you want the development version (not advised) you can install using the `remotes` package to install from GitHub. ``` if (!("remotes" %in% installed.packages())) { install.packages("remotes") } remotes::install_github("fhdsl/metricminer") ``` ``` library(metricminer) ``` ## Basic Usage To start, you need to `authorize()` the package to access your data. If you run `authorize()` you will be asked which app you'd like to authorize and whether you'd like to cache that auth information. If you already know which app you'd like to authorize, like `google` for example, you can run `authorize("google")`. Then follow the instructions on the upcoming screens and select the scopes you feel comfortable sharing (you generally just need read permissions for metricminer to be able to collect data). ``` authorize() ``` If you want to clear out authorizations and caches stored by `metricminer` you can run: ``` delete_creds() ``` ### GitHub You can retrieve metrics from a repository on GitHub doing this: ``` authorize("github") metrics <- get_github_repo_summary(repo = "fhdsl/metricminer") ``` ``` authorize("github") metrics <- get_github_repo_timecourse(repo = "fhdsl/metricminer") ``` ### Calendly You can retrieve calendly events information using this type of workflow: ``` authorize("calendly") user <- get_calendly_user() events <- list_calendly_events(user = user$resource$uri) ``` ### Google Analytics You can retrieve Google Analytics data for websites like this. First you have to retrieve your account information after you've authorized. ``` authorize("google") accounts <- get_ga_user() ``` Then you need to retrieve the properties (aka usually the websites you are tracking) underneath that account. ``` properties_list <- get_ga_properties(account_id = accounts$id[1]) ``` Just need to shave off the `properties/` bit from this string. ``` property_id <- gsub("properties/", "", properties_list$properties$name[1]) ``` Now we can collect some stats. In Google Analytics `metrics` are your basic numbers (how many visits to your website, etc.). ``` metrics <- get_ga_stats(property_id, stats_type = "metrics") ``` Whereas `dimensions` are more a list of events that have happened. So here's a list of people that have logged on. ``` dimensions <- get_ga_stats(property_id, stats_type = "dimensions") ``` Lastly, we have a third option of collecting `link_clicks` and the links they have clicked. This is also known as a dimension according to Google analytics, but often it isn't compatible for us to download link click data at the same time as other dimension data so in `metricminer` we collect them separately. ``` link_clicks <- get_ga_stats(property_id, stats_type = "link_clicks") ``` ### Google Forms You can retrieve Google form information and responses like this: ``` authorize("google") form_url <- "https://docs.google.com/forms/d/1Z-lMMdUyubUqIvaSXeDu1tlB7_QpNTzOk3kfzjP2Uuo/edit" form_info <- get_google_form(form_url) ``` ### Slido If you have used Slido for interactive slide sessions and collected that info and exported it to your googledrive you can use `metricminer` to collect that data as well. ``` drive_id <- "https://drive.google.com/drive/folders/0AJb5Zemj0AAkUk9PVA" slido_data <- get_slido_files(drive_id) ``` ### YouTube If you have a YouTube channel and the URL is https://www.youtube.com/watch?v=oMVVeZjHJ48 Then you can extract stats for the videos on that YouTube channel using that URL. ``` authorize("google") youtube_video_stats <- get_youtube_video_stats("oMVVeZjHJ48") youtube_playlist_stats <- get_youtube_playlist_stats("PL9bqxQvtZgAMblZJhg7e0_ThDD-pN4UqA") ``` ## Bulk Retrievals Maybe you just want to retrieval it ALL. We have som wrapper functions that will attempt to do this for you. These functions are a bit more precarious/risky in that there may be reasons certain websites/repos/events/data may not be able to be collected. So collecting repositories one by one will allow you more insight into what is happening. However, these bulk retrieval functions may help you if you want to grab ALL of your accounts data in one swoop. Just make sure to carefully look over and curate that data after it is attempted to be collected. You may find some retrievals are empty for potentially good reasons (for example if a google form has no responses to collect it will show up with "no responses" in the respective part of the list). ### GitHub bulk From GitHub you can attempt to collect repository metrics from all repositories from an account. ``` authorize("github") all_repos_metrics <- get_multiple_repos_metrics(owner = "fhdsl") ``` If you want to do this by giving a list of specific repositories you want data from you can just provide a vector of those repository's names like this: ``` repo_names <- c("fhdsl/metricminer", "jhudsl/OTTR_Template") some_repos_metrics <- get_multiple_repos_metrics(repo_names = repo_names) ``` ### Google Analytics bulk Similar to single website retrieval we need to authorize the package. ``` authorize("google") accounts <- get_ga_user() ``` Then we can provide the account id to `get_multiple_ga_metrics` and it will attempt to grab all stats for all website properties underneath the provided account. ``` account_stats_list <- get_multiple_ga_metrics(account_id = 209776907) stats_list <- stats_list <- get_multiple_ga_metrics(property_ids = c(422671031, 422558989)) ``` ### Google Forms bulk As always, we need to authorize the app. ``` authorize("google") ``` We can retrieve a list of form ids using `googledrive` R package. ``` form_list <- googledrive::drive_find( shared_drive = googledrive::as_id("0AJb5Zemj0AAkUk9PVA"), type = "form") ``` Now we can provide this vector of form ids to `get_multiple_forms` ``` multiple_forms <- get_multiple_forms(form_ids = form_list$id) ``` ## Non-interactive authorizing from secrets If you'd like to authorize non-interactively (whether on GitHub actions or locally) you can set your tokens using `Sys.setenv()` ### Setting Calendly auth from secret You can [go here to get an API key](https://calendly.com/integrations/api_webhooks). You likely will have to login first. Then you can store this by putting your API key in this type of command: ``` Sys.setenv(METRICMINER_CALENDLY = "Put calendly token here") ``` Now in your script if you run the following, you will have authorization to Calendly. ``` auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY")) ``` ### Setting GitHub auth from secret Similar steps can be done for the GitHub personal access token. First [go here to get a GitHub PAT](https://github.com/settings/tokens/new?description=metricminer&scopes=repo,read:packages,read:org). You will likely have to login first. Then you can run this command but put your GitHub PAT there. ``` Sys.setenv(METRICMINER_GITHUB_PAT = "Put GitHub PAT here") ``` Now in your script if you run the following, you will have authorization to GitHub. ``` # Authorize GitHub auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT")) ``` ### Setting Google auth from secret For Google you can authorize from secret by doing the normal interactive way using `authorize("google")` but storing the result like this: ``` token <- authorize("google") ``` Then you can use this object to extract two secrets by printing them out like this: `token$credentials$access_token` `token$credentials$refresh_token` Then you can set these in your environment doing the same steps as before: ``` Sys.setenv(METRICMINER_GOOGLE_ACCESS = "Google access token here") Sys.setenv(METRICMINER_GOOGLE_REFRESH = "Google refresh token here") ``` Now in your script if you run the following you will have authorization to Google Apps. ``` # Authorize Google auth_from_secret("google", refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"), access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"), cache = TRUE ) ``` ## Authorizing on GitHub Actions In GitHub you can run `metricminer` using authorization if you use the above steps to retrieve the necessary keys but then store them each as GitHub Secrets. [Read here about how to store GitHub secrets](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions#creating-secrets-for-a-repository) You'll need the secrets to be stored as the respective key name we've referenced above: ``` METRICMINER_CALENDLY METRICMINER_GITHUB_PAT METRICMINER_GOOGLE_REFRESH METRICMINER_GOOGLE_ACCESS ``` Then in your GitHub action yaml you'll need something like this to extract and authorize these secrets in the environment. ``` - name: Authorize metricminer env: METRICMINER_CALENDLY: ${{ secrets.METRICMINER_CALENDLY }} METRICMINER_GITHUB_PAT: ${{ secrets.METRICMINER_GITHUB_PAT }} METRICMINER_GOOGLE_ACCESS: ${{ secrets.METRICMINER_GOOGLE_ACCESS }} METRICMINER_GOOGLE_REFRESH: ${{ secrets.METRICMINER_GOOGLE_REFRESH }} run: | # Authorize Calendly auth_from_secret("calendly", token = Sys.getenv("METRICMINER_CALENDLY")) # Authorize GitHub auth_from_secret("github", token = Sys.getenv("METRICMINER_GITHUB_PAT")) # Authorize Google auth_from_secret("google", refresh_token = Sys.getenv("METRICMINER_GOOGLE_REFRESH"), access_token = Sys.getenv("METRICMINER_GOOGLE_ACCESS"), cache = TRUE ) ### Now run the R commands you want here or call an R script in a later step. shell: Rscript {0} ``` ### Session info ```{r} sessionInfo() ```