--- title: "Creating Migrations" author: "James P. Gilbert" date: "`r Sys.Date()`" output: pdf_document: toc: yes html_document: number_sections: yes toc: yes vignette: > %\VignetteIndexEntry{Creating Migrations} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} --- # Introduction Migrating existing data models can be a tricky process that often creates incompatibility between result viewers and existing result sets. This guide aims to show how to use the `ResultModelManager` class to create migrations for a given result model, either using a package or file structure. Please see the [HADES library](https://ohdsi.github.io/hades) for more information on HADES packages. # Assumptions This package assumes that you are familiar with R and OHDSI Hades packages in general. These examples make use of `DatabaseConnector` and `SqlRender`. The management of data integrity is left to the user, migrations should be designed and tested before deployment. Steps to maintain the data (such as backup plans) should be made prior to performing migrations in case of data corruption. # Creating the required file structure The first step is creating a proper folder structure for migrations. The chosen path is dependent on the structure used, the most consistent and recommended way is to expose a function within an R package to allow users to upgrade a data model. However, a flat folder structure that does not require an R package to be installed is also supported. ## In an R package Data migrations should exist in an isolated folder within the `/inst/sql/` directory of a package. The recommended convention is to use `migrations` across all Hades package. As migrations are supported by multiple database platforms this folder should exist within the generic (and SqlRender OHDSI common sql) `sql_server` folder, `inst/sql/sql_server/migrations`. For any database specific migrations they should be in the approprate sub directory. For example: ```{bash eval = FALSE} inst/sql/ sql_server/migrations/Migration_1-create.sql sqlite/migrations/Migration_1-create.sql redshift/migrations/Migration_1-create.sql ``` ## Using folder structure A folder structure requires a slightly different set up. Here, the migrations should be split via database platform within the `migration` path. For example. ```{bash eval = FALSE} migrations/ sql_server/Migration_1-create.sql sqlite/Migration_1-create.sql redshift/Migration_1-create.sql ``` # Adding a migration All data migrations are assumed to be in OHSI SQL and stored within a migration folder (see above for set up). Inside this folder only migrations that conform to a regular expression such as `(Migration_[0-9]+)-(.+).sql`. Explicitly, this encodes several things: * That the file is a migration and only intended to be executed once and by a DMM instance * The position in the sequence in which the migration will be executed (i.e. a natural number) * The string name of the migration * The fact that its an sql file For example, the following file names will work: ``` Migration_2-MyMigration.sql Migration_2-v3.2whaterver.sql Migration_4-TEST.sql Migration_4-2018922-vAAAA.sql ``` However, the following would be invalid: ``` MyMigration.sql # Does not include Migration_1 Migration_2v3.2whaterver.sql # missing - -TEST_Migration_1.sql # Wrong order Migraton_4-a.sql # Migration spelt wrong ``` # Adding migrator Each package/project should expose an instantiated DMM with the package specfic considerations. For example, for the package `CohortDiagnostics` a function such as the following may be written: ```{r, eval = FALSE} #' @export getDataMigrator <- function(connectionDetails, databaseSchema, tablePrefix) { ResultModelManager::DataMigrationManager$new( connectionDetails = connectionDetails, databaseSchema = databaseSchema, tablePrefix = tablePrefix, migrationPath = "migrations", packageName = "CohortDiagnostics" ) } ``` This will return an instance of data migrator that will expose teh functionality on a given data set. Naturally, the package is not strictly required for creating a migration manager (should the directory structure conform to the above outline) here you should set it according to your project's set up. Loading the migrator is then straightforward: ```{r eval = FALSE} connectionDetails <- DatabaseConnector::createConnectionDetails(MySettings) migrator <- getDataMigrator(connectionDetails = connectionDetails, databaseSchema = "mySchema", tablePrefix = "cd_") ``` To check migrations are valid ```{r eval = FALSE} migrator$check() # Will return false and display any eronious files ``` To get the status of all migrations ```{r eval = FALSE} migrator$getStatus() # Will return data frame of all sql migrations and if they have been executed or not ``` To run the migrations: ```{r eval = FALSE} ## It is strongly recommended that you create some form of backup before doing this migrator$executeMigrations() ``` # Unit testing No specific advice is given for how to write unit tests for migrations, however, it is strongly advised that migrations are unit tested. # Common issues The following is a list of expected issues when handling Data migrations: ## Supporting all database platforms It is likely a challenge to support all `SqlRender/DatabaseConnector` supported dbmses. Therefore, careful consideration with regards to supported platforms should be made. At the time of writing, for results handling, we recommend supporting the open source platforms of SqlRender and Postgresql. This decision is left to the package author. ## SQLite column types It is a not possible to change a data type within an Sqlite table (the `ALTER TABLE` command does not work). Consequently, you will likely have to rename the existing table, create a new table with the modified DDL and then copy the existing data across (using appropriate data transformations/casting). For example, changing an `INT` column in the table `foo` to a float requires the sqlite specific transformation: ```{sqlite-sql} {DEFAULT @foo = foo} ALTER TABLE @database_schema.@table_prefix@foo RENAME TO _foo_old; CREATE TABLE @database_schema.@table_prefix@foo ( id bigint, foo float ); INSERT INTO @database_schema.@table_prefix@foo (id, foo) SELECT * FROM _foo_old; ``` ## Non-existent data The presence of a data model does not mean data is present. As packages are developed, it is expected that new data formats will be created. The recommended pattern for this case is to allow existing data to be upgraded but to handle the use case of missing data in downstream reports/web applications.