Data migration in NestJS
Here’s a pain-free way to change the structure of data in NestJS
Coming to NestJS from a Ruby on Rails background, there’s a lot to like, a lot that’s familiar and also some stuff I miss. But I have a great time trying to replicate those missing features.
Take, for example, data migrations. When you’re working on a live project, you might want to change aspects of the data structure. A simple example is if you have a books database with a single author.
Then, one day, your client informs you that some books have multiple authors, and has asked if you can make it possible for a book to reflect this.
For a non-technical client, this seems like a simple request, but for us devs, we know it’s not. We’ll have to set up a new relation (probably through a join table) and transfer that data. Setting up the new relation is easy enough, but how do we move that data from the old relation to the old relation?
A naïve solution might be to make that change directly in the database migration, but, dear reader, here be dragons. Writing our data migrations in this way means that they stay around forever. If something in our codebase changes and a new developer runs the whole migration suite, this could cause the migrations to fail. The full pitfalls of this approach are outlined in this great blog post by Elle Meredith at Thoughtbot.
Another solution might be to add automated data migrations, that run in a similar way to database migrations, running at deploy time, and updating a migrations table to let the codebase know when a migration has run.
In the past, we’ve used data-migrate for this, but this comes with pitfalls. What happens if the deploy fails? We can often end up with strange errors and changes in state, so it’s nicer to have a bit more control.
My recent and preferred approach is having small, short-lived scripts in the codebase that we run manually after deploying. In the example of changing our books table to have multiple authors, we would do the following:
- add a new authors relation
- write a data migration to move the author relation to the authors relation
- deploy the code
- run the migration in the production environment
- remove the old relation
- deploy that code
In Rails, we’ve added a little generator that generates a file in db/data_migrate that we can then run with rails runner. The beauty of rails runner is that any code that’s run with that command has access to the whole of the Rails environment, so we have access to the database and the rest of the application.
In NestJS, we don’t have the same luxury. However, I have found a neat solution that works just as well. For the uninitiated, the main entrypoint to a Nest app is main.ts. This is where the application is loaded, binded to Express and booted:
import { NestFactory } from '@nestjs/core'; import { AppModule } from './app.module'; async function bootstrap() { const app = await NestFactory.create(AppModule); await app.listen(3000); } bootstrap();
We can steal this approach for our data migrations, which could then look like the following.
import { NestFactory } from '@nestjs/core'; import { AppModule } from './app.module'; async function migrate() { const application = await NestFactory.createApplicationContext(AppModule); // Add migration code here await application.close(); process.exit(0); } migrate();
Notice the use of NestFactory.createApplicationContext
, this gives us a reference to the application, so we can fetch things like services from the application like so:
const application = await NestFactory.createApplicationContext(AppModule); const booksService = application.get(BooksService);
In our books example above, we could then do something like this:
async function migrate() { const application = await NestFactory.createApplicationContext(AppModule); const bookService = application.get(BooksService); const books = await bookService.getAll(); console.log(`Updating ${books.length} books`); for (const book of books) { book.authors = [book.author]; bookService.save(book); } console.log('Done!'); await application.close(); process.exit(0); } migrate();
We can then save that file to a known location, and run it with ts-node
:
ts-node --transpile-only path/to/your/migration.ts
(Here we’re adding transpile-only
to prevent type information from being produced, which has a lower memory footprint and prevents any weirdness in production)
So, this is all great, but copying and pasting boilerplate code is boring. How about we add a generator to generate our data migrations in a known place, with some guidance on what to do next? This could look something like this:
import * as fs from 'fs'; import { format } from 'date-fns'; async function generateDataMigration() { const timestamp = format(new Date(), 'yyyymmddHMS'); const name = process.argv[2] .replace(/[A-Z]/g, (m) => '-' + m.toLowerCase()) .substring(1); const filename = `${timestamp}-${name}`; const path = __dirname + `/../src/db/data/${filename}.ts`; const body = `// Run me with \`ts-node --transpile-only ./src/db/data/${filename}.ts\` import { NestFactory } from '@nestjs/core'; import { AppModule } from '../../app.module';async function migrate() { const application = await NestFactory.createApplicationContext(AppModule); // Add migration code here await application.close(); process.exit(0); } migrate();`; fs.writeFile(path, body, { flag: 'a+' }, function (err) { if (err) { return console.error(err); } console.log(`Data migration ${filename} created!`); }); } generateDataMigration();
This generates a boilerplate migration in a known place with a timestamp. We can then run it in development with (assuming your script is at:
./utils/generate-data-migration.ts): ts-node ./utils/generate-data-migration.ts
You could even add a data:migrate:generate
to your package.json
like so:
"data:migrate:generate": "ts-node ./util/generate-data-migration.ts",
And that’s it! Now you can safely and carefully create data migrations on your project whenever you need to!