VictorOps is now Splunk On-Call! Learn More.
The connection between an application and SQL database raises some interesting and complicated challenges when the time comes to update your schema and maintain a consistent user experience. In this article, we’ll talk about these challenges and explore patterns and best practices you can implement to overcome these obstacles.
When we talk about maintaining a consistent user experience, our ultimate goal is to manage necessary updates with zero downtime. We’ll begin by talking about how you can invest in mitigating these problems as you begin a new project, and we’ll wrap it up by talking about how to manage changes with existing projects.
By investing in sound designs and architecture at the outset of a project, you can avoid many of the problems and headaches right from the beginning. One of the smartest investments you can make is to design and carefully review an interface before coding even begins. Once your interface is in place, coding to that interface reduces many problems down the road.
When we talk about an interface, we’re referring to an Application Program Interface (API) or Command Line Interface (CLI). These interfaces function as a contract between services and between your services and your relational database. Optimizing your code, resolving bugs, and improving logic within the confines of the agreed-upon interface or contract, positively impacts the lives of everyone involved.
Another investment you should consider is to use object-relational mapping (ORM) in your application. ORM isn’t always the best approach for an application but it can provide a few benefits to your team. An ORM abstracts many of the complexities involved in mapping objects to and from your relational database, making it easier to interact with code and make updates to the database schema in a unified manner.
Even with the upfront investment in design and coding standards, change is inevitable. We refer to these changes as schema evolutions. Schema evolutions require the same level, or more, of care to implement and deploy than the code they support. In the past, utilities such as Liquibase and Flyway provided a way to codify changes and implement them concurrently with new code. Unfortunately, in a microservices world, this approach might not be possible or feasible.
Zero downtime schema evolutions require us to break each change down into a series of smaller steps. They also require us to understand the impact that each change might have on the database. Two areas of focus should be:
The effect of the change on the locking mechanisms within the database management system (DBMS).
Ensuring that all changes maintain forward and backward compatibility with the change in your codebase.
In a world where multiple users may be accessing and manipulating different records throughout a database, the management systems use locking to prevent problems that may be introduced by inconsistent data. The DBMS might lock a single record, a column, or an entire table within the database. The DBMS might also include different hierarchies of locks that allow critical changes to take precedence over less critical changes.
How the system implements locks varies between database vendors, and even between versions of the same system. Therefore, it’s essential to understand the locking mechanisms in your specific database before making changes. It would be best if you also investigated whether your actions might result in unintentional locking. Even a change as simple as indexing a new column could result in locking the entire table for some time, and negatively impacting your zero downtime goal.
(Image source: pixabay.com)
Once we understand locking, we need to design our changes in such a way as to reduce downtime and maintain compatibility with all versions of our software. Let’s consider the following example:
Let’s say we have a table that stores order data. Within the table, we have a column to store the total amount for each order. And, let’s pretend we need to change the name of the column from orderTotal to pretaxOrderAmount.
A brand new intern, fresh out of their college SQL course, might suggest the following:
ALTER TABLE orders RENAME COLUMN orderTotal TO pretaxOrderAmount;
That statement might accomplish our objective but if an older version of our application attempts to retrieve data from orderTotal, we’re going to return an error. We’re also not entirely sure of the effect this change might have on table access while we execute it.
Let’s try the following approach instead:
ALTER TABLE orders ADD COLUMN pretaxOrderAmount FLOAT;
Our table now has both columns which protects both our new and old application versions from receiving an error. However, we have a few more steps to complete. We need to ensure the data is available in both columns but we’re also aware that, if the orders table is vast, we might run the risk of locking it for an extended period.
We’ll solve the locking problem by using a technique called sharding. The concept of sharding is that we break a large dataset up into smaller sets of more manageable data. So, if our orders table has 358 records, we might consider updating it with statements like the following:
UPDATE orders SET pretaxOrderAmount = orderTotal WHERE ID BETWEEN 1 and 100; UPDATE orders SET pretaxOrderAmount = orderTotal WHERE ID BETWEEN 101 and 200; UPDATE orders SET pretaxOrderAmount = orderTotal WHERE ID BETWEEN 201 and 300; UPDATE orders SET pretaxOrderAmount = orderTotal WHERE ID > 300;
Within your application, you should continue to write data to both columns – ensuring you maintain data integrity for all versions.
With subsequent updates, you can remove all instructions from your application that read the value from the original column. Once you remove all those references, you can also begin removing instructions to write values to the original column, and you might need to remove any constraints that might prevent a NULL value from being written to the column.
You should carry out the process of deleting a column, only after an exhaustive testing and maintenance period. An organization might also decide to quarantine a column for some time, before deleting it. Deleting a column should only be attempted after you are sure that you are capturing all data elsewhere and that nothing in your ecosystem references the column.
Even with the best of intentions and planning, mistakes can, and do, happen. While developing a system and standards to attain the goal of zero downtime, you must rehearse your changes and test them before trying anything in your production environment. Finally, an essential best practice is to monitor your system before, during, and after any change. Monitoring is also a crucial part of any production application and its relational databases.
Even if you do everything in your power to avoid SQL database impact when deploying code to production, incidents will still occcur. Check out a 14-day, free trial of VictorOps to learn more about the modern approach to on-call alerting and real-time incident response and remediation.
Mike Mackrory is a Global citizen who has settled down in the Pacific Northwest — for now. By day he works as a Lead Engineer on a DevOps team, and by night, he writes and tinkers with other technology projects. When he’s not tapping on the keys, he can be found hiking, fishing and exploring both the urban and rural landscape with his kids. Always happy to help out another developer, he has a definite preference for helping those who bring gifts of gourmet donuts, craft beer and/or single-malt Scotch.