Recently, We have done a PostgreSQL version upgrade and faced few performance issues afterward. I thought to share it in this short blog post so that It can help others plan better.
We are using the RDS service from AWS to manage our production database. So all we had to do is modify database settings to select the newer version and apply changes.
During major version upgrades, RDS used pg_upgrade utility command from Postgres to do the upgrade. The internal format of system tables, data files, and internal data storage format also changes.
In both minor and major version upgrades, RDS completes the following steps:
1. Takes a pre-upgrade snapshot (if configured for backups). You can use this snapshot for rollbacks.
2. Shuts down the instance and prepares it for the upgrade.
3. Uses the
pg_upgrade
utility to run the upgrade job on the instance.4. Takes a post-upgrade snapshot. Networking is now reconfigured on the instance.
So when we upgraded our DB, all these steps were done by RDS successfully and we were back live after a very short planned downtime. We were doing this upgrade over the weekend because of the planned downtime. All post-upgrade checks were done. All backend services were up and running fine.
But, we realized that some APIs had become significantly slow. So we started doing an investigation. We found out that API became slow because SQL queries being called during that API were slow. CPU utilization of our database was shooting up to 98–100%.
Trust me, No one wants to see that much CPU utilization for the production database. Not even during the planned maintenance window. Not for the 1-minute duration.
It turns out that we had to run one Postgres command which we actually expected that RDS would have handled. But during major version upgrades, RDS does not run “analyze”. So keep this in mind if you are upgrading a major version with RDS.
ANALYZE VERBOSE;
This command collects statistics about the contents of tables in the database, and stores the results in the pg_statistic system catalog. Subsequently, the query planner uses these statistics to help determine the most efficient execution plans for queries. Accurate statistics will help the planner to choose the most appropriate query plan, and thereby improve the speed of query processing.
So As soon as we ran this command, all our DB queries started performing as well as we expected.
Also read: Zero Downtime: Auto-Scaling Scale-Out without Disruptions.
That’s all I wanted to share. Thank you for reading.
References: