The Sequelize Saga
Recently I found the source to a very befuddling problem that I had been grappling with on and off again for weeks. As a warning to other developers, I’d like to recount my tale of woe. The subject here is Sequelize, which is an ORM (Objet Relational Mapping) tool which can be used to query a database from the backend of a web application. Sequelize is quite versatile, being available for use with Postgres, MySQL, MariaDB, SQLite and Microsoft SQL Server. In this saga, Sequelize is querying a Postgres database.
The utility of a tool like Sequelize is that it can abstract the database to models and functions. Models can be defined, and these models represent tables. These models then expose simple functions, which represent types of queries. One of the more useful examples is the bulkCreate
function, which can insert multiple records into a table in just one function call. This type of function is especially useful when performing operations on large sets of data, such as a data migration. We can simply construct an array of models and pass that array to the function. The function inserts each model as a record in the specified table.
But what does it return? In Postgres, the RETURNING
keyword can be used to define what values are returned from the query. Therefore, Sequelize has the returning
option, which activates this option. Postgres even has extra functionality for this feature, allowing an array of fields to return to be passed instead of a simple boolean if desired. But when I search the internet for “bulkCreate”, I can see that there may be issues, based on the fact that the top five results are all for Sequelize, and two of those results are for issues. One of these stemmed from a simple lack of knowledge of this option, and the other is a real issue logged on the site for the GitHub repository.
Speaking of GitHub issues, the Sequelize currently has over a thousand opened issues. That speaks partly to the popularity of the tool. The GitHub repository has over twenty thousand stars. It also demonstrates that the application has some issues. Currently, there are almost fifty open issues related to bulkCreate
. One of the main topics is the same issue that I ran into: namely, that on one environment, Sequelize returned the IDs of newly-created records from bulkCreate
, while on another environment, the exact same line of code did not. The call did not pass returning: true
, meaning that according to the current documentation, the default value used is false
. So why did one instance of the code magically return the ID anyway?
Well, according to this issue, the documentation is wrong! The default value is actually true
. But if that is the case, why does one version of the code not return the newly-created IDs? To explain that, we first have to acknowledge that the code should explicitly pass returning: true
, as the documentation suggests, if it expects to use these IDs. Expecting the function to return these IDs by magic is unacceptable, despite it being the admitted behavior as stated in the GitHub issue. Next, we have to examine our use of package versions to be sure that we are using the same version of Sequelize in both cases. Given that we performed an npm install
on both environments, we would hope that the same version would be installed in both cases.
Alas, the way that we have defined our package dependencies leaves room for error. If we define the Sequelize dependency as ^5.1.0
, then an npm install
will check to see if we have at least that major and minor version, and if we do, it won’t upgrade the package. In the case that it does need to upgrade, it will get the latest minor version, such as 5.22. Unfortunately, version 5.22 works as explained in the GitHub issue, whereas version 5.3 works as the documentation describes. Whether or not this is a breaking change is debatable, given that it is difficult to imagine a scenario in which we are creating multiple records, and expecting to not have the IDs returned, and that having them returned breaks our logic. But it certainly should be documented behavior.
The bottom line is that combining mistakes in our version logic, the code, and the Sequelize repository leads to a big headache. If we want to allow our application a margin of error for these types of things, it may be better to use an ORM which at least behaves as documented. Ultimately, the application should define specific version of packages, especially in the case of Sequelize, such that extra steps such as removing the node_modules
directory are not needed to ensure that two different environments have the same packages. And finally, we have to write code that makes sense. If we expect bulkCreate
to return something, then we need to tell it to do that.