In this blog post, I will explain how I implemented a graph algorithm to solve the database migration problem in an application pluggable module system.
Prerequisites:
Gavika Web Framework has a pluggable module system. The modules can be developed independently. They can be installed, upgraded and removed from the main application. Gavika Web Framework is written using Python, Flask, SQLAlchemy and a bunch of other technologies and libraries.
The main application uses Alembic to manage database migrations.
Each module has its own database schema that resides alongside of the main application's database schema. Typically, modules have their own tables. As a module evolves it has needs to add or remove tables or columns. There is a migration system in place to handle such database schema changes on a module level. The framework uses more or less the same file system structure and conventions as Alembic.
migrations/
├── __init__.py
└── versions
├── 1_.py
├── 2_.py
├── 3_.py
└── __init__.py
This is the directory structure inside a module. The files 1_.py
, 2_.py
and 3_.py
contain the actual migrations.
Contents of 1_.py
:
revision = "1"
down_revision = None
def upgrade():
...
Contents of 2_.py
revision = "2"
down_revision = "1"
def upgrade():
...
Every revision file contains revision
and down_revision
at minimum. The revision file 2_.py
indicates that the revision is "2" and it comes after "1". Notice that the revision is a string. The number is used in the string for convenience. The revision could be something like aasdeeeedd
or some kind of UUID or hash that do not follow a sequence. A sequence of numbers is not suitable for software that uses different branches. For the purpose of brevity, we demonstrate a simple use case. Therefore the revisions are 1
, 2
and 3
. In addition to the revision information, the revision file contains the upgrade
method. This method contains the actual code to modify the schema. Typically they are calls to op.create_table
, op.alter_table
, etc.
Networkx
library is used to implement the graph algorithm.
edge
in the graph. Add the edge
to the edges
list.sink
of the graph.upgrade
method module_revisions_path = (
f"{{module_name}/migrations/versions/"
)
if os.path.exists(module_revisions_path):
G = nx.DiGraph()
edges = []
revision_modules = {}
for revision_file in os.listdir(module_revisions_path):
if revision_file in ["__init__.py", "__pycache__"]:
continue
loader = importlib.machinery.SourceFileLoader(
"revision",
f"{module_name}/migrations/versions/{revision_file}",
)
python_module = types.ModuleType(loader.name)
loader.exec_module(python_module)
down_revision = python_module.down_revision
if down_revision is None:
down_revision = "START"
edges.append((down_revision, python_module.revision))
revision_modules[python_module.revision] = python_module
G.add_edges_from(edges)
sink = list(topological_sort(G))[-1:][0]
if not module.last_revision:
last_revision = "START"
else:
last_revision = module.last_revision
revision_path = shortest_path(G, last_revision, sink)
revision_path = revision_path[
1:
] # The first node is current/source node, omit it
for revision_path_item in revision_path:
revision_python_module = revision_modules[revision_path_item]
revision_python_module.upgrade()
module.last_revision = revision_python_module.revision
db.session.commit()
Mathematically, the revisions can be thought of as a directed path graph. The shortest path from source to sink becomes obvious in the directed path graph. There is only one way to traverse the path. There is no need to compute shortest path. All we need is a path. The closest solution I found in NetworkX
was the shortest_path
method. I do not know whether there is a better alternative to compute the path of the path graph in NetworkX or otherwise. That is an open question I would like to pursue at some point in future.