If your backend is in full swing and the database load is getting hot, you may consider to offload part of the work to another database. You can shard or move parts of the database entirely to an additional database. Which can be chosen specifically for your needs: you may store most of your data in Postgres
while keeping your analytics in ClickHouse
. In this short article we are going to cover how to connect second database to your Django
application and use it.
Django Setup
Let’s create Book
and Author
models and add books
database (in models.py
and settings.py
respectively):
# file: models.py
class Author(Model):
first_name = models.TextField()
last_name = models.TextField()
class Book(Model):
content = models.TextField()
# file: settings.py
DATABASES = {
"default": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": BASE_DIR / "db.sqlite3",
},
"books": {
"ENGINE": "django.db.backends.sqlite3",
"NAME": BASE_DIR / "books.sqlite3",
}
}
It’s time to tell django
to store Book
model in books
database. That is accomplished by creating a database router and registering it in the settings:
# file: router.py
class BookRouter:
books_db = "books"
default_db = "default"
def db_for_read(self, model, **hints):
model_name = model._meta.model_name
if model_name == 'book':
return self.books_db
else:
return None
def db_for_write(self, model, **hints):
model_name = model._meta.model_name
if model_name == 'book':
return 'books'
else:
return None
def allow_migrate(self, db, app_label, model_name=None, **hints):
if model_name == 'book':
return db == 'books'
else:
return db == 'default'
# file: settings.py
DATABASE_ROUTERS = [
"router.BookRouter"
]
In the configured router Book
model will be read and written to the books
database. For any other model (Author
in our case) django
will iterate over all configured routers to get the database for the model. If all routers returns None
default database will be used.
Running Migrations
Turned out that migrate
command accepts database
CLI parameter, which by default is set to default
, so it’s easy to run migrations on another database:
python manage.py migrate --database=books
In our case we need to run migrations on both databases. Luckily we can create a script to run them in parallel and wait until it’s done:
# file: migrate.sh
python manage.py migrate &
python manage.py migrate --database=books &
wait
Migration file can contain multiple types of operations: model operations, python operations and sql operations:
# file: 0002_mix.py
from django.db import migrations, models
def migration_code(apps, schema_editor):
...
sql_code = "..."
class Migration(migrations.Migration):
dependencies = [
('books', '0001_initial'),
]
operations = [
migrations.AddField(model_name='book', name='short_description', field=models.TextField(null=True)),
migrations.AddField(model_name='author', name='biography', field=models.TextField(null=True)),
migrations.RunPython(code=migration_code, reverse_code=migrations.RunPython.noop),
migrations.RunSQL(sql=sql_code, reverse_sql=migrations.RunSQL.noop),
]
For each operation in a migration file Django
will figure out operation target database, if it matches the database on which migration is executed (python manage.py --database=?
) operation is executed otherwise operation is skipped.
Thus each operation should be associated with a database. In router.py
we defined that Author
model is stored in the default database, while Books
model is stored in thebooks
database. However RunPython
and RunSQL
operations are routed to the default database and can fail if they touch data from another database.
Thankfully there is a way to route both RunPython
and RunSQL
to the database of our choice — they accept hints
dictionary which is later passed to allow_migrate
function:
# file: 0002_mix.py
operations = [
migrations.AddField(model_name='book', name='short_description', field=models.TextField(null=True)),
migrations.AddField(model_name='author', name='biography', field=models.TextField(null=True)),
migrations.RunPython(hints={"db": "books"}, code=migration_code, reverse_code=migrations.RunPython.noop),
migrations.RunSQL(hints={"db": "books"}, sql=sql_code, reverse_sql=migrations.RunSQL.noop)
]
# file: router.py
class BookRouter:
def allow_migrate(self, db, app_label, model_name=None, **hints):
if hints.get("db") is not None:
return db == hints.get("db")
if model_name == 'book':
return db == 'books'
else:
return db == 'default'
So now only AddField
operation on Author
model will be routed to the default database, all the others will be routed to the books
database.
Transactions
Using atomic
will start a transaction in the default database unless you explicitly pass the database name in using
parameter:
with transaction.atomic(using="books"):
# transaction on books database is started
pass
Transaction commit hook works the same: by default it’s executed after commit to the default database. It’s overriden with using
parameter:
def send_email()
pass
transaction.on_commit(send_email, using="books")
Conclusion
In this article we briefly covered the main aspects of using multiple databases:
- routing models to the database
- routing SQL and Python queries to the database
- migrations
- working with transactions
It looks like a lot of work to setup, but if it’s done right it may bring a lot of benefits:
- leverage the most suitable database for the particular task
- share the data between multiple machines with different CPU, RAM and storage types
- shard the data (if database doesn't support it)