
Django and its admin interface are a big part of why Caktus uses Django, but the admin's ability to log database changes is limited. For example, it shows only changes made via the Django admin, not via other parts of the site.
We've written previously on the Caktus blog about django-simple-history, a tool we use to track model changes in the admin and other parts of our Django projects. django-simple-history works well for some cases, but as a Python solution, it is not able to track changes made directly in the database with raw SQL.
Over the last year, we've been using yet another tool, django-pghistory, to track data changes in Postgres tables with 5+ million records, so I thought I'd write a short post with some of the things we've learned over this time.
Track changes selectively
django-pghistory works using Postgres triggers, which are a great solution for tracking and recording changes at a low level in the database (no matter what initiated the changes). That said, there are two caveats to this approach which are worth noting:
- The triggers need to be removed and re-added during schema changes. django-pghistory handles this for you, however, we found it makes database migrations longer and harder to read during code reviews due to the extra queries required. It's an expense worth incurring, but we recommend employing django-pghistory only on the models that really need it (probably not on every model in your project).
- django-pghistory includes an event viewer in the Django admin that shows you all the changes across all tracked models. This is great for small and simple projects, but it can quickly get out of hand for large projects with many tracked models. For this reason, we again recommend limiting tracked models only to those that really need it. For particularly large projects, it may be helpful to disable the "all events" viewer. This can be done by adding PGHISTORY_ADMIN_ALL_EVENTS = False to your Django settings file.
Use EventModelAdmin correctly
At first, we misinterpretted how to use the EventModelAdmin, and used this on our tracked models. We observed that this disabled adding, changing, and deleting models. On further investigation and after asking a question on the repo, we found that this class is intended to be used only when customizing the model admin classes for the events themselves (which also appear in the admin), not the tracked models. Adding, changing, and deleting events are disabled because these events are created whenever a tracked model is changed, and should not be edited directly. While making the mistake of using the EventModelAdmin, we also helped fix a bug, which was released in v3.5.5.
Taking an initial snapshot
For existing projects with large tables, it may be helpful to record an initial snapshot of a record before changing the data. django-pghistory works by recording the changed data at the time it is changed, so the first change to a record after adding django-pghistory tracking may not be saved without additional effort. This is described as manual tracking in the documentation.
You can configure django-pghistory with a custom event tracker by adding it to the @pghistory.track() decorator, like so:
@pghistory.track(
# Default trackers
pghistory.InsertEvent(),
pghistory.UpdateEvent(),
# Manual event to record an initial snapshot
pghistory.ManualEvent(label="initial_snapshot"),
exclude=["modification_date"],
)
class MyModel(models.Model):
# ...
Then, in parts of the code that change instances of MyModel, you can manually record a snapshot of the data before writing the actual changes to the database:
if not MyModelEvent.objects.filter(pgh_obj=my_obj).exists():
pghistory.create_event(my_obj, label="initial_snapshot")
my_obj.foo = "bar"
my_obj.save() # this line will now record the change/diff successfully
If preferred, it is also possible to backfill events in bulk. In either case, it's worth noting that django-pghistory works by writing the changed data to the history at the time that it's made, not by saving a copy of the record immediately before the change.
Conclusion
Used appropriately, django-pghistory is an incredibly powerful tool for tracking database changes in Django models. For further reading, we recommend the installation guide and admin integration section of the documentation (the latter is not enabled by default). Once you have it set up, the django-pghistory FAQ provides answers to a lot of good questions (some of which you might not have known you had!), and is a must-read when using the project. Good luck, and happy change logging!