One Thing to Look Out For While Testing django-import-export

Often the applications we build at Caktus deal with large sets of Django objects. The attributes of these objects can vary, and may need updating in certain instances. One of the best ways to manage this data is via django-import-export, which is a tool that creates a way to perform these bulk updates with a file upload such as a .csv or .xlsx file.

This week my team integrated this tool in one of our projects and I encountered an interesting bug specifically around importing .xlsx files.

Below is an example .xlsx file I put together to test if the import feature was working as expected. I edited 2 objects to have values for all available attributes (rows 2-3), and rows 4-5 are 2 objects that only have a subset of values populated for the available attributes. The intent here was to test if I could be selective of what attributes I wanted to update during an import.

clean google sheet

After constructing this .xlsx file of objects that I wanted to import/update, I attempted to upload this file only to be met with an error.

django import error

Error Message: “An error was encountered while trying to read the file. Ensure you have chosen the correct format for the file”

My initial thought was that we were not properly supporting .xlsx files, given that the same file saved as a .csv could be successfully imported. However, this was debunked after Simon, a developer on our team, looked into this issue and confirmed that he could successfully import .xlsx files without any problems. This left me confused, but after a bit of pair-debugging, we discovered that the issue stemmed from an unexpected reason.

The application I used to save my file versus the application the developer used to save theirs affected how the importer processed the file. A file saved from Google Sheets triggered an InvalidDimensions error in the logs, while the same file saved via Microsoft Excel imported without issues.

We discovered that when a .xlsx file is edited and saved in Google Sheets, its format changes slightly. Rows with empty columns as shown in the diagram below (e.g., columns C-G in rows 4-5) don’t get filled with empty values when the file is saved. As a result, Django Import-Export complains because it sees those rows as having only 2 columns when it expects 7 (e.g, rows 2-3). This error occurs when a row in the import file does not have the same number of columns as the header row.

google sheet file

Dev Console Error:

Traceback (most recent call last):
File "/home/simon/work/caktus/odk-publish/apps/odk_publish/forms.py", line 159, in clean
    self.dataset = import_format.create_dataset(data)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/simon/work/caktus/odk-publish/.direnv/python-3.12.3/lib/python3.12/site-packages/import_export/formats/base_formats.py", line 213, in create_dataset
    dataset.append(row_values)
File "/home/simon/work/caktus/odk-publish/.direnv/python-3.12.3/lib/python3.12/site-packages/tablib/core.py", line 476, in append
    self.rpush(row, tags)
File "/home/simon/work/caktus/odk-publish/.direnv/python-3.12.3/lib/python3.12/site-packages/tablib/core.py", line 462, in rpush
    self.insert(self.height, row=row, tags=tags)
File "/home/simon/work/caktus/odk-publish/.direnv/python-3.12.3/lib/python3.12/site-packages/tablib/core.py", line 450, in insert
    self._validate(row)
File "/home/simon/work/caktus/odk-publish/.direnv/python-3.12.3/lib/python3.12/site-packages/tablib/core.py", line 267, in _validate
    raise InvalidDimensions
tablib.exceptions.InvalidDimensions

After we discovered this, we updated the importer to account for this edge case. This experience was a good reminder that files that seem identical can behave very differently, depending on how they’re processed when saved.

New Call-to-action
blog comments powered by Disqus
Times
Check

Success!

Times

You're already subscribed

Times