I needed to implement user data export feature to a Django project. The static and user media is handled by Backblaze B2 cloud storage (similar to Amazon S3) which also supports server side encryption that allows your data to be encrypted at rest.
Usually when working with filed in Django you want to use the native storage api and storage backends. There are many for B2 as well and the project is configured to use one, but handling this one specific file that includes PII data was special enough case that I decided to write custom handlers for it manually.
Before goin on, a reminder that this is just one example that happened to work for this specific project and data. It most likely won’t work well for example large data. YMMW.
High Level Overview
Here’s the use case in a nutshell:
- User triggers a data export
- A Celery task then
- Collects the data
- Bundles it into an in-memory zip file
- Uploads the zip to an encrypted B2 bucket
- Saves the metadata of the export file to database
- Informs the user that the data is now available for upload
- User clicks a download button
- A custom download view
- Fetches the download from B2
- Writes it into a http response as a downloadable file
- Periodic Celery task removes the export metadata from the db after it has expired. (The file itself is automatically deleted from the B2 bucket afyer the expiry.)
Collecting And Uploading
The custom user model has two methods for collecting and uploading the data.
A method that does all the work:
def _build_and_upload_data_export(self):
from .serializers import UserDataexportSerializer
self._delete_data_export()
serializer = UserDataExportSerializer(self)
expires = timezone.now() + timedelta(days=7)
export_item: DataExportItem = DataExportItem.objects.create(
user=self,
expires_at=expires,
)
# Create the zip file in memory
in_memory = BytesIO()
zf = ZipFile(in_memory, mode="w")
zf.writestr(export_item.file_name, orjson.dumps(serializer.data))
zf.close()
in_memory.seek(0)
# Upload the file to b2
bucket = b2_api.get_bucket_by_name(settings.B2_ENCRYPTED_BUCKET_NAME)
uploaded = bucket.upload_bytes(
data_bytes=in_memory.read(), file_name=export_item.b2_file_name
)
export_item.size = uploaded.size
export_item.is_ready = True
export_item.save()
Few things to note here:
- I’m using a
DataExportItem
Django model to collect the metadata. To make sure we only have one in any given time we delete possible previous ones before starting a new export. - All data collection is handled by Django Rest Framework serializer class.
- Orjson works here great because it’s fast and it serializers to bytes.
- The B2 bucket has server-side encryption and lifecycle rules set to match the projects needs.
- Depending on the amount of user data and the server environment, this method will be slow to execute. You’ll want to run this in a background process detached from the Django request-response cycle.
- And again, if your data is big, you probably wouldn’t want to process it in memory.
The public for the export just triggers the background Celery task:
def export_data(self):
build_data_export.delay(self.uid)
The Celery task itself is also very simple:
@shared_task
def build_data_export(uid: str):
from .models import User
user = User.objects.get(uid=uid)
user._build_and_upload_data_export()
# handle any user notifications here
Handling The Download
The custom user model has a method for getting the export from B2. It returns either a B2 object or None:
def _get_data_export(self):
"Returns b2 DownloadedFile which can be saved w/ save()"
try:
export: DataExportItem = self.dataexport # type: ignore
bucket = b2_api.get_bucket_by_name(settings.B2_ENCRYPTED_BUCKET_NAME)
return bucket.download_file_by_name(export.b2_file_name)
except DataExportItem.DoesNotExist:
return None
Finally there’s a Django view that passes the file to the user:
@login_required
def download_data_export(request):
try:
export: DataExportItem = request.user.dataexport # type: ignore
export_file = request.user._get_data_export()
if export_file is not None:
in_memory_file = BytesIO()
export_file.save(in_memory_file)
in_memory_file.seek(0)
response = HttpResponse(content=in_memory_file.read())
response["Content-Type"] = "application/zip"
response["Content-Length"] = export.size
response["Content-Disposition"] = f"attachment; filename={export.download_file_name}"
return response
except DataExportItem.DoesNotExist:
pass
return HttpResponseNotFound()
Conclusion
Implementing these simple-sounding “let’s export the application data to the user” features takes a lot of work. Luckily we have great tools to do it safely in a way that doesn’t necessarily expose the data to anyone who shouldn’t see it. The method described here doesn’t work for all cases but if it does, it is pretty simple and straightforward. Storing user data in a way that is encrypted at rest and inaccessible without proper authentication leaves me sleeping better at night.
One important thing I intentionally left out here is testing. These kind of things can be tricky to test properly but as long as you keep the individual moving parts simple and small enough, it’s not impossible either.