We deploy Python/Django apps to a wide variety of hosting providers at Caktus. Our django-project-template includes a Salt configuration to set up an Ubuntu virtual machine on just about any hosting provider, from scratch. We've also modified this a number of times for local hosting requirements when our customer required the application we built to be hosted on hardware they control. In the past, we also built our own tool for creating and managing EC2 instances automatically via the Amazon Web Services (AWS) APIs. In March, my colleague Dan Poirier wrote an excellent post about deploying Django applications to Elastic Beanstalk demonstrating how we’ve used that service.
AWS have added many managed services that help ease the process of hosting web applications on AWS. The most important addition to the AWS stack (for us) was undoubtedly Amazon RDS for Postgres, launched in November 2013. As long-time advocates for Postgres, this addition to the AWS suite was the final puzzle piece necessary for building an AWS infrastructure for a typical Django app that requires little to no manual management. Still, the suite of AWS tools and services is immense, and configuring these manually is time-consuming and error-prone; despite everything it offers, setting up "one-click" deploys to AWS (à la Heroku) is still a complex challenge.
In this post, I'll be discussing another approach to hosting Python/Django apps and managing server infrastructure on AWS. In particular, we'll be looking at a Python library called troposphere that allows you to describe AWS resources using Python and generate CloudFormation templates to upload to AWS. We'll also look at a sample collection of troposphere scripts I compiled as part of the preparation for this post, which I've named (at least for now) AWS Container Basics.
Introduction to CloudFormation and Troposphere
CloudFormation is Amazon's answer to automated resource provisioning. A CloudFormation template is simply a JSON file that describes AWS resources and the relationships between them. It allows you to define Parameters (inputs) to the template and even includes a small set of intrinsic functions for more complex use cases. Relationships between resources are defined using the Ref function.
Troposphere allows you to accomplish all of the same things, but with the added benefit of writing Python code rather than JSON. To give you an idea of how Troposphere works, here's a quick example that creates an S3 bucket for hosting (public) static assets for your application (e.g., in the event you wanted to host your Django static media on S3):
from troposphere import Join, Template
from troposphere.s3 import (
Bucket,
CorsConfiguration,
CorsRules,
PublicRead,
VersioningConfiguration,
)
template = Template()
domain_name = "myapp.com"
template.add_resource(
Bucket(
"AssetsBucket",
AccessControl=PublicRead,
VersioningConfiguration=VersioningConfiguration(Status="Enabled"),
DeletionPolicy="Retain",
CorsConfiguration=CorsConfiguration(
CorsRules=[CorsRules(
AllowedOrigins=[Join("", ["https://", domain_name])],
AllowedMethods=["POST", "PUT", "HEAD", "GET"],
AllowedHeaders=["*"],
)]
),
)
)
print(template.to_json())
This generates a JSON dump that looks very similar to the corresponding Python code, which can be uploaded to CloudFormation to create and manage this S3 bucket. Why not just write this directly in JSON, one might ask? The advantages to using Troposphere are that:
- it gives you all the power of Python to describe or create resources conditionally (e.g., to easily provide multiple versions of the same template),
- it provides compile-time detection of naming or syntax errors, e.g., via flake8 or Python itself, and
- it also validates (most of) the structure of a template, e.g., ensuring that the correct object types are provided when creating a resource.
Troposphere does not detect all possible errors you might encounter when building a template for CloudFormation, but it does significantly improve one's ability to detect and fix errors quickly, without the need to upload the template to CloudFormation for a live test.
Supported resources
Creating an S3 bucket is a simple example, and you don't really need Troposphere to do that. How does this scale to larger, more complex infrastructure requirements?
As of the time of this post, Troposphere includes support for 39 different resource types (such as EC2, ECS, RDS, and Elastic Beanstalk). Perhaps most importantly, within its EC2 package, Troposphere includes support for creating VPCs, subnets, routes, and related network infrastructure. This means you can easily create a template for a VPC that is split across availability zones, and then programmatically define resources inside those subnets/VPCs. A stack for hosting an entire, self-contained application can be templated and easily duplicated for different application environments such as staging and production.
AWS managed services for a typical web app
AWS includes a wide array of managed services. Beyond EC2, what are some of the services one might need to host a Dockerized web application on AWS? Although each application is unique and will have differing managed service needs, some of the services one is likely to encounter when hosting a Python/Django (or any other) web application on AWS are:
- S3 for storing and serving static and/or uploaded media
- RDS for a Postgres (or MySQL) database
- ElastiCache, which supports both Memcached and Redis, for a cache, session store, and/or message broker
- CloudFront, which provides edge servers for faster serving of static resources
- Certificate Manager, which provides a free SSL certificate for your AWS-provided load balancer and supports automatic renewal
- Virtual Private Clouds (VPCs) for overall network management
- Elastic Load Balancers (ELBs), which allow you to transparently spread traffic across Availability Zones (AZs). These are managed by AWS and the underlying IPs may change over time.
Provisioning your application servers
For hosting a Python/Django application on AWS, you have essentially four options:
- Configure your application as a set of task definitions and/or services using the AWS Elastic Container Service (ECS). This is a complex service, and I don't recommend it as a starting point.
- Create an Elastic Beanstalk Multicontainer Docker environment (which actually creates and manages an ECS Cluster for you behind the scenes). This provides much of the flexibility of ECS, but decouples the deployment and container definitions from the infrastructure. This makes it easier to set up your infrastructure once and be confident that you can continue to use it as your requirements for running additional tasks (e.g., background tasks via Celery) change over the lifetime of a project.
- Configure an array of EC2 instances yourself, either by creating an AMI of your application or manually configuring EC2 instances with Salt, Ansible, Chef, Puppet, or another such tool. This is an option that facilitates migration for legacy applications that might already have all the tools in place to provision application servers, and it's typically fairly simple to modify these setups to point your application configuration to external database and cache servers. This is the only option available for projects using AWS GovCloud, which at the time of this post supports neither ECS nor EB.
- Create an Elastic Beanstalk Python environment. This option is similar to configuring an array of EC2 instances yourself, but AWS manages provisioning the servers for you, based on the instructions you provide. This is the approach described in Dan's blog post on Amazon Elastic Beanstalk.
Putting it all together
This was originally a hobby / weekend learning project for me. I'm much indebted to the blog post by Jean-Philippe Serafin (no relation to Caktus) titled How to build a scalable AWS web app stack using ECS and CloudFormation, which I recommend reading to see how one can construct a comprehensive set of managed AWS resources in a single CloudFormation stack. Rather than repeat all of that here, however, I'm going to focus on some of the outcomes and potential uses for this project.
Jean-Philippe Serafin provided all the code for his blog post on GitHub. Starting from that, I've updated and released another project -- a workable solution for hosting fully-featured Python/Django apps, relying entirely on AWS managed services -- on GitHub under the name AWS Container Basics. It includes several configuration variants (thanks to Troposphere) that support stacks with and without NAT gateways as well as three of the application server hosting options outlined above (ECS, EB Multicontainer Docker, or EC2). Contributions are also welcome!
Setting up a demo
To learn more about how AWS works, I recommend creating a stack of your own to play with. You can do so for free if you have an account that's still within the 12-month free tier . If you don't have an account or it's past its free tier window, you can create a new account at aws.amazon.com (AWS does not frown on individuals or companies having multiple accounts, in fact, it's encouraged as an approach for keeping different applications or even environments properly isolated). Once you have an account ready:
Make sure you have your preferred region selected in the console via the menu in the top right corner. Sometimes AWS selects an unintuitive default, even after you have resources created in another region.
If you haven't already, you'll need to upload your SSH public key to EC2 (or create a new key pair). You can do so from the Key Pairs section of the EC2 Console.
Next, click the button below to launch a new stack:
On the Select Template page:
- Confirm that the "Specify an Amazon S3 template URL" option is selected and the URL for the template is "https://s3.amazonaws.com/aws-container-basics/eb-no-nat.json"
- Click Next
On the Specify Details page:
- Enter a Stack Name of your choosing. Names that can be distinguished via the first 5 characters are better, because the name will be trimmed when generating names for the underlying AWS resources.
- Change the instance types if you wish, however, note that the t2.micro instance type is available within the AWS free tier for EC2, RDS, and ElastiCache.
- Enter a DatabaseEngineVersion. I recommend using the latest version of Postgres supported by RDS. As of the time of this post, that is 9.6.2
- Generate and add a random DatabasePassword for RDS. While the stack is configured to pass this to your application automatically (via DATABASE_URL), RDS and CloudFormation do not support generating their own passwords at this time.
- Enter a DomainName. This should be the fully-qualified domain name, e.g., myapp.mydomain.com. Your email address (or one you have access to) should be listed in the Whois database for the domain. The domain name will be used for several things, including generation of a free SSL certificate via the AWS Certificate Manager. When you create the stack, you will receive an email asking you to approve the certificate (which you must do before the stack will finish creating). The DNS for this domain doesn't need to exist yet (you'll update this later).
- For the KeyName, select the key you created or uploaded in the prior step.
- For the PrimaryAZ and SecondaryAZ parameters, select two different availability zones in which to place your instances (it doesn't matter which ones you choose, so long as they're different).
- For the SecretKey, generate a random SECRET_KEY which will be added to the environment (for use by Django, if needed). If your application doesn't need a SECRET_KEY, enter a dummy value here. This can be changed later, if needed.
- Once you're happy with the values, click Next.
On the Options page, click Next (no additional tags, permissions, or notifications are necessary, so these can all be left blank).
On the Review page, double check that everything is correct, check the "I acknowledge that AWS CloudFormation might create IAM resources." box, and click Create.
The stack will take about 30 minutes to create, and you can monitor its progress by selecting the stack on the CloudFormation Stacks page and monitoring the Resources and/or Events tabs.
Using the demo
When it is finished, you'll have an Elastic Beanstalk Multicontainer Docker environment running inside a dedicated VPC, along with an S3 bucket for static assets (including an associated CloudFront distribution), a private S3 bucket for uploaded media, a Postgres database, and a Redis instance for caching, session storage, and/or use as a task broker. The environment variables provided to your container are as follows:
- AWS_STORAGE_BUCKET_NAME: The name of the S3 bucket in which your application should store static assets.
- AWS_PRIVATE_STORAGE_BUCKET_NAME: The name of the S3 bucket in which your application should store private/uploaded files or media (make sure you configure your storage backend to require authentication to read objects and encrypt them at rest, if needed).
- CDN_DOMAIN_NAME: The domain name of the CloudFront distribution connected to the above S3 bucket; you should use this (or the S3 bucket URL directly) to refer to static assets in your HTML.
- DOMAIN_NAME: The domain name you specified when creating the stack, which will be associated with the automatically-generated SSL certificate.
- SECRET_KEY: The secret key you specified when creating this stack
- DATABASE_URL: The URL to the RDS instance created as part of this stack.
- REDIS_URL: The URL to the Redis instance created as part of this stack (may be used as a cache or session storage, e.g.). Note that Redis supports multiple databases and no database ID is included as part of the URL, so you should append a forward slash and the integer index of the database, e.g., /0.
Optional: Uploading your Docker image to the EC2 Container Registry
One of the AWS resources created by AWS Container Basics is an EC2 Container Registry (ECR) repository. If you're using Docker and don't have a place to store images already (or would prefer to consolidate hosting at AWS to simplify authentication), you can push your Docker image to ECR. You can build and push your Docker image as follows:
DOCKER_TAG=$(git rev-parse HEAD) # or "latest", if you prefer
$(aws ecr get-login --region <region>)
docker build -t <stack-name> .
docker tag <stack-name>:$DOCKER_TAG <account-id>.dkr.ecr.<region>.amazonaws.com/<stack-name>:$DOCKER_TAG
docker push <account-id>.dkr.ecr.<region>.amazonaws.com/<stack-name>:$DOCKER_TAG
You will need to replace <stack-name> with the name of the stack you entered above, <account-id> with your AWS Account ID, and <region> with your AWS region. You can also see these commands with the appropriate variables filled in by clicking the "View Push Commands" button on the Amazon ECS Repository detail page in the AWS console (note that AWS defaults to using a DOCKER_TAG of latest instead of using the Git commit SHA).
Updating existing stacks
CloudFormation, and by extension Troposphere, also support the concept of "updating" existing stacks. This means you can take an existing CloudFormation template such as AWS Container Basics, fork and tweak it to your needs, and upload the new template to CloudFormation. CloudFormation will calculate the minimum changes necessary to implement the change, inform you of what those are, and give you the option to proceed or decline. Some changes can be done as modifications whereas other, more significant changes (such as enabling encryption on an RDS instance or changing the solution stack for an Elastic Beanstalk environment) require destroying and recreating the underlying resource. CloudFormation will inform you if it needs to do this, so inspect the proposed change list carefully.
Coming Soon: Deployment
In the next post, I'll go over several options for deploying to your newly created stack. In the meantime, the AWS Container Basics README describes one simple option.