Introduction
Amazon Web Services (AWS)' Elastic Beanstalk is a service that bundles up a number of their lower-level services to manage many details for you when deploying a site. We particularly like it for deploys and autoscaling.
We were first introduced to Elastic Beanstalk when taking over an existing project that used it. It's not without its shortcomings, but we've generally been happy enough with it to stick with it for the project and consider it for others.
Basics
Elastic Beanstalk can handle a number of technologies that people use to build web sites, including Python (2.7 and 3.4), Java, PHP, .Net, Node.js, and Ruby. More are probably in the works.
You can also deploy containers, with whatever you want in them.
To add a site to Elastic Beanstalk, you create a new Elastic Beanstalk application, set some configuration, and upload the source for your application. Then Elastic Beanstalk will provision the necessary underlying resources, such as EC2 (virtual machine) instances, load balancers, autoscaling groups, DNS, etc, and install your application appropriately.
Elastic Beanstalk can monitor the load and automatically scale underlying resources as needed.
When your application needs updating, you upload the updated source, and Elastic Beanstalk updates the underlying resources. You can choose from multiple update strategies.
For example, on our staging server, we have Elastic Beanstalk update the application on our existing web servers, one at a time. In production, we have Elastic Beanstalk start setting up a new set of servers, check their health, and only start directing traffic to them as they're up and healthy. Then it shuts down the previous servers.
Elastic Beanstalk and Python
As you'd expect, when deploying a Python application, Elastic Beanstalk will create a virtual environment and install whatever's in your requirements.txt file.
A lot of other configuration can be done. Here are some example configuration file snippets from Amazon's documentation on Elastic Beanstalk for Python:
option_settings: aws:elasticbeanstalk:application:environment: DJANGO_SETTINGS_MODULE: production.settings aws:elasticbeanstalk:container:python:staticfiles: "/images/": "staticimages/" aws:elasticbeanstalk:container:python: WSGIPath: ebdjango/wsgi.py NumProcesses: 3 NumThreads: 20 packages: yum: libmemcached-devel: '0.31' container_commands: 00collectstatic: command: "django-admin.py collectstatic --noinput" 01syncdb: command: "django-admin.py syncdb --noinput" leader_only: true 02migrate: command: "django-admin.py migrate" leader_only: true
Here are some things we can note:
- We can define environment variables, like DJANGO_SETTINGS_MODULE.
- We can tell Elastic Beanstalk to serve static files for us.
- We ask Elastic Beanstalk to run our Django application using WSGI.
- We can install additional packages, like memcached.
- We can run commands during deploys.
- Using leader_only, we can tell Elastic Beanstalk that some commands only need to be run on one instance during the deploy.
This is also where we could set parameters for autoscaling, configure the deployment strategy, set up notifications, and many other things.
"leader_only"
The leader_only feature is great for doing something on only one of your servers during a deploy. But don't make the mistake we made of trying to use that to configure one server differently from the others, for example to run some background task periodically. Once the deploy is done, there's nothing special about the server that ran the leader_only commands during the deploy, and that server is as likely as any other to be terminated during autoscaling.
Right now, Elastic Beanstalk doesn't provide any way to readily differentiate servers so you can, for example, run things on only one server at a time. You'll have to do that yourselves.
Our solution for the situation where we ran into this was to use select_for_update to "lock" the records we were updating.
Database
You can have Elastic Beanstalk manage RDS (Amazon's hosted database service) for you, but so far we've preferred to set up RDS outside of Elastic Beanstalk. If Elastic Beanstalk was managing it, then Elastic Beanstalk would provide pointers to the database server in environment variables, which would be more convenient than keeping track of it ourselves. On the other hand, with our database outside of Elastic Beanstalk, we know our data is safe, even if we make some terrible mistake in our Elastic Beanstalk configuration.
Migrations
As always, you have to think carefully when deploying an update that includes migrations. Both of the deploy strategies we mentioned earlier will result in the migrations running on the database while some servers are still running the previous code, so you need to be sure that'll work. But that's a problem anytime you have multiple web servers and deserves a blog post of its own.
Time to deploy
One thing we're not happy with is the time it takes for an update deploy - over 20 minutes for our staging environment, and over 40 minutes for our production environment with its more conservative deploy strategy.
Some of that time is under our control. For example, it still takes longer than we'd like to set up the environment on each server, including building new environments for both Python and Node.js. We've already made some speedups there, and will continue working on that.
Other parts of the time are not under our control, especially some parts unique to production deploys. It takes AWS several minutes to provision and start a new EC2 instance, even before starting to set up our application's specific environment. It does that for one new server, waits until it's completely ready and tests healthy (several more minutes), and then starts the process all over again for the rest of the new servers it needs. (Those are done in parallel.)
When all the traffic is going to the new servers, it starts terminating the previous servers at 3 minute intervals, and waiting for all those to finish before declaring the deploy complete.
There are clear advantages to doing things this way: Elastic Beanstalk won't publish a completely broken version of our application, and the site never has any downtime during a production deploy. There are CLI tools that help you manage your deploys, plus a nice web interface to monitor what’s going on. We just wish the individual steps didn't take so long. Hopefully Elastic Beanstalk will find ways over time to improve things.
Conclusion
Despite some of its current shortcomings, we're quite happy to have Elastic Beanstalk in our deployment and hosting toolbox. We previously built our own tool for deploying and managing Django projects in an AWS autoscaling environment (before many of the more recent additions to the AWS suite, such as RDS for Postgres), and we know how much work it is to design, build, and maintain such a platform.