jonathanquail.com

Cloud architecture, Chef, AWS, and Python stuff

Chef Tools

So Fabric is still my go-to tool for automating any remote commands. And despite the fact I am a huge Chef fan, I still see a use for both tools.

In fact, Fabric makes a great tool to automate installing chef-client, configuring it and then executing the initial chef-run.

It can also be handy to execute specific recipes or run_lists on a remote host.

Chef-Sugar

Chef-Sugar provides a lot of nice syntactic sugar for use in your chef cookbooks.

if node['platform_family'] == 'rhel'
    execute 'yum update'
end

This is so much nicer:

if rhel?
    execute 'yum update'
end

Also the short hand for compile time dependencies is much easier to read to non-chef users:

package 'apache2' do
    action :nothing
end.run_action(:install)

It was never obvious as to why it was action :nothing and then run_action(:install)

But with chef-sugar…

compile_time do
    package 'apache2'
end

..it is clear that this is a compile time action.

Leaving out the obvious question a new chef user might ask, “Ruby doesn’t need to be compiled…what is this?”. To which I would say – best to read About the chef-client Run. Compile-time resources are executed in the step identified as “Identify resources, build the resource collection”. As are any blocks of ruby code not in a ruby_block.

Chef-Rewind

I am a huge fan of Bryan Berry and the Food Fight Show podcast. But I view chef-rewind as a “last resort”. chef-rewind lets you monkey-patch chef resources. And I am not a huge fan of monkey-patching unless absolutely necessary.

The use-case for this is primarily to make a change to a resource defined in an upstream cookbook you don’t control or want to fork and maintain.

For example – if a cookbook you are using has a template you need to customize, you can use chef-rewind to modify the template resource from the upstream recipe to look in your cookbook instead.

Here is the example from the chef-rewind docs:

# file postgresql/recipes/server.rb
template "/var/pgsql/data/postgresql.conf" do
  source  "postgresql.conf.erb"
  owner "postgres"
end

# file my-postgresql/recipes/server.rb
chef_gem "chef-rewind"
require 'chef/rewind'

include_recipe "postgresql::server"
# my-postgresql.conf.erb located inside my-postgresql/templates/default/my-postgresql.conf.erb
rewind :template => "/var/pgsql/data/postgresql.conf" do
  source "my-postgresql.conf.erb"
  cookbook_name "my-postgresql"
end

Berkshelf

The Berkshelf Way. Nuff said.

Vagrant + Bento

Vagrant is the best way for developing and testing cookbooks. Bring up a virtual machine in VirtualBox or VMWare. Test your cookbooks, then destroy it and try again. Also really nice to just bring up a base linux machine to play around with something and not have to worry about cleaning up afterwards.

A new tool from Chef (formerly known as Opscode). Bento is a tool to create VMs (Vagrant and VMWare) using Packer

Omnibus

Ok so this last one isn’t exactly a Chef tool…

John Vincent (@lusis) wrote an excellent blog post for Sysadvent 2013 on Omnibus – Sysadvent 2013 – Day 16 – omnibusing your way to happiness.

I have played with it a bit, but want to spend more time with it over the next few months.

The idea behind Omnibus (created by Chef) is to install all of the required packages/libraries/etc that an application requires into a single location and then package that for re-distribution. This includes any libraries that need to be linked to.

This avoids all of the pain of package managers/dependenncies, differences in OS distros etc.

For example – you write an awesome new Python application that requires Python 3. But you need to run it on an OS that still only ships with Python 2.6 (if you are lucky…).

So you use Omnibus to create an RPM/PKG that contains Python 3, all of the python modules, postgres database, nginx, gunicorn, and your code. All rolled up into a single OS package you can install by just installing the package.

And since it uses Vagrant the whole thing can be scripted in a Jenkins server and you can build OS packages that target different platforms at the same time.

It is pretty awesome. Definately a good read.

Python Tools

This is a list of some of the python tools/libraries that I tend to use regularly. I am sure I forgot a few.

Virtualenv + virtualenvwrapper

This needs no introduction, if you are developing in Python – you should be using virtualenv to create isolated environments.

One thing I like about virtualenv that was not immediately obvious. You don’t need to run $ source bin/activate if you are calling it from a shell script/cron/supervisor. You can simply provide the full path to the virtualenv Python.

From the virtualenv documentation:

If you directly run a script or the python interpreter from the virtualenv’s bin/ directory (e.g. path/to/env/bin/pip or /path/to/env/bin/python script.py) there’s no need for activation.

I also use virtualenvwrapper to add some helpers to make life easier.

It organizes all of my virtualenvs in a single location and then I can simply execute $ work on <env_name> and it will activate the virtualenv and cd me to the directory I set for the project.

boto

Ok – if you aren’t using AWS, boto won’t really help you. It is the Python SDK for AWS. Also supports MWS as well (although I haven’t tried that out yet)

Docopt

Writing command line tools always starts out the same. Look up the optparse docs and write the boilerplate code to add the arguments and (maybe) the help text. Docopt is a great idea. Instead of writing any boilerplate code, you just create the usage documentation at the start of the file.

Once that is defined, you pass the __doc__ value into the library and then you are done.

"""
Usage:
    script.py move --city=<city_name>
    script.py stay [--city=<city_name>]
    script.py (-h|--help)
    script.py --version

Options:
    -h --help            Show this screen.
    --version            Show version.
    --city=<city_name>   City name [default: Victoria]
"""

args = docopt(__doc__, version='1.0.0')

city = args['--city']

if args['move']:
    print "I'm moving to {0}!".format(city)
elif args['stay']:
    print "I'll stay here in {0} then".format(city)

There is a reference implementation in Python. But there are also implementations in many other languages such as Ruby, Go, PHP, Bash and others (full list at the bottom))

Try out docopt in your browser here

Fabric

Fabric is an extremely useful tool for automating common tasks on remote servers.

If you have a text file that contains some shell commands you are copy + pasting onto a remote server. Turn that into a fabric script. Not only will it make it easier to execute, but it will also let you check that box for “Infrastructure as Code”

Why IAM Roles for EC2 Is a Big Deal

Yesterday – Amazon announced IAM roles for EC2 instances (IAM roles for EC2 instances – Simplified Secure Access to AWS service APIs from EC2). It may not seem like a shiny new feature. But it is pretty fantastic.

Before the IAM roles were released – anyone writing an application for AWS and deploying it on EC2 had to deal with getting their AWS Access Key and Secret Key into their application. These are just as important as a database password and should never be put into source version control.

Some of the approaches I’ve seen used were:

  • A configuration file placed on the EC2 instance with the keys.
  • Putting the keys into the instance user-data at launch.
  • Pushing the keys to the application on first launch.

Sure these approaches work – but none of them will work well if you have to rotate your keys (which is a best practice for any application regardless of where it is hosted – especially if someone who knows the keys/passwords leaves the organization). They can also be problematic when auto-scaling is introduced. Especially when the keys must be pushed to your application from another server.

With IAM roles for EC2 – you define your permissions much like an IAM user (which you were already using….right?!). And then simply attach the role to an instance when you launch it.

The EC2 instance has a special internally accessible url you can query to get instance metadata. This is where the AWS Access Key and Secret Key will be available.

How is this different for security than the existing ways?

Amazon will automatically rotate these credentials multiple times a day. You don’t ever need to worry about rotating your keys again. Even when an employee leaves the organization.

Here is an example of how the code would have looked before to access an AWS resource. Note: This example is from the AWS Blog and is in Java. The boto library for Python hasn’t been updated to include support for IAM roles, although it appears to be in-progress and should be updated soon.

1
2
3
4
5
  AWSCredentials creds = new BasicAWSCredentials(
    "AKIAIOSFODNN7EXAMPLE",
    "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY");
  CredentialProvider session = new STSSessionCredentialsProvider(creds);
  AmazonDynamoDB dynamo = new AmazonDynamoDBClient(session);

Here is how the new code will look with the latest AWS SDKs:

1
  AmazonDynamoDB dynamo = new AmazonDynamoDBClient();

It’s easier to code – and deployment is a breeze. You don’t have to try and get those credentials into your application via OS environment variables/config files/etc.

When developing locally – you will not have access to the instance metadata (unless you develop on EC2), so take advantage of the SDK’s default way it loads credentials when not provided explicitely.

For Python/Boto – when not specified, it looks for local environment variables:

1
2
AWS_ACCESS_KEY_ID - Your AWS Access Key ID
AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key

Alternatively – you can use a boto config file located in either:

  • /etc/boto.cfg (for site-wide settings, all users)
  • ~/.boto (for user-specific settings)
1
2
3
[Credentials]
aws_access_key_id = <your access key>
aws_secret_access_key = <your secret key>

More details on configuring boto: BotoConfig

This feature may not seem like a huge savings for configuration/deployment. But it’s huge for security and longer-term maintenance. And that helps you get back to writing code for new stuff.

Restricting Access to Servers Behind an Elastic Load Balancer

We’re building a new system at work and I’ve been doing much of the deployment work. This new system will be deployed completely in AWS.

Part of this new system is a RESTful API that we only want to expose to our existing Application servers hosted in a datacenter elsewhere.

Attempt 1 – EC2 Security Groups

The security groups in AWS are great – and you should use them to restrict access to all of your servers so only the specific ports you need are open to the specific addresses (or other security groups) that need them.

Naturally my first stop was to create a nice restrictive security group to allow only SSH and HTTP access from our office for testing. Perfect – problem solved! No one outside the office could access the machine.

Now being the smart engineer that I am deploying to AWS – I started a few instances in more than 1 availability zone and then went and created an Elastic Load Balancer.

The ELB uses a health check to determine when an EC2 instance is healthy – I set it up to hit a part of our API that would prove to me the server was available.

Immediately all EC2 servers reported as un-healthy.

Off to google I go – and after far more searching than I should have had to do I discovered you need to add a special “amazon-elb” security group to your EC2 security group to allow the ELB to communicate with it.

(Note: the name of this group should be “amazon-elb”, but it also appears in the Elastic Load Balancer tab in the AWS Management Console).

So I amended the SG and health-checks were all green. My scalable application was ready. So I sent the URL to my wife at her office and expected her to say “Nope – doesn’t work”.

But she didn’t – it works for her.

See – Elastic Load Balancers forward all traffic to your EC2 instance as if its coming from them. Thus bypassing your perfectly crafted SG on your EC2.

Second Try – Security Group on your ELB!

This one is short. Elastic Load Balancers don’t allow you to specify a security group (except in a VPC – more on that later). The ELB is designed to be completely open and available to the world. Not great if you want to use them for internal-facing applications.

Third time’s a charm – Apache Deny/Allow

Elastic Load Balancers do set the X-Forwarded-For header on every request. So my next thought was to check this header and if the IP address was in my white-list, let the user through.

(Note: The X-Forwarded-For header could be faked, so its not a perfect solution).

The way ELBs work – you may see multiple ELB IP addresses in the header if the request is routed to us-east-1a and you had no healthy instances in that AZ, it would get sent to an ELB in us-east-1b where you may have a healthy instance. Therefore the header would look like:

1
  X-Forwarded-For: 251.21.25.42,81.212.52.12,81.214.55.16

Where 81.212.52.12 and 81.214.55.16 are ELB IPs, and 251.21.25.42 is the client’s IP.

So using some RegEx expressions to extract the client’s IP:

1
2
SetEnvIf REMOTE_ADDR "(.+)" CLIENTIP=$1
SetEnvIf X-Forwarded-For "^([0-9.]+)" CLIENTIP=$1

Now you can check that IP against your white list like:

1
SetEnvIf CLIENTIP "192.152.52.12" allowed_in

And use the Apache Allow/Deny like this:

1
2
3
Order deny,allow
Deny from all
Allow from env=allowed_in

The problem with this approach is that when deployed, traffic from non-whitelist IPs still comes to your Apache server and could still overload it. And of course this doesn’t play nice with Elastic Load Balancers. Their health check won’t be X-Forwarded-For any client (let alone one on the white-list).

Virtual Private Cloud

Using a VPC is one way to go. Not only does it let you assign a security group to your ELB (!) but you can also control both incoming AND outgoing network traffic. Regular EC2 security groups can only limit incoming traffic.

VPCs don’t cost anything above regular EC2 costs (unless you use a VPN device). I have some concerns over the robustness of using a VPC however. All traffic that routes outside the VPC (such as to AWS resources – S3/SimpleDB/Dynamo etc) – must go through a single NAT instance. You can only have a single NAT for your VPC, and it is located in one of the AZs of the region. If you lose that NAT instance, all traffic in all AZs that must reach the outside world will fail until you can bring a new NAT instance online.

My solution

A combination of the ideas above. Use security groups on your EC2 instances that only allow traffic from an ELB. Then use Apache to only allow traffic that was X-Forwarded-For a white-listed IP. You also need to configure Apache to allow any traffic through that was not X-Forwarded-For at all. This allows the health check through only. If someone tries to access the instance directly (not though the ELB) – the EC2 Security Group will stop them.

The Apache configuration would look like this:

1
2
3
4
5
6
SetEnvIf REMOTE_ADDR "(.+)" CLIENTIP=$1
SetEnvIf X-Forwarded-For "^([0-9.]+)" CLIENTIP=$1
SetEnvIf X-Forwarded-For "^$" is_not_forwarded
...
# Whitelist
SetEnvIf CLIENTIP "192.152.52.12" allowed_in

And the Apache Allow/Deny:

1
2
3
4
Order deny,allow
Deny from all
Allow from env=allowed_in
Allow from env=is_not_forwarded