I am a strong proponent of Infrastructure as Code. I automate my infrastructure provisioning with CloudFormation and Terraform templates. My CloudFormation templates are written in YAML.
I used to write everything in CloudFormation but now write everything I can in Terraform. The original reason was to have templates which allow provisioning resources other than just AWS, i.e. Azure, Google Cloud, Akamai. However, I have never really needed anything outside of what AWS provides but continue to use Terraform because I like the template syntax better and I find its variables and modularization easier to work with (as opposed to CloudFormation’s nested stacks).
I write both CloudFormation and Terraform templates in WebStorm because WebStorm has built-in support for YAML and a Terraform plugin which provides syntax highlighting and other goodies. I can also set up a WebStorm configuration which runs terraform apply
on the current template with a keybinding.
I am also a proponent of Immutable Servers where compute instances / servers / containers are deployed once and, essentially, never patched or updated, avoiding Configuration Drift. Rather than apply a patch, new instances are deployed, traffic gradually moved to the new instances, and the old instances terminated. I have done Green/Blue deployments using either load balancers or a Weighted Round-Robin routing policy with Route 53 DNS to gradually move traffic. CodeDeploy has also recently gotten built-in support for Green/Blue deployments and I am actively exploring that option.
With Immutable Servers, I don’t make a lot of use of Chef, Puppet, or Ansible. I don’t care for Chef and Ansible because they are procedural vs. declarative and I don’t like Puppet because advanced tasks require writing Ruby code. And, for the most part, I don’t like any of them because they default to a client/server architecture which actually just adds more infrastructure that needs to be managed.
Instead, I’ve become a huge fan of containerization with Docker being my primary solution. I combine Docker with Hashicorp’s Packer to create Amazon Machine Images (AMIs) which are, currently, the final artifact deployed to production.
I’m looking into using AWS Elastic Container Service or AWS Elastic Kubernetes Service to manage elasticity of container clusters in production. Containers are the future and managed elasticity is the way to go. The new AWS Fargate also provides some interesting new capabilities.
Sandbox, development, and QA environments use very small MySQL instances in RDS or local instances of MySQL. Production uses a highly-available Aurora cluster spread across multiple Availability Zones (AZs). I used to use DynamoDB for writing events and triggering Lambda functions but found Aurora to be more cost-effective in production and Aurora now also has the ability to trigger Lambda functions. I’m not the only one to notice how cost-effective Aurora is.
I’m also comfortable setting up a Continuous Integration / Continuous Deployment (CI/CD) workflow with Jenkins, CircleCI, or CodeBuild using GitHub or CodeCommit triggers. AWS CodePipeline pulls the code from CodeCommit and invokes AWS CodeBuild to build the app, run tests, and integrate with Packer to create an AMI. CodeDeploy can then apply a CloudFormation or Terraform template to spin up an EC2 instance with the AMI.
I make liberal use of CloudWatch alarms along with SNS notifications to monitor for problems in production. I’ve recently been introduced to New Relic and am finding it very helpful analyzing metrics and identifying problematic areas.
Logging is always a challenge in production environments. CloudWatch logs do the job for the most part but I’ve started working with 3rd party tools like PaperTrail for real-time offsite logging and Logstash and Elasticsearch for log analysis.
I’ve worked with VictorOps to manage on-call schedules and integrate with other services like Slack. I’ve tinkered a bit with it’s transmogrification feature to make AWS alarms more readable and actionable.
I use CloudTrail to track user activity and API calls within the environment. It also comes in handy when you need to roll something back to a previous value that no one wrote down before changing it.
I proactively monitor and analyze costs with Cost Explorer to find opportunities to reduce AWS expenses. I use spot instances whenever possible for batch jobs, CI/CD, and other intermittent activities.
I use Lucidchart to create network diagrams of my AWS infrastructure and Cloudcraft to create cool 3D diagrams of the entire infrastructure.