About the author: I'm Charles Sieg, a cloud architect and platform engineer who builds apps, services, and infrastructure for Fortune 1000 clients through Vantalect. If your organization is rethinking its software strategy in the age of AI-assisted engineering, let's talk.
Every multi-account AWS organization needs a baseline IAM role in every member account. Cross-account access for security tooling, centralized billing queries, incident response, compliance scanning: the use cases pile up fast. I have deployed this pattern across six enterprise organizations, each with 50 to 400 member accounts. The approach that survives at scale is Terraform managing a CloudFormation StackSet from the management account, with service-managed permissions and auto-deployment enabled. New accounts get the role automatically. No tickets. No manual steps. No drift.
This is the reference for engineers who need to ship this pattern in production. Not a getting-started walkthrough. I assume you already run Terraform against your management account and understand IAM role trust policies. What follows is the architecture, the full Terraform configuration, the CloudFormation template, the operational gotchas, and the failure modes that will bite you if you skip them.
Why StackSets for Cross-Account IAM
The Problem
You have an AWS Organization with dozens or hundreds of member accounts. You need an IAM role in every single one. The role trusts the management account (or a dedicated security account) so that a central automation can assume it. Common scenarios:
- Security Hub aggregation: a central Lambda assumes a role in each member account to pull findings.
- Cost Explorer queries: a billing dashboard assumes roles across accounts for consolidated reporting.
- Incident response: an IR automation assumes roles in compromised accounts to isolate resources.
- Compliance scanning: AWS Config or a third-party tool needs read access in every account.
You could create these roles manually. You could write a script that iterates over every account and calls aws iam create-role. You could use Terraform with multiple provider aliases. All of these approaches break the moment someone creates a new account and forgets to run the script.
Why Not Pure Terraform
The natural instinct for a Terraform shop is to write an aws_iam_role resource with a for_each over a list of account IDs, each with its own provider alias configured via assume_role. This works at small scale. At 50+ accounts, it falls apart:
| Approach | Accounts Supported | Auto-Deploy New Accounts | Terraform State Size | Provider Configuration |
|---|---|---|---|---|
| Pure Terraform with provider aliases | ~20 before pain | No | Grows linearly | One provider block per account |
| Terraform + StackSet (self-managed) | 1,000+ | No | Minimal | Single management account provider |
| Terraform + StackSet (service-managed) | 100,000+ | Yes | Minimal | Single management account provider |
The provider alias approach requires a provider block for every account. Terraform has to initialize credentials for every provider on every plan/apply. State files balloon. Adding a new account means editing Terraform code. It does not scale.
StackSets solve this. You define the IAM role once in a CloudFormation template, wrap it in a StackSet, target the entire organization or specific OUs, and every current and future account gets the role. Terraform manages the StackSet itself. CloudFormation handles the fan-out.
Permission Models: Self-Managed vs. Service-Managed
The most important decision when creating a StackSet is the permission model. Get this wrong and you will spend hours debugging role assumption failures.
| Aspect | Self-Managed | Service-Managed |
|---|---|---|
| IAM role creation | You create AWSCloudFormationStackSetAdministrationRole in management account and AWSCloudFormationStackSetExecutionRole in every target account |
AWS creates roles automatically via Organizations integration |
| Target specification | List of account IDs | Organizational Units (OUs) or entire organization |
| Auto-deployment to new accounts | No | Yes (configurable) |
| Delegated administration | No | Yes |
Terraform permission_model |
"SELF_MANAGED" (default) |
"SERVICE_MANAGED" |
| Prerequisites | Manual role creation in every target account | Trusted access enabled for CloudFormation in Organizations |
When to Use Self-Managed
Self-managed permissions make sense when you deploy to a small, fixed set of accounts outside your organization, or when you need granular control over which execution role the StackSet assumes in each account. I rarely use this model in production. The overhead of pre-creating roles in every target account defeats the purpose of using StackSets in the first place.
When to Use Service-Managed
Service-managed is the right choice for organizational deployments. AWS handles the role plumbing. You target OUs instead of individual account IDs. New accounts added to the OU get stack instances automatically. This is the model I use for every engagement.
Enabling Trusted Access
Before you can use service-managed permissions, you need to enable trusted access for CloudFormation StackSets in AWS Organizations. This is a one-time operation from the management account:
aws organizations enable-aws-service-access \
--service-principal member.org.stacksets.cloudformation.amazonaws.com
Or in Terraform:
resource "aws_organizations_organization" "this" {
aws_service_access_principals = [
"member.org.stacksets.cloudformation.amazonaws.com",
]
feature_set = "ALL"
}
Skip this step and every StackSet creation will fail with a cryptic error about insufficient permissions. I have watched three different teams burn hours on this.
The Architecture
Here is how the pieces fit together. Terraform runs in the management account, creates a StackSet with a CloudFormation template that defines an IAM role, and targets the root OU (or specific OUs). CloudFormation deploys a stack instance to every member account in those OUs. Each stack instance creates the IAM role locally in that member account.
The Management Account
Terraform runs here. It creates the aws_cloudformation_stack_set resource and the aws_cloudformation_stack_instances resource. The management account holds the StackSet definition and orchestrates deployments. It does not receive a stack instance itself; StackSets with service-managed permissions skip the management account.
aws_iam_role resource.Auto-Deployment
With auto_deployment enabled, any new account added to a targeted OU automatically receives a stack instance. The IAM role appears in the new account within minutes. No Terraform run required. No human intervention. This is the behavior that makes StackSets worth the complexity over pure Terraform.
You also configure what happens when an account leaves the OU. Setting retain_stacks_on_account_removal = true keeps the IAM role in the account even after removal. I recommend this for security roles: you do not want to lose your incident response access just because someone moved an account to a different OU.
The Terraform Configuration
Here is the complete Terraform configuration. I will walk through each resource.
The StackSet Resource
resource "aws_cloudformation_stack_set" "cross_account_role" {
name = "cross-account-security-role"
description = "Deploys a cross-account IAM role to all member accounts"
permission_model = "SERVICE_MANAGED"
auto_deployment {
enabled = true
retain_stacks_on_account_removal = true
}
operation_preferences {
failure_tolerance_percentage = 10
max_concurrent_percentage = 25
region_concurrency_type = "PARALLEL"
}
capabilities = ["CAPABILITY_NAMED_IAM"]
template_body = file("${path.module}/templates/cross-account-role.yaml")
parameters = {
TrustedAccountId = data.aws_caller_identity.current.account_id
RoleName = "OrganizationSecurityAuditRole"
ExternalId = var.external_id
}
lifecycle {
ignore_changes = [administration_role_arn]
}
}
Key decisions in this configuration:
permission_model = "SERVICE_MANAGED"enables Organizations integration. AWS handles all the cross-account role plumbing.auto_deployment.enabled = trueensures new accounts get the role automatically.retain_stacks_on_account_removal = truekeeps the role when accounts move between OUs.capabilities = ["CAPABILITY_NAMED_IAM"]is required because the CloudFormation template creates a named IAM role. Omit this and the deployment fails silently in every account.lifecycle.ignore_changesonadministration_role_arnprevents a Terraform update loop. When using service-managed permissions, AWS sets this field automatically, and Terraform tries to clear it on every apply.
The Stack Instances
resource "aws_cloudformation_stack_instances" "all_accounts" {
stack_set_name = aws_cloudformation_stack_set.cross_account_role.name
deployment_targets {
organizational_unit_ids = [data.aws_organizations_organization.current.roots[0].id]
}
regions = var.target_regions
operation_preferences {
failure_tolerance_percentage = 10
max_concurrent_percentage = 25
region_concurrency_type = "PARALLEL"
}
}
I target the organization root to cover every account. If you only need the role in specific OUs (production accounts, security accounts), replace the root ID with those OU IDs.
The regions parameter specifies which regions get stack instances. IAM is global, so you only need one region. I use ["us-east-1"] for IAM-only StackSets. If your template includes regional resources (CloudWatch alarms, Config rules), specify every region you operate in.
Operation Preferences
Operation preferences control how fast the StackSet deploys and how many failures it tolerates before stopping.
| Parameter | Description | Recommended Value |
|---|---|---|
failure_tolerance_percentage |
Percentage of accounts that can fail before the operation stops | 10% for initial deploy, 0% for updates |
max_concurrent_percentage |
Percentage of accounts deployed to simultaneously | 25% for large orgs, 100% for small orgs |
region_concurrency_type |
Deploy regions in parallel or sequentially | PARALLEL for IAM (global resource), SEQUENTIAL for regional resources |
region_order |
Order of region deployment | Only needed with SEQUENTIAL |
For an IAM-only deployment, I run parallel with 25% concurrency and 10% failure tolerance. This deploys to a quarter of your accounts simultaneously and stops if more than 10% fail. Aggressive enough to finish in minutes for a 200-account org; conservative enough to catch systemic problems before they hit every account.
The CloudFormation Template
This is the template that CloudFormation deploys to every member account. It creates a single IAM role with a trust policy that allows the management account (or any account you specify) to assume it.
AWSTemplateFormatVersion: "2010-09-09"
Description: >
Cross-account IAM role deployed via StackSet.
Grants read-only security audit access to a trusted account.
Parameters:
TrustedAccountId:
Type: String
Description: AWS account ID allowed to assume this role
RoleName:
Type: String
Default: OrganizationSecurityAuditRole
Description: Name of the IAM role to create
ExternalId:
Type: String
Description: External ID for additional assume-role security
Resources:
CrossAccountRole:
Type: AWS::IAM::Role
Properties:
RoleName: !Ref RoleName
MaxSessionDuration: 3600
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
AWS: !Sub "arn:aws:iam::${TrustedAccountId}:root"
Action: "sts:AssumeRole"
Condition:
StringEquals:
"sts:ExternalId": !Ref ExternalId
ManagedPolicyArns:
- "arn:aws:iam::aws:policy/SecurityAudit"
- "arn:aws:iam::aws:policy/ReadOnlyAccess"
Tags:
- Key: ManagedBy
Value: StackSet
- Key: Purpose
Value: CrossAccountSecurityAudit
Outputs:
RoleArn:
Description: ARN of the created cross-account role
Value: !GetAtt CrossAccountRole.Arn
Trust Policy Design
The trust policy is the most security-sensitive part of this configuration. Three elements matter:
| Element | Purpose | Recommendation |
|---|---|---|
Principal |
Who can assume this role | Specify the exact account ID, never use * |
Condition: sts:ExternalId |
Prevents confused deputy attacks | Always include for cross-account roles |
MaxSessionDuration |
How long assumed sessions last | 3600 seconds (1 hour) for automated tooling |
The sts:ExternalId condition is optional but strongly recommended. Without it, any principal in the trusted account that has sts:AssumeRole permissions can assume your role. With it, the caller must know the external ID. This prevents confused deputy scenarios where a compromised service in the trusted account pivots into your member accounts.
Choosing Managed Policies
For a security audit role, SecurityAudit and ReadOnlyAccess cover most use cases. SecurityAudit grants the permissions that AWS Security Hub, GuardDuty, and Config need. ReadOnlyAccess adds read permissions for services that SecurityAudit misses.
If you need write access (incident response, automated remediation), create a custom policy with precisely scoped permissions. Never attach AdministratorAccess to a StackSet-deployed role. One compromised credential in the trusted account gives an attacker admin access to every member account in the organization.
Delegated Administration
Running Terraform against the management account is a security concern. The management account has god-mode permissions over the entire organization. Limiting direct access to it is an IAM best practice. StackSets support delegated administration: you register a member account (typically a dedicated security or infrastructure account) as a delegated administrator for CloudFormation StackSets.
resource "aws_organizations_delegated_administrator" "stacksets" {
account_id = var.delegated_admin_account_id
service_principal = "member.org.stacksets.cloudformation.amazonaws.com"
}
Once registered, you run Terraform from the delegated admin account with call_as = "DELEGATED_ADMIN":
resource "aws_cloudformation_stack_set" "cross_account_role" {
name = "cross-account-security-role"
permission_model = "SERVICE_MANAGED"
call_as = "DELEGATED_ADMIN"
# ... rest of configuration
}
This moves StackSet management out of the management account entirely. I recommend this for every organization with more than 10 accounts. The management account should be a locked vault, not a daily driver.
Operational Considerations
Drift Detection
StackSets support drift detection across all stack instances. If someone manually modifies the IAM role in a member account (adds a policy, changes the trust policy), drift detection catches it.
aws cloudformation detect-stack-set-drift \
--stack-set-name cross-account-security-role \
--call-as DELEGATED_ADMIN
Run this on a schedule. I trigger it weekly via a CloudWatch Events rule. Drift in IAM roles is a security finding: someone bypassed the managed deployment to make a manual change. Investigate every instance.
Updates and Rollbacks
When you update the CloudFormation template (change the managed policies, modify the trust policy), the StackSet rolls the update across all accounts. The operation_preferences control the rollout speed and failure tolerance.
For IAM changes, I recommend:
- Set
failure_tolerance_percentage = 0for updates. IAM changes should succeed everywhere or nowhere. - Set
max_concurrent_percentage = 10for updates. Roll slowly. If something is wrong with the new template, you want to catch it early. - Test the template change in a sandbox account first by deploying a standalone stack, before pushing through the StackSet.
CloudFormation handles rollbacks per-account. If a stack instance fails to update, it rolls back to the previous template. The StackSet operation continues to other accounts. You will see OUTDATED status on the failed instances, indicating they are running the old template.
Account Removal Behavior
When retain_stacks_on_account_removal = true, removing an account from a targeted OU leaves the IAM role in place. The stack instance is removed from the StackSet, but the IAM role persists in the account. This is the correct behavior for security roles.
When retain_stacks_on_account_removal = false, CloudFormation deletes the stack (and the IAM role) when the account leaves the OU. Use this for non-critical resources where cleanup matters more than continuity.
Limits and Quotas
StackSets have specific limits that matter at scale:
| Quota | Default Value | Adjustable |
|---|---|---|
| Stack sets per administrator account | 100 | Yes |
| Stack instances per stack set | 100,000 | Yes |
| Concurrent operations per Region per administrator | 10,000 | Yes |
| Stack sets per delegated administrator | 100 | Yes |
Maximum MaxConcurrentCount |
Varies by org size | No |
| Concurrent StackSet operations | 1 per StackSet | No |
| Drift detection operations | 1 per StackSet at a time | No |
The single concurrent operation limit per StackSet is the most operationally constraining. If a StackSet update is in progress and you try to run drift detection (or another update), the second operation fails. Sequence your operations carefully, especially in CI/CD pipelines where multiple Terraform applies might target the same StackSet.
Common Failure Modes
After deploying this pattern across multiple organizations, these are the failures I have seen most frequently:
| Failure | Cause | Fix |
|---|---|---|
CAPABILITY_NAMED_IAM error |
Template creates a named IAM resource but the capability is not declared | Add capabilities = ["CAPABILITY_NAMED_IAM"] to the StackSet resource |
| Trust policy rejection | Trusted account ID is wrong or the external ID does not match | Verify parameters passed to the CloudFormation template |
StackSetNotFoundException |
Targeting an OU that does not exist or that the StackSet does not cover | Verify OU IDs with aws organizations list-organizational-units-for-parent |
Perpetual Terraform diff on administration_role_arn |
Service-managed StackSets auto-populate this field; Terraform tries to reset it | Add lifecycle { ignore_changes = [administration_role_arn] } |
| Deployment skips management account | By design: service-managed StackSets never deploy to the management account | Create the role separately in the management account with aws_iam_role |
| New account does not get the role | Account was added to an OU not targeted by the StackSet, or auto-deployment is disabled | Verify auto_deployment.enabled = true and the account's OU is in deployment_targets |
| Drift detection times out | Large number of accounts with many resources per stack | Increase detection timeout; run during off-peak hours |
The lifecycle.ignore_changes on administration_role_arn deserves emphasis. This is a known issue in the Terraform AWS provider. Service-managed StackSets set this field automatically. Without the lifecycle block, every terraform plan shows a diff, and every terraform apply triggers an unnecessary update. I have seen this cause CI/CD pipelines to run StackSet updates on every commit.
Putting It All Together
Here is the complete Terraform module structure:
modules/stackset-iam-role/
main.tf # StackSet and instance resources
variables.tf # Input variables
templates/
cross-account-role.yaml # CloudFormation template
The variables file:
variable "role_name" {
type = string
default = "OrganizationSecurityAuditRole"
description = "Name of the IAM role to create in each member account"
}
variable "external_id" {
type = string
description = "External ID for assume-role condition"
sensitive = true
}
variable "target_regions" {
type = list(string)
default = ["us-east-1"]
description = "Regions to deploy stack instances"
}
variable "target_ou_ids" {
type = list(string)
default = []
description = "OU IDs to target. Empty targets the organization root."
}
variable "managed_policy_arns" {
type = list(string)
default = [
"arn:aws:iam::aws:policy/SecurityAudit",
"arn:aws:iam::aws:policy/ReadOnlyAccess"
]
description = "Managed policy ARNs to attach to the role"
}
This module deploys to six organizations I manage. The CloudFormation template has not changed in over a year. New accounts get the role within five minutes of creation. Zero tickets. Zero manual steps.
Cross-reference this with the AWS IAM: An Architecture Deep-Dive for deeper coverage of IAM policy evaluation, and the Infrastructure as Code: CloudFormation, CDK, Terraform, and Pulumi Compared for a broader comparison of IaC tools.
Key Patterns
- Use service-managed permissions. Self-managed requires pre-creating roles in every target account. Service-managed handles it through Organizations integration. Always service-managed for organizational deployments.
- Enable auto-deployment. The whole point of StackSets is removing manual steps. Auto-deploy ensures new accounts get the role without any human intervention.
- Retain stacks on account removal. For security roles, losing access when an account moves between OUs is the wrong behavior. Retain the stacks.
- Use the
lifecycle.ignore_changesblock. Theadministration_role_arndiff is a known provider issue. Prevent unnecessary StackSet updates. - Always include an external ID. Cross-account role assumption without an external ID is a confused deputy vulnerability. It costs nothing to include.
- Move to delegated administration. Keep the management account locked down. Run StackSet operations from a delegated admin account.
- Run drift detection on a schedule. Manual IAM changes in member accounts are a security risk. Catch them early.
- Set failure tolerance to zero for updates. IAM changes should be all-or-nothing. A partial rollout where half your accounts have different permissions is worse than a failed deployment.
Additional Resources
- AWS CloudFormation StackSets Documentation
- StackSets Prerequisites
- Service-Managed Permissions for StackSets
- CloudFormation StackSets Delegated Administration
- Terraform awscloudformationstack_set Resource
- Terraform awscloudformationstack_instances Resource
- Implementing Least Privilege with CloudFormation StackSets
- StackSets Quotas and Limits
Let's Build Something!
I help teams ship cloud infrastructure that actually works at scale. Whether you're modernizing a legacy platform, designing a multi-region architecture from scratch, or figuring out how AI fits into your engineering workflow, I've seen your problem before. Let me help.
Currently taking on select consulting engagements through Vantalect.