Reference answer
Enabling self-service infrastructure provisioning means empowering developers (or other internal teams) to provision, modify, and destroy infrastructure (like databases, environments, S3 buckets, etc.) on-demand, safely, and without manual ticketing — while still respecting security, cost, and compliance boundaries.
It's one of the key pillars of Platform Engineering and Internal Developer Platforms.
Let's break it down with practical, real-world guidance
The Goal:
Developers can provision infrastructure (compute, storage, databases, queues, etc.) independently, through a safe, auditable, and policy-driven interface — without needing to know Terraform or cloud internals.
✅ What Self-Service Infra Looks Like (From the Developer's POV):
A developer should be able to:
- Choose a template (e.g., “PostgreSQL + S3 bucket + Redis”)
- Fill in required inputs (e.g., team name, region, size)
- Click a button or run a CLI command
- Wait a few minutes and get everything provisioned
- Have ownership, logs, and cost attribution automatically set
- No need to open a Jira ticket or message the DevOps team
Key Components to Enable Self-Service Provisioning
1. Infrastructure as Code (IaC)
You need reproducible, version-controlled infrastructure definitions.
Popular tools:
- Terraform – most common choice
- Pulumi – IaC using real programming languages
- Crossplane – Kubernetes-native provisioning
- CloudFormation – AWS-native (but less portable)
These IaC modules should be:
- Reusable (modular)
- Parameterized (using variables)
- Versioned and stored in Git
Example: A terraform-module-rds-postgres with inputs like DB size, env, and tags.
2. Workflow Automation Engine
Something needs to run the IaC logic based on user inputs.
Options:
- GitOps-based: Developer submits a PR with infra request → triggers ArgoCD or Flux
- Workflow-based: Use tools like Terraform Cloud, Atlantis, Spacelift, GitHub Actions
- Custom portal or CLI triggers the job (e.g., with a backend service)
The workflow:
- Validates inputs
- Runs plan + apply
- Notifies the user (via Slack, email, dashboard)
3. Abstraction Layer / Developer Interface
Devs shouldn't need to touch raw Terraform or YAML.
Options:
- Developer portal (e.g., Backstage) – Select modules via UI
- Custom Web UI – Form-based, simple dropdowns
- Internal CLI – e.g., platform create-db --team finance --env staging
- Slack bots – For lightweight use cases (e.g., ephemeral test envs)
Keep the experience frictionless and intuitive.
4. Policy Enforcement (Guardrails)
You must make it safe. Use Policy as Code to enforce:
- Naming conventions
- Cost limits (e.g., instance sizes, quotas)
- Tagging standards (owner, environment, cost center)
- Region restrictions
- Allowed services and versions
Tools:
- OPA / Gatekeeper
- Conftest
- Checkov, tfsec, Sentinel
Example: No team can provision unencrypted S3 buckets.
5. Secrets & Identity Integration
When provisioning things like DBs, queues, or VMs, secrets and credentials must be handled securely.
- Use Vault, AWS Secrets Manager, or External Secrets Operator
- Bind access to the provisioning user's identity/team
- Never hardcode credentials in IaC
6. Auditing, Ownership & Cost Attribution
You want traceability — who provisioned what, when, and at what cost?
Best practices:
- Auto-tag resources (team, owner, env, project_id, cost_center)
- Store logs of all provisioning activities
- Show usage and cost reports via dashboards (e.g., in Backstage, Grafana, or FinOps tools)
7. Lifecycle Management
Provisioning is only half the story. You also need to support:
- Updates (e.g., increase DB size)
- Deletions (e.g., clean up dev environments)
- TTL policies (e.g., destroy after 72h)
Automate expiry, garbage collection, and drift detection.