Reference answer
Operators extend Kubernetes functionality by combining Custom Resource Definitions (CRDs) with custom controllers that implement domain-specific logic:
Custom Resource (Desired State) → Controller (Reconciliation Logic) → Kubernetes Resources (Actual State)
Design Philosophy
Declarative API Design: Users describe what they want (desired state) rather than how to achieve it (imperative commands).
Controller Pattern: Continuously observe the current state and take actions to make it match the desired state.
Kubernetes-Native Integration: Leverage existing Kubernetes primitives and patterns for consistency and reliability.
CRD Design Principles
1. Resource Modeling
Define clear abstractions that map to your domain:
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: webapps.example.com
spec:
group: example.com
versions:
- name: v1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec: # Desired state
type: object
properties:
replicas:
type: integer
minimum: 1
maximum: 10
image:
type: string
database:
type: object
properties:
host:
type: string
port:
type: integer
required: ["host", "port"]
required: ["replicas", "image", "database"]
status: # Observed state
type: object
properties:
ready:
type: boolean
replicas:
type: integer
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
lastTransitionTime:
type: string
format: date-time
reason:
type: string
message:
type: string
Key design elements:
- spec: User-defined desired state with validation constraints
- status: Controller-managed observed state and conditions
- Validation: OpenAPI schema ensures data integrity
- Versioning: Support for API evolution and backward compatibility
2. Status and Conditions Design
Follow Kubernetes conventions for status reporting:
status:
ready: true
replicas: 3
conditions:
- type: "Available"
status: "True"
lastTransitionTime: "2023-10-01T10:00:00Z"
reason: "MinimumReplicasAvailable"
message: "Deployment has minimum availability"
- type: "Progressing"
status: "True"
lastTransitionTime: "2023-10-01T10:00:00Z"
reason: "NewReplicaSetAvailable"
message: "ReplicaSet has successfully progressed"
Controller Logic Design
1. Reconciliation Loop Pattern
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// 1. Fetch the custom resource
webapp := &webappv1.WebApp{}
err := r.Get(ctx, req.NamespacedName, webapp)
if err != nil {
return ctrl.Result{}, client.IgnoreNotFound(err)
}
// 2. Determine desired state from spec
desiredDeployment := r.buildDeployment(webapp)
desiredService := r.buildService(webapp)
// 3. Get current state
currentDeployment := &appsv1.Deployment{}
err = r.Get(ctx, types.NamespacedName{Name: webapp.Name, Namespace: webapp.Namespace}, currentDeployment)
// 4. Reconcile differences
if errors.IsNotFound(err) {
// Create new deployment
err = r.Create(ctx, desiredDeployment)
} else if err == nil {
// Update existing deployment if needed
if !r.deploymentEqual(currentDeployment, desiredDeployment) {
err = r.Update(ctx, desiredDeployment)
}
}
// 5. Update status based on current state
r.updateStatus(ctx, webapp)
// 6. Return reconciliation result
return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
}
Key reconciliation principles:
- Idempotency: Multiple reconciliations should have the same effect
- Error handling: Distinguish between retriable and permanent errors
- Status updates: Always reflect current observed state
- Requeue strategy: Balance responsiveness with resource usage
2. Owner References for Resource Management
// Set owner reference for garbage collection
err = ctrl.SetControllerReference(webapp, deployment, r.Scheme)
if err != nil {
return ctrl.Result{}, err
}
Benefits of owner references:
- Automatic cleanup when custom resource is deleted
- Clear resource ownership hierarchy
- Prevents orphaned resources
Advanced Controller Patterns
1. Multi-Resource Coordination
Complex applications often require coordinating multiple Kubernetes resources:
func (r *WebAppReconciler) reconcileDatabase(ctx context.Context, webapp *webappv1.WebApp) error {
// Create database secret
secret := r.buildDatabaseSecret(webapp)
err := r.reconcileResource(ctx, secret)
if err != nil {
return err
}
// Create database deployment
deployment := r.buildDatabaseDeployment(webapp)
err = r.reconcileResource(ctx, deployment)
if err != nil {
return err
}
// Create database service
service := r.buildDatabaseService(webapp)
return r.reconcileResource(ctx, service)
}
2. Condition-Based State Management
func (r *WebAppReconciler) updateStatus(ctx context.Context, webapp *webappv1.WebApp) error {
// Check deployment readiness
deployment := &appsv1.Deployment{}
err := r.Get(ctx, types.NamespacedName{Name: webapp.Name, Namespace: webapp.Namespace}, deployment)
if err != nil {
// Deployment not found - update condition
r.setCondition(webapp, "Available", metav1.ConditionFalse, "DeploymentNotFound", "Deployment does not exist")
} else if deployment.Status.ReadyReplicas == *deployment.Spec.Replicas {
// Deployment ready
r.setCondition(webapp, "Available", metav1.ConditionTrue, "MinimumReplicasAvailable", "All replicas are ready")
webapp.Status.Ready = true
} else {
// Deployment not ready
r.setCondition(webapp, "Available", metav1.ConditionFalse, "InsufficientReplicas", "Not all replicas are ready")
webapp.Status.Ready = false
}
return r.Status().Update(ctx, webapp)
}
Error Handling and Reliability
1. Retry Strategy
func (r *WebAppReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
// ... reconciliation logic ...
if err != nil {
// Classify error type
if isRetriableError(err) {
// Exponential backoff for retriable errors
return ctrl.Result{RequeueAfter: calculateBackoff(req)}, nil
} else {
// Log permanent errors but don't retry
r.Log.Error(err, "Permanent error during reconciliation")
return ctrl.Result{}, nil
}
}
return ctrl.Result{RequeueAfter: time.Minute * 5}, nil
}
2. Event Recording
// Record events for user visibility
r.Recorder.Event(webapp, "Normal", "Created", "Successfully created deployment")
r.Recorder.Event(webapp, "Warning", "Failed", "Failed to create service")
Testing Strategy
1. Unit Testing Controller Logic
func TestWebAppReconciler_Reconcile(t *testing.T) {
// Setup test environment
scheme := runtime.NewScheme()
_ = webappv1.AddToScheme(scheme)
_ = appsv1.AddToScheme(scheme)
client := fake.NewClientBuilder().WithScheme(scheme).Build()
reconciler := &WebAppReconciler{
Client: client,
Scheme: scheme,
}
// Create test custom resource
webapp := &webappv1.WebApp{
ObjectMeta: metav1.ObjectMeta{
Name: "test-webapp",
Namespace: "default",
},
Spec: webappv1.WebAppSpec{
Replicas: 3,
Image: "nginx:latest",
},
}
// Test reconciliation
_, err := reconciler.Reconcile(context.TODO(), ctrl.Request{
NamespacedName: types.NamespacedName{
Name: "test-webapp",
Namespace: "default",
},
})
assert.NoError(t, err)
// Verify expected resources were created
deployment := &appsv1.Deployment{}
err = client.Get(context.TODO(), types.NamespacedName{Name: "test-webapp", Namespace: "default"}, deployment)
assert.NoError(t, err)
assert.Equal(t, int32(3), *deployment.Spec.Replicas)
}
2. Integration Testing
Test operators in real Kubernetes environments using frameworks like:
- Ginkgo/Gomega: BDD-style testing framework
- envtest: Lightweight Kubernetes API server for testing
- Kind/minikube: Full cluster testing environments
Operational Considerations
1. Metrics and Monitoring
Implement controller-specific metrics:
- Reconciliation duration and frequency
- Error rates and types
- Custom resource creation/update/deletion rates
- Resource drift detection
2. Security and RBAC
Define minimal required permissions:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: webapp-operator
rules:
- apiGroups: ["example.com"]
resources: ["webapps"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["services", "secrets"]
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
Security principles:
- Grant only necessary permissions
- Use namespace-scoped roles when possible
- Regular security audits and permission reviews