Autoscaling¶
Horizontal Pod Autoscaler (HPA) configuration for eoAPI services. Autoscaling requires monitoring components to be enabled in the main chart.
Prerequisites¶
Enable monitoring in your main eoapi installation:
monitoring:
prometheus:
enabled: true
prometheusAdapter:
enabled: true # Required for request-rate scaling
metricsServer:
enabled: true # Required for CPU scaling
Configuration¶
Basic Autoscaling¶
The following instructions assume you've gone through the AWS or GCP cluster set up
and installed the eoapi chart.
stac:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
type: "requestRate" # Options: "cpu", "requestRate", "both"
targets:
requestRate: 50000m # 50 requests/second
Concurrency settings¶
Each main eoAPI service has WEB_CONCURRENCY and database pool settings that should be adjusted based on your scaling strategy:
Without autoscaling (default)¶
Higher concurrency per pod to handle some considerate load:
stac:
settings:
envVars:
WEB_CONCURRENCY: "10" # More workers per pod
DB_MIN_CONN_SIZE: "1"
DB_MAX_CONN_SIZE: "5" # Total: 10-50 connections per pod
With autoscaling enabled¶
Lower concurrency for predictable resource usage:
stac:
autoscaling:
enabled: true
settings:
envVars:
WEB_CONCURRENCY: "4" # Fewer workers, let HPA scale pods
DB_MIN_CONN_SIZE: "1"
DB_MAX_CONN_SIZE: "3" # Total: 4-12 connections per pod
Service-specific recommentations¶
| Service | WEB_CONCURRENCY (no autoscaling) | WEB_CONCURRENCY (with autoscaling) | Rationale |
|---|---|---|---|
| STAC | 10 | 4-6 | High request volume, DB intensive |
| Raster | 4 | 2-3 | CPU intensive image operations |
| Vector | 8 | 4-5 | Complex spatial queries |
Scaling Policies¶
- Go to the releases section of this repository and find the latest
eoapi-support-<version>version to install, or use the following command to get the latest version:
# Get latest eoapi-support chart version
export SUPPORT_VERSION=$(helm search repo eoapi/eoapi-support --versions | head -2 | tail -1 | awk '{print $2}')
stac:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
type: "both"
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5min cooldown
policies:
- type: Percent
value: 50 # Max 50% pods removed per period
periodSeconds: 300
scaleUp:
stabilizationWindowSeconds: 60 # 1min cooldown
policies:
- type: Percent
value: 100 # Max 100% pods added per period
periodSeconds: 60
targets:
cpu: 70
requestRate: 50000m
Metrics Types¶
CPU-based Scaling¶
type: "cpu"
targets:
cpu: 70
Request Rate Scaling¶
type: "requestRate"
targets:
requestRate: 50000m # 50 requests/second
Combined Scaling¶
type: "both"
targets:
cpu: 70
requestRate: 100000m # 100 requests/second
Custom Metrics Configuration¶
When using request rate scaling, the prometheus-adapter needs to be configured to expose custom metrics. This is handled automatically when you enable monitoring in the main chart:
# In your main eoapi values file
ingress:
host: your-domain.com
monitoring:
prometheusAdapter:
enabled: true
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
Service-Specific Examples¶
STAC (High throughput)¶
stac:
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
type: "requestRate"
targets:
requestRate: 40000m
Raster (Resource intensive)¶
raster:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 8
type: "cpu"
behavior:
scaleDown:
stabilizationWindowSeconds: 300
targets:
cpu: 75
Vector (Balanced)¶
vector:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 12
type: "both"
targets:
cpu: 70
requestRate: 75000m
STAC Auth Proxy¶
When STAC Auth Proxy is enabled, ingress routes STAC traffic through the proxy. Under load, the proxy can become the bottleneck while stac CPU utilization stays low—enable proxy autoscaling in addition to (or instead of) relying on STAC HPA alone.
Autoscaling is provided by the stac-auth-proxy subchart. Configure it under stac-auth-proxy.autoscaling (CPU only; request-rate/both types apply to main eoAPI services with nginx ingress metrics).
stac-auth-proxy:
enabled: true
resources:
requests:
cpu: 500m
limits:
cpu: 2000m
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilizationPercentage: 75
When autoscaling.enabled is true, replicaCount is ignored and the HPA manages replica count. Requires metrics-server (or your cluster's equivalent) for CPU metrics.
The HPA resource name is {{ .Release.Name }}-stac-auth-proxy (subchart fullname). Check status with:
kubectl get hpa -n <namespace> | grep stac-auth-proxy
Configuration Examples¶
For complete configuration examples, see the production profile.
Resource Requirements¶
Autoscaling Components¶
- metrics-server: ~100m CPU, ~300Mi memory per node
- prometheus-adapter: ~250m CPU, ~256Mi memory
- prometheus-server: ~500m CPU, ~512Mi memory (varies with retention)
Verification¶
Check HPA Status¶
# Check HPA status for all services
kubectl get hpa -n eoapi
# Get detailed HPA information
kubectl describe hpa eoapi-stac -n eoapi
Verify Custom Metrics API¶
# Check if custom metrics API is available
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
# Check specific request rate metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/eoapi/ingresses/*/requests_per_second" | jq .
Check Prometheus Adapter¶
# Check prometheus-adapter logs
kubectl logs -l app.kubernetes.io/name=prometheus-adapter -n eoapi
Load Testing¶
For load testing your autoscaling setup:
ingress:
host: your-test-domain.com
- Check ingress configuration:
kubectl get ingress -n eoapi
Troubleshooting¶
HPA Shows "Unknown" Metrics¶
If HPA shows "unknown" for custom metrics:
-
Verify prometheus-adapter is running:
kubectl get pods -l app.kubernetes.io/name=prometheus-adapter -n eoapi -
Check prometheus-adapter logs:
kubectl logs -l app.kubernetes.io/name=prometheus-adapter -n eoapi -
Verify metrics are available in Prometheus:
# Port forward to access Prometheus kubectl port-forward service/eoapi-prometheus-server 9090:80 -n eoapi # Then check metrics at http://localhost:9090
Default Configuration¶
Default autoscaling configuration:
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 5
# Type can be "cpu", "requestRate", or "both"
type: "cpu"
# Custom scaling behavior (optional)
behavior: {}
# Scaling targets
targets:
# CPU target percentage (when type is "cpu" or "both")
cpu: 80
# Request rate target in millirequests per second (when type is "requestRate" or "both")
requestRate: 30000m
No Scaling Activity¶
If pods aren't scaling:
-
Check HPA events:
kubectl describe hpa eoapi-stac -n eoapi -
Verify metrics are being collected:
kubectl top pods -n eoapi -
Check resource requests are set:
kubectl describe pod eoapi-stac-xxx -n eoapi | grep -A 10 "Requests"
Install or Upgrade Autoscaling Changes to eoapi Chart¶
When enabling autoscaling, ensure monitoring is also enabled:
# Enable monitoring first
monitoring:
prometheus:
enabled: true
prometheusAdapter:
enabled: true
# Then enable autoscaling
stac:
autoscaling:
enabled: true
type: "requestRate"
targets:
requestRate: 50000m
# Configure resources for proper scaling metrics
stac:
settings:
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
Custom Metrics Not Working¶
If request rate metrics aren't working:
- Verify nginx ingress controller has metrics enabled
- Check prometheus is scraping ingress metrics
- Confirm prometheus-adapter configuration
- Validate ingress annotations for metrics
Scaling Too Aggressive/Slow¶
Adjust scaling behavior:
autoscaling:
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # Faster scaling up
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Slower scaling down
policies:
- type: Percent
value: 25 # More conservative scale down
periodSeconds: 300
Best Practices¶
- Set appropriate resource requests: HPA needs resource requests to calculate CPU utilization
- Use stabilization windows: Prevent thrashing with appropriate cooldown periods
- Monitor costs: Autoscaling can increase costs rapidly
- Test thoroughly: Validate scaling behavior under realistic load
- Set reasonable limits: Use
maxReplicasto prevent runaway scaling - Use multiple metrics: Combine CPU and request rate for better scaling decisions
Example ingress configuration for load testing:
# For AWS ALB
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: eoapi-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
spec:
ingressClassName: nginx
rules:
- host: your-domain.com
http:
paths: [...]
# For nginx ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: eoapi-ingress
spec:
ingressClassName: nginx
rules:
- host: abc5929f88f8c45c38f6cbab2faad43c-776419634.us-west-2.elb.amazonaws.com
http:
paths: [...]
Load Testing¶
Load Testing with hey¶
The hey tool is a simple HTTP load testing tool.
Install and Run Load Tests¶
-
Install hey:
# macOS brew install hey # Linux go install github.com/rakyll/hey@latest # Or download from releases wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64 chmod +x hey_linux_amd64 sudo mv hey_linux_amd64 /usr/local/bin/hey -
Run basic load test:
# Test STAC endpoint hey -z 5m -c 10 https://your-domain.com/stac/collections # Test with higher concurrency hey -z 10m -c 50 https://your-domain.com/stac/search -
Monitor during load test:
# Watch HPA scaling watch kubectl get hpa -n eoapi # Monitor pods watch kubectl get pods -n eoapi
Load Testing Best Practices¶
- Start small: Begin with low concurrency and short duration
- Monitor resources: Watch CPU, memory, and network usage
- Test realistic scenarios: Use actual API endpoints and payloads
- Gradual increase: Slowly increase load to find breaking points
- Test different endpoints: Each service may have different characteristics
Troubleshooting Load Tests¶
- High response times: May indicate need for more replicas or resources
- Error rates: Could suggest database bottlenecks or resource limits
- No scaling: Check HPA metrics and thresholds
Advanced Load Testing¶
For more comprehensive testing, consider: - Artillery - Feature-rich load testing toolkit - k6 - Developer-centric performance testing - Locust - Python-based distributed load testing
For monitoring and observability setup, see observability.md.