Autoscaling¶
Horizontal Pod Autoscaler (HPA) configuration for eoAPI services. Autoscaling requires monitoring components to be enabled in the main chart.
Prerequisites¶
Enable monitoring in your main eoapi installation:
monitoring:
prometheus:
enabled: true
prometheusAdapter:
enabled: true # Required for request-rate scaling
metricsServer:
enabled: true # Required for CPU scaling
Configuration¶
Basic Autoscaling¶
The following instructions assume you've gone through the AWS or GCP cluster set up
and installed the eoapi chart.
stac:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
type: "requestRate" # Options: "cpu", "requestRate", "both"
targets:
requestRate: 50000m # 50 requests/second
Scaling Policies¶
- Go to the releases section of this repository and find the latest
eoapi-support-<version>version to install, or use the following command to get the latest version:
# Get latest eoapi-support chart version
export SUPPORT_VERSION=$(helm search repo eoapi/eoapi-support --versions | head -2 | tail -1 | awk '{print $2}')
stac:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 20
type: "both"
behaviour:
scaleDown:
stabilizationWindowSeconds: 300 # 5min cooldown
policies:
- type: Percent
value: 50 # Max 50% pods removed per period
periodSeconds: 300
scaleUp:
stabilizationWindowSeconds: 60 # 1min cooldown
policies:
- type: Percent
value: 100 # Max 100% pods added per period
periodSeconds: 60
targets:
cpu: 70
requestRate: 50000m
Metrics Types¶
CPU-based Scaling¶
type: "cpu"
targets:
cpu: 70
Request Rate Scaling¶
type: "requestRate"
targets:
requestRate: 50000m # 50 requests/second
Combined Scaling¶
type: "both"
targets:
cpu: 70
requestRate: 100000m # 100 requests/second
Custom Metrics Configuration¶
When using request rate scaling, the prometheus-adapter needs to be configured to expose custom metrics. This is handled automatically when you enable monitoring in the main chart:
# In your main eoapi values file
ingress:
host: your-domain.com
monitoring:
prometheusAdapter:
enabled: true
resources:
limits:
cpu: 250m
memory: 256Mi
requests:
cpu: 100m
memory: 128Mi
Service-Specific Examples¶
STAC (High throughput)¶
stac:
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
type: "requestRate"
targets:
requestRate: 40000m
Raster (Resource intensive)¶
raster:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 8
type: "cpu"
behaviour:
scaleDown:
stabilizationWindowSeconds: 300
targets:
cpu: 75
Vector (Balanced)¶
vector:
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 12
type: "both"
targets:
cpu: 70
requestRate: 75000m
Resource Requirements¶
Autoscaling Components¶
- metrics-server: ~100m CPU, ~300Mi memory per node
- prometheus-adapter: ~250m CPU, ~256Mi memory
- prometheus-server: ~500m CPU, ~512Mi memory (varies with retention)
Verification¶
Check HPA Status¶
# Check HPA status for all services
kubectl get hpa -n eoapi
# Get detailed HPA information
kubectl describe hpa eoapi-stac -n eoapi
Verify Custom Metrics API¶
# Check if custom metrics API is available
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
# Check specific request rate metrics
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/eoapi/ingresses/*/requests_per_second" | jq .
Check Prometheus Adapter¶
# Check prometheus-adapter logs
kubectl logs -l app.kubernetes.io/name=prometheus-adapter -n eoapi
Load Testing¶
For load testing your autoscaling setup:
ingress:
host: your-test-domain.com
- Check ingress configuration:
kubectl get ingress -n eoapi
Troubleshooting¶
HPA Shows "Unknown" Metrics¶
If HPA shows "unknown" for custom metrics:
-
Verify prometheus-adapter is running:
kubectl get pods -l app.kubernetes.io/name=prometheus-adapter -n eoapi -
Check prometheus-adapter logs:
kubectl logs -l app.kubernetes.io/name=prometheus-adapter -n eoapi -
Verify metrics are available in Prometheus:
# Port forward to access Prometheus kubectl port-forward service/eoapi-prometheus-server 9090:80 -n eoapi # Then check metrics at http://localhost:9090
Default Configuration¶
Default autoscaling configuration:
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 5
# Type can be "cpu", "requestRate", or "both"
type: "cpu"
# Custom scaling behavior (optional)
behaviour: {}
# Scaling targets
targets:
# CPU target percentage (when type is "cpu" or "both")
cpu: 80
# Request rate target in millirequests per second (when type is "requestRate" or "both")
requestRate: 30000m
No Scaling Activity¶
If pods aren't scaling:
-
Check HPA events:
kubectl describe hpa eoapi-stac -n eoapi -
Verify metrics are being collected:
kubectl top pods -n eoapi -
Check resource requests are set:
kubectl describe pod eoapi-stac-xxx -n eoapi | grep -A 10 "Requests"
Install or Upgrade Autoscaling Changes to eoapi Chart¶
When enabling autoscaling, ensure monitoring is also enabled:
# Enable monitoring first
monitoring:
prometheus:
enabled: true
prometheusAdapter:
enabled: true
# Then enable autoscaling
stac:
autoscaling:
enabled: true
type: "requestRate"
targets:
requestRate: 50000m
# Configure resources for proper scaling metrics
stac:
settings:
resources:
limits:
cpu: 1000m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
Custom Metrics Not Working¶
If request rate metrics aren't working:
- Verify nginx ingress controller has metrics enabled
- Check prometheus is scraping ingress metrics
- Confirm prometheus-adapter configuration
- Validate ingress annotations for metrics
Scaling Too Aggressive/Slow¶
Adjust scaling behavior:
autoscaling:
behaviour:
scaleUp:
stabilizationWindowSeconds: 60 # Faster scaling up
policies:
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300 # Slower scaling down
policies:
- type: Percent
value: 25 # More conservative scale down
periodSeconds: 300
Best Practices¶
- Set appropriate resource requests: HPA needs resource requests to calculate CPU utilization
- Use stabilization windows: Prevent thrashing with appropriate cooldown periods
- Monitor costs: Autoscaling can increase costs rapidly
- Test thoroughly: Validate scaling behavior under realistic load
- Set reasonable limits: Use
maxReplicasto prevent runaway scaling - Use multiple metrics: Combine CPU and request rate for better scaling decisions
Example ingress configuration for load testing:
# For AWS ALB
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: eoapi-ingress
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
spec:
ingressClassName: nginx
rules:
- host: your-domain.com
http:
paths: [...]
# For nginx ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: eoapi-ingress
spec:
ingressClassName: nginx
rules:
- host: abc5929f88f8c45c38f6cbab2faad43c-776419634.us-west-2.elb.amazonaws.com
http:
paths: [...]
Load Testing¶
Load Testing with hey¶
The hey tool is a simple HTTP load testing tool.
Install and Run Load Tests¶
-
Install hey:
# macOS brew install hey # Linux go install github.com/rakyll/hey@latest # Or download from releases wget https://hey-release.s3.us-east-2.amazonaws.com/hey_linux_amd64 chmod +x hey_linux_amd64 sudo mv hey_linux_amd64 /usr/local/bin/hey -
Run basic load test:
# Test STAC endpoint hey -z 5m -c 10 https://your-domain.com/stac/collections # Test with higher concurrency hey -z 10m -c 50 https://your-domain.com/stac/search -
Monitor during load test:
# Watch HPA scaling watch kubectl get hpa -n eoapi # Monitor pods watch kubectl get pods -n eoapi
Load Testing Best Practices¶
- Start small: Begin with low concurrency and short duration
- Monitor resources: Watch CPU, memory, and network usage
- Test realistic scenarios: Use actual API endpoints and payloads
- Gradual increase: Slowly increase load to find breaking points
- Test different endpoints: Each service may have different characteristics
Troubleshooting Load Tests¶
- High response times: May indicate need for more replicas or resources
- Error rates: Could suggest database bottlenecks or resource limits
- No scaling: Check HPA metrics and thresholds
Advanced Load Testing¶
For more comprehensive testing, consider: - Artillery - Feature-rich load testing toolkit - k6 - Developer-centric performance testing - Locust - Python-based distributed load testing
For monitoring and observability setup, see observability.md.