Scaling operations across tenants in the cloud

Currently, when using the tenant-per-namespace deployment model, operational management procedures are difficult to scale to many tenants, because typical actions like patching, upgrading, stopping, starting, etc. must be initiated as pipeline jobs, once per tenant, and watched for successful execution per job. This is labor intensive, error-prone (having to re-input the same input parameters per pipeline job), and tedious to manage. Therefore, scaling is different in its current form.

To enable this model to scale, tooling is required to enable a single specification of intent to serve as input into an automated workflow that performs the required action across every applicable namespace (tenant). The intended action may be as simple as `kubectl patch` or it may be a very complex job (upgrade all resources). The workflow would coordinate the parallel execution of these actions against their respective namespaces (identified either by label or a list of names), possibly throttling for limited concurrency to avoid resource contention, and reporting output for status monitoring and troubleshooting. This would reduce the operational cost and complexity of deploying patches and upgrades from O(n) to approximately O(1) for n tenants.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.