Command and Control

Learn about instances in containers, sending control signals to running instances, and sending commands to admin API over HTTP.

Containers and instances

Live control is only necessary if it takes our instances a long time to be ready to run. As a thought experiment, imagine that any configuration change took ten milliseconds to roll out and that each instance could be restarted in another hundred milliseconds. In that world, live control would be more trouble than it was worth. Whenever an instance needed to be modified, it would be simpler to just kill the instance and let the scheduler start a new one. If our instances run in containers and get their configuration from a configuration service, then that is exactly the world we live in. Containers start very quickly. New configuration would be used immediately. Sadly, not every service is made of instances that start up so quickly. Anything based on Oracle’s JVM (or OpenJDK for that matter) needs a warm-up period before the `JIT really kicks in and makes it fast. Many services need to hold a lot of data in cache before they perform well enough. That also adds to the startup time. If the underlying infrastructure uses virtual machines instead of containers, then it can take several minutes to restart.

Controls to offer

In those cases, we need to look at ways to send control signals to running instances. Here is a brief checklist of controls to plan for:

  • Reset circuit breakers.
  • Adjust connection pool sizes and timeouts.
  • Disable specific outbound integrations.
  • Reload configuration.
  • Start or stop accepting load.
  • Feature toggles.

Not every service will need all of these controls. They should give you a place to start, though.

Many services also expose controls to update the database schema, or even to delete all data and reseed it. These are presumably helpful in test environments but extremely hazardous in production. These controls result from a breakdown in roles. Developers don’t trust operations to deploy the software and run the scripts correctly, and the operations team doesn’t allow developers to log in to the production machines to update the schemata. That breakdown is itself a problem to fix. Don’t build a self-destruct button into the production code!

Another common control is the “flush cache” button. This is also quite hazardous. It may not be a self-destruct button, but it’s the button that vents all your atmosphere into space. An instance that flushes a cache will have really bad performance for the next several minutes.

Get hands-on with 1400+ tech skills courses.