A marginally related point but I do not know if others faced the following situation: I worked in a place with a CI pipeline room ~25 minutes with the unit/integration tests (3000+) taking 18 minutes.
When something happens in production we ended up placing more tests; and of course when things goes south at least 50 minutes were necessary to recover.
After a lot of consideration we decided to focus on the recovery and relax and simply some tests and focus on recovery (i.e. have the full thing in less than 5 minutes) combined with a canary as deployment strategy (instead rolling updates).
At least for us was a so refreshing experience but sounded wrong in some ways.
I’ve often said that it is the speed of deployment that matters. If it takes you 50 minutes to deploy, it takes you 50 minutes to fix a problem. If it takes you 50 seconds to deploy, it takes you 50 seconds to fix a problem.
Of course all kinds of things are rolled up in that speed to deploy, but almost all of them are good.
When something happens in production we ended up placing more tests; and of course when things goes south at least 50 minutes were necessary to recover.
After a lot of consideration we decided to focus on the recovery and relax and simply some tests and focus on recovery (i.e. have the full thing in less than 5 minutes) combined with a canary as deployment strategy (instead rolling updates).
At least for us was a so refreshing experience but sounded wrong in some ways.