I understand the idea of treating them like pixels, so if a fan dies or a NIC ca...

lallysingh · on Dec 10, 2012

The machines are for testing. They'll detect those through secondary means. If a machine's faulty, it'll cause two cases: (1) faulty software will register as faulty; (2) good software will register as faulty. The third case (faulty software marked as good), is really unlikely, and any time it does happen, a later bug report will give a hint.

A test failure will probably bring up an engineer that will track down the issue, and a re-test will inevitably occur. The faulty machine will eventually (hopefully) get labeled flaky and will get repaired.

Of course, nobody may care and just use a double-test to verify that an executable is good.

rdtsc · on Dec 10, 2012

> A test failure will probably bring up an engineer that will track down the issue, and a re-test will inevitably occur. The faulty machine will eventually (hopefully) get labeled flaky and will get repaired.

Depending on how valuable the engineers' time is. I have seen this played out like this: hardware gets blamed last after hours and days of testing have been wasted. So tests are run and re-run, blame goes all around until finally after hours and hours of testing it is determined that maybe it is hardware after all.

In the end an engineers' time is worth a lot more than savings obtained by running flaky but cheaper hardware.