"Free Will and Determinism in Seventeenth-Century Philosophy," The Routledge Companion to Seventeenth-Century Philosophy, ed. Dan Kaufman (Routledge, 2017), 117-142 With Matthew Shea, "God, Evil and ...
Achieving bit-for-bit determinism across different GPU architectures is EXTREMELY hard, if not completely impossible. In my experience, training a model on an a100 vs v100 for example with the same hyperparameters, seeds, etc... can and more often than not will yield different results.
According to the accepted answer to floating point processor non-determinism?, C++ floating point is not non-deterministic. The same sequence of instructions will give the same results.
The specified version string contains wildcards, which are not compatible with determinism. Either remove wildcards from the version string, or disable determinism for this compilation
29 TL;DR Non-determinism for a priori deterministic operations come from concurrent (multi-threaded) implementations. Despite constant progress on that front, TensorFlow does not currently guarantee determinism for all of its operations. After a quick search on the internet, it seems that the situation is similar to the other major toolkits.
Can floats that the tensors are made of be fully deterministic? :) For "full determinism" there would have to be a KV cache on their side. It might be possible to implement it yourself (depending on your scenario) to achieve what you are looking for.
Non-determinism can also be caused by accidentally using different FP rounding modes, though if I understood correctly this is mostly a solved issue. I've also gotten the impression that SSE (2) instructions do not suffer from the truncation issue, as they perform all floating point arithmetic in 32- or 64-bit without a higher precision register.