Misplaced Pages

TensorFloat-32

Article snapshot taken from[REDACTED] with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Numbering format in Nvidia hardware
Floating-point formats
IEEE 754
Other
Alternatives
Tapered floating point

TensorFloat-32 or TF32 is a numeric floating point format designed for Tensor Core running on certain Nvidia GPUs.

Format

The binary format is:

  • 1 sign bit
  • 8 exponent bits
  • 10 fraction bits (also called mantissa, or precision bits)

The total 19 bits fits within a double word (32 bits), and while it lacks precision compared with a normal 32 bit IEEE 754 floating point number, provides much faster computation, up to 8 times on a A100 (compared to a V100 using FP32).

See also

References

  1. https://deeprec.readthedocs.io/en/latest/NVIDIA-TF32.html accessed 23 May 2024

External links

Stub icon

This computer-engineering-related article is a stub. You can help Misplaced Pages by expanding it.

Categories:
TensorFloat-32 Add topic