#[repr(transparent)]pub struct f16(_);
Expand description
A 16-bit floating point type implementing the IEEE 754-2008 standard binary16
a.k.a half
format.
This 16-bit floating point type is intended for efficient storage where the full range and
precision of a larger floating point value is not required. Because f16
is primarily for
efficient storage, floating point operations such as addition, multiplication, etc. are not
implemented. Operations should be performed with f32
or higher-precision types and converted
to/from f16
as necessary.
Implementations§
source§impl f16
impl f16
sourcepub const fn from_bits(bits: u16) -> f16
pub const fn from_bits(bits: u16) -> f16
Constructs a 16-bit floating point value from the raw bits.
sourcepub fn from_f32(value: f32) -> f16
pub fn from_f32(value: f32) -> f16
Constructs a 16-bit floating point value from a 32-bit floating point value.
If the 32-bit value is to large to fit in 16-bits, ±∞ will result. NaN values are preserved. 32-bit subnormal values are too tiny to be represented in 16-bits and result in ±0. Exponents that underflow the minimum 16-bit exponent will result in 16-bit subnormals or ±0. All other values are truncated and rounded to the nearest representable 16-bit value.
sourcepub const fn from_f32_const(value: f32) -> f16
pub const fn from_f32_const(value: f32) -> f16
Constructs a 16-bit floating point value from a 32-bit floating point value.
This function is identical to from_f32
except it never uses hardware
intrinsics, which allows it to be const
. from_f32
should be preferred
in any non-const
context.
If the 32-bit value is to large to fit in 16-bits, ±∞ will result. NaN values are preserved. 32-bit subnormal values are too tiny to be represented in 16-bits and result in ±0. Exponents that underflow the minimum 16-bit exponent will result in 16-bit subnormals or ±0. All other values are truncated and rounded to the nearest representable 16-bit value.
sourcepub fn from_f64(value: f64) -> f16
pub fn from_f64(value: f64) -> f16
Constructs a 16-bit floating point value from a 64-bit floating point value.
If the 64-bit value is to large to fit in 16-bits, ±∞ will result. NaN values are preserved. 64-bit subnormal values are too tiny to be represented in 16-bits and result in ±0. Exponents that underflow the minimum 16-bit exponent will result in 16-bit subnormals or ±0. All other values are truncated and rounded to the nearest representable 16-bit value.
sourcepub const fn from_f64_const(value: f64) -> f16
pub const fn from_f64_const(value: f64) -> f16
Constructs a 16-bit floating point value from a 64-bit floating point value.
This function is identical to from_f64
except it never uses hardware
intrinsics, which allows it to be const
. from_f64
should be preferred
in any non-const
context.
If the 64-bit value is to large to fit in 16-bits, ±∞ will result. NaN values are preserved. 64-bit subnormal values are too tiny to be represented in 16-bits and result in ±0. Exponents that underflow the minimum 16-bit exponent will result in 16-bit subnormals or ±0. All other values are truncated and rounded to the nearest representable 16-bit value.
sourcepub const fn to_le_bytes(self) -> [u8; 2]
pub const fn to_le_bytes(self) -> [u8; 2]
Returns the memory representation of the underlying bit representation as a byte array in little-endian byte order.
Examples
let bytes = f16::from_f32(12.5).to_le_bytes();
assert_eq!(bytes, [0x40, 0x4A]);
sourcepub const fn to_be_bytes(self) -> [u8; 2]
pub const fn to_be_bytes(self) -> [u8; 2]
Returns the memory representation of the underlying bit representation as a byte array in big-endian (network) byte order.
Examples
let bytes = f16::from_f32(12.5).to_be_bytes();
assert_eq!(bytes, [0x4A, 0x40]);
sourcepub const fn to_ne_bytes(self) -> [u8; 2]
pub const fn to_ne_bytes(self) -> [u8; 2]
Returns the memory representation of the underlying bit representation as a byte array in native byte order.
As the target platform’s native endianness is used, portable code should use
to_be_bytes
or to_le_bytes
, as appropriate,
instead.
Examples
let bytes = f16::from_f32(12.5).to_ne_bytes();
assert_eq!(bytes, if cfg!(target_endian = "big") {
[0x4A, 0x40]
} else {
[0x40, 0x4A]
});
sourcepub const fn from_le_bytes(bytes: [u8; 2]) -> f16
pub const fn from_le_bytes(bytes: [u8; 2]) -> f16
Creates a floating point value from its representation as a byte array in little endian.
Examples
let value = f16::from_le_bytes([0x40, 0x4A]);
assert_eq!(value, f16::from_f32(12.5));
sourcepub const fn from_be_bytes(bytes: [u8; 2]) -> f16
pub const fn from_be_bytes(bytes: [u8; 2]) -> f16
Creates a floating point value from its representation as a byte array in big endian.
Examples
let value = f16::from_be_bytes([0x4A, 0x40]);
assert_eq!(value, f16::from_f32(12.5));
sourcepub const fn from_ne_bytes(bytes: [u8; 2]) -> f16
pub const fn from_ne_bytes(bytes: [u8; 2]) -> f16
Creates a floating point value from its representation as a byte array in native endian.
As the target platform’s native endianness is used, portable code likely wants to use
from_be_bytes
or from_le_bytes
, as
appropriate instead.
Examples
let value = f16::from_ne_bytes(if cfg!(target_endian = "big") {
[0x4A, 0x40]
} else {
[0x40, 0x4A]
});
assert_eq!(value, f16::from_f32(12.5));
sourcepub fn to_f32(self) -> f32
pub fn to_f32(self) -> f32
Converts a f16
value into a f32
value.
This conversion is lossless as all 16-bit floating point values can be represented exactly in 32-bit floating point.
sourcepub const fn to_f32_const(self) -> f32
pub const fn to_f32_const(self) -> f32
Converts a f16
value into a f32
value.
This function is identical to to_f32
except it never uses hardware
intrinsics, which allows it to be const
. to_f32
should be preferred
in any non-const
context.
This conversion is lossless as all 16-bit floating point values can be represented exactly in 32-bit floating point.
sourcepub fn to_f64(self) -> f64
pub fn to_f64(self) -> f64
Converts a f16
value into a f64
value.
This conversion is lossless as all 16-bit floating point values can be represented exactly in 64-bit floating point.
sourcepub const fn to_f64_const(self) -> f64
pub const fn to_f64_const(self) -> f64
Converts a f16
value into a f64
value.
This function is identical to to_f64
except it never uses hardware
intrinsics, which allows it to be const
. to_f64
should be preferred
in any non-const
context.
This conversion is lossless as all 16-bit floating point values can be represented exactly in 64-bit floating point.
sourcepub const fn is_nan(self) -> bool
pub const fn is_nan(self) -> bool
Returns true
if this value is NaN
and false
otherwise.
Examples
let nan = f16::NAN;
let f = f16::from_f32(7.0_f32);
assert!(nan.is_nan());
assert!(!f.is_nan());
sourcepub const fn is_infinite(self) -> bool
pub const fn is_infinite(self) -> bool
Returns true
if this value is ±∞ and false
.
otherwise.
Examples
let f = f16::from_f32(7.0f32);
let inf = f16::INFINITY;
let neg_inf = f16::NEG_INFINITY;
let nan = f16::NAN;
assert!(!f.is_infinite());
assert!(!nan.is_infinite());
assert!(inf.is_infinite());
assert!(neg_inf.is_infinite());
sourcepub const fn is_finite(self) -> bool
pub const fn is_finite(self) -> bool
Returns true
if this number is neither infinite nor NaN
.
Examples
let f = f16::from_f32(7.0f32);
let inf = f16::INFINITY;
let neg_inf = f16::NEG_INFINITY;
let nan = f16::NAN;
assert!(f.is_finite());
assert!(!nan.is_finite());
assert!(!inf.is_finite());
assert!(!neg_inf.is_finite());
sourcepub const fn is_normal(self) -> bool
pub const fn is_normal(self) -> bool
Returns true
if the number is neither zero, infinite, subnormal, or NaN
.
Examples
let min = f16::MIN_POSITIVE;
let max = f16::MAX;
let lower_than_min = f16::from_f32(1.0e-10_f32);
let zero = f16::from_f32(0.0_f32);
assert!(min.is_normal());
assert!(max.is_normal());
assert!(!zero.is_normal());
assert!(!f16::NAN.is_normal());
assert!(!f16::INFINITY.is_normal());
// Values between `0` and `min` are Subnormal.
assert!(!lower_than_min.is_normal());
sourcepub const fn classify(self) -> FpCategory
pub const fn classify(self) -> FpCategory
Returns the floating point category of the number.
If only one property is going to be tested, it is generally faster to use the specific predicate instead.
Examples
use std::num::FpCategory;
let num = f16::from_f32(12.4_f32);
let inf = f16::INFINITY;
assert_eq!(num.classify(), FpCategory::Normal);
assert_eq!(inf.classify(), FpCategory::Infinite);
sourcepub const fn signum(self) -> f16
pub const fn signum(self) -> f16
Returns a number that represents the sign of self
.
1.0
if the number is positive,+0.0
orINFINITY
-1.0
if the number is negative,-0.0
orNEG_INFINITY
NAN
if the number isNaN
Examples
let f = f16::from_f32(3.5_f32);
assert_eq!(f.signum(), f16::from_f32(1.0));
assert_eq!(f16::NEG_INFINITY.signum(), f16::from_f32(-1.0));
assert!(f16::NAN.signum().is_nan());
sourcepub const fn is_sign_positive(self) -> bool
pub const fn is_sign_positive(self) -> bool
Returns true
if and only if self
has a positive sign, including +0.0
, NaNs
with a
positive sign bit and +∞.
Examples
let nan = f16::NAN;
let f = f16::from_f32(7.0_f32);
let g = f16::from_f32(-7.0_f32);
assert!(f.is_sign_positive());
assert!(!g.is_sign_positive());
// `NaN` can be either positive or negative
assert!(nan.is_sign_positive() != nan.is_sign_negative());
sourcepub const fn is_sign_negative(self) -> bool
pub const fn is_sign_negative(self) -> bool
Returns true
if and only if self
has a negative sign, including -0.0
, NaNs
with a
negative sign bit and −∞.
Examples
let nan = f16::NAN;
let f = f16::from_f32(7.0f32);
let g = f16::from_f32(-7.0f32);
assert!(!f.is_sign_negative());
assert!(g.is_sign_negative());
// `NaN` can be either positive or negative
assert!(nan.is_sign_positive() != nan.is_sign_negative());
sourcepub const fn copysign(self, sign: f16) -> f16
pub const fn copysign(self, sign: f16) -> f16
Returns a number composed of the magnitude of self
and the sign of sign
.
Equal to self
if the sign of self
and sign
are the same, otherwise equal to -self
.
If self
is NaN, then NaN with the sign of sign
is returned.
Examples
let f = f16::from_f32(3.5);
assert_eq!(f.copysign(f16::from_f32(0.42)), f16::from_f32(3.5));
assert_eq!(f.copysign(f16::from_f32(-0.42)), f16::from_f32(-3.5));
assert_eq!((-f).copysign(f16::from_f32(0.42)), f16::from_f32(3.5));
assert_eq!((-f).copysign(f16::from_f32(-0.42)), f16::from_f32(-3.5));
assert!(f16::NAN.copysign(f16::from_f32(1.0)).is_nan());
sourcepub fn max(self, other: f16) -> f16
pub fn max(self, other: f16) -> f16
Returns the maximum of the two numbers.
If one of the arguments is NaN, then the other argument is returned.
Examples
let x = f16::from_f32(1.0);
let y = f16::from_f32(2.0);
assert_eq!(x.max(y), y);
sourcepub fn min(self, other: f16) -> f16
pub fn min(self, other: f16) -> f16
Returns the minimum of the two numbers.
If one of the arguments is NaN, then the other argument is returned.
Examples
let x = f16::from_f32(1.0);
let y = f16::from_f32(2.0);
assert_eq!(x.min(y), x);
sourcepub fn clamp(self, min: f16, max: f16) -> f16
pub fn clamp(self, min: f16, max: f16) -> f16
Restrict a value to a certain interval unless it is NaN.
Returns max
if self
is greater than max
, and min
if self
is less than min
.
Otherwise this returns self
.
Note that this function returns NaN if the initial value was NaN as well.
Panics
Panics if min > max
, min
is NaN, or max
is NaN.
Examples
assert!(f16::from_f32(-3.0).clamp(f16::from_f32(-2.0), f16::from_f32(1.0)) == f16::from_f32(-2.0));
assert!(f16::from_f32(0.0).clamp(f16::from_f32(-2.0), f16::from_f32(1.0)) == f16::from_f32(0.0));
assert!(f16::from_f32(2.0).clamp(f16::from_f32(-2.0), f16::from_f32(1.0)) == f16::from_f32(1.0));
assert!(f16::NAN.clamp(f16::from_f32(-2.0), f16::from_f32(1.0)).is_nan());
sourcepub fn total_cmp(&self, other: &Self) -> Ordering
pub fn total_cmp(&self, other: &Self) -> Ordering
Returns the ordering between self
and other
.
Unlike the standard partial comparison between floating point numbers,
this comparison always produces an ordering in accordance to
the totalOrder
predicate as defined in the IEEE 754 (2008 revision)
floating point standard. The values are ordered in the following sequence:
- negative quiet NaN
- negative signaling NaN
- negative infinity
- negative numbers
- negative subnormal numbers
- negative zero
- positive zero
- positive subnormal numbers
- positive numbers
- positive infinity
- positive signaling NaN
- positive quiet NaN.
The ordering established by this function does not always agree with the
PartialOrd
and PartialEq
implementations of f16
. For example,
they consider negative and positive zero equal, while total_cmp
doesn’t.
The interpretation of the signaling NaN bit follows the definition in the IEEE 754 standard, which may not match the interpretation by some of the older, non-conformant (e.g. MIPS) hardware implementations.
Examples
let mut v: Vec<f16> = vec![];
v.push(f16::ONE);
v.push(f16::INFINITY);
v.push(f16::NEG_INFINITY);
v.push(f16::NAN);
v.push(f16::MAX_SUBNORMAL);
v.push(-f16::MAX_SUBNORMAL);
v.push(f16::ZERO);
v.push(f16::NEG_ZERO);
v.push(f16::NEG_ONE);
v.push(f16::MIN_POSITIVE);
v.sort_by(|a, b| a.total_cmp(&b));
assert!(v
.into_iter()
.zip(
[
f16::NEG_INFINITY,
f16::NEG_ONE,
-f16::MAX_SUBNORMAL,
f16::NEG_ZERO,
f16::ZERO,
f16::MAX_SUBNORMAL,
f16::MIN_POSITIVE,
f16::ONE,
f16::INFINITY,
f16::NAN
]
.iter()
)
.all(|(a, b)| a.to_bits() == b.to_bits()));
sourcepub const EPSILON: f16 = _
pub const EPSILON: f16 = _
f16
machine epsilon value
This is the difference between 1.0 and the next largest representable number.
sourcepub const MANTISSA_DIGITS: u32 = 11u32
pub const MANTISSA_DIGITS: u32 = 11u32
Number of f16
significant digits in base 2
sourcepub const MAX_10_EXP: i32 = 4i32
pub const MAX_10_EXP: i32 = 4i32
Maximum possible f16
power of 10 exponent
sourcepub const MIN_10_EXP: i32 = -4i32
pub const MIN_10_EXP: i32 = -4i32
Minimum possible normal f16
power of 10 exponent
sourcepub const MIN_EXP: i32 = -13i32
pub const MIN_EXP: i32 = -13i32
One greater than the minimum possible normal f16
power of 2 exponent
sourcepub const MIN_POSITIVE: f16 = _
pub const MIN_POSITIVE: f16 = _
Smallest positive normal f16
value
sourcepub const NEG_INFINITY: f16 = _
pub const NEG_INFINITY: f16 = _
f16
negative infinity (-∞)
sourcepub const MIN_POSITIVE_SUBNORMAL: f16 = _
pub const MIN_POSITIVE_SUBNORMAL: f16 = _
Minimum positive subnormal f16
value
sourcepub const MAX_SUBNORMAL: f16 = _
pub const MAX_SUBNORMAL: f16 = _
Maximum subnormal f16
value
sourcepub const FRAC_1_SQRT_2: f16 = _
pub const FRAC_1_SQRT_2: f16 = _
f16
1/√2
sourcepub const FRAC_2_SQRT_PI: f16 = _
pub const FRAC_2_SQRT_PI: f16 = _
f16
2/√π
Trait Implementations§
source§impl AddAssign<&f16> for f16
impl AddAssign<&f16> for f16
source§fn add_assign(&mut self, rhs: &f16)
fn add_assign(&mut self, rhs: &f16)
+=
operation. Read moresource§impl AddAssign<f16> for f16
impl AddAssign<f16> for f16
source§fn add_assign(&mut self, rhs: Self)
fn add_assign(&mut self, rhs: Self)
+=
operation. Read moresource§impl DivAssign<&f16> for f16
impl DivAssign<&f16> for f16
source§fn div_assign(&mut self, rhs: &f16)
fn div_assign(&mut self, rhs: &f16)
/=
operation. Read moresource§impl DivAssign<f16> for f16
impl DivAssign<f16> for f16
source§fn div_assign(&mut self, rhs: Self)
fn div_assign(&mut self, rhs: Self)
/=
operation. Read moresource§impl MulAssign<&f16> for f16
impl MulAssign<&f16> for f16
source§fn mul_assign(&mut self, rhs: &f16)
fn mul_assign(&mut self, rhs: &f16)
*=
operation. Read moresource§impl MulAssign<f16> for f16
impl MulAssign<f16> for f16
source§fn mul_assign(&mut self, rhs: Self)
fn mul_assign(&mut self, rhs: Self)
*=
operation. Read moresource§impl PartialOrd<f16> for f16
impl PartialOrd<f16> for f16
source§impl RemAssign<&f16> for f16
impl RemAssign<&f16> for f16
source§fn rem_assign(&mut self, rhs: &f16)
fn rem_assign(&mut self, rhs: &f16)
%=
operation. Read moresource§impl RemAssign<f16> for f16
impl RemAssign<f16> for f16
source§fn rem_assign(&mut self, rhs: Self)
fn rem_assign(&mut self, rhs: Self)
%=
operation. Read moresource§impl SubAssign<&f16> for f16
impl SubAssign<&f16> for f16
source§fn sub_assign(&mut self, rhs: &f16)
fn sub_assign(&mut self, rhs: &f16)
-=
operation. Read moresource§impl SubAssign<f16> for f16
impl SubAssign<f16> for f16
source§fn sub_assign(&mut self, rhs: Self)
fn sub_assign(&mut self, rhs: Self)
-=
operation. Read more