1 #include "caffe2/operators/fused_rowwise_8bit_conversion_ops.h" 2 #include "caffe2/core/registry.h" 6 FloatToFused8BitRowwiseQuantized,
7 FloatToFused8BitRowwiseQuantizedOp<CPUContext>);
8 OPERATOR_SCHEMA(FloatToFused8BitRowwiseQuantized)
12 Applies 8-bit row-wise quantization by determining the range 13 (maximum - minimum) and offset (minimum value) of each row in the input 14 matrix, and then scaling each element to an 8-bit number between 0 and 15 255. To later de-quantize values, the scale (range / 255) and offset 16 (bias) are stored alongside the data. More precisely, the first 4 bytes 17 of each row in the output matrix are a 32-bit float storing the scale, 18 the next 4 bytes store the bias as a 32-bit float, and all remaining 19 bytes in the row encode single quantized values.) 21 .Input(0, "input",
"Float32 input data")
22 .Output(0,
"output",
"Fused scale, bias and quantized data");
23 NO_GRADIENT(FloatToFused8BitRowwiseQuantized);
25 REGISTER_CPU_OPERATOR(
26 Fused8BitRowwiseQuantizedToFloat,
27 Fused8BitRowwiseQuantizedToFloatOp<CPUContext>);
28 OPERATOR_SCHEMA(Fused8BitRowwiseQuantizedToFloat)
32 De-quantizes the result of the 33 FloatToFused8BitRowwiseQuantized operator. The input is expected to 34 encode the scale as a 32-bit float in the second to the last 4 bytes of each 35 row, followed by the bias as a 32-bit float in the next 4 bytes, and the 36 quantized values in the preceding bytes of the row. The output is a 37 matrix containing only the values, but de-quantized. De-quantization is 38 performed by multiplying each value by its row's scale and bias 39 parameters. The de-quantized values will thus not be exactly equal to 40 the original, un-quantized floating point values. 44 "scale_bias_quantized_input",
45 "Fused scale, bias and quantized data")
46 .Output(0,
"float_input",
"Float32 data");
47 NO_GRADIENT(Fused8BitRowwiseQuantizedToFloat);
A global dictionary that holds information about what Caffe2 modules have been loaded in the current ...