proxygen
proxygen/folly/folly/docs/Conv.md
Go to the documentation of this file.
1 `folly/Conv.h`
2 -------------
3 
4 `folly/Conv.h` is a one-stop-shop for converting values across
5 types. Its main features are simplicity of the API (only the
6 names `to` and `toAppend` must be memorized), speed
7 (folly is significantly faster, sometimes by an order of magnitude,
8 than comparable APIs), and correctness.
9 
10 ### Synopsis
11 ***
12 
13 All examples below are assume to have included `folly/Conv.h`
14 and issued `using namespace folly;` You will need:
15 
16 ``` Cpp
17  // To format as text and append to a string, use toAppend.
18  fbstring str;
19  toAppend(2.5, &str);
20  CHECK_EQ(str, "2.5");
21 
22  // Multiple arguments are okay, too. Just put the pointer to string at the end.
23  toAppend(" is ", 2, " point ", 5, &str);
24  CHECK_EQ(str, "2.5 is 2 point 5");
25 
26  // You don't need to use fbstring (although it's much faster for conversions and in general).
27  std::string stdStr;
28  toAppend("Pi is about ", 22.0 / 7, &stdStr);
29  // In general, just use to<TargetType>(sourceValue). It returns its result by value.
30  stdStr = to<std::string>("Variadic ", "arguments also accepted.");
31 
32  // to<fbstring> is 2.5x faster than to<std::string> for typical workloads.
33  str = to<fbstring>("Variadic ", "arguments also accepted.");
34 ```
35 
36 ### Integral-to-integral conversion
37 ***
38 
39 Using `to<Target>(value)` to convert one integral type to another
40 will behave as follows:
41 
42 * If the target type can accommodate all possible values of the
43  source value, the value is implicitly converted. No further
44  action is taken. Example:
45 
46 ``` Cpp
47  short x;
48  unsigned short y;
49  ...
50  auto a = to<int>(x); // zero overhead conversion
51  auto b = to<int>(y); // zero overhead conversion
52 ```
53 
54 * Otherwise, `to` inserts bounds checks and throws
55  `std::range_error` if the target type cannot accommodate the
56  source value. Example:
57 
58 ``` Cpp
59  short x;
60  unsigned short y;
61  long z;
62  ...
63  x = 123;
64  auto a = to<unsigned short>(x); // fine
65  x = -1;
66  a = to<unsigned short>(x); // THROWS
67  z = 2000000000;
68  auto b = to<int>(z); // fine
69  z += 1000000000;
70  b = to<int>(z); // THROWS
71  auto b = to<unsigned int>(z); // fine
72 ```
73 
74 ### Anything-to-string conversion
75 ***
76 
77 As mentioned, there are two primitives for converting anything to
78 string: `to` and `toAppend`. They support the same set of source
79 types, literally by definition (`to` is implemented in terms of
80 `toAppend` for all types). The call `toAppend(value, &str)`
81 formats and appends `value` to `str` whereas
82 `to<StringType>(value)` formats `value` as a `StringType` and
83 returns the result by value. Currently, the supported
84 `StringType`s are `std::string` and `fbstring`
85 
86 Both `toAppend` and `to` with a string type as a target support
87 variadic arguments. Each argument is converted in turn. For
88 `toAppend` the last argument in a variadic list must be the
89 address of a supported string type (no need to specify the string
90 type as a template argument).
91 
92 #### Integral-to-string conversion
93 
94 Nothing special here - integrals are converted to strings in
95 decimal format, with a '-' prefix for negative values. Example:
96 
97 ``` Cpp
98  auto a = to<fbstring>(123);
99  assert(a == "123");
100  a = to<fbstring>(-456);
101  assert(a == "-456");
102 ```
103 
104 The conversion implementation is aggressively optimized. It
105 converts two digits at a time assisted by fixed-size tables.
106 Converting a `long` to an `fbstring` is 3.6x faster than using
107 `boost::lexical_cast` and 2.5x faster than using `sprintf` even
108 though the latter is used in conjunction with a stack-allocated
109 constant-size buffer.
110 
111 Note that converting integral types to `fbstring` has a
112 particular advantage compared to converting to `std::string`
113 No integral type (<= 64 bits) has more than 20 decimal digits
114 including sign. Since `fbstring` employs the small string
115 optimization for up to 23 characters, converting an integral
116 to `fbstring` is guaranteed to not allocate memory, resulting
117 in significant speed and memory locality gains. Benchmarks
118 reveal a 2x gain on a typical workload.
119 
120 #### `char` to string conversion
121 
122 Although `char` is technically an integral type, most of the time
123 you want the string representation of `'a'` to be `"a"`, not `96`
124 That's why `folly/Conv.h` handles `char` as a special case that
125 does the expected thing. Note that `signed char` and `unsigned
126 char` are still considered integral types.
127 
128 
129 #### Floating point to string conversion
130 
131 `folly/Conv.h` uses [V8's double conversion](http://code.google.com/p/double-conversion/)
132 routines. They are accurate and fast; on typical workloads,
133 `to<fbstring>(doubleValue)` is 1.9x faster than `sprintf` and
134 5.5x faster than `boost::lexical_cast` (It is also 1.3x faster
135 than `to<std::string>(doubleValue)`
136 
137 #### `const char*` to string conversion
138 
139 For completeness, `folly/Conv.h` supports `const char*` including
140 i.e. string literals. The "conversion" consists, of course, of
141 the string itself. Example:
142 
143 ``` Cpp
144  auto s = to<fbstring>("Hello, world");
145  assert(s == "Hello, world");
146 ```
147 
148 #### Anything from string conversion (i.e. parsing)
149 ***
150 
151 `folly/Conv.h` includes three kinds of parsing routines:
152 
153 * `to<Type>(const char* begin, const char* end)` rigidly
154  converts the range [begin, end) to `Type` These routines have
155  drastic restrictions (e.g. allow no leading or trailing
156  whitespace) and are intended as an efficient back-end for more
157  tolerant routines.
158 * `to<Type>(stringy)` converts `stringy` to `Type` Value
159  `stringy` may be of type `const char*`, `StringPiece`,
160  `std::string`, or `fbstring` (Technically, the requirement is
161  that `stringy` implicitly converts to a `StringPiece`
162 * `to<Type>(&stringPiece)` parses with progress information:
163  given `stringPiece` of type `StringPiece` it parses as much
164  as possible from it as type `Type` and alters `stringPiece`
165  to remove the munched characters. This is easiest clarified
166  by an example:
167 
168 ``` Cpp
169  fbstring s = " 1234 angels on a pin";
170  StringPiece pc(s);
171  auto x = to<int>(&pc);
172  assert(x == 1234);
173  assert(pc == " angels on a pin";
174 ```
175 
176 Note how the routine ate the leading space but not the trailing one.
177 
178 #### Parsing integral types
179 
180 Parsing integral types is unremarkable - decimal format is
181 expected, optional `'+'` or `'-'` sign for signed types, but no
182 optional `'+'` is allowed for unsigned types. The one remarkable
183 element is speed - parsing typical `long` values is 6x faster than
184 `sscanf`. `folly/Conv.h` uses aggressive loop unrolling and
185 table-assisted SIMD-style code arrangement that avoids integral
186 division (slow) and data dependencies across operations
187 (ILP-unfriendly). Example:
188 
189 ``` Cpp
190  fbstring str = " 12345 ";
191  assert(to<int>(str) == 12345);
192  str = " 12345six seven eight";
193  StringPiece pc(str);
194  assert(to<int>(&pc) == 12345);
195  assert(str == "six seven eight");
196 ```
197 
198 #### Parsing floating-point types
199 
200 `folly/Conv.h` uses, again, [V8's double-conversion](http://code.google.com/p/double-conversion/)
201 routines as back-end. The speed is 3x faster than `sscanf` and
202 1.7x faster than in-home routines such as `parse<double>` But
203 the more important detail is accuracy - even if you do code a
204 routine that works faster than `to<double>` chances are it is
205 incorrect and will fail in a variety of corner cases. Using
206 `to<double>` is strongly recommended.
207 
208 Note that if the string "NaN" (with any capitalization) is passed to
209 `to<double>` then `NaN` is returned, which can be tested for as follows:
210 
211 ``` Cpp
212  fbstring str = "nan"; // "NaN", "NAN", etc.
213  double d = to<double>(str);
214  if (std::isnan(d)) {
215  // string was a valid representation of the double value NaN
216  }
217 ```
218 
219 Note that passing "-NaN" (with any capitalization) to `to<double>` also returns
220 `NaN`.
221 
222 Note that if the strings "inf" or "infinity" (with any capitalization) are
223 passed to `to<double>` then `infinity` is returned, which can be tested for
224 as follows:
225 
226 ``` Cpp
227  fbstring str = "inf"; // "Inf", "INF", "infinity", "Infinity", etc.
228  double d = to<double>(str);
229  if (std::isinf(d)) {
230  // string was a valid representation of one of the double values +Infinity
231  // or -Infinity
232  }
233 ```
234 
235 Note that passing "-inf" or "-infinity" (with any capitalization) to
236 `to<double>` returns `-infinity` rather than `+infinity`. The sign of the
237 `infinity` can be tested for as follows:
238 
239 ``` Cpp
240  fbstring str = "-inf"; // or "inf", "-Infinity", "+Infinity", etc.
241  double d = to<double>(str);
242  if (d == std::numeric_limits<double>::infinity()) {
243  // string was a valid representation of the double value +Infinity
244  } else if (d == -std::numeric_limits<double>::infinity()) {
245  // string was a valid representation of the double value -Infinity
246  }
247 ```
248 
249 Note that if an unparseable string is passed to `to<double>` then an exception
250 is thrown, rather than `NaN` being returned. This can be tested for as follows:
251 
252 ``` Cpp
253  fbstring str = "not-a-double"; // Or "1.1.1", "", "$500.00", etc.
254  double d;
255  try {
256  d = to<double>(str);
257  } catch (const std::range_error &) {
258  // string could not be parsed
259  }
260 ```
261 
262 Note that the empty string (`""`) is an unparseable value, and will cause
263 `to<double>` to throw an exception.
264 
265 #### Non-throwing interfaces
266 
267 `tryTo<T>` is the non-throwing variant of `to<T>`. It returns
268 an `Expected<T, ConversionCode>`. You can think of `Expected`
269 as like an `Optional<T>`, but if the conversion failed, `Expected`
270 stores an error code instead of a `T`.
271 
272 `tryTo<T>` has similar performance as `to<T>` when the
273 conversion is successful. On the error path, you can expect
274 `tryTo<T>` to be roughly three orders of magnitude faster than
275 the throwing `to<T>` and to completely avoid any lock contention
276 arising from stack unwinding.
277 
278 Here is how to use non-throwing conversions:
279 
280 ``` Cpp
281  auto t1 = tryTo<int>(str);
282  if (t1.hasValue()) {
283  use(t1.value());
284  }
285 ```
286 
287 `Expected` has a composability feature to make the above pattern simpler.
288 
289 ``` Cpp
290  tryTo<int>(str).then([](int i) { use(i); });
291 ```