--- title: Xlang Serialization Format sidebar_position: 0 id: xlang_serialization_spec license: | Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --- ## Cross-language Serialization Specification Apache Fory™ xlang serialization enables automatic cross-language object serialization with support for shared references, circular references, and polymorphism. Unlike traditional serialization frameworks that require IDL definitions and schema compilation, Fory serializes objects directly without any intermediate steps. Key characteristics: - **Automatic**: No IDL definition, no schema compilation, no manual object-to-protocol conversion - **Cross-language**: Same binary format works seamlessly across Java, Python, C++, Rust, Go, JavaScript, and more - **Reference-aware**: Handles shared references and circular references without duplication or infinite recursion - **Polymorphic**: Supports object polymorphism with runtime type resolution This specification defines the Fory xlang binary format. The format is dynamic rather than static, which enables flexibility and ease of use at the cost of additional complexity in the wire format. ## Type Systems ### Data Types - bool: a boolean value (true or false). - int8: a 8-bit signed integer. - int16: a 16-bit signed integer. - int32: a 32-bit signed integer. - varint32: a 32-bit signed integer which use fory variable-length encoding. - int64: a 64-bit signed integer. - varint64: a 64-bit signed integer which use fory PVL encoding. - tagged_int64: a 64-bit signed integer which use fory Hybrid encoding. - uint8: an 8-bit unsigned integer. - uint16: a 16-bit unsigned integer. - uint32: a 32-bit unsigned integer. - var_uint32: a 32-bit unsigned integer which use fory variable-length encoding. - uint64: a 64-bit unsigned integer. - var_uint64: a 64-bit unsigned integer which use fory PVL encoding. - tagged_uint64: a 64-bit unsigned integer which use fory Hybrid encoding. - float8: an 8-bit floating point number. - float16: a 16-bit floating point number. - bfloat16: a 16-bit brain floating point number. - float32: a 32-bit floating point number. - float64: a 64-bit floating point number including NaN and Infinity. - string: a text string encoded using Latin1/UTF16/UTF-8 encoding. - enum: a data type consisting of a set of named values. Rust enum with non-predefined field values are not supported as an enum. - named_enum: an enum whose value will be serialized as the registered name. - struct: a dynamic(final) type serialized by Fory Struct serializer. i.e. it doesn't have subclasses. Suppose we're deserializing `List`, we can save dynamic serializer dispatch since `SomeClass` is dynamic(final). - compatible_struct: a dynamic(final) type serialized by Fory compatible Struct serializer. - named_struct: a `struct` whose type mapping will be encoded as a name. - named_compatible_struct: a `compatible_struct` whose type mapping will be encoded as a name. - ext: a type which will be serialized by a customized serializer. - named_ext: an `ext` type whose type mapping will be encoded as a name. - list: a sequence of objects. - set: an unordered set of unique elements. - map: a map of key-value pairs. Mutable types such as `list/map/set/array` are not allowed as key of map. - duration: an absolute length of time, independent of any calendar/timezone, as a count of nanoseconds. - timestamp: a point in time, independent of any calendar/timezone, encoded as seconds (int64) and nanoseconds (uint32) since the epoch at UTC midnight on January 1, 1970. - date: a naive date without timezone. The count is days relative to an epoch at UTC midnight on Jan 1, 1970. - decimal: exact decimal value represented as an integer value in two's complement. - binary: an variable-length array of bytes. - array: only allow 1d numeric components. Other arrays will be taken as List. The implementation should support the interoperability between array and list. - bool_array: one dimensional bool array. - int8_array: one dimensional int8 array. - int16_array: one dimensional int16 array. - int32_array: one dimensional int32 array. - int64_array: one dimensional int64 array. - float8_array: one dimensional float8 array. - float16_array: one dimensional half_float_16 array. - bfloat16_array: one dimensional bfloat16 array. - float32_array: one dimensional float32 array. - float64_array: one dimensional float64 array. - union: a tagged union type that can hold one of several alternative types. The active alternative is identified by an index. - typed_union: a union value with registered numeric union type ID. - named_union: a union value with embedded union type name or shared TypeDef. - none: represents an empty/unit value with no data (e.g., for empty union alternatives). Note: - Unsigned integer types use the same byte sizes as their signed counterparts; the difference is in value interpretation. See [Type mapping](xlang_type_mapping.md) for language-specific type mappings. ### Polymorphisms For polymorphism, if one non-final class is registered, and only one subclass is registered, then we can take all elements in List/Map have same type, thus reduce runtime check cost. Collection/Array polymorphism are not fully supported, since some languages such as golang have only one collection type. If users want to get exactly the type he passed, he must pass that type when deserializing or annotate that type to the field of struct. ### Type disambiguation Due to differences between type systems of languages, those types can't be mapped one-to-one between languages. When deserializing, Fory use the target data structure type and the data type in the data jointly to determine how to deserialize and populate the target data structure. For example: ```java class Foo { int[] intArray; Object[] objects; List