Generic Binary Keyed Format (*.gbkf)
This is the specification of a compact, extensible binary format designed for fast, schema-flexible storage and transmission of structured data.
Field sizes were chosen to balance storage efficiency against typical enterprise-scale data volumes. In general, the format’s capabilities far exceed what most implementations can write in terms of file size. An example of this, is that the maximum Instance ID is 4,294,967,295.
This document focuses in describing the format itself, so detailed technical choices, tests, examples and implementations are not covered by this document. Such documentation can be found at gbkf.rsm92.fr.
This version was released on August 24, 2025.
Format Specification
Description:
- Header: Main identifier of the file, essential to read the body.
- Body: Container of all the Keyed-Values.
- Footer: SHA-256 used to verify the file integrity.
Remark: The minimum valid GBKF file consists of only the header, with zero keyed-values and no footer. This represents a file with no payload and no integrity check. While the footer is optional, its use is strongly recommended to ensure file integrity.
Header
-
gbkf Identifier:
- Type: ASCII Lowercase character sequence
- Encoding: 1 byte per character using 7-bit ASCII (ISO 646)
- Size: 4 bytes
- Value: gbkf
- Description: Format Identifier.
-
gbkf Version:
- Type: Unsigned Integer
- Size: 1 byte
-
Values:
- Min: 0
- Max: 255
- Description: Version of the GBKF Specification. This version is 1.
-
Specification ID:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size: 4 bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Field to distinguish top-level specifications. The list of reserved ranges is here.
-
Specification Version:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size: 2 bytes
-
Values:
- Min: 0
- Max: 65,535
- Default: 0
- Description: Version of the top-level specification.
-
Maximum Key Size:
- Type: Unsigned Integer
- Size: 1 byte
-
Values:
- Min: 1
- Max: 255
- Default: 1
- Description: Allocated size to store the keys. The container shall be initialized with zeroes in case of accepting keys with a lower size. In such case the key will end at the first null character.
-
Number of Keyed-Values:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size: 4 bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Total number of keyed-values before the SHA256 sum.
Body
The body contains the real payload, and all its content is managed through a Key-(Instance)-Value architecture. It can contain from zero to 4,294,967,295 Key-Instance-Values.
Keyed-Value
Each keyed-value is composed of a common header, followed by a specific data structure based on the type (Integer, String, etc...) to be stored.
Keyed-Values' Header
This header is common1 for all the Keyed-Values. It allows to define the data type and size, and to create a mapping table to simplify the reading of the file.
Additionally, the header also stores an Instance ID to allow the top-level specification create groups of keys, or by instance id.
1The header for String and booleans is extended.
-
Key:
- Type: ASCII character sequence
- Encoding: 1 byte per character using 7-bit ASCII (ISO 646)
- Size: <Maximum Key Size> This value is defined in the Main Header.
- Description: Values identifier. This is a friendly approach to identify the type of your data.
-
Instance ID:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size: 4 Bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Field to identify data sharing the same key. This allows to group data across different keys, or to write sequential data across a same key.
- Type of the Values:
-
Number of Values:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size: 4 Bytes
-
Values:
- Min: 0
- Max: 4,294,967,295
- Default: 0
- Description: Number of values that are hold by the key-instance id.
Keyed-Values' Data
Type BLOB
The binary data is stored without transformation or encoding. Each byte is treated as one value, and the "Number of Values" field in the Keyed-Value header represents the total number of bytes in the payload.
Type BOOLEAN
Booleans are grouped into a package of 8 bits and written as a byte. Because of that, it is necessary store the number of useful bits of the last byte.
-
Useful bits in the last byte:
- Type: Unsigned Integer
- Size: 1 Byte
- Values: 1 to 8
- Description: Number of Useful bits of the last byte.
-
Values:
- Type: Unsigned Integer
- Size: 1 Byte (Per group of 8 booleans)
Type STRING
There are two types of string, with fixed size, and with dynamic size. In both cases, they start with the following field:
-
Maximum String Size:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size: 2 Bytes
- Values:
- 0 for Dynamic-sized strings
- 1 to 65,535 for Fixed-Sized strings
- Description:
- If the value is equal to 0, it means the string has a Dynamic size.
- If the value is equal or greater than 1, it means the string has a Fixed size.
- The field represents the maximum number of bytes that can be allocated to the string.
- It must not take into account the null character.
- On UTF-8, the maximum character size is 4 bytes, so a simple rule is:
Maximum String Size = Max Nb. of characters * 4
Fixed-Size String
Fixed-Size strings are written with a fixed pre-allocated space, initialized with zeros.
Dynamic-Size String
-
Total number of bytes:
- Type: Unsigned Integer
- Encoding: UTF-8
- Size: 4 Bytes
- Description:
Total number of bytes = ∑i=1i=Number of values Stringi size.
This field is necessary to know the whole values size, and move to the next keyed-value.
-
String Size:
- Type: Unsigned Integer
- Encoding: UTF-8
- Size: 2 Bytes
- Description: Number of bytes that it is necessary to allocate the string (without the null character).
Type INTEGER / UNSIGNED INTEGER
All integers are written in sequence and have the same structure. Thanks to their Type it is possible to deduce the size and respective ranges.
-
Values:
- Type: Unsigned Integer
- Encoding: Little-endian
- Size:
- 1: for uint8 / int8
- 2: for uint16 / int16
- 4: for uint32 / int32
- 8: for uint64 / int64
-
Ranges:
- int8: −128 to 127
- int16: −32,768 to 32,767
- int32: −2,147,483,648 to 2,147,483,647
- int64: −9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
- uint8: 0 to 255
- uint16: 0 to 65,535
- uint32: 0 to 4,294,967,295
- uint64: 0 to 18,446,744,073,709,551,615
Type FLOAT32
Float32 values are read and written in sequence using
IEEE 754 single-precision ,
and only finite normalized values are supported.
-
Values:
- Type: Float32
- Encoding: Little-endian
- Size: 4 Bytes
-
Ranges:
- Min: -3.4028235e+38
- Max: 3.4028235e+38
Type FLOAT64
Float64 values are read and written in sequence using
IEEE 754 double-precision ,
and only finite normalized values are supported.
-
Values:
- Type: Float64
- Encoding: Little-endian
- Size: 8 Bytes
-
Ranges:
- Min: -1.7976931348623157e+308
- Max: 1.7976931348623157e+308
Footer
The footer is an SHA-256 (32 bytes) that hashes the header and the body. This allows to verify the file integrity.