Thanks for this reference, it's nice to see new development in this area.
I like the language agnostic approach and the format seems to cater for many key concepts. It's also nice that it automatically creates parsers out of a specification.
Kaitai Struct seems to be a bit low level, though, probably to make it more machine readable.
Let's take their example for a GIF file format:
Code: Select all
meta:
id: gif
file-extension: gif
endian: le
seq:
- id: header
type: header
- id: logical_screen
type: logical_screen
types:
header:
seq:
- id: magic
contents: 'GIF'
- id: version
size: 3
logical_screen:
seq:
- id: image_width
type: u2
- id: image_height
type: u2
- id: flags
type: u1
- id: bg_color_index
type: u1
- id: pixel_aspect_ratio
type: u1
In more common format specification languages, which are based on programming language datatype definitions, this would become something similar to this (when keeping the meta-info as is):
Code: Select all
meta:
id: gif
file-extension: gif
endian: le
struct header
ascii_char[3] magic = 'GIF'
uint_8[3] version
struct logical_screen
uint16_t image_width
uint16_t image_height
uint8_t flags
uint8_t bg_color_index
uint8_t pixel_aspect_ratio
I think it's easier to understand what the intent is, if it's written in ways similar to the one shown last. This is because datatypes are always explicit (magic and version are not in Kaitai Struct), and the visual structure of the text already suggests the structure of the file/file format.
If you want to express a hierarchy, you could do so as well, using indentation:
Code: Select all
struct header
ascii_char[3] magic = 'GIF'
uint_8[3] version
struct logical_screen
uint16_t image_width
uint16_t image_height
uint8_t flags
uint8_t bg_color_index
uint8_t pixel_aspect_ratio
Here you have a top-level structure header, that has a sub-structure logical_screen. (
Katai Struct calls this type and sub-type.)
Alternatively, you could add a structure member named logical_screen with the structure type logical_screen_t like this:
Code: Select all
struct header
ascii_char[3] magic = 'GIF'
uint_8[3] version
struct logical_screen_t logical_screen
uint16_t image_width
uint16_t image_height
uint8_t flags
uint8_t bg_color_index
uint8_t pixel_aspect_ratio
There are many binary file specification formats, one of the most complete is this (while some of its syntax is a bit hard to parse "mentally"):
http://hmelnov.icc.ru/FlexT/index.eng.html
Its documentation is not complete and its main description in Russian. Some documentation can be found here:
http://hmelnov.icc.ru/FlexT/FLEXT_CSCC99.htm
But especially when you look through the format specifications, you see the breadths of concepts that are supported:
http://geos.icc.ru:8080/scripts/WWWBinV.dll/Cat
Link above seems to be dead, but this one seems to be the current link (15.9.2019):
http://hmelnov.icc.ru/geos/scripts/WWWBinV.dll/Cat