Suggestion for HxD:User templates for data inspector

Wishlists for new functionality and features.
asmix
Posts: 10
Joined: 02 Dec 2008 02:40

Suggestion for HxD:User templates for data inspector

Post by asmix »

It would be great if we have an ability to make user templates for different data, it will greatly simplify data interpretation.
Probably simple script language should be enough. E.g.

Code: Select all

Struct         32              ;Size of structure
Offset         128             ;Offset in current block (disk sector) or repetitive 
Magic          4D 41 47 49 43  ;Optional
Uint16         Some data
Uint32         Some other data
Color          #ff3333
String19       Some string
Bit16b         Some bits       ;Wide big-endian bit field
End
Also it would be nice to have an option to change Data inspector font and colors

But anyway, it's good to see the progress is moving
Great work Maël

akrutsinger
Posts: 3
Joined: 01 Jun 2016 20:03

Re: Suggestion for HxD:User templates for data inspector

Post by akrutsinger »

Agreed! No need re-inventing a method though, take advantage of a format/structure that is already available -- http://kaitai.io/

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

Thanks for this reference, it's nice to see new development in this area.
I like the language agnostic approach and the format seems to cater for many key concepts. It's also nice that it automatically creates parsers out of a specification.
Kaitai Struct seems to be a bit low level, though, probably to make it more machine readable.

Let's take their example for a GIF file format:

Code: Select all

meta:
  id: gif
  file-extension: gif
  endian: le
seq:
  - id: header
    type: header
  - id: logical_screen
    type: logical_screen
types:
  header:
    seq:
      - id: magic
        contents: 'GIF'
      - id: version
        size: 3
  logical_screen:
    seq:
      - id: image_width
        type: u2
      - id: image_height
        type: u2
      - id: flags
        type: u1
      - id: bg_color_index
        type: u1
      - id: pixel_aspect_ratio
        type: u1
In more common format specification languages, which are based on programming language datatype definitions, this would become something similar to this (when keeping the meta-info as is):

Code: Select all

meta:
  id: gif
  file-extension: gif
  endian: le
  
struct header
  ascii_char[3] magic = 'GIF'
  uint_8[3] version
  
struct logical_screen
  uint16_t image_width
  uint16_t image_height
  uint8_t flags
  uint8_t bg_color_index
  uint8_t pixel_aspect_ratio
I think it's easier to understand what the intent is, if it's written in ways similar to the one shown last. This is because datatypes are always explicit (magic and version are not in Kaitai Struct), and the visual structure of the text already suggests the structure of the file/file format.

If you want to express a hierarchy, you could do so as well, using indentation:

Code: Select all

struct header
  ascii_char[3] magic = 'GIF'
  uint_8[3] version
  
  struct logical_screen
    uint16_t image_width
    uint16_t image_height
    uint8_t flags
    uint8_t bg_color_index
    uint8_t pixel_aspect_ratio
Here you have a top-level structure header, that has a sub-structure logical_screen. (Katai Struct calls this type and sub-type.)

Alternatively, you could add a structure member named logical_screen with the structure type logical_screen_t like this:

Code: Select all

struct header
  ascii_char[3] magic = 'GIF'
  uint_8[3] version
  
  struct logical_screen_t logical_screen
    uint16_t image_width
    uint16_t image_height
    uint8_t flags
    uint8_t bg_color_index
    uint8_t pixel_aspect_ratio
There are many binary file specification formats, one of the most complete is this (while some of its syntax is a bit hard to parse "mentally"):
http://hmelnov.icc.ru/FlexT/index.eng.html

Its documentation is not complete and its main description in Russian. Some documentation can be found here:
http://hmelnov.icc.ru/FlexT/FLEXT_CSCC99.htm

But especially when you look through the format specifications, you see the breadths of concepts that are supported:
http://geos.icc.ru:8080/scripts/WWWBinV.dll/Cat
Link above seems to be dead, but this one seems to be the current link (15.9.2019):
http://hmelnov.icc.ru/geos/scripts/WWWBinV.dll/Cat

akrutsinger
Posts: 3
Joined: 01 Jun 2016 20:03

Re: Suggestion for HxD:User templates for data inspector

Post by akrutsinger »

I agree that using explicit types makes the most sense and I personally prefer it to alleviate any potential ambiguity. Most of all I hope you are interested in adding this capability to HxD :D :D

DMan
Posts: 1
Joined: 28 Aug 2017 15:02

Re: Suggestion for HxD:User templates for data inspector

Post by DMan »

Would love to see this feature too. For reference, the commercial "010 Editor" has this feature, which it calls "binary templates":

https://www.sweetscape.com/010editor/#templates

Example of a template:
https://gist.github.com/mattifestation/ ... 45ae02eccb

n8d06
Posts: 1
Joined: 22 Aug 2019 16:08

Re: Suggestion for HxD:User templates for data inspector

Post by n8d06 »

I would also like to vote for this feature! Very helpful when looking at image files, or any "standard" files. Having a simple definition language would be very useful when reverse engineering files or when creating custom file types. Let me know if you need any help implementing this. HxD rocks!

JakeTheDog
Posts: 1
Joined: 15 Feb 2021 10:16

Re: Suggestion for HxD:User templates for data inspector

Post by JakeTheDog »

This would be a great feature making HxD much more versatile and incorporating the community writing templates.
On my mac I'm using https://ridiculousfish.com/hexfiend/ which has this features and is really helpful.
Maybe this tcl based format can be used given the Project is BSD licensed?
This way both hex editors could leverage the same templates https://github.com/HexFiend/HexFiend/tr ... /templates

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

Thanks for your suggestion.
Over time I looked at many different syntaxes, and decided to go against the common choice of using imperative like inspired languages.

I think a datatype, which (file) structure definitions are, should be described as declaratively as possible, even if it has dynamic aspects. It will also likely be more expressive/powerful than most structure definition languages I saw so far. Unless of course they are essentially a full programming language.

But people should be able to write translators.

The main advantage is that this allows for declaring the intent, and optimizing the implementation independently of that (always to a degree of course). The other advantage is that it would allow to automatically generate parsers, based on the declarative syntax, so you could for example quickly define the file structure, then automatically get a file parser in your preferred language as a class/library to include.

A bit like a compiler, but for making structured file readers/modifiers.

It would also allow to check for file consistency, and to annotate so that files could be changed/resized while staying valid and consistent.

These are options for the future, currently I will focus on making a read only structure viewer.

In the next post I'll show what I have currently.

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: HxD structure definition (HSD)

Post by Maël »

As mentioned I am working on a feature for creating a structure viewer/editor.

PE (portable executable) files are currently the template to determine the necessary functionality (but other file formats like PNG-files and matching features will be added).

So far you can define dynamic arrays and structures with dynamic size, where other parts/fields in the file define the size, and pointers are dereferenced automatically.

There is also a feature to map pointers using a function (currently only built-in ones). For PE files this allows mapping RVA (relative virtual addresses) to absolute file offsets.

All of the file structure is given in a declarative language, called HxD structure definition (HSD).

A functional example for parsing PE headers is given below:

Code: Select all

types
  PVirtualAddress = pointer<UInt32, UInt32>


  IMAGE_DATA_DIRECTORY = struct {
    VirtualAddress: UInt32;
    Size: UInt32;
  }

  IMAGE_FILE_HEADER = struct {
    Machine: UInt16;
    NumberOfSections: UInt16;
    TimeDateStamp: UInt32;
    PointerToSymbolTable: UInt32;
    NumberOfSymbols: UInt32;
    SizeOfOptionalHeader: UInt16;
    Characteristics: UInt16;
  }

  IMAGE_OPTIONAL_HEADER32 = struct {
    Magic: UInt16;
    MajorLinkerVersion: UInt8;
    MinorLinkerVersion: UInt8;
    SizeOfCode: UInt32;
    SizeOfInitializedData: UInt32;
    SizeOfUninitializedData: UInt32;
    AddressOfEntryPoint: UInt32;
    BaseOfCode: UInt32;
    BaseOfData: UInt32;

    ImageBase: UInt32;
    SectionAlignment: UInt32;
    FileAlignment: UInt32;
    MajorOperatingSystemVersion: UInt16;
    MinorOperatingSystemVersion: UInt16;
    MajorImageVersion: UInt16;
    MinorImageVersion: UInt16;
    MajorSubsystemVersion: UInt16;
    MinorSubsystemVersion: UInt16;
    Win32VersionValue: UInt32;
    SizeOfImage: UInt32;
    SizeOfHeaders: UInt32;
    CheckSum: UInt32;
    Subsystem: UInt16;
    DllCharacteristics: UInt16;
    SizeOfStackReserve: UInt32;
    SizeOfStackCommit: UInt32;
    SizeOfHeapReserve: UInt32;
    SizeOfHeapCommit: UInt32;
    LoaderFlags: UInt32;
    NumberOfRvaAndSizes: UInt32;
    DataDirectory: IMAGE_DATA_DIRECTORY[:NumberOfRvaAndSizes];
  }

  IMAGE_NT_HEADERS32 = struct {
    Signature: UInt8[4];
    FileHeader: IMAGE_FILE_HEADER;
    OptionalHeader: IMAGE_OPTIONAL_HEADER32;
  }


  PIMAGE_NT_HEADERS32 = pointer<UInt32, IMAGE_NT_HEADERS32>

  IMAGE_DOS_HEADER = struct {
    e_magic: UInt8[2];
    e_cblp: UInt16;
    e_cp: UInt16;
    e_crlc: UInt16;
    e_cparhdr: UInt16;
    e_minalloc: UInt16;
    e_maxalloc: UInt16;
    e_ss: UInt16;
    e_sp: UInt16;
    e_csum: UInt16;
    e_ip: UInt16;
    e_cs: UInt16;
    e_lfarlc: UInt16;
    e_ovno: UInt16;
    e_res: UInt16[4];
    e_oemid: UInt16;
    e_oeminfo: UInt16;
    e_res2: UInt16[10];
    _lfanew: UInt32;
  }

  IMAGE_SECTION_HEADER = struct {
    Name: Char8Ansi[8];
    Misc_PhysicalAddressOrVirtualSize: UInt32;
    VirtualAddress: UInt32;
    SizeOfRawData: UInt32;
    PointerToRawData: UInt32;
    PointerToRelocations: UInt32;
    PointerToLinenumbers: UInt32;
    NumberOfRelocations: UInt16;
    NumberOfLinenumbers: UInt16;
    Characteristics: UInt32;
  }

  IMAGE_IMPORT_DESCRIPTOR = struct {
    OriginalFirstThunk_ImportLookupTable_RVA: UInt32;
    TimeDateStamp: UInt32;
    ForwarderChain: UInt32;
    Name_RVA: UInt32;
    FirstThunk_ImportAddressTable_RVA: UInt32;
  }

  OVERALL_FILE = struct {
    ImageDosHeader: IMAGE_DOS_HEADER;
    ImageNtHeaders32: IMAGE_NT_HEADERS32 @ :ImageDosHeader._lfanew;                                                        
    ImageSectionHeaders: IMAGE_SECTION_HEADER[:ImageNtHeaders32.FileHeader.NumberOfSections];
  }

instances
  $root: OVERALL_FILE
The pictures below show how this is parsed/visualized for my PropEdit.exe (but the solution is generic and works with any 32-bit PE file).
structedit1.png
structedit1.png (9.48 KiB) Viewed 244 times
structedit2.png
structedit2.png (9.67 KiB) Viewed 244 times
structedit3.png
structedit3.png (13.43 KiB) Viewed 244 times
structedit4.png
structedit4.png (15.55 KiB) Viewed 244 times
structedit5.png
structedit5.png (12.89 KiB) Viewed 244 times

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

What is special (besides being able to define dynamic structures and automatically parsing files accordingly), are expressions like this:

Code: Select all

DataDirectory: IMAGE_DATA_DIRECTORY[:NumberOfRvaAndSizes];
As you can see in the fourth picture in the post above, the array size of DataDirectory dynamically depends on the field NumberOfRvaAndSize, which was defined earlier in the file, and is displayed accordingly. While with traditional programming language you would need additional code to handle the dynamic nature of the data structure, you can do it declaratively in HSD.

The position of ImageNtHeaders32 is dependent on ImageDosHeader._lfanew, which is a file-dependent offset/pointer and would also require imperative code in traditional languages. In HSD this declaration suffices:

Code: Select all

ImageNtHeaders32: IMAGE_NT_HEADERS32 @ :ImageDosHeader._lfanew;
The result can be seen in the second picture in the post above.

Code: Select all

ImageSectionHeaders: IMAGE_SECTION_HEADER[:ImageNtHeaders32.FileHeader.NumberOfSections];
ImageSectionHeaders is defined dynamically, as well. But in such a way that it depends on the two dynamic declarations before (their size and position, that depends on sibling/previous fields). This is because ImageSectionHeaders is the third field in OVERALL_FILE and the preceding fields are of dynamic size, so the position of ImageSectionHeaders adapts accordingly. The size of ImageSectionHeaders is also dynamic, as can be seen when looking at the identifier between [ and ].

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

Later I added the ability to define pointers which automatically use a function to map addresses, with this syntax:

Code: Select all

PVirtualAddress = pointer<UInt32, UInt32, RVAToFilePointer>
The first UInt32 defines the address size (this is settable, since you cannot assume all pointers in a file to have the same bitwidth or even type; as opposed to pointers in normal code, that always follows the CPU's constraints).
The second one is the datatype of the pointer target, to keep things simple for now, just an UInt32 (but it will be a structure type later).
Finally, RVAToFilePointer is the function that does the mapping from the RVAs stored in the file to absolute file offsets.

RVAToFilePointer is a built-in function that gets called whenever a mapping is needed, but I'll expand this to allow for simple functions, that can be declared in HSD, as well.

With this new ability, the data directories in the PE file can now be defined as follows, and will be properly found in the structure viewer:

Code: Select all

IMAGE_DATA_DIRECTORY = struct {
  VirtualAddress: PVirtualAddress;
  Size: UInt32;
}

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

Support variable-width datatypes in structures

HSD should also support basic datatypes of variable width, such as (U)LEB128 or VINT(EBML) -- there are many kinds of variable-width integers, that's why EBML is added in parentheses to clarify what kind it is -- and adapt the size of structs that include VINTs dynamically, just like it does already with byte arrays of dynamic size.

This is useful for example, for Matroska/WEBM files.

VINT(EBML):
https://github.com/ietf-wg-cellar/ebml- ... ze-integer
https://stackoverflow.com/questions/605 ... ile-binary

Another summary with some simple examples of the essentials of the Matroska File Format:
https://matroska.org/files/matroska_fil ... er_noe.pdf
One piece of information is stored the following way:

Code: Select all

typedef struct {
  vint       ID    // EBML-ID
  vint       size  // size of element
  char[size] data  // data
} EBML_ELEMENT;
VINT is essentially a big-endian encoded integer, with the leading bits reserved for specifying the length of the integer in octets/bytes.
So essentially a UInt8, UInt16, UInt24, UInt32, UInt40, UInt48, UInt56, UInt64 with the leading bits carrying the length information, and the remaining bits usable for the actual integer data (encoded in big endian order).

GetCodedUIntSize code given here:
https://github.com/webmproject/libwebm/ ... uxerutil.h
https://chromium.googlesource.com/webm/ ... erutil.cpp
https://docs.rs/crate/webm-sys/0.1.0/so ... erutil.cpp

Matroska standard https://tools.ietf.org/id/draft-lhomme- ... ka-04.html

From a user request (with some slight edits): a typical structure in a Matroska/WEBM file could look like this:

Code: Select all

struct {
  int16 struct_key;
  VINT size_of_following_integer;
  VINT some_vinteger;
  byte somedata[];
}
// followed by another block, immediately
// (so offsets of the following struct depend on this struct's size -- which is dynamic, due to VINT fields)

Code: Select all

// typical code pattern (from test webm muxer) writing 'Audio Codec Private bytes' looks like
// scratchbuf = is a pointer to to byte-stream.
// scratchbuf_used = is a writted bytes counter

uint8_t codecprivate_hdr[2] = { 0x63, 0xA2 };                                                  
//serialize
memcpy(scratchbuf + scratchbuf_used, codecprivate_hdr, sizeof(codecprivate_hdr));              
scratchbuf_used += sizeof(codecprivate_hdr);                                                  
//calculate value size                                                                                  
valbytesize = GetCodedUIntSize(codecpriv_sz);                                                  
//serialize size
WriteUIntSize(scratchbuf + scratchbuf_used, codecpriv_sz, valbytesize);                        
scratchbuf_used += valbytesize;                                                                
//serialize value                                                                                        
memcpy(scratchbuf + scratchbuf_used, codecpriv, codecpriv_sz);                                
scratchbuf_used += codecpriv_sz;
https://chromium.googlesource.com/webm/ ... r_tests.cc

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

Other examples of variable-width integers are:

https://github.com/AljoschaMeyer/varu64
https://github.com/multiformats/unsigned-varint

possibly relevant issues:
https://github.com/multiformats/unsigne ... /issues/12
https://github.com/AljoschaMeyer/varu64/issues/1

Similarly, UTF-8 code points (which are also variable width encodings) would have to be treated equally like a dynamically sized array, and have their parent elements/structs be adapted in size/have their child fields offsets adapted accordingly.

dukk
Posts: 2
Joined: 24 Feb 2021 14:39

Re: Suggestion for HxD:User templates for data inspector

Post by dukk »

The problem with EBML (WebM) structure is not only in VINTs...

Structure definition have to deal with "main" element (name it header) and its corresponding lower level sub-elements. Matryoshka (doll). Think of it as container-in-container.
In EBML-terms you serialize size of seek entries KaxInfo and KaxTracks (upper elements!) only after you serialize lower elements (codec data).

Code: Select all

struct a {
  vint size_of_a;
  
  struct b {
    vint size_of_b;
  };
  
  struct c {
    vint size_of_c;
  };
}
size_of_a written only after structures b and c were written;

Maël
Site Admin
Posts: 1297
Joined: 12 Mar 2005 14:15

Re: Suggestion for HxD:User templates for data inspector

Post by Maël »

As far as I can tell, this is similar to various file formats, that specify an overall file size, which can only be determined after the rest has been written. Or a CRC that can only be computed when the main data is written.

But the structure editor/viewer would not really have a problem with this, as the file would already be written properly/have valid data in the size fields.

You could add declarative functions that dynamically recompute the value of the size fields (like it is possible for pointers to map them using tables in PE files: RVA to file offset), or any integer field, really.
But that would be an extension.

In the end most formats have some kind of nested structure, or headers that can only be written once the payload is fully clear/given.

Am I missing anything?

P.S.: I added code-tags around your code and reformatted it a bit. Is the nesting of the structs correct?

Post Reply