Disassembly feature as a 3rd party extendable plug-in?

Wishlists for new functionality and features.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

The Intel dissasembler is a nice feature!

I'm currently working with some Motorola MC6809 code, and can see how awesome this would be to have 6809 dissassembly support, especially when doing file compares and analysing the differences.

I noted someone else had requested MC68000 disassembler support. And of course there would be many others interested in other ISA's. For me right now, 6502 would be great too. :-)

Given the size of the task in trying to integrate and maintain wider support, my feature request would be for you to instead consider how you could implement the DIsassembly feature as an externally defined plug-in. eg. Using descriptor files of some sort.

Perhaps a CPU descriptor file could contain a table of opcode bytes (or binary sequences) with specific wildcard characters for which the actual binary values are hex encoded and substituted into an associated Assembly code string that represents the matching byte / binary sequence.
The writer of each descriptor file could ensure that the codes are in a diminishing binary match sequence, so that HxD's interpretation would be simply first binary/wildcard sequence match.

Just my 2c thinking out loud. But I'm sure you get where I'm going with that?

An "externally defined plug-in" Disassembly approach would mean that contributors could independantly create and maintain (if required) a growing library of supported CPU opcode disassemblers. I suspect this would be a very nice, powerful, and differentiating feature for HxD.
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Hi,

The other feature request stopped progressing, because of license incompatibilities of the suggested source code.

But there is a plugin framework since a while!
https://github.com/maelh/hxd-plugin-framework

You could easily add another disassembler this way. Let me know you think.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Maël wrote: 09 Oct 2020 12:34 But there is a plugin framework since a while!
https://github.com/maelh/hxd-plugin-framework

You could easily add another disassembler this way. Let me know you think.
Thanks for pointing this out. Being new around here, I hadn't discovered the plugin framework as yet.
Sounds like fun. So I'll add a task to my projects list, to remind me to look into this further. :-)
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Hi Maël. I'm finally taking a look at your hxd-plugin-framework, and examples. :)

I've got a couple of initial questions, to help with my understanding (apologies if I'm overlooking the obvious).

1. What action invokes the StrToBytes function, and what are the returned bytes used for / where are they rendered?
ie. I understand BytesToStr being called to obtain the Data inspector table translated string value, but I can't see where the reverse conversion is needed? (StrToBytes)

2. I'm struggling to understand the logic / interaction between SupportedByteOrders, and when ChangeByteOrder function is invoked / not invoked.
ie. Given that the host (Windows) environment is LittleEndian, I'd assumed that by specifying boBigEndian only (for SupportedByteOrders) it would then result in an initial call to ChangeByteOrder, irrespective of the GUI selected "Byte order"?
However that doesn't seem to happen, and manually switching between the GUI "Byte order" seems to still swap the BytesToStr passed byte order (even if only boBigEndian support is specified).
I think I must be misunderstanding the intent of SupportedByteOrders / ChangeByteOrder? :roll:

ps. I posted these more general hxd-plugin-framework questions here, as the official news topic on hxd-plugin-framework appears to be locked (can't reply).
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Good to hear you took a look.
GregC wrote: 27 Oct 2020 23:37 1. What action invokes the StrToBytes function, and what are the returned bytes used for / where are they rendered?
ie. I understand BytesToStr being called to obtain the Data inspector table translated string value, but I can't see where the reverse conversion is needed? (StrToBytes)
The datainspector is not just a viewer, but also an editor, that's why you need the reverse operation.

Each line in the datainspector grid has an edit box on the right side, where you can not only view text, but edit it as well. Pressing enter or losing the focus of this edit box commits the changes you made and translates the string back to the corresponding byte representation of the line's datatype.

So, the bytes returned by StrToBytes will be used to modify the file/stream and replace the bytes that were initially converted to a string by BytesToStr for display in the datainspector.
GregC wrote: 27 Oct 2020 23:37 2. I'm struggling to understand the logic / interaction between SupportedByteOrders, and when ChangeByteOrder function is invoked / not invoked.
ie. Given that the host (Windows) environment is LittleEndian, I'd assumed that by specifying boBigEndian only (for SupportedByteOrders) it would then result in an initial call to ChangeByteOrder, irrespective of the GUI selected "Byte order"?
However that doesn't seem to happen, and manually switching between the GUI "Byte order" seems to still swap the BytesToStr passed byte order (even if only boBigEndian support is specified).
I think I must be misunderstanding the intent of SupportedByteOrders / ChangeByteOrder? :roll:
If you look at the Delphi example plugin for Int32 values, you will see this code:

Code: Select all

procedure TInt32Converter.ChangeByteOrder(Bytes: PByte; ByteCount: Integer;
  TargetByteOrder: TByteOrder);
begin
  inherited;

  if (TargetByteOrder = boBigEndian) and (ByteCount >= sizeof(Int32)) then
    PUInt32(Bytes)^ := ByteSwap(PUInt32(Bytes)^);
end;
The ChangeByteOrder function is best understood when thinking of it as being called before the BytesToStr function is called.

A very simplified version of BytesToStr does essentially this:

Code: Select all

function TInt32Converter.BytesToStr(Bytes: PByte; ByteCount: Integer;
  IntegerDisplayOption: TIntegerDisplayOption; out ConvertedByteCount: Integer;
  var ConvertedStr: string): TBytesToStrError;
begin
  ConvertedStr := IntToStr(PInt32(Bytes)^);
  // lots of other cases omitted
end;
Or in other words, it takes 4 bytes, casts them to a 32 bit integer and then calls an integer to string conversion function provided by the programming language/runtime library (in the case of Delphi it's IntToStr). The casting of 4 bytes to an integer on a little endian machine means that it will implicitly reorder the byte sequence (or actually, handle it as if the bytes were in reverse order).

In other words, if you expect the data to be in little endian order already, you don't need to do anything on a little endian machine. But if it is in big endian order, you need to reverse it, so the little endian machine will handle it correctly.

ChangeByteOrder is always called before any operation (also some internal to HxD only, which are not exposed to plugins), such that the data is in little endian byte order before doing any further processing. So all you have to ensure is that it does that, independent of anything else.
But it also means that you can rely on having passed data in the right order for BytesToStr, so you can do simple integer casts, for example.
The same hold true for StrToBytes, it can just use the native byte ordering provided by the programming language/CPU and rely on ChangeByteOrder being called when necessary, before writing the data back to the file/stream.

This works because reversing the byte order two times (once before BytesToStr and then again after StrToBytes), will result in the original byte order.

If the byte order is fixed, like for UTF-8, you can simply return the data unchanged, since it never needs to be reordered. In that case you will return an empty set for SupportedByteOrders, since data is neither little nor big endian, but a fixed byte order that is not up to interpretation.
SupportedByteOrders just says what byte orders the datatype can be in (not what you chose to support), and ChangeByteOrder takes care of reordering the data as necessary.

Why would you not just call a generic function to reverse the byte order when necessary? Because you may have a sort of structure or composed datatypes, some of which might need reordering, while others don't. Such as ASCII-strings mixed with integers.

Another example: if your data type is really a data structure like "DOS time & date", you would have code like this:

Code: Select all

procedure TDOSTimeDateConverter.ChangeByteOrder(var Bytes: TBytes;
  TargetByteOrder: TByteOrder);
begin
  inherited;
  if (TargetByteOrder = boBigEndian) and (Length(Bytes) >= MaxTypeSize) then
  begin
    PDosTimeDate(@Bytes[0]).Time := ReverseBytes(PDosTimeDate(@Bytes[0]).Time);
    PDosTimeDate(@Bytes[0]).Date := ReverseBytes(PDosTimeDate(@Bytes[0]).Date);
  end;
end;
Here time and date are reversed individually, since they are two 16 bits integers, and not only one large 32 bit integer.

Finally, GUIDs are a good example of data in mixed endianness. Some parts are integers (which need to be reversed to match the selected endianness), others are merely byte sequences (that will never be reordered):
https://blogs.msdn.microsoft.com/opensp ... -ssinguid/

Code: Select all

typedef struct {
  unsigned long Data1;
  unsigned short Data2;
  unsigned short Data3;
  byte Data4[8];
} GUID;
Note that each data element is treated individually.

Data1 being 4 bytes long goes from Byte1Byte2Byte3Byte4 to
Byte4Byte3Byte2Byte1.

Data2 being 2 bytes long goes from Byte1Byte2 to Byte2Byte1.

Data3 being 2 bytes, is treated in the same way as Data2.

Data4 stays unaltered as it is represented as an array of 8
individual bytes.


In summary, the datatypes of programming languages already interpret a byte sequence as little endian (on x86 machines), you don't need to reverse their byte order, it will already by handled in reverse order automatically by the CPU. That's why you need to explicitly reverse byte sequences, when they are in big endian order, to compensate for this effect.
That may be confusing, since data stored in big endian order already seems to be in the "right" order: the one you read in naturally as human (at least for numbers, always from left to right). But the right order is defined by the target CPU, and only big endian machines agree with our reading order.

Finally, from another angle: if you were writing code for a big endian machine, your code would look like this instead:

Code: Select all

procedure TInt32Converter.ChangeByteOrder(Bytes: PByte; ByteCount: Integer;
  TargetByteOrder: TByteOrder);
begin
  inherited;

  if (TargetByteOrder = boLittleEndian) and (ByteCount >= sizeof(Int32)) then
    PUInt32(Bytes)^ := ByteSwap(PUInt32(Bytes)^);
end;
Note: I called the byte reversing order function ByteSwap in this example. A commonly used name but somewhat confusing. Better would be ReverseBytes, since what really happens is reordering a byte sequence ABCD to its reverse order: DCBA. (Where A, B, C, D are names of a byte, each.)
GregC wrote: 27 Oct 2020 23:37 ps. I posted these more general hxd-plugin-framework questions here, as the official news topic on hxd-plugin-framework appears to be locked (can't reply).
Yeah, the news section is not open to comment. If you feel that something is a more general question, you could start another thread, though, if you wish.

I hope this helps.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Hi Maël. Thanks for the detailed response. :)

After digesting this (and some more experimentation), I've arrived at the following...

On the SupportedByteOrders / ChangeByteOrder topic:
As my plan is to write a generic Disassembler plug-in I think I can just take the approach of managing the byte reversal myself, based on the target disassembly endianess. Since the platform that HxD is running on can be assumed to be Little Endian I can just force a byte reversal for a Big Endian CPU target disassembly.
As width will be dtwVariable, I've just written a generic ByteSwap to handle whatever length Byte Array is passed to it.

On the BytesToStr topic:
I now understand (thank you). As a HxD newbie, I hadn't acually discovered I could over-key the data inspector grid. I'd just assumed it was read-only. Oops!
However, this leads to a new question: As I intend to write a simple instruction Disassembly plug-in, I infact want this to be read-only!
I note with your embedded x86 Disassembly, any attempt to edit provides a message dialog box: "Assembly editing is not supported."
The best I seem to be able to achieve is to simply leave the ConvertedBytes var parm untouched, and simply return stbeNone. This seems to have the effect of just reverting the edit box string value, but of course without any user friendly dialog box to explain why.
Is the the only / best "read-only" approach?

One other *new* question I've just come across:
As DataInspectorPluginServer.RegisterDataTypeConverter requires passing a Class reference, it appears the only way that I can dynamically create multiple DataInspector types (ie. at runtime) would be to pre-define multiple base class reference types so that I can call RegisterDataTypeConverter multiple times (with each uniquely configured individual class reference). Perhaps there's a better way, although as only a handful of Disassemblers should be needed (perhaps I allow a max of 8 different Disassembler definitions?), then this somewhat ugly approach would be managable. :|
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

GregC wrote: 28 Oct 2020 03:28 I note with your embedded x86 Disassembly, any attempt to edit provides a message dialog box: "Assembly editing is not supported." The best I seem to be able to achieve is to simply leave the ConvertedBytes var parm untouched, and simply return stbeNone.
Yeah, that technically works, but is not really distinguishable from having entered a string that returns the same bytes, and differed only due to whitespace, or some other weird cases. It's better if I add an explicit option.

I'll extend it so that the plugin constructor can specify it is a readonly plugin, and then show an error message when you try to write stuff.
Then you could still return stbeNone, ConvertedByteCount = 0, and an empty string, but otherwise keep the StrToBytes method empty.

Btw. what language are you using?
As DataInspectorPluginServer.RegisterDataTypeConverter requires passing a Class reference, it appears the only way that I can dynamically create multiple DataInspector types (ie. at runtime) would be to pre-define multiple base class reference types so that I can call RegisterDataTypeConverter multiple times (with each uniquely configured individual class reference). Perhaps there's a better way, although as only a handful of Disassemblers should be needed (perhaps I allow a max of 8 different Disassembler definitions?), then this somewhat ugly approach would be managable. :|
I'll have to go through the code in more detail to see if I can modify it without restructuring major parts.
I wonder though, if it isn't easier to write specific disassemblers, than trying to write a generic one, since you will essentially write a generic binary parser, have to define a syntax for it and then decode it. Probably easier to just write the parsing code directly in a programming language, or use one of the existing libraries and wrap them into the plugin.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Btw. what language are you using?
I just installed latest Delphi 10.3 CE. As some background, I've coded in Delphi since V1.0 (and Turbo Pascal before that). However, most of the code I was maintaining / developing in recent years was Delphi 7 and some more recent stuff in Delphi XE. I stopped in 2016, and have only really been playing in the Swift (and C) spaces since then. So, installing Delpi 10.3 CE gave me a nice IDE update (compared to my Delphi memories) . :)
I wonder though, if it isn't easier to write specific disassemblers, than trying to write a generic one, since you will essentially write a generic binary parser, have to define a syntax for it and then decoding it. Probably easier to just write the parsing code directly in a programming language, or use one of the existing libraries and wrap them into the plugin.
I took a look at a few existing 6809 disassemblers (my initial ISA driver). But I quickly came to the conclusion that there wasn't really anything substantial that I could (relatively easily) migrate for this purpose.
Plus, with my retro CPU interests, I could also see that I'd potentially want to eventually add 6800 and 6502 code disassembly (maybe even 68000?), and looking at existing disassemblers, I really didn't like the "hard-coded" processor specific parsing approach.

It seemed to me that if I took the time to write a definition file parser that would (potentially) allow additional processors to be added by simply adding another definition file (without the need to write more hard-wired processor specific parsing code), that would be a nice solution.
Potentially also allowing others to contribute their own definition files. I think this approach should be be achievable for the simple single instruction disassembly that's required for this application.

Of course, if I want to "keep it simple", I could just limit to parsing a single definition file and users could just substitute different CPU files as needed (or I could just hard-wire a few class reference types to use). Either way, a bit of reminiscing fun playing in the Delphi space again. :D
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Ok, I pushed a change to GitHub, so you can create "readonly" plugins. You will also need an updated version of HxD (which will be in German for now):
https://mh-nexus.de/downloads/HxDPluginFramework.zip

For the multiple datatype support per registered converter, I could add another parameter to CreateConverter. On the other hand this will become confusing conceptually.

In principle the plugin interface does not care what you use to register your datatype, as long as it's a unique identifier. Ideally a pointer, so it's easily unique amongst all plugins, since they are all loaded into the same address space.

DataInspectorPluginServer is really just a convenience wrapper, and it chooses to use class references as unique identifiers.

Maybe I could introduce a TExternalMultiDataTypeConverter that handles this transparently.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Thanks Maël. I'll report back once I've tried out your SupportsStrToBytes "readonly" addition.

In the meantime, I did a bit of marathon coding last night and I've now got my generic disassembly plug-in code about 95% complete. 8)

I've implemented support for a maximum of 8 different CPU definition files via multiple hard-coded class references, which in the end wasn't too ugly (so will certainly do for now).
An .ini file is used to specify definition file parameters, including an option to enable logging of definition file import / parsing. This allows any syntactical errors in the definition files to be easily identified.

I just need to finish the last required class method, and then I'll need to work on completing one or more CPU definition files.
In the meantime I have the beginnings of 6809 and 6800 definition files, with just a handful of opcodes each for my development testing.

Here's a screenshot showing the 6809 'NOP' instruction (Opcode 12) decoded.
nb. The numbers in [ ] (in the Disassembly inspector name) is just a debug feature to show me how many Opcode definitions were successfuly imported.
HxD_DisassemblyPlugin.png
HxD_DisassemblyPlugin.png (122.57 KiB) Viewed 240853 times
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Looks good.

Do you plan to release the plugin publicly, later?
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Do you plan to release the plugin publicly, later?
Yes, of course. As soon as I get it to a beta-release stage and I've completed (or at least reasonably progressed) at least one CPU definition file, then I'll make it available.

I'm just dealing with the last coding piece, which is implementing support for argument extraction via operand bit mask, and aligning the extracted argument as 8, 16, 32, or 64 bit values, and as either signed or unsigned. And, of course, also allowing for big endian or little endian operands.
I suspect this last piece of work is where any potential ISA support limitations or implementation issues will likely be found, as understanding all requirements for a fully generic instruction Disassembler would require a comprehensive understanding of more complex ISA's (an understanding I don't profess to have).
However, my initial driver for coding this plugin was to provide support for various retro processors (like 6809 / 6800 / 6502 etc.), which I'm pretty confident my implementation will cater for. :)

This last piece of coding definately fits the classic "ninety-ninety rule" for programming. :lol:

I've been temporarily side-tracked with other work, but I'm hopeful of getting this completed in an evening (or two?) next week.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Hi Maël. I've now completed initial developer testing on an alpha release version of the generic Disassembly plugin. :)

This object release includes a completed Motorola MC6800 definition file (as an initially supported CPU). I still need to complete the MC6809 definition file (the started definition is included in the release). I also plan to add a 6502 definition file (which could later be expanded for 65C02 support also). Perhaps then, 65816 support is also a future option!

Note this .dll currently supports your release version of HxD. I've yet to test your unreleased "readonly" feature (let me know if any urgency on this).

If you'd like to take a look / have a play with the generic Disassembly plugin, you can find it here:
https://github.com/DigicoolThings/HxD_D ... ctorPlugin

Let me know your comments / feedback.

Once I complete the 6809 definition file, I'll be able to get back to progressing my project that included analysing some 6809 Monitor ROM differences (which is what lead to me being side-tracked into creating this Disaasmbly plugin. :lol: ).
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Your GitHub page looks very enthusiastic, I like that :)

Thanks for your compliments there as well, it was very nice to read.

Do you plan to release the source code as well, so I can give feedback, or do you mean just in general about how it works?
GregC wrote: 05 Nov 2020 07:28 Once I complete the 6809 definition file, I'll be able to get back to progressing my project that included analysing some 6809 Monitor ROM differences (which is what lead to me being side-tracked into creating this Disaasmbly plugin. :lol: ).
Isn't that how every bigger project starts? haha
HxD itself is also kind of a long-term side-tracking of wanting to know how machines work to the core, a long side-track...
Since you are interested in low level understanding of machines, I found Ben Eater's video series on YouTube, on how to build a computer from simple electronic components on a breadboard quite fascinating. Or maybe you know all that already, as your site may suggest.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Do you plan to release the source code as well, so I can give feedback, or do you mean just in general about how it works?
I do plan to release the source code, eventually, once I've got a few more CPU defintion files finalised and I'm happy to call it a stable v1.0 release (and I've reviewed my source commenting). :)
In the meantime, any functional feedback is certainly welcome.
Since you are interested in low level understanding of machines, I found Ben Eater's video series on YouTube, on how to build a computer from simple electronic components on a breadboard quite fascinating. Or maybe you know all that already, as your site may suggest.
Yes, my teenage years were in the early days of the microprocessor, when I was into digital electronics and (like many others) got started with computers by building my own MC6800 & MC6809 early microprocessor based designs, and programming them with hand-assembled machine code!

If you're also interested in this stuff, I have a "Part 1" video of my 6809 related project (that I was refering to), which might be of interest. It was the next steps, where I started looking at the content of the EPROM (which I'd programmed with the Motorola ASSIST09 Monitor 35 years ago!), which lead me to being side-tracked by the HxD embedded Disassembly feature challenge. :roll:

Once I get back to this 6809 project, I'll hopefully get around to videoing Part 2.
In the meantime you're welcome to view Part 1 here: https://youtu.be/XEUi0pk8eCo
Post Reply