Page 5 of 5

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 28 Feb 2021 20:47
by Maël
It would also make sense to specify how many leading zeros a hexadecimal number should have for Intel assembly. I have yet to figure out what the standards are, personally I find leading zeros distracting, even if they indicate the operand size.

Need to review NASM, MASM and Borland assembler (TASM) conventions. Edit: NASM/MASM use the shortest representation, NASM adds an operand size specifier to avoid ambiguity. TASM adds leading zeros when necessary to indicate operand size. See also post below. I chose NASM's style due to readability.

What about Motorola 6800 (and related) assembly?

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 01 Mar 2021 19:51
by Maël
NASM does print hexadecimal letters in lower case and does always chose the minimal length (which is 1 hex digit) for a hexadecimal number, no leading zeros, also not to ensure number length are multiples of 2 (i.e., hex pair sequences). As it chooses the C style prefix 0x, there is also no need for a leading zero if the hex number starts with a letter.

MASM acts almost the same at least under WinDbg. Hexadecimal numbers have the Intel (leading zero) syntax, with letters in uppercase, but it also does not create hex pairs / a number of length which is a multiple of 2. Minimal length is two hex digits, however values <= 9 are always shown in decimal, only immediate values >= 10 are shown in hexadecimal (with at least 2 hex digits).

Delphi's debugger / TASM seem to follow somewhat arbitrary rules. Hexadecimal numbers that are some kind of immediate values (i.e., encoded in the assembler instruction itself) will be represented in the length of that immediate value, not the shortest possible.
But other instructions such as or rcx,$01 where the immediate value $01 is really a 64 bit integer, will still be shown in its short form, probably because the encoded command only uses one byte to encode the immediate value: 4883C901

Further tests show:
48 81 c9 01 00 00 00 is or rcx,01h in NASM and rcx,$0000000000000001 in TASM/Delphi debugger.
48 83 c9 01 is or rcx,byte +01h in NASM and or rcx,$01 in TASM/Delphi debugger.
So TASM/Delphi represents the operand length using the number of digits, but NASM uses operand length prefixes, if its not clear from the register operand (or there is no register operand).

Pointers are always as long as the pointer size of the CPU (8 bytes/16 chars on 64 bit systems, 4 bytes / 8 chars on 32 bit systems).

Summary:
NASM's method seems the most clear and short, so fixed the current implementation to follow the NASM rules (which is close to MASM rules). The only addition is that I chose to have at least two hex digits, not one like MASM.

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 06 Mar 2021 11:03
by Maël
I was notified you viewed the topic. Did you see the questions? Maybe they got lost within the amount of text I wrote.

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 11 Mar 2021 01:26
by GregC
I was notified you viewed the topic. Did you see the questions?
Hi Maël. Apologies, I’ve just had some personal distractions lately.
So roughly you could say processors from the MOS/WDC 65xx and Motorola 6800 family.
As a description for the plug-in itself, this is not really a description that is functionally all encompassing.
Yes, the 6502 / 65816 / 6809 could all trace their design ancestry back to the original Motorola 6800 design. However, the 6500, 6502, 6809 are all unique CPU architectures.

To put it another way, the disassembly plug-in is not limited to: “MC6800, MC6809, 6502 and related CPUs.” Therefore, this description of the plug-in does not encompass it’s generalised implementation that allows support for numerous CPU’s, simply with the addition of appropriate CPU definition text files (which can be contributed by anyone, without requiring any plug-in code changes).

As noted in my earlier post, I believe the disassembly plug-in is more accurately described as supporting: “Retro 8 or 16-bit CPU architectures, that utilise no more than 2 Operands in any instruction.”

Although, I’d also note that in some cases more than 2 Operands is actually supported. For example, Operands that reference a CPU register are defined as multiple individual instruction definitions (as with the 6809’s Indexed Addressing mode instruction definitions).
So the current 2 Operand restriction is in reality in reference to the Operand types that specify data, address, or offsets etc.
Regarding hexadecimal number formatting and naming of the styles
Yes, there are multiple hex formatting. The 4 you listed would appear to cover those that I’m familiar with.
Likewise you will see Assembly (and hex) presented in uppercase and lowercase.

I’m not aware of any case sensitivity in this area (unless a specific application's author has chosen to recognise only upper or lower case).

I think you will find that upper / lower case representation will typically align with the age of the CPU / code being published (or is simply the personal preference of the author).

Older systems, around the time of 6800 etc, were in some cases only interfaced with terminals that supported uppercase only. Or, certainly in the earlier days of bitmapped fonts and non-descender 5x7 (or 7x9) pixel based fonts, lowercase was not very legible, so uppercase Assembly code was the obvious choice!

So perhaps, as a subjective observation, you might find old-school enthusiasts prefer uppercase Assembly, and those that weren't around in the early days might prefer lowercase?

For the disassembly plug-in’s current implementation, the upper / lower case, and the hex representation, is simply defined by the strings in the CPU definition file.
ie. If lowercase is someone’s preference, they could simply lowercase convert the CPU definition file! Also, the hex representation style (of the CPU being defined) is represented in the definition string.
how many leading zeros
Depending on the CPU, the number of leading zeros represented in the disassembly should be aligned with the instruction variant.
For example, an instruction may have alternate Opcodes for different Operand lengths.
eg. With the CPU’s I’ve produced definition files for so far, there can be different Opcode’s (for the same instruction), where the Operand may be 5-bits, 8-bits, 16-bits, or even 24-bits.
In each case it is appropriate to Disassemble the Operand to the specific number of bytes that aligns with the specific Opcode's Operand length (in this example: 1, 2, or 3 bytes).

Noting also, that an Assembler would typically be coded to be smart enough to assemble the appropriate instruction Opcode using the smallest required Operand size Opcode.
eg. If I assembly coded a Branch instruction with a specified offset of $0001 (in my source). When assembled, if a 5-bit offset Opcode was available, then that Opcode should be assembled (even though a 2 byte $0001 offset was specified), as using the 16-bit offset Operand instruction variant would be wasteful when a shorter and faster 5-bit offset Opcode was available for the instruction!

I’m not sure if the above observations cover what you intended to raise, but hopefully it is of some assistance? :?:

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 11 Mar 2021 09:26
by Maël
Thanks, I think that cleared it all up.

(Regarding the naming of the CPUs apparently people have different schemes Wikipedia etc. But calling it 8-bit era CPUs seems too generic. Intel also had 8 bit computers, and the Z-80 seems to be derived from it. But that's not really that important, I guess.)

So it's really like for Intel x86 assembly, where you also have the same mnemonic yet operand size can vary. NASM implies immediate value operand size, using the other operand which is a memory or register operand (of defined size). When that is not enough, it will use a prefix string like "byte", "word", "dword", etc.

So in conclusion, I will add formatting options such that you can control leading zeros, casing, and prefix/postfix style of hexadecimal numbers, and casing of instructions/operands in assembly. Those options will be passed to the plugin, like the integer options are passed now, so that the plugin can react to this.

I'll also provide those functions over the plugin interface, so it's not necessary to implement them again in each plugin.

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 16 Nov 2021 01:50
by exodustx0
Hello! I hope you don't mind me bumping this thread. Greg pointed me here after my contribution to his plugin: an ISA definition for the Sony SPC700, which is primarily known as being the dedicated audio CPU for the Nintendo SNES (I swear I've seen talk of it having been used in another product, but I can't find anything on it now).

I've been working with the SPC700 for a about two years now, and it was my introduction to Assembly. I can't say I do that much with processors on that level, but it does totally fascinate me, right down to the microcode level. I saw some talk about Ben Eater videos somewhere in this thread; I've gotten quite far into following along with his brearboard computer series, really love what he does.

At some point I'd likely want to write more ISA definitions. Currently looking at the Sharp SM83, the CPU of the Nintendo Game Boy, or the GSU (aka Super FX) co-processor for SNES games. It's not a priority, but at some point that'd be great.

Writing this ISA definition did make me wonder one thing: how hard would it be to add the go to functionality to branched instructions? From the perspective of the ISA definitions that wouldn't be so hard (would need a boolean column in the CSV that signifies a branch instruction), but it doesn't look like there's any exposed function in the plugin framework that allows for this. I don't suppose it's too much effort to add such a function?

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 16 Nov 2021 05:51
by GregC
exodustx0 wrote: 16 Nov 2021 01:50 Writing this ISA definition did make me wonder one thing: how hard would it be to add the go to functionality to branched instructions? From the perspective of the ISA definitions that wouldn't be so hard (would need a boolean column in the CSV that signifies a branch instruction), but it doesn't look like there's any exposed function in the plugin framework that allows for this. I don't suppose it's too much effort to add such a function?
Hi exodustx0. Thank you for your work on the ISA definition for the Sony SPC700. A great addition to the Diassembly plug-in.

In terms of adding go to functionality for the Disassembly, if I’m understanding correctly, you mean for simple absolute jump addresses and relative branch offsets (calculated to an absolute address), to be able to be flagged as such and the address passed back to HxD to trigger a go to link?

I can see that absolute address operands would be problematic, as HxD wouldn’t know the target address of the code. HxD would presumably only know the relative address into the .bin file.

I can see that relative branch offsets could be calculated, assuming these are calculated consistently across different ISA’s.
ie. As being relative to the address of the next instruction opcode, where the program counter would logically be pointed after loading the operand (Apologies, I have not researched this before, so I’m starting from that logical assumption).

As you note, this would need a new CSV column to indicate if the Operand specifies a branch relative offset. In fact, as we allow for two Operands I believe we would logically need this flag per Operand (ie. two new CSV columns). Then, we have the dilemma of how to deal with both Operands being flagged as specifying a branch relative offset. Off the top of my head, I’m not even sure whether there is an ISA that supports a multi-branch instruction, however we’d need to consider what to do with the logical possibility of both Operands being flagged as branch offsets?

However, first up, as you’ve also noted, I’m not sure there is currently a facility in the plug-in framework for triggering a go to link?
Also, whether the current file offset address is available to the plug-in, to allow the calculation?
Indeed, whether the implementation would be best to just return the Operand value (ie. the offset itself), to allow HxD to make the actual absolute go to link calculation? (This probably makes more sense, given HxD would need to handle the potential out of bounds issue anyway).
But, I believe these are questions for Maël to advise on.

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 16 Nov 2021 12:56
by exodustx0
GregC wrote: 16 Nov 2021 05:51 Hi exodustx0. Thank you for your work on the ISA definition for the Sony SPC700. A great addition to the Diassembly plug-in.
My pleasure! Nice work on the plugin, it's very easy to work with due to its sane, sensible design.
In terms of adding go to functionality for the Disassembly, if I’m understanding correctly, you mean for simple absolute jump addresses and relative branch offsets (calculated to an absolute address), to be able to be flagged as such and the address passed back to HxD to trigger a go to link?
Yes, that's my general idea.
I can see that absolute address operands would be problematic, as HxD wouldn’t know the target address of the code. HxD would presumably only know the relative address into the .bin file.
Only thing the plugin — or HxD itself — can do, is assume that absolute addresses are true too in the context of the binary file. You could maybe add a boolean to the INI configuration which states that there's some kind of memory mapping going on, or something else that makes it impossible for HxD to reliably jump to the right absolute address, but that's about it. HxD's internal go to for unsigned integers does this (no assumptions made, for there are none that can be made), so could we for absolute addresses.
I can see that relative branch offsets could be calculated, assuming these are calculated consistently across different ISA’s.
ie. As being relative to the address of the next instruction opcode, where the program counter would logically be pointed after loading the operand (Apologies, I have not researched this before, so I’m starting from that logical assumption).
I personally don't know about ISA's (not that I know many to begin with, for now) that don't branch relative to the address after the last operand. But, to cover all cases, I think that you could add an INI config field which takes one of three values: after, operand or instruction. after would be as I currently know it: branch relative to the address after the last operand. operand would branch relative to the address of the operand which is branched with. instruction would branch relative to the address of the branch instruction. If you know straight off the bat that one of these would never ever occur (I'm mostly running off of naïveté here), feel free to disregard hahaha. Either way, this sounds to me like an option that would cover all possibilities, and even then, it sounds to me like something that can allow for easily adding other values should the need arise.
As you note, this would need a new CSV column to indicate if the Operand specifies a branch relative offset. In fact, as we allow for two Operands I believe we would logically need this flag per Operand (ie. two new CSV columns). Then, we have the dilemma of how to deal with both Operands being flagged as specifying a branch relative offset. Off the top of my head, I’m not even sure whether there is an ISA that supports a multi-branch instruction, however we’d need to consider what to do with the logical possibility of both Operands being flagged as branch offsets?
Yeah, didn't think about that; naturally you'd need one of these flags per operand. Though, maybe that's not even necessary; maybe you can put this under the _OperandHexDec flag, change it to _OperandMode and allow for a third value, R, signifying a relative address? Just a thought, not sure if this is worth exploring.

I hadn't considered the case of more than one relative branch operand (or a combination of relative branch and absolute jump operands, for that matter). To me the solution is simple, however: bail. Unless Maël would be willing to modify HxD to allow for two separate go to links, there's nothing we can do in this case. Maybe default to the first operand in case of multiple, but then you'd at the very least need to very clearly document this.

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 16 Nov 2021 18:31
by Maël
Maël wrote: 11 Mar 2021 09:26 So in conclusion, I will add formatting options such that you can control leading zeros, casing, and prefix/postfix style of hexadecimal numbers, and casing of instructions/operands in assembly. Those options will be passed to the plugin, like the integer options are passed now, so that the plugin can react to this.

I'll also provide those functions over the plugin interface, so it's not necessary to implement them again in each plugin.
To recap, also for me and as future reference.
HxD currently has all these formatting options in the development version.
The supported formatting styles are:
  • Pascal/Motorola: $FF (prefix style)
  • Intel: FFh (postfix style)
  • Intel: 0FFh (leading zero, otherwise postfix style)
  • C: 0xFF (prefix style)
Another option controls the casing of the letters (except for the x in 0x which always remains lowercase).

The plugin interface exposes them as well. See the TFormattingOptions parameter in BytesToStr.

There is no option to set the casing of the assembly mnemonics in HxD's option dialog yet, but the plugin interface has it (see InstructionCasing: TLetterCase of the TFormattingOptions record/struct).

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 16 Nov 2021 20:02
by Maël
exodustx0 wrote: 16 Nov 2021 01:50 Writing this ISA definition did make me wonder one thing: how hard would it be to add the go to functionality to branched instructions? From the perspective of the ISA definitions that wouldn't be so hard (would need a boolean column in the CSV that signifies a branch instruction), but it doesn't look like there's any exposed function in the plugin framework that allows for this. I don't suppose it's too much effort to add such a function?
As GregC has mentioned, the INI file based assembly plugin may need a small adaptation for this.

HxD currently only exposes the "goto link" for internal integer converters (Edit: and after a change, which is not yet on GitHub, now also for integer converter plugins).

HxD's plugin framework could be extended to also support jump targets in general. Since the "address space" are file offsets/positions it might not always be what is wanted though.

Applying an address mapping, such as what happens when PE files (Windows executables) are loaded in memory by the Windows loaded (and similar for ROM files when loaded into a console's processor memory space) is not the role of a datatype converter.

General address mapping of file offsets to memory regions could be another plugin type, I could add. It would behave similarly to what the RAM /virtual memory editor does now. There offsets are virtual memory addresses of a process (running executable).

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 16 Nov 2021 20:22
by Maël
exodustx0 wrote: 16 Nov 2021 12:56 I hadn't considered the case of more than one relative branch operand (or a combination of relative branch and absolute jump operands, for that matter). To me the solution is simple, however: bail. Unless Maël would be willing to modify HxD to allow for two separate go to links, there's nothing we can do in this case. Maybe default to the first operand in case of multiple, but then you'd at the very least need to very clearly document this.
I think the simplest option here is to design for possible future extensibility, without actually implementing it. Otherwise you quickly get drawn into very generic designs that end up with features that will never be used.
If the interface in Delphi/whatever language (or syntax of the INI file) is forward compatible, future extensions will be easy, that's what matters most.

I would do it similarly for the link feature of the datainspector. Currently only one link per datatype converter / datainspector row.

Whenever the structure editor/viewer is out, there is support for several links (as most structs have several fields and each could be a pointer). So you could turn it into a kind of structured type, that exposes several internal parts (some as links).

Until then, I would just make sure the overall syntax of the INI file will not make "old" disassembly definitions invalid, or if it does, do so that it creates clear syntax errors, instead of unintentional silent behavioral changes.
One simple option for this is to allow each operand to be defined as link, but reject the case when both are defined as link, with a simple error message "not supported, might be in future, if needed contact me" or similar.

Re: Disassembly feature as a 3rd party extendable plug-in?

Posted: 24 Nov 2021 19:02
by Maël
Without a response, nothing will be done regarding this feature.