Disassembly feature as a 3rd party extendable plug-in?

Wishlists for new functionality and features.
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

I found the time now to watch your video. Relaxing and calm presentation and always nice to see little things like part boxes, and how you picked a crystal. I know a tiny detail, but I like to observe those, and watch the decision making, etc.

Since about 7 years ago, I got into electronics again, with a large gap in-between, where I did tinker with some electronic kits as a child.
Taking electronic devices apart and trying to figure out how they work, I finally wanted to get more serious about it, and started to learn more.

It's definitely easier today, with all the tools available now for inspecting circuits and easier to use microcontrollers, SDRs, affordable scopes and signal generators. Still it adds up to a significant sum. Inbetween though, PCs became more and more locked down, while you could communicate with circuits through parallel ports before (which I never came to do while they still were common). With easily available USB-based microcontrollers now (and since a few years) you can go back, closer to the metal, yet hook it up to modern PCs.

The first technical manuals were I read about low level details were from Intel. Not really the best architecture to get started :p

One of my first programs that used assembler was a function plotter that would parse a mathematical expression during run-time, then translate it to Intel FPU assembly, during run-time, to compute values / plot the graph. Still haven't fully figured out assembly yet, besides writing small specialized programs or routines, inspecting generated compiler generated code, or trying to debug programs. Figuring out larger programs in assembly, or reverse engineering them was once a major interest, which I haven't come to do yet (and not sure if I am still that keen to do so).
But assembly made me think about representations, abstractions, and translations between models for a long time.
It's quite interesting in many ways, compilers are really not that different from planners in AI.

But back to topic: I'll have a look at your plugin, and reply then. Since I am not familiar with 6809, what I will see is mostly what I gathered from a quick reading of documentation I found on the net. So take my feedback with a grain of salt.
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

From what I can tell, it works without troubles in HxD :) Great!

Quick things I noticed (some of which you may be aware of already; if so, just disregard -- I'll be direct for the sake of clarity, no unfriendliness intended):
  • Numbers are sometimes shown in decimal with no prefix, sometimes in hexadecimal with a $ prefix, sometimes with a #$ prefix (e.g., for 6800: SUBA #$FF).
    I assume the last notation is for immediate values. But what is the logic behind using decimal vs. hexadecimal (e.g., BHI 33)? If it is because of signed numbers: at least in x86, you can also have signed hexadecimals. Edit: One of the PDFs below (page 8 ) has an example "LDA –$21c0,PCR" that shows signed hexadecimals are used for 6809 assembly, too.
    I thought that maybe in future, you would be able to choose in HxD if you want to have decimal or hexadecimal numbers displayed within assembly, with the checkbox that already exists under the datainspector grid.
  • Every mnemonic is in upper case. I see this is common for 6800/6900 assembly (and was so also for x86 assembly for a long time). Personally, I think lowercase is easier on the eyes, though, and it would fit better with the other datatype converters.
  • DefinitionLogPath=C:\Temp
    While that path may sometimes exist, usually the temp path is a per-user path under Windows. I think the environment variable %TEMP% points to it. Would require parsing for environment variables, and replace them with what the WinAPI GetEnvironmentVariable returns. Or you could just hardcode to use the user's temp directory always: WinAPI GetTempPath
    Probably better to do this explicitly, than to rely on Windows to redirect writes aimed at C:\Temp for non-admin users.
  • OperandArgumentMask in .csv file - I think this column could use some documentation. I figured it out by switching back and forth between the 6800 link below and your list, seeing that essentially, the mask is always set to exactly cover the operand bytes. The opcode seems to be a single byte always as well, no mixing of bits in the opcode and operand bytes, like for x86 machine code.
    In other words: what effect does the mask have, besides ignoring some of the bits of the operand bytes given by OperandBytes? Are they just ignored or used somewhere else?
  • OpcodeBytes,OperandBytes are apparently used to identify an instruction, and the OperandArgumentMask, which is why you set OperandBytes to 00 for those that have operands, but for which they do not help in identificating the instruction, just such that OperandArgumentMask can work?
    Not entirely sure here.
  • 10AE,84,00,,,"LDY ,X" I suppose some entries are not complete, since there is no ? wildcard here.

A random suggestion: As per your ini definition file, instructions are at most 3 bytes long, there would be only 2^24 possible bit sequences: 16,777,216.
That would lend itself to some automated testing against another already existing disassembler, and see where yours and theirs disagrees to find potential bugs. The issue would be to handle undefined bit patterns.

I see 6809 has a max instructions length of 5 bytes, that may be too much there to test exhaustively.

As reference I took:
For 6800:
http://www.8bit-era.cz/6800.html
(the first table in the link above says MSB \ LSB, but they really mean most significant nibble, and least significant nibble; where nibble = half byte = 4 bit)

For 6809:
https://atjs.mbnet.fi/mc6809/Information/6809.htm

http://archive.worldofdragon.org/browse ... 981%29.pdf

https://atjs.mbnet.fi/mc6809/Disassembler/disassembler // C
https://atjs.mbnet.fi/mc6809/Disassembler/dis6809.zip // Pascal
Especially the functions D_Indexed() in C and DisLine() in Pascal seem relevant, because most others simply ignore decoding the exact addressing, and just focus on the opcode.

Found two more, that may explain it better, but didn't look into them yet, but superficially:
https://atjs.mbnet.fi/mc6809/Information/6809Data.pdf
http://www-vs.informatik.uni-ulm.de/tea ... 6809-1.pdf

But it started to take up too much time, maybe you found a clearer reference, that succintly explains the encoding of indexed (and other) addressing modes for 6809?
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Some other relevant links, with good info (historically interesting as well):

https://colorcomputerarchive.com/repo/D ... arrow).pdf

Post byte encoding of instructions: https://www.maddes.net/m6809pm/sections.htm#sec2_1
Same info from the official datasheet on page 29: http://www.gbgmv.se/dl/doc/md09/MC6809_DataSheet.pdf
Same but more readable, see page 8: https://cdn.hackaday.io/files/460001968 ... clesx3.pdf

Also here http://www.roust-it.dk/coco/The%20MC6809%20CookBook.pdf page 47 and after. Page 52 and later has some examples of encoded assembly instructions.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Thanks for your detailed response! I'll try to respond to each point. Some I think you've probably already self-determined, based on your follow-up informational links.
Maël wrote: 08 Nov 2020 00:43 From what I can tell, it works without troubles in HxD :) Great!
Well, that's a good start! :D
Maël wrote: 08 Nov 2020 00:43
  • Numbers are sometimes shown in decimal with no prefix, sometimes in hexadecimal with a $ prefix, sometimes with a #$ prefix (e.g., for 6800: SUBA #$FF).
    I assume the last notation is for immediate values. But what is the logic behind using decimal vs. hexadecimal (e.g., BHI 33)? If it is because of signed numbers: at least in x86, you can also have signed hexadecimals. Edit: One of the PDFs below (page 8 ) has an example "LDA –$21c0,PCR" that shows signed hexadecimals are used for 6809 assembly, too.
Yes, as you've noted, the # prefix is used in Motorola 6800 Assembly to indicate Immediate mode addressing.
ie. the following value is the literal value that is being operated on.

Currently, the decimal vs hexadecimal rendering of the argument value is specified (per opcode definition), in the .CSV defintion file. This is the ArgumentHexDec column.

So, you can choose your own preference of H|D in your own .CSV file.

The logic I used (for the 6800 defnition file) was to use decimal notation for the relative addressing instructions (Branch instructions), where you'd typically be counting in your head to get to the target address.

So, for

Code: Select all

BHI 33
you would "Branch if HIgher" to the address of the next instruction plus 33 (decimal).

For all other instructions I just specified the default Hexadecimal rendering. You can set your .CSV file to whichever you prefer.

Rendering signed Hexadecimal values is a good point, which I hadn't implemented (yet). Currently, for Hexadecimal rendering I just render the raw Hex byte value (I guess old-school me just automatically sees -1 when I see $FF).

The ArgumentSignedUnsigned value is (currently) utilised in determining if an argument value needs to be sign-extended into a normalised 1, 2, 4, or 8 byte value.
nb. This sign-extend requirement is not actually needed for 6800, as it only utilises byte-aligned 8-bit or 16-bit arguments. However, the 6809 is an example where there are also 5-bit signed arguments, which must therefore be sign-extended into a full byte for ease of rendering.
Maël wrote: 08 Nov 2020 00:43 I thought that maybe in future, you would be able to choose in HxD if you want to have decimal or hexadecimal numbers displayed within assembly, with the checkbox that already exists under the datainspector grid.
Yes, that would indeed be a nicer solution. If the hexadecimal checkbox property could be made available to the Datainspector Plugin, I could then use the checkbox selection to determine Hex or Dec treatment.

I note you'd also want to tweak the checkbox label to be more appropriately descriptive (assuming single checkbox). Currently it specifically refers to: "Show Integers...".
Maël wrote: 08 Nov 2020 00:43
  • Every mnemonic is in upper case. I see this is common for 6800/6900 assembly (and was so also for x86 assembly for a long time). Personally, I think lowercase is easier on the eyes, though, and it would fit better with the other datatype converters.
Understood. This is determined literally, from the .CSV definition file DasmString. You could easily just convert the .CSV file to lowercase and you'd be good to go. :)
My choice in the supplied 6800 definition file was to use Uppercase, based on retro 6800 Assembly code traditionally being viewed / published using Uppercase.

In addition, an uppercase/lowercase preference would also ideally be desired for the rendered Hexadecimal values. I note HxD has an option to specify "Hexadecimal numbers capitalization". Therefore, to consistently support this, the Datainspector Plugin would need to also have visibility of this global format setting.
Maël wrote: 08 Nov 2020 00:43
  • DefinitionLogPath=C:\Temp
    While that path may sometimes exist, usually the temp path is a per-user path under Windows. I think the environment variable %TEMP% points to it. Would require parsing for environment variables, and replace them with what the WinAPI GetEnvironmentVariable returns. Or you could just hardcode to use the user's temp directory always: WinAPI GetTempPath
    Probably better to do this explicitly, than to rely on Windows to redirect writes aimed at C:\Temp for non-admin users.
Yes, I understand GetEnvironmentVariable etc. However, my thinking was that the user would want to point the log path to where they wanted the logs to go. This is also on the basis that logging is intended only for temporary .CSV file debugging use (as it also adds, not insignificant, loading overhead).

I just happened to use C:\Temp, so I'd left that in as an example, noting that Logging itself is disabled by default in the supplied .INI (DefinitionLogEnable=0). Therefore, you would need to edit the .INI if you wanted to turn on the logging output (at which time you'd configure where you wanted it to go).

I don't think it would be appropriate to hard-wire log creation to the system %TEMP% path. Perhaps also not really necessary to support using environment variables embedded in the INI value?
Maël wrote: 08 Nov 2020 00:43
  • OperandArgumentMask in .csv file - I think this column could use some documentation. I figured it out by switching back and forth between the 6800 link below and your list, seeing that essentially, the mask is always set to exactly cover the operand bytes. The opcode seems to be a single byte always as well, no mixing of bits in the opcode and operand bytes, like for x86 machine code.
    In other words: what effect does the mask have, besides ignoring some of the bits of the operand bytes given by OperandBytes? Are they just ignored or used somewhere else?
I agree that for the simple 6800 case, the OperandArgumentMask use is not that obvious, given that the full Operand is always the full Argument (in the simple 6800 instruction set).

I guess I'd also fallen into the trap of just assuming everyone knows what a binary bit Mask is. I'll blame that on my old-school background! :roll:

Understanding a mask also requires understanding binary logic, specifically the effect of bitwise AND.
So, wherever a Mask bit is set to 1 it preserves the AND'd Operand's bit value, and where a Mask bit is set to 0, the Operand's bit value is purged (masked), to always returned a 0.

For example, if you have the Byte $65 and you apply a Mask of $1F, you will end up with $05.
In binary: %01100101 ($65) & %00011111 ($1F) -> %00000101 ($05)
(nb. % is Motorola convention for binary number notation)

The OperandArgumentMask value is actually used both for extracting the Argument, and also identifying the rest of the Operand that makes up the balance of the full instruction (ie. as an extension to / in combination with, the Opcode).

So, with the above example, the $1F mask can also be inverted, giving %11100000 ($E0), and used as the Mask to extract the remaining Opcode portion of the Operand bytes (ie. $60) that is required to fully match the Instruction byte sequence.
In binary: %01100101 ($65) & %11100000 ($E0) -> %01100000 ($60)

Perhaps the best real-world example is one of the 6809 Indexed Addressing mode instructions, which provides for a smaller 5-bit signed offset argument.

eg. This definition:

Code: Select all

10AE,60,1F,D,S,"LDY ?,S"
This is a 3 byte instruction, with a fixed 2 byte Opcode (10AE), followed by a single Operand byte which in binary form is represented as: 0RRnnnnn.

In this case the most significant bit (0) is zero, the next 2 bits (RR) specify the source addressing Register, in this case 11 for the 'S' Stack register, then the 5 remaining bits (nnnnn) specify the signed 5-bit offset argument.

So using the above example of $65 as the Operand, we would have a HxD Byte sequence: 10AE65
The inverted OperandArgumentMask ($1F -> $E0) would allow us to match the 3 byte Instruction definition of "10AE,60" (Opcode + Operand), and the defined OperandArgumentMask ($1F) would allow us to extract the 5-bit Argument as $05.

So we would have Assembly string:

Code: Select all

LDY $05,S
Note also, as a Signed Argument, this would be sign-extended to a full single Byte (for rendering the value), which in this case gives us +5.
If however the byte stream was: 10AE7F, then $7F (%01111111) would similarly match the Masked "10AE,60" Opcode + Operand, with the extracted 5-bit Argument being $1F.
As a Signed value this would be sign-extended to a full byte as $FF, or in Decimal -1. :)

So in this case we would have Assembly string:

Code: Select all

LDY $FF,S
Or (based on your earlier signed Hexadecimal observation), this should probably be rendered as:

Code: Select all

LDY -$01,S
Hopefully all the above helped clarify:
a. The OperandArgumentMask usage.
b. That the OpcodeBytes + OperandBytes define the length of the instruction.
c. The OperandBytes value just needs to contain the required bits to Identify the full instruction.
d. The bits of the OperandBytes that represent the actual Argument are irrelevant / can just be left as 0's (as the value of the Instruction Argument can obviously vary).
Maël wrote: 08 Nov 2020 00:43
  • OpcodeBytes,OperandBytes are apparently used to identify an instruction, and the OperandArgumentMask, which is why you set OperandBytes to 00 for those that have operands, but for which they do not help in identificating the instruction, just such that OperandArgumentMask can work?
    Not entirely sure here.
Yes. In the simple 6800 full 8-bit argument, the 00 as OperandBytes simply fulfil the purpose of identifying the full instruction length.

I think the full intention is best described by my above example and observation that the specified OpcodeBytes + OperandBytes define the length of that particular instruction.

The other option would have been to add another CSV field to specifically state each individual Instruction's length, however as this can be implied by the OpcodeBytes + OperandBytes fields, this added field would either be redundant, or, would require the dangerous assumption that a larger Instruction length (than what the specified OpcodeBytes + OperandBytes indicated) would require zero padding.
Maël wrote: 08 Nov 2020 00:43
  • 10AE,84,00,,,"LDY ,X" I suppose some entries are not complete, since there is no ? wildcard here.
No, this entry is complete. 8)

This particular instruction has no Argument, hence there is no OperandArgumentMask needed (so the Mask is just set to 00). Technically, the OperandArgumentMask field could have also been empty (ie. not specified).

eg.

Code: Select all

10AE,84,,,,"LDY ,X"
would also work.

This 6809 instruction is effectively to Load the Y index register from the X register, with no offset! (+0)

The Assembly for which is simply:

Code: Select all

LDY ,X


Hopefully all the above has help clarify my intentions. You are correct that I do somehow need to roll this into some expanded documentation. Perhaps as above... that's if all the above actually made any sense? :)

Thanks also for the various links, which I shall be sure to read through. Most of that 6800 / 6809 information I do already have in the form of all my original Motorola Data books, Application books, Reference cards, and Data sheets! :)

In summary, I think we arrived at the following to-do's / wish list items:
  1. I should implement -ve rendering of Signed Hexadecimal values, instead of my current old-school assumption that everyone knows $FF is -1
  2. A future HxD datainspector plugin enhancement to allow 3rd party code visibility of the Hexadecimal / Decimal format checkbox would then allow me to automate (and make dynamic) the current function of the fixed ArgumentHexDec .CSV field.
  3. A future HxD datainspector plugin enhancement to allow 3rd party code visibility of the "Hexadecimal numbers capitalization" format option, would allow me to follow the HxD global setting for Hexadecimal rendering Uppercase/lowercase preference.
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Thanks for your reply, I'll answer in more detail later.
The example you gave with LDY ,X made sense, I found it was a shorthand for LDY 0,X when reading through the docs I found later.
I assumed you would know most of these documents (or equivalent information), the links were mostly for me, noting down what I found useful or might want to refer to later, as part of my "research".
The 0RRnnnnn example was useful as well, to know how you parse/identify instructions.

So far all is clear.
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

GregC wrote: 09 Nov 2020 05:23 [*] I should implement -ve rendering of Signed Hexadecimal values, instead of my current old-school assumption that everyone knows $FF is -1
I thought about exposing some simple conversion functions for IntToHex and IntToStr, since I implemented them already (see signed hexadecimal numbers in the data inspector for Int32, for example). It would allow for more consistency. Optionally, you could also apply the default casing of hexadecimal numbers.
[*] A future HxD datainspector plugin enhancement to allow 3rd party code visibility of the Hexadecimal / Decimal format checkbox would then allow me to automate (and make dynamic) the current function of the fixed ArgumentHexDec .CSV field.
Yes.
[*] A future HxD datainspector plugin enhancement to allow 3rd party code visibility of the "Hexadecimal numbers capitalization" format option, would allow me to follow the HxD global setting for Hexadecimal rendering Uppercase/lowercase preference.
Maybe I could add another option, to specifically set the casing of assembly instructions in the options. So you could have it independently from the hexadecimal casing option. And make "assembly mnemonic casing" and "hexadecimal casing" both available to plugins.
I note you'd also want to tweak the checkbox label to be more appropriately descriptive (assuming single checkbox). Currently it specifically refers to: "Show Integers...".
What's wrong with "Show integers in hexadecimal base"? The original in German translated directly reads "Hexadecimal base (for integers)", maybe that's less awkward?
Since addresses/pointers/(relative) offsets are all integers, but floating points are not affected (you can have hexfloats), I think integer and base should be part of the wording.
If you mean it implicitly omits what happens when hexadecimal is not checked (i.e., show decimal numbers), then this was to save vertical screen space, since it would need another group box, like for byte order.

Regarding the %TEMP% thing (or really, support for environment variables), that's probably more work than necessary, for a plugin.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Maël wrote: 09 Nov 2020 09:53I thought about exposing some simple conversion functions for IntToHex and IntToStr, since I implemented them already (see signed hexadecimal numbers in the data inspector for Int32, for example). It would allow for more consistency. Optionally, you could also apply the default casing of hexadecimal numbers.
I've now implemented a change to support -ve rendering of Signed Hexadecimal values. I've also updated the CSV files for Hexadecimal relative references (instead of Decimal). So all current 6800 / 6809 instruction definitions now specify Hexadecimal rendering.
These changes have now been committed to github: https://github.com/DigicoolThings/HxD_D ... ctorPlugin

Exposing your conversion functions may indeed be useful to assist 3rd party conversion consistency. You may want different function naming, to clearly differentiate from System.SysUtils.IntToStr/IntToHex functionality?

However, I noticed a couple of formatting niggles with your existing Integer Hexadecimal conversions.
I do understand that this is purely subjective, as we each have our own preferences (so feel free to ignore). 8)

Here's what I noted with Hexadecimal checkbox selected:
  1. The Hexadecimal Integer value rendering does not include a '$' prefix. This is fine where the Hex number includes A..F. But where the Hex number includes only 0..9 this can be confusing to a viewer as to whether the number being viewed is in fact Hex or Decimal.
  2. The Hexadecimal Integer value rendering does not byte align. By this I mean that I would expect Hexadecimal display to always be an even number of Hex characters (ie. complete Bytes).
As an example (covering both of the above), you render the Int8 value $F8 as '-8', in both Hexadecimal and Decimal display modes! In Hexadecimal mode my preference would instead be to render this as '-$08'.
Maël wrote: 09 Nov 2020 09:53Maybe I could add another option, to specifically set the casing of assembly instructions in the options. So you could have it independently from the hexadecimal casing option. And make "assembly mnemonic casing" and "hexadecimal casing" both available to plugins.
Yes, I think this is great idea. On reflection, I do believe that the Uppercase / lowercase treatment of Disassembly reflects a (relatively) permanent personal preference. Meaning, that a user would probably set this only once, to match their personal preference (so it would indeed logically be a global preference Option). Therefore, I don't see that you would need to have a Datainspector checkbox to allow ongoing toggling of this preference (as related to Disassembly).
Maël wrote: 09 Nov 2020 09:53What's wrong with "Show integers in hexadecimal base"? The original in German translated directly reads "Hexadecimal base (for integers)", maybe that's less awkward?
Since addresses/pointers/(relative) offsets are all integers, but floating points are not affected (you can have hexfloats), I think integer and base should be part of the wording.
If you mean it implicitly omits what happens when hexadecimal is not checked (i.e., show decimal numbers), then this was to save vertical screen space, since it would need another group box, like for byte order.
Perhaps some misunderstanding here. However, if you implement the above Assembly casing Options, then the current checkbox description is just fine, as it remains specific to your Data inspector Integer types.

My comment was when considering the checkbox could also affect Disassembly argument rendering. In this regard I don't believe it would be safe to assume everyone considers Disassembly addresses/pointers/(relative) offsets as "Integers".

I understand that the term Integer can be used to simply imply an ordinal value. However, it can also be read as refering to a specific Data Type (as in your list of Data inspector Integer types).

eg. I've never really thought of a CPU Address as being an Integer. Perhaps a Word. :)

But, my point is of course irrelevant if the checkbox remains specific to your Data inspector Integer types.
Maël wrote: 09 Nov 2020 09:53Regarding the %TEMP% thing (or really, support for environment variables), that's probably more work than necessary, for a plugin.
Agreed. I will leave this as-is. Users can point to their prefered log file creation folder location in the INI file, if they choose to enable Logging.
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

Exposing your conversion functions may indeed be useful to assist 3rd party conversion consistency. You may want different function naming, to clearly differentiate from System.SysUtils.IntToStr/IntToHex functionality?
They would be in a different unit/namespace, so there would be no collisions, and can produce the same results, if configured, so.
However, I noticed a couple of formatting niggles with your existing Integer Hexadecimal conversions.
I do understand that this is purely subjective, as we each have our own preferences (so feel free to ignore). 8)
Indeed we have different opinions there ;)
I treat them as real numbers, as opposed to a programming token. But as you may also notice, in the x86 assembly datatype converter, it uses a $ prefix.
For the same reason they are not shown as hex pairs for the IntX decoders, but there is an option to set the amount of leading zeros, as you can with formatting functions as well. Again, the leading zeros/amount of places implies the bitness in the x86 disassembly.
All of this is intentional, to differentiate from bytes, raw data, source code.

When using decimal you have no such direct correlation between number representation and "container" bitness, so you could argue that you can drop it there as well :p Seriously, though, it could improve readability. But I have no such plans yet.

However, it really does improve readability in the data inspector. There, Int8 and Int64 would widely differ, because of the amount of leading zeros, which would just make it harder to immediately see if they are identical in value or not.
The bitness is already much clearer (no need to count digits) from the name of the row, i.e., (U)IntX, where x is the bitness.

The hex editor window is also not full with $ or 0x prefixes, although it's an array of bytes show in hexadecimal. That is because people know it is in hex encoding and 0x or $ would just add clutter. Everywhere where it's settable you either have a prefix/postfix, e.g., Offset (h), or the checkbox in the data inspector.
I think that makes especially negative hex numbers more readable.

A leading zeros options, next to the casing one in the global options, is a possibility, though, including for (U)IntX.

So in the end your work would look very much like you intend to, but with changeable options :)
Yes, I think this is great idea. On reflection, I do believe that the Uppercase / lowercase treatment of Disassembly reflects a (relatively) permanent personal preference. Meaning, that a user would probably set this only once, to match their personal preference (so it would indeed logically be a global preference Option). Therefore, I don't see that you would need to have a Datainspector checkbox to allow ongoing toggling of this preference (as related to Disassembly).
Yes, that was the plan, to set it in the global options.
My comment was when considering the checkbox could also affect Disassembly argument rendering. In this regard I don't believe it would be safe to assume everyone considers Disassembly addresses/pointers/(relative) offsets as "Integers".
Do you have an alternative in mind that works for describing both, integers and addresses/pointers/offsets?
I understand that the term Integer can be used to simply imply an ordinal value. However, it can also be read as refering to a specific Data Type (as in your list of Data inspector Integer types).
True. Currently it is specific. I could also use "whole numbers" as this isn't anything commonly used for datatypes, but conveys the same idea, and does not include floating points.
eg. I've never really thought of a CPU Address as being an Integer. Perhaps a Word. :)
In a CPU it technically is, since you can increment and do arithmetic with it. The type is really mostly there to differentiate its use (modify the IP or PC register) and allow for not mixing them.

But all of this is just a question of integration, feel free to finish implementing your plugin while having fun. I know how frustrating it can be to align differing concepts/POVs.

My goal is to unify as much as possible, not to establish new norms, that might be uncommon. For a while I thought about implementing separate pointer types, but it would have increased the list of datatypes much further, and "flood" the datainspector. Then I thought about what a pointer would really achieve, and it mostly was to be able to jump to a (relative) address. That's why there are goto links next to (U)IntX lines, IntX makes relative jumps (relative to the current position), UIntX absolute jumps.

Sometime, when the structure view is implemented, you could customize "pointer" behavior more.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Maël wrote: 10 Nov 2020 05:25They would be in a different unit/namespace, so there would be no collisions, and can produce the same results, if configured, so.
Indeed. I was just meaning, if their function was different. Although in a different namespace, the same name could confuse someone familiar with the standard SysUtils versions. But if configured effectively as an overloaded function (to also support same results), then that does make sense. :)
When using decimal you have no such direct correlation between number representation and "container" bitness, so you could argue that you can drop it there as well :p Seriously, though, it could improve readability. But I have no such plans yet.
Yes, I think I'm convinced on this. At least in the Disassembly realm, there probably is (in retrospect) no good place for Decimal representation. This is "close to the metal" stuff which really needs Hexadecimal clarity. 8)
However, it really does improve readability in the data inspector. There, Int8 and Int64 would widely differ, because of the amount of leading zeros, which would just make it harder to immediately see if they are identical in value or not.
I think this is perhaps where we might have to agree to disagree. :) Perhaps it's just my hex editor level thinking, but similar to my above acceptance that Decimal perhaps doesn't belong in Disassembly, I actually think it is relevant that Int8 and Int64 do widely differ.
ie. At the hex editor level, I don't see a good reason to make it any less obvious that leading zeros are indeed significant to the Int64 value (for example).
A leading zeros options, next to the casing one in the global options, is a possibility, though, including for (U)IntX.

So in the end your work would look very much like you intend to, but with changeable options :)
Yes, I think this is definately the way to go. Global configuration options that 3rd party plugins can have access to (and abide by) for UI consistency.
In my own coding I have generally always gone for configurability, rather than hard-wiring based on assumptions (that later bite you).
My comment was when considering the checkbox could also affect Disassembly argument rendering. In this regard I don't believe it would be safe to assume everyone considers Disassembly addresses/pointers/(relative) offsets as "Integers".
Do you have an alternative in mind that works for describing both, integers and addresses/pointers/offsets?
I understand that the term Integer can be used to simply imply an ordinal value. However, it can also be read as refering to a specific Data Type (as in your list of Data inspector Integer types).
True. Currently it is specific. I could also use "whole numbers" as this isn't anything commonly used for datatypes, but conveys the same idea, and does not include floating points.
Yes, if you want a single generalised checkbox then terminology is a challenge. "Whole numbers" is perhaps one possibility. Although I would probably lean more towards something like: "Ordinal Values" as a generalised term, that is also consistent with the expected understanding of someone who would be using a Hex Editor. 8)
But all of this is just a question of integration, feel free to finish implementing your plugin while having fun. I know how frustrating it can be to align differing concepts/POVs.
Thanks. Plus, don't worry about me, I don't find any of this frustrating. I've spent a good deal of my IT life having robust debates about methodology, usability (and UI in general). Debating stuff is how we edge towards the optimal solution. :D
My goal is to unify as much as possible, not to establish new norms, that might be uncommon. For a while I thought about implementing separate pointer types, but it would have increased the list of datatypes much further, and "flood" the datainspector. Then I thought about what a pointer would really achieve, and it mostly was to be able to jump to a (relative) address. That's why there are goto links next to (U)IntX lines, IntX makes relative jumps (relative to the current position), UIntX absolute jumps.
This is actually quite interesting. As a new user of HxD I had noted the "go to" links, but I hadn't actually taken the step of playing with them, to see what they did. (which is actually unusual for me, as I'm usually known for my attention to detail / analytical mindset).

I would observe that the diference between the relative vs absolute go to's (jumps) is not immediately obvious (from a UI perspective).
Given that you allow Data inspector types to be de-selected and re-ordered, perhaps adding types to the datainspector is not that big an issue.

However, utilising the Integer types as relative and absolute jump pointers is a nice compact way of providing this functionality. :D

A suggested improvement could be to create a couple of small button icon designs that represent relative and absolute jumps (ie. a small button for each). These buttons could then be dynamically positioned alongside any Data inspector type where the returned string includes an "Ordinal Value".
Perhaps for 3rd party plugins the relevant ordinal value could be returned as an additional param, which would also trigger the inclusion of the Relative + Absolute jump buttons.
I could see this as being very useful for the Disassembly Data inspectors! I had already considered whether I should be appending a calculated target address (as an ugly comment), to assist with identifying the absolute target of relative branch references etc.


PS. One small observation while I was playing with the "go to" links. I get a "Range Check Error" dialog in the case of a Uint64 $FFFFFFFF.
Ideally this should be trapped and instead the consistent Informational dialog presented "The file does not contain..."
HxD_goto_RangeError.png
HxD_goto_RangeError.png (52.04 KiB) Viewed 274726 times
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

I think this is perhaps where we might have to agree to disagree. :) Perhaps it's just my hex editor level thinking, but similar to my above acceptance that Decimal perhaps doesn't belong in Disassembly, I actually think it is relevant that Int8 and Int64 do widely differ.
Probably, I think it depends on the use case, that's why I had options in mind.
But for the record, I don't think decimal values have no place in assembly/disassembly. I just wanted to understand the logic behind the choice.
Hexadecimals are definitely standard for addresses, but quite awkward when calculating. And decimals are better when you do something like add eax, 123 or mov eax, 123, if you are adding/storing an integer and not a pointer. It depends on the use case.
"Ordinal Values" as a generalised term, that is also consistent with the expected understanding of someone who would be using a Hex Editor. 8)
Hm, ordinal numbers are usually positive only, idk.
I would observe that the diference between the relative vs absolute go to's (jumps) is not immediately obvious (from a UI perspective).
Yeah, that's true. Maybe in future I'll find a way to clarify this (such as "rel. goto"). I'll leave it like this for now, many other things left to implement.
Given that you allow Data inspector types to be de-selected and re-ordered, perhaps adding types to the datainspector is not that big an issue.
I really like the current implementation, also as a user of HxD myself :)
However, utilising the Integer types as relative and absolute jump pointers is a nice compact way of providing this functionality. :D
Thanks
Perhaps for 3rd party plugins the relevant ordinal value could be returned as an additional param, which would also trigger the inclusion of the Relative + Absolute jump buttons.
For integer datatype converters, this happens already. Like the (U)LEB128 encoding for example (where you would need again another pointer type -- which is why I like the link solution).
Generally might be useful, but in assembler you really need to know the base address. Maybe as part of a structure view some time. But my hunch is you would need a fully fledged disassembler which needs to do runtime analysis.
I could see this as being very useful for the Disassembly Data inspectors! I had already considered whether I should be appending a calculated target address (as an ugly comment), to assist with identifying the absolute target of relative branch references etc.

At least in x86 assembly, the addresses can often only be determined during execution. And judging by the indirect addressing modes of "your" CPU, I assume similar complications may arise.
PS. One small observation while I was playing with the "go to" links. I get a "Range Check Error" dialog in the case of a Uint64 $FFFFFFFF.
Ideally this should be trapped and instead the consistent Informational dialog presented "The file does not contain..."
HxD_goto_RangeError.png
Thanks for the report. But I have not been able to reproduce this error, with the 2.4.0.0 English release version.
What is the file offset and file size? Or can you share the file itself? It may be the error is not related to the goto.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

PS. One small observation while I was playing with the "go to" links. I get a "Range Check Error" dialog in the case of a Uint64 $FFFFFFFF.
Ideally this should be trapped and instead the consistent Informational dialog presented "The file does not contain..."
Thanks for the report. But I have not been able to reproduce this error, with the 2.4.0.0 English release version.
What is the file offset and file size? Or can you share the file itself? It may be the error is not related to the goto.
I am using 2.4.0.0 on Windows10 Pro.

I've attached the file. It is simply a 4KB ROM image where the first half is blank (FF).

I get the error at any point where the Uint64 comprises 8x FF bytes. ie. The maximum Uint64 value.

The first point it does not error is when positioned on the 7th FF prior to the mid point.
ie. FF FF FF FF FF FF FF 30 byte sequence (or $30FFFFFFFFFFFFFF little Endian).

Hopefully you can re-create the error.
ASSIST09_2732A.zip
(2.03 KiB) Downloaded 1957 times
Maël
Site Admin
Posts: 1454
Joined: 12 Mar 2005 14:15

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by Maël »

I can reproduce it now. It only happens in the 64 bit version, and it makes the debugger hang and sometimes crashes the IDE.

Edit:
Ok, was merely a question of using an unsigned type instead of a signed one. Interesting that it caused such a havoc for the IDE and debugger, but only under 64 bit.

Fixed here:
https://mh-nexus.de/downloads/HxDPluginFramework.zip

Could you please test the one change in the plugin-framework, for making proper readonly plugins?
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Maël wrote: 10 Nov 2020 17:34Could you please test the one change in the plugin-framework, for making proper readonly plugins?
Yes, my apologies for not getting back to you sooner. Tests now successfully completed. :)

1. Read-only Plugin option to inform of edit rejection (FSupportsStrToBytes):
Which Google translate tells me is appropriately (in English): "Editing is not supported by the "Disassembly (6800)" type."
HxD_FSupportsStrToBytes.png
HxD_FSupportsStrToBytes.png (55.54 KiB) Viewed 274700 times

2. Your fix for the Range Error (Uint64 go-to for FFFFFFFFFFFFFFFF):
HxD_goto_ResolvedRangeError.png
HxD_goto_ResolvedRangeError.png (49.54 KiB) Viewed 274700 times
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

Maël wrote: 10 Nov 2020 07:50Hm, ordinal numbers are usually positive only, idk.
Yes, indeed, you are correct. Apologies, that was my confusion. Ordinal terminology would not be appropriate in the case of -ve Signed values.
I see (via Google translate) that in 2.5 you currently have "Hexadecimal base (for whole numbers)", which does avoid the Integer type specific misunderstanding. :)
Generally might be useful, but in assembler you really need to know the base address. Maybe as part of a structure view some time. But my hunch is you would need a fully fledged disassembler which needs to do runtime analysis.
Yes, fair point. My initial assumption that a relative reference base is from the decoded block's following byte address may be appropriate. But I'm not really sure, and it certainly would not be appropriate as a generalised "goto" solution for all plugins!

Perhaps a more viable solution is that a "goto" enabling value returned by a plugin, would simply be a relative reference to the ByteToString's passed Bytes base position.
ie. It would be up to the plugin itself to calculate (when it determines appropriate), what the correct "goto" signed offset is, to be returned.
I could see this as being very useful for the Disassembly Data inspectors! I had already considered whether I should be appending a calculated target address (as an ugly comment), to assist with identifying the absolute target of relative branch references etc.

At least in x86 assembly, the addresses can often only be determined during execution. And judging by the indirect addressing modes of "your" CPU, I assume similar complications may arise.
Yes indeed. For single instruction Disassembly I was thinking of calculated target address only in the case of Direct / Extended relative addressing, and Direct / Extended absolute addressing modes. Register Indirect (or likely any indirection) would not be feasible (ie. inherently unknown).
My thought that this could be passed back from a plugin, noting (as above), a relative offset reference would only be passed back by a plugin where the plugin determines it is appropriate.
GregC
Posts: 31
Joined: 08 Oct 2020 04:27

Re: Disassembly feature as a 3rd party extendable plug-in?

Post by GregC »

UPDATE: Committed to github an additional definition for 6502 Disassembly. As below:
  • Added original 6502 Disassembly definition file (Dasm6502.csv), and associtaed .ini file settings (to enable).
  • Note this is the first Little Endian CPU definition, which confirms successful Little Endian Operand operation.
  • The 6502 definition file can also form basis for later addition of extended instruction 65C02 and 65C816 definition files.
Get it here: Disassembly Plugin on GitHub

Screenshot showing correct Little Endian 6502 Absolute addressing STA instruction decoding:
HxD_6502_LittleEndian.png
HxD_6502_LittleEndian.png (92.43 KiB) Viewed 274693 times
Post Reply