Datainspector: byte $C0 in Int8 row (in hexadecimal number mode) should not display as ffffffc0

Bug reports concerning HxD.
Post Reply
Maël
Site Admin
Posts: 1455
Joined: 12 Mar 2005 14:15

Datainspector: byte $C0 in Int8 row (in hexadecimal number mode) should not display as ffffffc0

Post by Maël »

Instead it should display as C0, the same way as UInt8.

This is due to converting it to a signed Int8 that is then passed on to a (U)Int32 or an (U)Int64, at which point a sign extension happens, which show as leading F's.

Also look how others handle this, as usually hexadecimals are always positive, but in this case Int8 and UInt8 just show the same information twice.

But it also affects the goto command (when clicking on a data row's "go to:" link), as this always expects a positive value (negative ones make no sense as file position).

What about IntToHex in other languages like C orJava. How do they handle negative values?

Same issue happens with Int16. But not Int32 or Int64, since they don't get sign extended, but passed on IntToHex without a type cast (and IntToHex treats them always as unsigned internally).

Make sure this matches with IntToBase/IntToOffsetBase handling of negative values, i.e., both the data inspector hex value display and the former functions should give the same results, for consistency.

Relative jumps (short/near jumps) in x86 assembly have a similar problem. They can have an immediate relative offset, that usually is shown with leading F in case of negative offsets in hexadecimal. But it might be nicer showing it with a - or + sign, always!

This would disambiguate it from having a hexadecimal value which is shown as unsigned, no matter the signedness of the type.

See also how debuggers handle relative jumps, or actually the relative offset in hexadecimal. See this in Delphi/VS, etc.
Maël
Site Admin
Posts: 1455
Joined: 12 Mar 2005 14:15

Summary of the current status regarding all points mentioned in previous post

Post by Maël »

Maël wrote: 12 Mar 2019 06:19 Instead it should display as C0, the same way as UInt8.

This is due to converting it to a signed Int8 that is then passed on to a (U)Int32 or an (U)Int64, at which point a sign extension happens, which show as leading F's.
Fixed in 2.3.
Maël wrote: 12 Mar 2019 06:19 But it also affects the goto command (when clicking on a data row's "go to:" link), as this always expects a positive value (negative ones make no sense as file position).
For goto error messages (when the relative offset would jump outside of the file), the value should be shown as relative offset, even when displayed in hexadecimal. That means the byte $C0 if interpreted as signed (which would be -64 in decimal), should also be rendered as signed in hexadecimal, which is $40 (64 in decimal), which results in -$40 if we add the sign back in.
This makes sense as we have a relative offset, and want to jump back $40 bytes, so the sign should be part of the display. Since we know it's in hexadecimal, the display will have no base indicator(like $, 0x, or h), and would simply be this: -40
Maël wrote: 12 Mar 2019 06:19 Also look how others handle this, as usually hexadecimals are always positive, but in this case Int8 and UInt8 just show the same information twice.
Currently both Int8 and UInt8 show the same unsigned hexadecimal number interpretation.
But it would be better if Int8 would follow the same pattern outlined just above, i.e., show and process signed hexadecimal numbers.

The only problem is signed hexadecimal numbers are quite uncommon, besides when used as relative offsets in assembly. This might cause confusion, based on the expectations people have.
But it remains strange when switching from hexadecimal to decimal display and back in the data inspector, and seeing the sign just disappear in the hexadecimal representation. So I think a signed display is the best choice. It would also be more consistent with the relative "goto"-link interpretation (and the possible error messages mentioned above).
Maël wrote: 12 Mar 2019 06:19 What about IntToHex in other languages like C orJava. How do they handle negative values?
From the experience gained through https://github.com/maelh/hxd-plugin-framework the common conversion functions in C and C++ seem to treat numbers always as unsigned when converting them to hexadecimal. (TODO: this means I would have to fix the plugin framework examples, to handle signed hexadecimal numbers as well!.)
For example,

Code: Select all

std::cout << std::hex << -64;
and

Code: Select all

printf("%x\n", -64);
produce ffffffc0 as output.
The leading ffffff is due to sign-extension to 32 bit. An 8 bit integer would be printed as character, so that is not an option to remove the ffffff "prefix". A 16 bit integer cast produces ffc0 as output, still including an ff "prefix".

As shown in the discussion below, signed hexadecimal numbers can be confusing to people:
https://stackoverflow.com/questions/960 ... d-hex-in-c
What do you mean by a "signed" hexadecimal value? Do you want something like "-e0"? If so, what would that mean? The sign is already represented in the hexadecimal display, since it shows all the bits and that's all the information that's in the number.
It's quite frequent to view hexadecimal numbers as a more compact binary representation, that contains literally all the information necessary, as a sequence of bits.
As an example of this notion, the Windows calculator treats all numbers (no matter if byte, word, dword, or qword sized) as signed integers. But in the binary and hexadecimal presentation no sign is ever shown. They are treated as pure low level bit patterns, that just show the encoding of numbers using two's complement representation, not some slightly higher level representation that has a notion of signed numbers, no matter how they get encoded.

If we have offsets in hexadecimal in HxD, it makes sense to show signed hexadecimal values for relative offsets. Two's complement encoding are not really helpful in this case, as it is too low level and hard to interpret its meaning. But jumping back $40 bytes can be estimated from the hexeditor display: each line is $10 bytes long, so jumping back $40 bytes means jumping back 4 lines.
=> signed hexadecimal numbers are clearly useful and better when shown in the data inspector (which usually shows a higher level view).


In unison, x86 assembly uses signed hexadecimal numbers, as in this example:

Code: Select all

03 80 60 FF FF FF        add eax, [eax-$000000A0]
03 80 A0 00 00 00        add eax, [eax+$000000A0]
[eax-$000000A0] is relative addressing, where eax is the base address and -$000000A0 is a relative offset.

This is exactly the same use case as for our goto links in the data inspector, and should therefore be treated the same way.
The only slight difference is that in assembly, it seems offsets are only shown in expressions of the form base+hexadecimal_value or base-hexadecimal_value. So you could argue that hexadecimal_value is always unsigned but can take part in subtractions and additions.
But having signed hexadecimals shown on their own should be fine, since in the context of IntX datatypes it is clear we have signed values (and not some operation relative to a base), and we really should not assign it unsigned ones, this would be a type violation (and you should use UIntX instead, then).

For other use cases, i.e., when checking the correctness of a low level encoding of a number, e.g., two's complement encoding, you could directly use the byte array display ("hex view" in HxD's main editor). However the endianness might still play a role, and as such it would be practical if you could enable/disable a two's complement decoded display for hexadecimal values, yet still have the automatic endianness conversion.

Binary should stay as it is, as it is never treated like a signed number in computers, though you could, of course, as with numbers in any base system. But binary is really only meant to show the low level representation of high or low voltage (or on/off state) in the electronics or storage media.

It may be useful to understand subtraction in binary, but then we should have it as a special interpretation for IntX/UIntX, not as a separate data type, "Binary (8 bit)", like we have now, which really should only show an array of states, not a number.

Side note: this also shows that computers don't really compute, but are symbol manipulators, which happen to also compute (because it's a useful kind of symbol manipulation, but not the only nor the universal one). The more important feature is symbol manipulation, which is also why Turing machines focus on it, instead of computation.
Maël wrote: 12 Mar 2019 06:19 Same issue happens with Int16. But not Int32 or Int64, since they don't get sign extended, but passed on IntToHex without a type cast (and IntToHex treats them always as unsigned internally).

Make sure this matches with IntToBase/IntToOffsetBase handling of negative values, i.e., both the data inspector hex value display and the former functions should give the same results, for consistency.
I still need to check the status of this, in the current code base.
Maël wrote: 12 Mar 2019 06:19 Relative jumps (short/near jumps) in x86 assembly have a similar problem. They can have an immediate relative offset, that usually is shown with leading F in case of negative offsets in hexadecimal. But it might be nicer showing it with a - or + sign, always!

This would disambiguate it from having a hexadecimal value which is shown as unsigned, no matter the signedness of the type.
This in addition to the reasoning above on why it is useful, might make it more clear and ambiguity free. Conversion function should have an option to always include a sign (even when the value is positive).
Maël wrote: 12 Mar 2019 06:19 See also how debuggers handle relative jumps, or actually the relative offset in hexadecimal. See this in Delphi/VS, etc.
Relative jumps are decoded to absolute offsets, since the debugger knows the values of registers, memory locations, etc. to which the relative offset is applied. Instead of keeping such an expression, it is evaluated and replaced with the resulting absolute offset. So this was not conclusive in giving any hint on how relative offsets are displayed.

But the assembly example above with the add operation and relative addressing does indeed use +/- in front of hexadecimal values.
Maël
Site Admin
Posts: 1455
Joined: 12 Mar 2005 14:15

Re: Datainspector: byte $C0 in Int8 row (in hexadecimal number mode) should not display as ffffffc0

Post by Maël »

Maël wrote: 12 Mar 2019 06:19 Make sure this matches with IntToBase/IntToOffsetBase handling of negative values, i.e., both the data inspector hex value display and the former functions should give the same results, for consistency.
This is still left to be done.
From the experience gained through https://github.com/maelh/hxd-plugin-framework the common conversion functions in C and C++ seem to treat numbers always as unsigned when converting them to hexadecimal. (TODO: this means I would have to fix the plugin framework examples, to handle signed hexadecimal numbers as well!.)
This too, needs to be fixed.

Everything else is fixed in the latest release (using a set of type conversion functions for (U)Int32 and (U)Int64 in DataTypeConversion.pas, that handle all the possible boundary conditions perfectly, including correct range checks (Delphi's built-in functions had bugs), and can support signed numbers, including signed hexadecimal numbers -- with a hex prefix of $, 0x, 0X, x, or X -- and numbers with leading + signs).

Explanation of the solution:
The general solution for signed integers was to remove the minus sign from the integer before converting it to a string, then prepending the minus character to the resulting string. That way decimal and hexadecimal number representations are signed if the integer is signed too, and we have never issues with leading $FF pairs, due to sign extension (since during the conversion signed integers have their minus sign removed, as mentioned before, which guarantees we get the shortest hexadecimal string, not "spoiled" by leading 1's for the sign extension).

This way, we don't really need to know the size of the integer type, and have no need to cut off excessive sign extension bits (or their $FF hexadecimal representations), caused by using intermediary integer types during computing/converting that are larger than the final target type.
For example, when converting an negative Int32 to a hexadecimal string, using Int64 for intermediary calculations/conversions, all the sign extension bits will show in the converted hexadecimal string up to the 64th bit, eventhough the original had only 32. Those excessive leading/sign extending bits (32 of them) would need to be trimmed to give a correct hexadecimal string.

But since we never deal with sign extended numbers (as we remove the sign from the integer before conversion and add it back to the string later), we don't need to trim either.

In theory, this can result in range check errors or overflows, for example for the Int32 value -2147483648. But the sign is removed and type casting is used such that -2147483648 is turned into 2147483648 without any issues (and range checking is disabled), before passing it to the unsigned IntToHex Delphi function.
As mentioned, special functions make sure this is done correctly always, namely Int32ToHex and Int64ToHex.
Every possible boundary condition was tested to make sure it works in every case.
Post Reply