DataInspector: provide some Varint decoding for Git

Wishlists for new functionality and features.
Post Reply
PhilipOakley
Posts: 1
Joined: 02 Apr 2019 14:51

DataInspector: provide some Varint decoding for Git

Post by PhilipOakley » 02 Apr 2019 18:46

This follow up an email exchange:

Decoding Git objects and packs is complicated by its use of variable length integer formats. Providing an option to show Git varint decoding would be helpful for inspecting these big binary files.

These varints use the byte's msb to indicate of there is more data within the integer. Some use little endian (LE), others big endian (BE) approaches. The little endian format may hide a few type bits in the first byte, and the big endian may also use the 'no leading zero (+1)' effect for further compression.

There are at least three git formats are in use: the object type/length field (LE), the pack OFS_DELTA size (LE), and 'varint' (BE, '+1').

https://github.com/git/git/blob/master/ ... 1047-L1070
https://github.com/git/git/blob/master/ ... 1181-L1190
https://github.com/git/git/blob/master/varint.c#L4-L18

Philip

Maël
Site Admin
Posts: 1071
Joined: 12 Mar 2005 14:15

Re: DataInspector: provide some Varint decoding for Git

Post by Maël » 05 Apr 2019 12:40

From previous email exchange:
It could be added to the data inspector.

The basic unsigned format seems reasonably simple, the more compact ones less.

I found this article
https://en.wikipedia.org/wiki/Variable-length_quantity which also
describes the Git encoding. However it is slightly confusing and the
linked source code is doing things implicitly (insufficient documentation).
https://github.com/git/git/blob/master/varint.c

Do you have more material on this? (The documentation I found suggests the git varint logic is more complex than your description?)
It looks like a pretty specialized data type, with many variations and a bit hard to implement, given the documentation is code (and it's not clear how stable this code "definition" is).
So I was wondering if providing a plugin interface just for implementing additional datatypes for the data inspector wouldn't be better. Something that allows you to write a DLL in C or any language that supports exporting C like functions in DLLs.
What do you think?

Edit: if you don't mind I could add your reply to the quoted email, as well.

Maël
Site Admin
Posts: 1071
Joined: 12 Mar 2005 14:15

Re: DataInspector: provide some Varint decoding for Git

Post by Maël » 08 Apr 2019 17:47

Let me know if you are interested in this plugin approach, as mentioned in my last mail.

Post Reply