HxD: Search with Regex
HxD: Search with Regex
Hi Mael!
Here is a feature a colleague asked me about:
He wants to search in a binary file, through a binary regular expression. For example, to search a pattern like:
34 4? 57
and gets all the pattern matches with the second nibble as "don't care".
If it's possible to get down to the bit level, that would be fantastic (e.g. ignoring 3 bits and not a whole nibble).
Thanks
Schtrudel
Here is a feature a colleague asked me about:
He wants to search in a binary file, through a binary regular expression. For example, to search a pattern like:
34 4? 57
and gets all the pattern matches with the second nibble as "don't care".
If it's possible to get down to the bit level, that would be fantastic (e.g. ignoring 3 bits and not a whole nibble).
Thanks
Schtrudel
I've never really known a use for matching 4bits out of a byte, not that there aren't people out there who might.Maël wrote:As really flexible regular expressions (with some sugar) mean that I have to roll my own implementation, expect this feature not before version 1.8 or 1.9.
May I recommend using the PCRE regex library instead of spending a lot of time implementing of your own "regex style" search functionality?
It should be fairly simplistic to convert a search syntax like this:
81 8F E3 F7 9E . . . . 9F FF 8E 87 6F 47 87
(or maybe double dots such that the spacing remains the same?)
To something like this:
\x81\x8F\xE3\xF7\x9E....\x9F\xFF\x8E\x87\x6F\x47\x87
I use regular expressions to search files on machine code binaries on a regular basis, but none of those tools are a hex editor, so it would be very nice to integrate that functionality into this particular hex editor. (Fast and light weight = good i.m.o.)
Other examples of useful syntax that is PCRE compatible :
Code: Select all
.*
(matches all bytes until next specified match)
.{1,200}
(matching any byte minimum of 1 times, maximum of 200)
[90-F0] converts to [\x90-\xF0]
(matching bytes from 90 to F0)
(90|FF|55) converts to (\x90|\xFF|\x55)
(matching bytes, 90, FF, or 55)
(?!55) converts to (?!\x55)
(next byte does not equal 55)
The Function pcre_compile() will tell you if the notation is bad, and pcre_exec() returns the address of the match plus the length, so it would be easy to update the selection in the editor.
This makes the matching engine extremely flexible, and forces the user to simply be familiar with regular expression notion without the ugly syntax. (\x00)
I don't know if there is really a need for a full blow perl regular expressions. My first thought was more to implement "real" regular expressions (i.e. what can be done with an DFA) with an slightly extended syntax that allows to specify byte patterns. AFAICT all your examples would be achievable this way.
The issue with this is that I allow searching streams of any size (also GBs of data) which means I have to adapt the searching (as not everything can be held in memory). I will first have to investigate how much I would need to adapt the PCRE lib to achieve this. This also means I can't simply use the code, but will have to analyze and understand it.b0ne wrote:The Function pcre_compile() will tell you if the notation is bad, and pcre_exec() returns the address of the match plus the length, so it would be easy to update the selection in the editor.
Could you give me some links to those tools, so I can see how they handle binaries?b0ne wrote:I use regular expressions to search files on machine code binaries on a regular basis
Perhaps a checkbox to enable "full" regular expressions that will scan the full buffer with the understanding that it cannot be performed on giant streams? It doesn't seem wise to use complicated regular expressions on GBs of data to begin with...The issue with this is that I allow searching streams of any size (also GBs of data) which means I have to adapt the searching (as not everything can be held in memory).
Alternatively, a condition where if the scan is going to be extremely large, to allow only regular expressions that have bounds on their repetitions such that you can overlap the buffers that are provided to PCRE?
For instance, you have read in 4 blocks of memory, each 32 bytes in size. You know your pattern matches a maximum of 16 bytes.
The pattern cannot match across blocks 1 to 3 or 2 to 4. Knowing this you could free 1 after you've scanned across 1 and 2, then free 2 after you've scanned across 2 and 3.
Well, some are closed source like PowerGrep, it handles binary searches (not very well) and is closed source. There are tools I use at my job which utilize PCRE, but those aren't hex editors... I was privy to the implementation of the internal ones, all of them rely on passing the entire buffer to PCRE.Could you give me some links to those tools, so I can see how they handle binaries?
I haven't investigated clamav's source, but they have a byte-pattern matcher in their scanner which supports variable sized patterns on unknown file sizes.
I was more interested in regular expression syntax. I can't use most opensource anyway as HxD isn't opensource either. Anyway, sourcecode isn't the issue, I was more interested in design of regular expressions for binary files (and their special requirements).b0ne wrote:Well, some are closed source like PowerGrep, it handles binary searches (not very well) and is closed source. There are tools I use at my job which utilize PCRE, but those aren't hex editors... I was privy to the implementation of the internal ones, all of them rely on passing the entire buffer to PCRE.
Sorry for the extremely delayed response. PCRE is BSD licensed, so all you need to do is throw some copyright info into a text file and call it good. The source is available for study as well.Maël wrote:I was more interested in regular expression syntax. I can't use most opensource anyway as HxD isn't opensource either. Anyway, sourcecode isn't the issue, I was more interested in design of regular expressions for binary files (and their special requirements).
Re: HxD: Search with Regex
I personally both hands up for a simple pattern search, even if this is not regex. I really miss an ability to search for stuff like "66 ?? 5D ?? ?? 7C". This particular thing will be not that hard to implement, although I understand that it's less flexible than Regex, time-to-market can be considerable lower.
Re: HxD: Search with Regex
I already started implementing something based on PCRE. Doing just a partial implementation and I'll get a lot of complaints why the pattern search is so limited. But I can tell you that it is very high on the TODO-list.
Hex search with gaps
I'd like to search a file where I know the start and the end of a byte sequence. This is mostly the case when I do binary patching of executable or DLLs where fixup values are used for addresses.
A search hex-string like "85 F6 74 0C 83 0D xx xx xx xx 04" could be used to find the position.
A search hex-string like "85 F6 74 0C 83 0D xx xx xx xx 04" could be used to find the position.
Re: Hex search with gaps
There is a similar request to add regular expressions: http://forum.mh-nexus.de/viewtopic.php?f=4&t=80&start=0
It should be possible to do what you want when regexes are implemented.
It should be possible to do what you want when regexes are implemented.
Re: Hex search with gaps
I'm looking forward to that. Thanks.
Re: HxD: Search with Regex
34 4? 57
Just need simple pattern search.
It's similar:
https://www.unknowncheats.me/forum/c-an ... hmark.html
https://github.com/mrexodia/PatternFinder
Just need simple pattern search.
It's similar:
https://www.unknowncheats.me/forum/c-an ... hmark.html
https://github.com/mrexodia/PatternFinder
Re: HxD: Search with Regex
Related post:
viewtopic.php?f=4&t=176
viewtopic.php?f=4&t=176