The Many Shapes and Sizes of Strings on Windows

Earlier this week I was toying around with the idea of implementing a lnk file parser in the go language. Yes, I occasionally do things like this for fun. Just to get a feel for things, I grabbed the most easily accessible lnk file I could find, the Chrome launcher in my Windows VM, and moved it to my Xubuntu box where Real WorkTM takes place. For some basic sanity checking, I take a gander at all of the strings stored in the file:

#strings –a Google\ Chrome.lnk >> strings.txt

#strings –a –e l Google\ Chrome.lnk >> strings.txt

After looking at the short list of results, I fire up xxd to take a look around to see if there are easily visible structures that will make reading any docs I can find make more sense. As I float around something stands out: the clearly visible string ‘@shell32.dll,-21817.’ Why might you ask does this string stand out? Because I didn’t remember seeing it in the strings.txt file I generated above. I then verified that it wasn’t in there. This is weird. To the man page!

The strings tool can search for strings of many encodings, the –e l switch looks for little endian 16-bit strings. Other options are –e L for 32-bit little endian, -e b for 16-bit big endian, and -e B for 32-bit big-endian there’s also a –e S, but it looked to put out a ton of junk). Back to the lnk file.

#strings –a –e L Google\ Chrome.lnk > strings_L.txt

#strings –a –e b Google\ Chrome.lnk > strings_b.txt

#strings –a –e B Google\ Chrome.lnk > strings_B.txt

While the strings_L.txt and the strings_B.txt files are empty, my mystery string (‘@shell32.dll,-21817’) does in fact appear in the strings_b.txt file. Windows apparently uses 16-bit big endian strings too. Where else might these types of strings be? I’ve been doing a lot of work with memory analysis stuff lately and so I have a 12GB memory dump from some machine in my office. I ran strings with each of the encodings (l, L, b, B, and the default ASCII) on my memory image and generated the following files, with the number of strings in each listed as well:

#wc –l *.txt

46589193 strings.txt

9507326 strings_l.txt

51787 strings_L.txt

696550 strings_b.txt

57015 strings_B.txt

That’s a lot of stuff. Even the 32-bit string types are present in the memory dump. Where in the dump are they? And are any of them useful or just false positive/junk? Volatility strings plugin to the rescue. For each of the non-ASCII files above:

#python vol.py –profile Win7SP1x64 -f mem_image.bin strings -s strings_<>.txt >

vol_strings_<>.txt

For each of the vol_strings_<>.txt files created with Volatility, I looked for strings in kernel space, any mention of EXEs, DLLs, LNKs, PF, SYS, and URLs and IP addresses (using Perl regexes found here).

Big endian 32-bit strings gave the worst results. While there were 50k+ strings almost all of them were junk. There was no evidence of any types of file paths for any of the types listed above, no URLs, and no valid looking IP addresses. Maybe not worth the effort in all cases.

Little endian 32-bit strings fared only a bit better. There were a few dozen references to LNK files, mostly referencing Windows built-in tools, and there were some strings in kernel space related to VMWare and some firewall tech. Ntoskrnl.exe contained some strings related to toolbars. No URLs were to be found and only a couple of valid looking IP addresses, probably related back to the VMWare stuff.

On to big endian 16-bit strings. There was just a ton of stuff. There were about 10k URLs (drops to about 3k if we remove any mention of Microsoft), 100k IPs (a larger number of which were actually software version strings). References to 7k files ending in EXE, 16k ending in DLL, 500 ending in LNK, and on and on. You get the picture.

But I know there’s a bit of a catch. They could easily be duplicates of the little endian strings we all know and love, because many of the 16-bit little endian strings will look like ASCII characters separated by nulls – same as the big endian ones.

As an initial test for duplicates, I took the strings output files for the strings –e l and strings –e b for the same memory image. I then filtered each of them through sort and uniq, and used diff to spot the differences, then used grep to verify that some strings in the 16-bit big endian strings list were not also in the 16-bit little endian strings list. If that is unclear, I did the following:

#strings –a –e l mem_image.bin |sort | uniq > sorted_unique_strings_l.txt

#strings –a –e b mem_image.bin | sort | uniq > sorted_unique_strings_b.txt

#diff sorted_unique_strings_l.txt sorted_unique_strings_b.txt > l_versus_b_diff.txt

I then looked through the diff results and used grep to verify that a handful of the items diff said were only in the big endian file were in fact not in the little endian file from (way) above.

#grep “some_16_bit_big_endian_string” strings_l.txt

(returned nothing)

How many of these strings are unique to the 16-bit big endian set? I have no idea. I’ve only just begun to dig around, and there’s still a chance I’m missing something simple (but I hope not). If anyone else has an explanation, please share.

This might require a part 2. But until then, happy hunting!

 

Vico and the 504ENSICS Team

@vicomarzale