How to read raw binary without definition, and re-write to binary? #736

siebeneicher · 2017-03-30T00:27:01Z

For a project I need to read a binary without having its proto definition. Using protoc.exe from Google does print me out something readable, but further more I need to change specific content and than re-write the content to binary back.

Any general advice? Would I need to dive deep in the protocol to understand how to decode manually?

Or would you suggest using protoc.exe output, transform to lets say JSON, and rewrite it (with a somehow reverse-engineered proto)?

I am not necessary stuck to protobuf.js or any particular technology.

Any general advice is super welcome!

dcodeIO · 2017-03-30T00:34:36Z

You could reverse engineer the definition. It's not that hard actually and once this is done, you'd not be limited anymore.

Alternatively, there is the low level API for working with the wire format (example) that could also help you to identify the format.

siebeneicher · 2017-03-30T13:36:14Z

I find your example very intersting and I will continue that road!

So far I am analyzing this first part of a buffer:

0a df 11 32 9e 05 08 02 12 1c 0a 09 62 72 6f 77 73 65 5f 69 64 12 0f 46 45 77 68 61 74 5f 74 6f 5f 77 61 74 63 68 ...

But I struggled after some parts... hope you bear with me.

From what I understood:

// 0a = 10dec = 0000 1010 = msb: 0, id: 1, wiretype: 2
// df = 223dec = 1101 1111 = msb: 1
// 11 = 17dec = 0001 0001 = msb: 0 concat: 11+df => 001 0001 + 101 1111 => 2271dec

My conclusion is: wiretype 2 ldelim with 2271 length / bytes.

So thats why I do:

	var reader = protobuf.Reader.create(rbuffer);
	while (reader.pos < reader.len) {
	    var tag = reader.uint64();		// get max. 8bytes, does take MSB in consideration, returns full tag

		// 1st bit 		= msb
		// 2-4th bit 	= id
		// 5-8th bit 	= wire msg

	    var id = tag >>> 3;						// shift 3 bits out (id = 4 bits)
	    var wireType = tag & 7;					// decimal of last 3 bits
	    console.log(tag, wireType);

	    switch (wireType) {
	        case 2:
	        	var l = reader.uint64();
	            console.log(reader.string());
	            break;
	        default:
	            //reader.skipType(/*wireType*/ tag & 7);
	            break;
	    }
	}

here is console.log from the reader.string()

"��
browse_id��FEwhat_to_watch��
�context��yt_"

which looks not correct.

Parsing the same buffer with protoc.exe --decode_raw < buffer returns:

1 {
6 {
1: 2
2 {
1: "browse_id"
2: "FEwhat_to_watch"
}
2 {
1: "context"
2: "yt_android_w2w"
}
2 {
1: "has_unlimited_entitlement"
2: "False"
}
....

So expect I do miss something in the interpretation.

Is the string by chance nested and I have to apply the same process on the return from string() ??

How can I determine if its proto v2 or v3?

Very glad for any feedback from you!

Cheers,

Markus

dcodeIO · 2017-03-30T13:59:47Z

So expect I do miss something in the interpretation.

Looks like it's not just bytes, but submessages, so ...

Is the string by chance nested and I have to apply the same process on the return from string() ??

Yep, but it's rather a buffer than a string. .bytes()

0a	id 1, wireType 2
df	95 (with msb)
11	17 (without msb) = 2271

either 2271 bytes of a string, of a buffer or a sub-message. let's assume a sub-message:

32	id 6. wireType 2
9e	30 (with msb)
05	5 (without msb) = 670

looks like a sub-message (also corresponds to what protoc outputs: note the 6, which is the field id here).

regarding protoc output, this continues. message structure is about:

message {
  field 6 (submessage) {
    field 1 (varint or fixed),
    field 2 (submessage) {
      field 1 (string),
      field 2 (string)
    }
    field 2 ... again, hence: repeated
  }
}

etc. As you see, protoc's output is a good indicator of the field ids to expect. It also indirectly shows possible data types (strings, submessages with braces, but numbers could be varints or fixed32/64 bits).

How can I determine if its proto v2 or v3?

You cannot. proto3 wire format does not differ from proto2, it's just the field declarations that are all implicitly optional and the introduction of language-level constructs like oneofs. When reverse-engineering, it's better to declare everything optional anyway, so it's safe to use proto3 here.

siebeneicher · 2017-03-30T15:33:36Z

that makes sense. I assume, this buffer uses V3, because nested in V2 would have wiretype 3 or 4, no?

dcodeIO · 2017-04-11T13:29:10Z

I assume, this buffer uses V3, because nested in V2 would have wiretype 3 or 4, no?

No, wiretype 3 and 4 are for legacy groups, a feature long deprecated in proto2 already. On the wire, proto2 and proto3 do not differ much, it's mostly language-level changes like all optional fields and new data types, but those new types use backward compatible encoding.

dcodeIO added the question label Mar 30, 2017

dcodeIO closed this as completed Jun 9, 2017

konsumer mentioned this issue Sep 14, 2017

Raw reader #910

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to read raw binary without definition, and re-write to binary? #736

How to read raw binary without definition, and re-write to binary? #736

siebeneicher commented Mar 30, 2017

dcodeIO commented Mar 30, 2017 •

edited

Loading

siebeneicher commented Mar 30, 2017

dcodeIO commented Mar 30, 2017 •

edited

Loading

siebeneicher commented Mar 30, 2017

dcodeIO commented Apr 11, 2017

How to read raw binary without definition, and re-write to binary? #736

How to read raw binary without definition, and re-write to binary? #736

Comments

siebeneicher commented Mar 30, 2017

dcodeIO commented Mar 30, 2017 • edited Loading

siebeneicher commented Mar 30, 2017

dcodeIO commented Mar 30, 2017 • edited Loading

siebeneicher commented Mar 30, 2017

dcodeIO commented Apr 11, 2017

dcodeIO commented Mar 30, 2017 •

edited

Loading

dcodeIO commented Mar 30, 2017 •

edited

Loading