Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing value checks #78

Open
hannesm opened this issue Jan 11, 2016 · 8 comments
Open

missing value checks #78

hannesm opened this issue Jan 11, 2016 · 8 comments
Labels

Comments

@hannesm
Copy link
Member

hannesm commented Jan 11, 2016

Should cstruct check that the value of a set_XXX is in range? At the moment the following assertion does not hold (set/get_uint8 is interchangable with uint16):

let c = Cstruct.create 1 in
let v = 256 in
Cstruct.set_uint8 c 0 v ;
let v' = Cstruct.get_uint8 c 0 in
assert (v = v')

Opinions?

@hannesm hannesm changed the title value checks missing value checks Jan 11, 2016
@djs55 djs55 added the question label Mar 12, 2016
@djs55
Copy link
Member

djs55 commented Mar 12, 2016

I would like cstruct to check if values are in range -- I'm sure I've made mistakes which have gone undetected for longer because it doesn't check. Since cstruct already raises Invalid_argument if buffer bounds are out-of-range I think it would be consistent to raise Invalid_argument in these cases too.

@dinosaure
Copy link
Member

Should we just add a mask (such as land 0xff) in this case to ensure to store the value without overflow?

@avsm
Copy link
Member

avsm commented Jul 29, 2020

It’s frustrating to have this be an overhead on constant arguments. If there’s no way to convince the optimiser to check overflow at compile time, we could consider a small ppc checker for this. A mask on every single set call for a constant should be avoided if we can, but I agree it does need to be checked at build time.

@dinosaure
Copy link
Member

It’s frustrating to have this be an overhead on constant arguments.

Obviously we can infer a possible overhead but:

  1. it seems that OCaml 4.10.0 deletes this overhead where we manipulate a constant
  2. we can have some surprises on performance costs from the CPU prediction (as the bound-checking)

Let's start with this simple code:

let pp_uint16 ppf x =
  Format.fprintf ppf "%04x" (x land 0xffff)
[@@inline]

let () =
  pp_uint16 Format.std_formatter 42 ;
  pp_uint16 Format.std_formatter (Sys.argv.(1))

OCaml generates this x86_64 ASM:

; pp_uint16 Format.std_formatter 42
        movl    $85, %ebx ; $85 is [(42 lsl 1) + 1]
        movq    %rbx, 8(%rsp)
        movq    camlExample__4@GOTPCREL(%rip), %rbx
        movq    %rbx, (%rsp)
        call    camlStdlib__format__fprintf_1166@PLT

And

; pp_uint16 Format.std_formatter (int_of_string Sys.argv.(1)
; [int_of_string Sys.argv.(1)] is stored into %rax
        andq    $131071, %rax ; it's the inlined part [x land 0xffff] where [131071 = (0xffff lsl 1) + 1]
        movq    %rax, 8(%rsp)
        movq    camlExample__4@GOTPCREL(%rip), %rax
        movq    %rax, (%rsp)
        movq    %rbx, %rax
        call    camlStdlib__format__fprintf_1166@PLT

The [@@inline] tag is not really needed for the first case.

@avsm
Copy link
Member

avsm commented Jul 29, 2020

That looks good to me. Thanks for checking. Let's do the mask then, and prevent overflow.

@hannesm
Copy link
Member Author

hannesm commented Jul 30, 2020

I understand masking helps to avoid the overflow(s), but with the issue from dns linked above, it will only hide the actual issue by silently truncating (a semantics I'm not too fund of, since it is pretty hard to debug).

TL;DR: similar to checking offset and length on get&set, it's a question of design (see bigstringaf as alternative in that space). An alternative proposal (API breaking) would be to define uint8 / uint16 / uint32 / uint64 as abstract types and specify set to use these types -- then there could be constructors that check the value ranges, and "unsafe" ones that are just the identity, and "masking" ones as proposed above.

@dinosaure
Copy link
Member

An alternative proposal (API breaking) would be to define uint8 / uint16 / uint32 / uint64 as abstract types and specify set to use these types -- then there could be constructors that check the value ranges, and "unsafe" ones that are just the identity, and "masking" ones as proposed above.

I think this choice should be lead by optimization done by the compiler. I'm not sure that the compiler can delete a branch if we give to it a constant which respects the range. However, again, the CPU prediction can lead on that. From my perspective, I like the idea of abstract type but we fallback to another issue: #86.

@hannesm
Copy link
Member Author

hannesm commented Jul 30, 2020

since there are some performance notes in this issue and #86, maybe we should first establish some benchmarks (micro and real-world usage), then re-think safety/correctness and then performance again. Re-reading #86, I think abstract types (and conversions) are the way to go in respect to safety. Performance-wise, I struggle whether there should be some sort of Cstruct.Unsafe for non-checked operations (my intuition is when I receive an IPv4 packet, and check that it is at least header length bytes, I should be able to access individual header fields without performance penalty of bounds checks), i.e.:

let do_something_with buf =
  guard (Cstruct.len buf > Ipv4_packet.header_size) "too small" >>= fun () ->
  let typ = Cstruct.get_uint8 buf 3 in (* this should not induce bounds checks since I just checked that in the line above *)
  ...

EDIT: same for serialising:

let buf = Cstruct.create 20 in
Cstruct.set_uint8 buf 0 0x2F; (* should neither bound check nor value bound check :D *)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants