--- title: "FFI Boundary Semantics" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{FFI Boundary Semantics} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This article is about what actually crosses the R/C boundary in `Rtinycc`: - when values are copied - when they are borrowed - when they stay as raw addresses - when wrappers allocate temporary storage The statements below are based on the implemented wrapper generator and runtime helpers. ## Scalar Inputs Are Converted Scalar inputs are converted at the boundary. For example: - `i8`, `i16`, `i32`, `u8`, `u16` use integer coercion plus range checks - `i64`, `u32`, `u64` use numeric coercion plus integer-value checks - `bool` rejects `NA` - `f32` and `f64` are read from R numerics So scalar arguments are not zero-copy views into R objects. They become C scalars inside the wrapper. ## Vector Inputs Are Usually Borrowed The array input types: - `raw` - `integer_array` - `numeric_array` - `logical_array` are passed as writable direct pointers into the underlying R vector storage. For ordinary already-materialized vectors, no extra buffer is allocated by the wrapper. That means: - C sees the existing vector data - mutation from C writes into the same memory region - the current array input types intentionally use R's writable pointer access path because their C signatures receive mutable pointers This is the main zero-copy part of the FFI boundary. One important R-specific caveat is ALTREP: if an input vector is an ALTREP object, asking R for a writable C data pointer can materialize that vector. Checking `ALTREP(x)` in generated C could choose a different policy, but using `*_GET_REGION()` into a temporary buffer would be a different contract for these mutable types: it would need copy-back behavior and would not automatically preserve pointer aliasing when the same R vector is passed to multiple C arguments. Read-only ALTREP-friendly array types would be a cleaner separate API. ## `tcc_call_symbol()` Uses `.C()`-Style Copy-In/Copy-Out When `tcc_call_symbol()` is called with extra arguments, it follows the argument-type mapping of R's `.C()` interface rather than the zero-copy array contract used by `tcc_ffi()` wrappers. For atomic vectors and character vectors, Rtinycc copies inputs into guarded mutable call storage and returns a list containing the copied-back values. This means C mutations are visible in the returned list, not by mutating the original R objects. Numeric vectors with `attr(x, "Csingle")` are converted through a temporary `float *` buffer, matching R's legacy `.C()` convention. Lists and other non-atomic R objects follow the legacy `.C()` read-only paths: lists are exposed as `SEXP *`, while functions, environments, and other R objects are exposed as `SEXP`. These values are borrowed only for the duration of the call. C code must not mutate them through `tcc_call_symbol()`, and if it stores a `SEXP` beyond the call it must preserve and later release it with the R C API. For ordinary lists, Rtinycc can pass the existing `SEXP *` element storage read-only; for ALTREP or otherwise opaque list-like vectors, it rebuilds a temporary call-lifetime `SEXP *` view with `VECTOR_ELT()` so the path remains ALTREP-aware without forcing writable data access. Unlike R's optional `options(CBoundsCheck = TRUE)`, `tcc_call_symbol()` checks guard bytes around its copied atomic and character buffers by default. This can catch simple underwrites and overwrites, but it is not a sandbox: far out-of-bounds native writes are still bugs in the called C code. For character arguments, C may edit the contents of each copied string buffer in place, but it must not replace the `char *` elements in the `char **` array. ## `cstring_array` Is Rebuilt Per Call `cstring_array` is different. The wrapper allocates a temporary `const char **` with `R_alloc()` and fills it by translating each R string element. So: - the pointer array itself is allocated for the call - each element points at translated string data - this is not the same as passing a pre-existing C array through unchanged ## Returned Arrays Are Copied into Fresh R Vectors Array returns are always copied into a newly allocated R vector. The wrapper uses the declared `length_arg` to size the R result, then `memcpy()` copies the returned C buffer into that vector. If `free = TRUE`, the wrapper also frees the original returned buffer after the copy. So array returns are not borrowed views into C memory. ## Returned `cstring` Values Are Copied For `cstring` returns, the wrapper creates an R string with `mkString()` when the returned pointer is non-NULL. That means the resulting R value is a copy in R-managed memory, not a retained external pointer to the original C string. ## Returned `ptr` Values Stay as Pointers For `ptr` returns, the wrapper constructs an external pointer around the raw address. That means: - no pointee copy is made - ownership is not implied - the pointer may dangle if the underlying C storage goes away The same distinction matters for globals and struct fields. ## `sexp` Passes Through Directly `sexp` is the most direct boundary mode: - input `sexp` arguments are passed through as `SEXP` - returned `sexp` values are returned directly This is useful when you want the R C API contract rather than the stricter FFI conversion layer. ## Owned vs Borrowed Helper Pointers At the helper level: - `tcc_malloc()` and `tcc_cstring()` create owned external pointers - `tcc_data_ptr()` and `tcc_read_ptr()` return borrowed external pointers - struct field address helpers and many raw pointer returns are borrowed views - named nested struct getters such as `struct_outer_get_child()` return borrowed nested views into the owning struct storage Use `tcc_ptr_is_owned()` when you need to distinguish these cases in R code. ## Bitfields Are Scalar Helpers, Not Addressable Views Bitfield helpers behave like scalar getter/setter helpers at the R boundary, but that does **not** make them ordinary addressable fields. In particular: - bitfield getters return copied scalar values - bitfield setters write scalar values back through the compiler-managed bitfield storage - `tcc_field_addr()` and `tcc_container_of()` reject bitfield members So bitfields are intentionally excluded from the borrowed-address helper model. ## Serialization Boundary Compiled `tcc_compiled` objects store enough recipe information to recompile after `serialize()` / `unserialize()` or `readRDS()`. Raw pointers and raw `tcc_state` objects do not gain that behavior. After serialization they are just dead addresses or invalid states, not auto-reconstructed resources. The same applies to callback tokens, struct/union external pointers, and helper allocations from `tcc_malloc()` or `tcc_cstring()`: they do not serialize as live native resources.