tccquickr compiles a
deliberately narrow, declared subset of R. This
vignette states that subset precisely: the type language you declare,
which syntactic constructs are accepted (and what “accepted” means),
where the boundaries to the R interpreter are, and which optimizations
the middle-end performs.
Every compiled function begins with declare(type(...)).
A declared type is an element mode at a rank:
| Rank | How you write it | Meaning |
|---|---|---|
| scalar (0) | double(), integer(),
logical() |
a length-1 value |
| vector (1) | double(NA), integer(NA),
double(n) |
a vector; NA = unknown length, a name = a symbolic
extent |
| matrix (2) | double(nrow, ncol), double(r, c) |
a 2-D array; bounds may be symbolic |
Element modes are double, integer, and
logical (plus the internal xlen index type).
Symbolic extents (n, num_states, …) tie
argument shapes together and feed shape/domain reasoning; they are how
the Viterbi example in vignette("getting-started") relates
its matrices and vectors.
Each R construct has one of three dispositions, and the boundary between them is the spec boundary of the compiler:
fallback = "auto") to an explicit
r_eval boundary node that calls back into R. Rejected under
the default fallback = "hard".This table is computed by probing the actual
compiler, so it cannot drift from the implementation. It is
grounded in R’s own grammar, the yacc grammar src/main/gram.y.
| Construct | Probe | Disposition |
|---|---|---|
| add | x + 1 | core |
| comparison != | x != 0 | core |
| comparison < | x < 1 | core |
| comparison <= | x <= 0 | core |
| comparison == | x == 0 | core |
| comparison > | x > 0 | core |
| comparison >= | x >= 0 | core |
| divide | x / 2 | core |
| fold reducer call | sum(x) | core |
| for loop + indexed write | out <- x; for (i in 1:n) { out[i] <- out[i] * 2 }; out | core |
| if / else (scalar) | if (n > 0L) n else -n | core |
| integer constant | x + 1L | core |
| integer divide | x %/% 2 | core |
| local assignment | a <- x + 1; a | core |
| logical and (&) | (x > 0) & (x < 1) | core |
| logical or (|) | (x > 0) | (x < 1) | core |
| multiply | x * 3 | core |
| numeric constant | x * 2 | core |
| parenthesised | (x + 1) * 2 | core |
| pipe (|>) | x |> sin() | core |
| power | x ^ 2 | core |
| range slice ([) | x[1:n] | core |
| single index ([) | x[1L] | core |
| subtract | x - 1 | core |
| switch (integer) | switch(n, 1, 2, 3) | core |
| symbol | x | core |
| unary math call | sin(x) | core |
| unary minus | -x | core |
| unary not | !(x > 0) | core |
| unary plus | +x | core |
| vectorized ifelse | ifelse(x > 0, x, -x) | core |
| double index ([[) | x[[1L]] | boundary |
| logical and (&&) | (n > 0L) && (n < 9L) | boundary |
| logical or (||) | (n > 0L) || (n < 9L) | boundary |
| modulo | x %% 2 | boundary |
| unknown call | qux(x) | boundary |
| anonymous function | (function(z) z)(x) | rejected |
| break | out <- x; for (i in 1:n) { if (i > 0L) break; out[i] <- 0 }; out | rejected |
| colon sequence | 1:n | rejected |
| dollar (\() |x\)a | rejected | |
| namespaced call (::) | base::sin(x) | rejected |
| next | out <- x; for (i in 1:n) { if (i > 0L) next; out[i] <- 0 }; out | rejected |
| NULL | NULL | rejected |
| repeat loop | repeat { break }; x | rejected |
| slot (@) | x@a | rejected |
| string constant | “hello” | rejected |
| while loop | while (n > 0L) { }; x | rejected |
A drop in this coverage, or a core construct silently
becoming a boundary, is caught by
inst/tinytest/test_tccq_grammar.R.
The subset is small on purpose, and the rules are explicit so they live in the IR and middle-end rather than appearing accidentally in the C emitter:
a <- expr is a local binding, not
mutation.x[i] is a checked, 1-based indexed
read; x[idx] is a gather.x[lo:hi] is a contiguous slice/view
(an explicit view1 node), so a reduction over a slice need
not copy first.a[i] <- v and a[lo:hi] <- v are
mutation barriers on local storage.x[1] <- v on an input) is rejected — inputs are
borrowed.direct_formal_mutation <- function(x, v) {
declare(type(x = double(NA)), type(v = double()))
x[1] <- v
x
}
tccq_compile(direct_formal_mutation, mode = "code")
#> Error:
#> ! indexed assignment currently requires a local vector binding. Write y <- x; y[...] <- value; y instead of mutating formal 'x' directly.Writing through a local binding is fine: y <- x
starts as an alias to the input, and the first indexed write forces
materialization into owned storage.
Optimization is organized around a three-region split, and the regions are explicit in the IR rather than implied by codegen:
Rf_* calls. This is
the only region targeted by optimization.TYPEOF,
XLENGTH, PROTECT, …).boundary_call node
(currently backed by r_eval). It is a legality barrier:
optimization never reorders or fuses across it.Unsupported calls do not vanish into target special cases. Under the
default fallback = "hard" they are an error; under
fallback = "auto" they become an inspectable boundary
node:
The frontend lowers declared R into typed IR; the middle-end runs an ordered pass pipeline; the target prints C; a backend compiles it. The default passes (introspected live):
| Pass | What it does |
|---|---|
| validate_ir | Structurally validate the typed IR. |
| effects | Classify expression effects (pure vs boundary-bearing). |
| index_normalize | Normalize index/view access chains into a canonical form. |
| shape_domains | Derive shape/domain facts for vector expressions and locals. |
| kernelize | Rewrite expressions into explicit kernel IR (producer / materialize / fold / scalar_kernel). |
| fusion | Fuse legal producer/fold patterns (e.g. fold over a materialized producer becomes a fold over the producer). |
| boundary_collect | Gather explicit r_eval boundary nodes. |
| boundary_validate | Validate boundary-node legality and the R APIs they use. |
| storage_plan | Decide ownership: borrowed formals, owned locals, aliases, views, and materialization points. |
| allocation_plan | Plan allocations for outputs and temporaries. |
| protect_plan | Plan PROTECT/UNPROTECT for the C-API wrapper region. |
The kernel IR concepts are what make the high-value decisions explicit:
producer — an element computation over
an index domain.materialize(producer) — allocate an
output and write producer elements.fold(op, domain, elem) — reduce an
element expression over a domain.Fusion is therefore an IR rewrite, not an emitter accident. For
sum((sin(x) + y) * y), kernelize first
represents the reduction as a fold over a materialized
producer; fusion rewrites it to a fold over the producer
directly:
mod <- tccquickr:::tccq_lower_module(tccquickr:::tccq_frontend(function(x, y) {
declare(type(x = double(NA), y = double(NA)))
sum((sin(x) + y) * y)
}))
base_passes <- list(
tccquickr:::tccq_pass_validate_ir(), tccquickr:::tccq_pass_effects(),
tccquickr:::tccq_pass_index_normalize(), tccquickr:::tccq_pass_shape_domains(),
tccquickr:::tccq_pass_kernelize()
)
after_kernelize <- tccquickr:::tccq_run_passes(mod, base_passes)
after_fusion <- tccquickr:::tccq_run_passes(mod, c(base_passes, list(tccquickr:::tccq_pass_fusion())))
after_kernelize
#> <tccq_module>
#> entry: tccq_entry
#> formals:
#> - x : double[NA]
#> - y : double[NA]
#> ir: program
#> kernel: fold
#> result type: double
after_fusion
#> <tccq_module>
#> entry: tccq_entry
#> formals:
#> - x : double[NA]
#> - y : double[NA]
#> ir: program
#> kernel: fold
#> result type: doubleThe current target is C, but the pipeline is built so the high-value
decisions (types, ranks, ownership, shape domains, the elementwise
domain, boundaries) are made before any target syntax. A
lowered IR seam sits between the middle-end and the printers so the same
decisions can drive other targets later (Fortran/Rust/devices). The
architecture and verification strategy — including the R-as-oracle
conformance number and the Lean transformation-soundness proofs — are
recorded in the decision records under docs/decisions/ on
GitHub.