Rducks keeps SQL type semantics explicit. DuckDB owns binding and execution; Rducks maps DuckDB values into R values, calls the R function, and writes values back to DuckDB with the declared result type.
Three concepts are intentionally separate:
mode = "scalar" calls R once per row;
mode = "vectorized" calls R once per DuckDB chunk.arrow_r,
arrow_c, or arrow_ipc marshalling combined
with an allowed concurrency model.Changing a connection’s default execution plan affects future scalar-UDF registrations and the matching native runtime backend; it does not rewrite an existing scalar UDF to a different marshalling engine.
Rducks descriptors describe DuckDB logical types, including primitive, exact, and composite values.
primitive <- list(INTEGER, DOUBLE, BOOLEAN, VARCHAR)
exact <- list(UUID, HUGEINT, DECIMAL(18, 4), INTERVAL, BIT)
semi_structured <- list(GEOMETRY, VARIANT)
composite <- list(
INTEGER[],
ARRAY(DOUBLE, 3),
STRUCT(id = INTEGER, label = VARCHAR),
MAP(VARCHAR, DOUBLE),
UNION(i = INTEGER, s = VARCHAR)
)Declared scalar-UDF arguments pin the SQL signature:
rducks_register_scalar_udf(
con,
name = "r_add_one",
fun = function(x) x + 1L,
args = INTEGER,
returns = INTEGER
)
#> <rducks_scalar_udf_registration>
#> registered: yes
#> name: r_add_one
#> evaluation_mode: scalar
#> plan: arrow_r+serial
#> signature: r_add_one(INTEGER) -> INTEGEROmitting args registers a dynamic DuckDB varargs
function. At bind time, DuckDB supplies the concrete logical types for
the SQL call, and Rducks uses those bound types for the same input
materialization it would use for explicit args.
rducks_register_scalar_udf(
con,
name = "r_payload_label",
fun = function(payload) paste(payload$label, payload$x, sep = ":"),
returns = VARCHAR
)
#> <rducks_scalar_udf_registration>
#> registered: yes
#> name: r_payload_label
#> evaluation_mode: scalar
#> plan: arrow_r+serial
#> signature: r_payload_label(...) -> VARCHAR
DBI::dbGetQuery(con, "
SELECT r_payload_label(struct_pack(x := 3::INTEGER, label := 'a')) AS label
")
#> label
#> 1 a:3Use args = NULL for a true zero-argument UDF.
null_handling = "default" follows DuckDB’s default
scalar-UDF contract: if a top-level input is SQL NULL,
DuckDB produces SQL NULL without calling R.
null_handling = "special" passes top-level SQL
NULL inputs through to R as type-specific missing values so
the R function can decide what to return.
rducks_register_scalar_udf(
con,
name = "r_null_special",
fun = function(x) if (is.na(x)) 5L else x,
args = INTEGER,
returns = INTEGER,
null_handling = "special"
)
#> <rducks_scalar_udf_registration>
#> registered: yes
#> name: r_null_special
#> evaluation_mode: scalar
#> plan: arrow_r+serial
#> signature: r_null_special(INTEGER) -> INTEGER
DBI::dbGetQuery(con, "SELECT r_null_special(NULL::INTEGER) AS x")
#> x
#> 1 5Nested NULLs are part of the nested value. Scalar children usually
become typed NA values, while nested composite NULLs become
NULL.
exception_handling = "rethrow" makes R errors fail the
SQL query. Other error handling modes are explicit choices and should be
tested with the declared return type.
Mark functions with side_effects = TRUE when they depend
on counters, randomness, time, I/O, mutation, sleeps, external state, or
diagnostics. Without that flag, DuckDB may treat a scalar UDF as pure
enough for ordinary SQL optimization.
The package exports compact reference tables so tests and documentation can stay aligned with the implemented semantics.
rducks_mode_semantics()[, c("mode", "call_granularity", "input_shape")]
#> mode call_granularity
#> 1 scalar one R call per row
#> 2 vectorized one R call per DuckDB chunk
#> input_shape
#> 1 one scalar/composite R value per declared or dynamically bound argument
#> 2 one R vector/list-column per declared or dynamically bound argument
rducks_value_semantics()[
rducks_value_semantics()$duckdb_type %in% c("INTEGER", "VARCHAR", "GEOMETRY", "VARIANT", "STRUCT"),
c("duckdb_type", "r_value_class", "special_null_argument")
]
#> duckdb_type r_value_class special_null_argument
#> 6 INTEGER integer NA_integer_
#> 12 VARCHAR character NA_character_
#> 14 GEOMETRY raw NULL
#> 15 VARIANT rducks_variant NULL
rducks_argument_type_mapping(list(
INTEGER,
UUID,
DECIMAL(10, 2),
STRUCT(a = INTEGER[])
))
#> duckdb_type descriptor_kind r_value_class r_argument_shape
#> 1 INTEGER scalar integer integer scalar
#> 2 UUID scalar rducks_uuid rducks_uuid scalar
#> 3 DECIMAL(10, 2) decimal rducks_decimal rducks_decimal scalar
#> 4 STRUCT(a INTEGER[]) struct list named list of fields
#> special_null_argument copy_semantics integer_uses_r_double
#> 1 NA_integer_ boxed scalar FALSE
#> 2 NULL boxed exact Rducks value FALSE
#> 3 NULL boxed exact Rducks value FALSE
#> 4 NULL recursive R allocation FALSE
#> float32_widens_to_r_double precision_may_be_lost
#> 1 FALSE FALSE
#> 2 FALSE FALSE
#> 3 FALSE FALSE
#> 4 FALSE FALSE
#> notes
#> 1
#> 2 exact Rducks value class
#> 3 exact fixed-point value class
#> 4 recursive field mapping