| Title: | Persistent ALTREP Vectors |
|---|---|
| Description: | Provides experimental file-backed, ALTREP-style vector allocation for R using the 'fmalloc' library. The package supports persistent and scratch runtimes, durable reference-based serialization, explicit vector lifecycle management, and multiple runtime handles for working with several backing files in one R process. |
| Authors: | Sounkou Mahamane Toure [aut, cre], Kenichi Yasukata [cph] (fmalloc), Wolfram Gloger [cph] (ptmalloc3), Free Software Foundation, Inc. [cph] (selected GNU C Library support files) |
| Maintainer: | Sounkou Mahamane Toure <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.1.0 |
| Built: | 2026-06-03 21:52:16 UTC |
| Source: | https://github.com/sounkou-bioinfo/Rfmalloc |
Rfmalloc provides experimental memory-mapped file allocation capabilities for R using a patched copy of the fmalloc library. The current package exposes ALTREP file-backed vector allocation for logical, integer, numeric, raw, complex, character, and list vectors with fmalloc payload storage.
open_fmallocOpen an explicit fmalloc runtime handle.
init_fmallocOpen and install a default fmalloc runtime.
create_fmalloc_vectorCreate vectors using fmalloc.
create_fmalloc_matrixCreate matrix-shaped fmalloc vectors.
create_fmalloc_arrayCreate array-shaped fmalloc vectors.
create_fmalloc_data_frameCreate data.frames from fmalloc-backed columns.
as_fmalloc_matrixConvert fmalloc vectors to matrix-shaped objects.
as_fmalloc_arrayConvert fmalloc vectors to array-shaped objects.
as_fmalloc_data_frameConvert fmalloc-backed objects to a data.frame.
list_fmalloc_allocationsList persistent allocation catalog records.
diagnose_fmalloc_runtimeSummarize persistent allocation catalog state and runtime diagnostics.
cleanup_fmallocRequest cleanup of an fmalloc runtime.
ALTREP file-backed allocation for logical, integer, numeric, raw,
complex, character, and list vectors. List elements are restricted to
NULL or Rfmalloc-backed vectors from the same runtime.
Large allocations spanning multiple fmalloc chunks.
Multiple runtime handles in one R process.
Persistent and scratch runtime modes.
Reference serialization for persistent fixed-width atomic and character ALTREP vectors.
Fmalloc-backed ALTREP subset copies for vector indexing operations.
An in-file allocation catalog for persistent vectors.
A C-callable API and installed header for other packages.
Native lifetime tracking so runtime mappings outlive reachable vectors allocated from them.
Runtime and catalog diagnostics for planning recovery and operational cleanup.
ALTREP-backed dispatch now covers core Ops, Summary, Math,
Math2, and matrix rowSums/colSums/rowMeans/colMeans
workflows through S3 methods for common vector/matrix usage.
Explicit base-fallback boundaries are:
rowSums(), colSums(), rowMeans(), and colMeans() when
the input is not an exact 2D matrix or dims != 1L; these
cases now emit a warning and call the corresponding
base:: reducer.
Scalar or zero-length results from Summary, Math, and
Math2 generics (for example sum(x) returning a single
value) are returned as ordinary R scalars by design.
Full operator- and method-family coverage is still incomplete for all R generics. Some advanced families may still materialize ordinary R objects in a few edge cases.
Future work includes view-based subset representations, catalog compaction and reset tooling, metadata storage for attributes on persisted elements, robust nested-list reference validation, and compaction of recovery metadata.
Maintainer: Sounkou Mahamane Toure [email protected]
Other contributors:
Kenichi Yasukata (fmalloc) [copyright holder]
Wolfram Gloger (ptmalloc3) [copyright holder]
Free Software Foundation, Inc. (selected GNU C Library support files) [copyright holder]
Useful links:
Report bugs at https://github.com/sounkou-bioinfo/Rfmalloc/issues
Returns an existing vector re-typed as an array by installing array dimensions (and optional dimnames) as metadata.
as_fmalloc_array(x, dim = NULL, dimnames = NULL, copy = TRUE)as_fmalloc_array(x, dim = NULL, dimnames = NULL, copy = TRUE)
x |
A vector. |
dim |
Target dimension vector. |
dimnames |
Optional |
copy |
If TRUE (default), allocate a new fmalloc-backed array object.
If FALSE, install metadata in place on the same fmalloc ALTREP payload
without allocation (this also updates any aliases of |
An array object, backed by the same payload when copy = FALSE.
Thin convenience wrapper around data.frame().
as_fmalloc_data_frame( ..., row.names = NULL, check.names = TRUE, stringsAsFactors = FALSE )as_fmalloc_data_frame( ..., row.names = NULL, check.names = TRUE, stringsAsFactors = FALSE )
... |
Columns or objects to include in the frame. |
row.names |
Optional row names for the frame. |
check.names |
Whether to enforce syntactic column names. |
stringsAsFactors |
Deprecated: retained for compatibility. |
A data.frame containing the supplied columns.
Returns an existing vector re-typed as a matrix by installing matrix dimensions (and optional dimnames) as metadata.
as_fmalloc_matrix(x, nrow = NULL, ncol = NULL, dimnames = NULL, copy = TRUE)as_fmalloc_matrix(x, nrow = NULL, ncol = NULL, dimnames = NULL, copy = TRUE)
x |
A vector. |
nrow |
Optional target row count. |
ncol |
Optional target column count. |
dimnames |
Optional |
copy |
If TRUE (default), allocate a new fmalloc-backed matrix object.
If FALSE, install metadata in place on the same fmalloc ALTREP payload
without allocation (this also updates any aliases of |
A matrix object, backed by the same payload when copy = FALSE.
Requests cleanup of an fmalloc runtime. If vectors allocated from the runtime are still reachable, the native mapping is kept alive until those vectors are garbage-collected.
cleanup_fmalloc(runtime = NULL)cleanup_fmalloc(runtime = NULL)
runtime |
Optional runtime handle returned by |
NULL (invisibly)
## Not run: init_fmalloc("data.bin") v <- create_fmalloc_vector("integer", 100) rm(v) gc() cleanup_fmalloc() ## End(Not run)## Not run: init_fmalloc("data.bin") v <- create_fmalloc_vector("integer", 100) rm(v) gc() cleanup_fmalloc() ## End(Not run)
Creates an fmalloc-backed ALTREP array in a single step by allocating vector storage and installing array dimensions (and optional dimnames).
create_fmalloc_array( type = "integer", dim, dimnames = NULL, runtime = NULL, zero_initialize = TRUE )create_fmalloc_array( type = "integer", dim, dimnames = NULL, runtime = NULL, zero_initialize = TRUE )
type |
Character string specifying the vector type. Supported values are
the same as for |
dim |
Integer dimension vector. |
dimnames |
Optional |
runtime |
Optional runtime handle returned by |
zero_initialize |
Logical scalar passed through to payload allocation. See
|
An fmalloc-backed ALTREP array.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) a <- create_fmalloc_array("numeric", dim = c(2L, 3L), runtime = rt) cleanup_fmalloc(rt) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) a <- create_fmalloc_array("numeric", dim = c(2L, 3L), runtime = rt) cleanup_fmalloc(rt) ## End(Not run)
Thin constructor wrapper around data.frame() that keeps fmalloc vectors as
column payloads.
create_fmalloc_data_frame( ..., row.names = NULL, check.names = TRUE, stringsAsFactors = FALSE )create_fmalloc_data_frame( ..., row.names = NULL, check.names = TRUE, stringsAsFactors = FALSE )
... |
Columns to include in the frame. |
row.names |
Optional row names for the frame. |
check.names |
Whether to enforce syntactic column names. |
stringsAsFactors |
Deprecated: retained for compatibility. |
A data.frame with the provided columns.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) x <- create_fmalloc_vector("integer", 3, runtime = rt) y <- create_fmalloc_vector("character", 3, runtime = rt) x[] <- 1:3 y[] <- c("a", "b", "c") df <- create_fmalloc_data_frame(x = x, y = y) cleanup_fmalloc(rt) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) x <- create_fmalloc_vector("integer", 3, runtime = rt) y <- create_fmalloc_vector("character", 3, runtime = rt) x[] <- 1:3 y[] <- c("a", "b", "c") df <- create_fmalloc_data_frame(x = x, y = y) cleanup_fmalloc(rt) ## End(Not run)
Creates an fmalloc-backed ALTREP matrix in a single step by allocating vector storage and installing matrix dimensions (and optional dimnames).
create_fmalloc_matrix( type = "integer", nrow, ncol, dimnames = NULL, runtime = NULL, zero_initialize = TRUE )create_fmalloc_matrix( type = "integer", nrow, ncol, dimnames = NULL, runtime = NULL, zero_initialize = TRUE )
type |
Character string specifying the vector type. Supported values are
the same as for |
nrow |
Integer number of rows. |
ncol |
Integer number of columns. |
dimnames |
Optional |
runtime |
Optional runtime handle returned by |
zero_initialize |
Logical scalar passed through to payload allocation. See
|
An fmalloc-backed ALTREP matrix object.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) m <- create_fmalloc_matrix("integer", nrow = 2, ncol = 3, runtime = rt) cleanup_fmalloc(rt) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) m <- create_fmalloc_matrix("integer", nrow = 2, ncol = 3, runtime = rt) cleanup_fmalloc(rt) ## End(Not run)
Creates an ALTREP vector using a file-backed fmalloc runtime. The returned
object is ALTREP from creation time. Fixed-width atomic payload bytes are
allocated directly with fmalloc, and ALTREP duplication and vector subsetting
keep copy-on-write copies fmalloc-backed without using R's non-API
Rf_allocVector3() path.
create_fmalloc_vector( type = "integer", length, runtime = NULL, zero_initialize = TRUE )create_fmalloc_vector( type = "integer", length, runtime = NULL, zero_initialize = TRUE )
type |
Character string specifying the vector type. Supported values are
|
length |
Integer specifying the non-negative length of the vector to create. |
runtime |
Optional runtime handle returned by |
zero_initialize |
Logical scalar. If TRUE (default), newly allocated payload bytes are zero-initialized. Set FALSE to skip initialization for faster large allocations when you will fully initialize values yourself. |
A vector of the specified type and length, allocated using fmalloc.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) v <- create_fmalloc_vector("integer", 1000, runtime = rt) cleanup_fmalloc(rt) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) v <- create_fmalloc_vector("integer", 1000, runtime = rt) cleanup_fmalloc(rt) ## End(Not run)
Releases runtime bookkeeping for a single fmalloc ALTREP vector immediately. In
scratch mode, payload memory is immediately reclaimed. In persistent mode, the
vector payload is retained by default so existing on-disk state remains
durable; optional unsafe = TRUE reclaims payload memory and marks metadata
as non-recoverable.
destroy_fmalloc_vector(x, unsafe = FALSE)destroy_fmalloc_vector(x, unsafe = FALSE)
x |
Fmalloc ALTREP vector to destroy. |
unsafe |
Whether to physically free persistent payload bytes. Unsafe destroy is intended for short-lived scratch-like cleanup and will mark the catalog entry as non-recoverable. |
Explicit destroy fails when a vector is still referenced by another fmalloc list vector as a child.
Logical value indicating whether a live vector was destroyed.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin"), mode = "persistent") v <- create_fmalloc_vector("integer", 10, runtime = rt) destroy_fmalloc_vector(v) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin"), mode = "persistent") v <- create_fmalloc_vector("integer", 10, runtime = rt) destroy_fmalloc_vector(v) ## End(Not run)
Returns diagnostic metadata for an open runtime handle, including lightweight runtime attributes, the current allocation catalog, and a catalog-level summary useful for estimating reclaimable/fragmented payload regions.
diagnose_fmalloc_runtime(runtime = NULL)diagnose_fmalloc_runtime(runtime = NULL)
runtime |
Optional runtime handle returned by |
A named list with three components:
runtime: runtime metadata such as file path, UUID, mode, catalog
counters, live vectors, and reference state;
catalog: the full allocation catalog returned by
list_fmalloc_allocations();
summary: a compact set of computed diagnostics and an explicit compaction
status note.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin"), mode = "persistent") x <- create_fmalloc_vector("integer", 4, runtime = rt) y <- create_fmalloc_vector("logical", 2, runtime = rt) diagnose_fmalloc_runtime(rt) cleanup_fmalloc(rt) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin"), mode = "persistent") x <- create_fmalloc_vector("integer", 4, runtime = rt) y <- create_fmalloc_vector("logical", 2, runtime = rt) diagnose_fmalloc_runtime(rt) cleanup_fmalloc(rt) ## End(Not run)
These S3 methods preserve current fmalloc behavior for matrix summary/reduction operations while returning ordinary R vectors for small results.
rowSums(x, na.rm = FALSE, dims = 1L) colSums(x, na.rm = FALSE, dims = 1L) rowMeans(x, na.rm = FALSE, dims = 1L) colMeans(x, na.rm = FALSE, dims = 1L)rowSums(x, na.rm = FALSE, dims = 1L) colSums(x, na.rm = FALSE, dims = 1L) rowMeans(x, na.rm = FALSE, dims = 1L) colMeans(x, na.rm = FALSE, dims = 1L)
x |
A matrix-like object. |
na.rm |
Logical scalar controlling NA removal. |
dims |
Numeric scalar for dimensions. |
These implementations keep managed execution for 2D fmalloc matrices with
dims = 1L. For unsupported shapes or dims values (for example,
non-2D arrays or dims != 1L), the methods warn and delegate to the base R
implementations (base::rowSums, base::colSums, base::rowMeans, and
base::colMeans).
The reduction result, as either an ordinary R object or a
fmalloc vector when result length exceeds
getOption("Rfmalloc.reduce_result_length", 1e6).
Compatibility wrapper that opens an fmalloc runtime and installs it as the
package default runtime used by create_fmalloc_vector() when no explicit
runtime is supplied.
init_fmalloc(filepath, size_gb = NULL, mode = c("persistent", "scratch"))init_fmalloc(filepath, size_gb = NULL, mode = c("persistent", "scratch"))
filepath |
Character string specifying the file path for fmalloc data. |
size_gb |
Numeric value specifying the size of the backing file in GB (optional). If not specified, uses the package default size for new files or the existing file size. |
mode |
Runtime mode. |
For new code, prefer open_fmalloc() and pass the returned runtime handle to
create_fmalloc_vector().
Logical indicating whether the file was newly initialized.
## Not run: alloc_file <- tempfile(fileext = ".bin") init_fmalloc(alloc_file) v <- create_fmalloc_vector("integer", 1000) cleanup_fmalloc() unlink(alloc_file) ## End(Not run)## Not run: alloc_file <- tempfile(fileext = ".bin") init_fmalloc(alloc_file) v <- create_fmalloc_vector("integer", 1000) cleanup_fmalloc() unlink(alloc_file) ## End(Not run)
Returns the in-file allocation catalog for a persistent fmalloc runtime. The catalog is stored in the backing file and records physical allocation metadata used to validate serialized persistent references.
list_fmalloc_allocations(runtime = NULL)list_fmalloc_allocations(runtime = NULL)
runtime |
Optional runtime handle returned by |
For successful recovery, look at the state column:
"committed": valid serialized payload exists for that record;
"tombstone": the payload has been destroyed and is non-recoverable unless
the runtime remains open and referenced directly by an existing SEXP;
other transient states are internal and are generally not expected.
recoverable indicates whether the record can be reopened via serialized
reference metadata. payload_offset == 0 or payload_nbytes == 0 generally
indicates a non-payload entry.
A data frame with one row per catalog record and columns describing the catalog record offset, generation, state, vector type, length, payload offset, payload byte size, flags, and whether the record is recoverable by reference serialization.
## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) v <- create_fmalloc_vector("integer", 10, runtime = rt) list_fmalloc_allocations(rt) cleanup_fmalloc(rt) ## End(Not run)## Not run: rt <- open_fmalloc(tempfile(fileext = ".bin")) v <- create_fmalloc_vector("integer", 10, runtime = rt) list_fmalloc_allocations(rt) cleanup_fmalloc(rt) ## End(Not run)
Opens a file-backed fmalloc runtime and returns an external-pointer handle. Multiple handles to the same path share a single in-process runtime while the underlying file-backed runtime remains open. Runtime mode controls whether vector payloads are durable persistent allocations or scratch allocations that can be returned to fmalloc when their ALTREP handles are garbage-collected.
open_fmalloc(filepath, size_gb = NULL, mode = c("persistent", "scratch"))open_fmalloc(filepath, size_gb = NULL, mode = c("persistent", "scratch"))
filepath |
Character string specifying the file path for fmalloc data. |
size_gb |
Numeric value specifying the size of the backing file in GB (optional). If not specified, uses the package default size for new files or the existing file size. |
mode |
Runtime mode. |
An external pointer of class fmalloc_runtime.