--- title: "SIMDe Dispatch Design" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SIMDe Dispatch Design} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` Rminibwa follows the `RsimdDispatch`/`SIMDe` downstream-package pattern for the SIMD-sensitive minibwa KSW kernels: - `configure` probes compiler support; - backend objects are staged under `src/rmb-ksw/`; - `src/Makevars.in` links those staged objects through `PKG_LIBS`; - ordinary minibwa sources are compiled once with baseline package flags; - runtime selection switches between compiled and CPU-supported backends. This avoids putting ISA flags such as `-mavx2` into global `PKG_CPPFLAGS` or `PKG_CFLAGS`. ## Boundary Dispatch starts at the KSW functions called by minibwa: - `ksw_extz2_sse()`; - `ksw_extd2_sse()`; - `ksw_ll_qinit()` and the `ksw_ll_*` helpers. The rest of minibwa stays baseline-compiled once: - index loading; - BWT and seeding; - chaining; - pairing; - formatting; - R external-pointer glue. ## Backends Rminibwa currently exposes: - `scalar`: portable SIMDe fallback compiled with native intrinsics disabled and no ISA flags; - `sse4`: native SSE4.1/SIMDe-alias build of the KSW files; - `avx2`: native AVX2 build, including the widened dual-gap `extd2` kernel. The `avx2` backend uses the wide `ksw2_extd2_wide.c` patch for the dominant dual-gap gap-filling path and uses AVX2-compiled KSW objects for the remaining KSW entry points. ## R surface ```{r} library(Rminibwa) simd_info() ``` Select a backend explicitly for diagnostics or benchmarks: ```{r} old <- simd_backend() simd_set_backend("scalar") simd_backend() simd_set_backend("auto") ``` `simd_set_backend()` errors if the requested backend was not compiled or is not supported by the current CPU/runtime. ## Validation For every available backend, Rminibwa runs the same native mapping fixture in `tinytest` and checks that the native alignment batch and ALTREP column accessors remain valid. The README `make rdm` path additionally benchmarks: 1. internal Rminibwa-only `scalar` vs `sse4` vs `avx2`; and 2. Rminibwa AVX2 vs locally built Python/Rust bindings compiled with native AVX2 codegen. Backend timing must use a workload that actually reaches KSW. Very short exact matches can be resolved by seeding/chaining and ungapped fast paths without a single `ksw_ext*` or `ksw_ll*` call, so apparent `scalar`/`sse4`/`avx2` differences on those examples are just end-to-end mapper noise. The developer README benchmark uses an indel-mutated random read and records internal KSW call counters before timing count-only mapping.