--- title: "Getting Started with Rsassy" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with Rsassy} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") ``` `Rsassy` provides R bindings to the Rust `sassy` approximate string matcher. It searches short patterns in DNA, IUPAC, or ASCII text. ## Install ```{r, eval=FALSE} install.packages( "Rsassy", repos = c("https://sounkou-bioinfo.r-universe.dev", "https://cloud.r-project.org") ) ``` ## Search ```{r} library(Rsassy) sassy_search(list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1, alphabet = "dna") ``` The result is a `sassy_matches` data frame. Coordinates are 0-based and half-open. ## Reuse a searcher ```{r} searcher <- sassy_searcher("dna", rc = TRUE) sassy_searcher_search(searcher, list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1) ``` ## Multiple patterns or texts List inputs search every pattern against every text. Each list element may be a raw vector or a non-missing character scalar, so callers can mix byte strings, regular strings, and ALTREP-backed raw elements. `pattern_idx` and `text_idx` identify the input indices. ```{r} sassy_search( list("ATG", charToRaw("TTT")), list("CCCCATGCCCCTTT"), k = 1, alphabet = "iupac", rc = FALSE, strategy = "encoded_patterns" ) ``` ## FASTA/FASTQ batches `sassy_fastx_iter()` and `sassy_fastx_next()` parse FASTA/FASTQ files into record-count-bounded batches. Record IDs are exposed as an ALTREP character vector and sequences as a list of raw ALTREP slices over immutable native batch buffers. ```{r} fq <- tempfile(fileext = ".fastq") writeLines(c("@r1", "ACGT", "+", "!!!!"), fq, useBytes = TRUE) it <- sassy_fastx_iter(fq, batch_records = 1) batch <- sassy_fastx_next(it) sassy_search(list("ACG"), batch$seq, k = 0, alphabet = "dna", rc = FALSE, text_id = batch$id) ``` See `vignette("fastx-iteration", package = "Rsassy")` for the performance and validation details.