--- title: "API and Coordinates" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{API and Coordinates} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") library(Rsassy) ``` ## Functions - `sassy_search()` searches once. - `sassy_searcher()` creates a reusable searcher. - `sassy_searcher_search()` searches with that object. - `sassy_features()` prints backend/build information. - `sassy_set_backend()` selects a backend before first use. - `sassy_fastx_iter()` opens a chunked FASTA/FASTQ iterator. - `sassy_fastx_next()` returns the next record-count-bounded batch. ## Coordinates Coordinates are 0-based and half-open. - `text_start`, `text_end`: interval in the text. - `pattern_start`, `pattern_end`: interval in the pattern. - `pattern_idx`, `text_idx`: input index for vector/list inputs. ```{r} matches <- sassy_search(list("ACGT"), list("TTACGTAA"), k = 0, alphabet = "dna", rc = FALSE) matches[, c("text_start", "text_end", "pattern_start", "pattern_end", "cigar")] ``` ## Inputs `pattern` and `text` are lists of sequence elements. Each element may be a raw vector or a non-missing character scalar. This keeps one raw vector as one sequence of bytes and lets callers mix raw bytes, ordinary strings, and ALTREP-backed raw elements in the same input list. ```{r} sassy_search(list(charToRaw("ACGT")), list(charToRaw("TTACGTAA")), k = 0, alphabet = "dna") ``` The FASTA/FASTQ iterator returns batches already shaped for this API: `batch$seq` is a list of raw ALTREP sequence elements and `batch$id` is an ALTREP character vector suitable for `text_id`. ```{r} fq <- tempfile(fileext = ".fastq") writeLines(c("@r1", "ACGT", "+", "!!!!"), fq, useBytes = TRUE) batch <- sassy_fastx_next(sassy_fastx_iter(fq, batch_records = 1)) sassy_search(list("ACG"), batch$seq, k = 0, alphabet = "dna", rc = FALSE, text_id = batch$id) ``` ## Match regions `match_region = TRUE` adds the matched text interval. Reverse-strand regions are reverse-complemented to match the pattern direction. ```{r} sassy_search( list("ATCGATCG"), list("GGGGATCGATCGTTTT"), k = 1, alphabet = "dna", match_region = TRUE ) ``` ## Search strategies The default `strategy = "pairwise"` searches each pattern/text pair independently. This is the general path and works with mixed pattern lengths and all alphabets. Other strategies are performance-oriented paths that call different Sassy kernels: - `batch_texts`: one pattern, multiple texts per batch. - `batch_patterns`: multiple equal-length patterns per batch. - `encoded_patterns` / `v2`: Sassy encoded-pattern path. `batch_patterns` and `encoded_patterns` use Sassy's multi-pattern encoding. In `sassy` 0.2.1 that encoding is implemented for the IUPAC profile and equal byte-length patterns.