sassy_search() searches once.sassy_searcher() creates a reusable searcher.sassy_searcher_search() searches with that object.sassy_features() prints backend/build information.sassy_set_backend() selects a backend before first
use.sassy_fastx_iter() opens a chunked FASTA/FASTQ
iterator.sassy_fastx_next() returns the next
record-count-bounded batch.Coordinates are 0-based and half-open.
text_start, text_end: interval in the
text.pattern_start, pattern_end: interval in
the pattern.pattern_idx, text_idx: input index for
vector/list inputs.pattern and text are lists of sequence
elements. Each element may be a raw vector or a non-missing character
scalar. This keeps one raw vector as one sequence of bytes and lets
callers mix raw bytes, ordinary strings, and ALTREP-backed raw elements
in the same input list.
sassy_search(list(charToRaw("ACGT")), list(charToRaw("TTACGTAA")), k = 0, alphabet = "dna")
#> <sassy_matches> 2 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 2 6 0 4 0 + 4=
#> 0 0 2 6 0 4 0 - 4=The FASTA/FASTQ iterator returns batches already shaped for this API:
batch$seq is a list of raw ALTREP sequence elements and
batch$id is an ALTREP character vector suitable for
text_id.
fq <- tempfile(fileext = ".fastq")
writeLines(c("@r1", "ACGT", "+", "!!!!"), fq, useBytes = TRUE)
batch <- sassy_fastx_next(sassy_fastx_iter(fq, batch_records = 1))
sassy_search(list("ACG"), batch$seq, k = 0, alphabet = "dna", rc = FALSE, text_id = batch$id)
#> <sassy_matches> 1 match
#> pattern_idx text_idx text_id text_start text_end pattern_start pattern_end cost strand cigar
#> 0 0 r1 0 3 0 3 0 + 3=match_region = TRUE adds the matched text interval.
Reverse-strand regions are reverse-complemented to match the pattern
direction.
sassy_search(
list("ATCGATCG"),
list("GGGGATCGATCGTTTT"),
k = 1,
alphabet = "dna",
match_region = TRUE
)
#> <sassy_matches> 3 matches
#> pattern_idx text_idx text_start text_end pattern_start pattern_end cost strand cigar match_region
#> 0 0 2 10 0 8 1 - 7=1X ATCGATCC
#> 0 0 4 12 0 8 0 + 8= ATCGATCG
#> 0 0 6 14 0 8 1 - 1=1X6= AACGATCGThe default strategy = "pairwise" searches each
pattern/text pair independently. This is the general path and works with
mixed pattern lengths and all alphabets. Other strategies are
performance-oriented paths that call different Sassy kernels:
batch_texts: one pattern, multiple texts per
batch.batch_patterns: multiple equal-length patterns per
batch.encoded_patterns / v2: Sassy
encoded-pattern path.batch_patterns and encoded_patterns use
Sassy’s multi-pattern encoding. In sassy 0.2.1 that
encoding is implemented for the IUPAC profile and equal byte-length
patterns.