Streaming cryptographic hashes for R
________
/\ sec \
/ \ ret \
\ / base /
\/_______/
At 40KB - smaller than most hex stickers - {secretbase} delivers streaming cryptographic hashes for R. It does one thing, and does it well: hash R objects and files without memory overhead.
R already has excellent packages for cryptographic hashing. The {digest} package by Dirk Eddelbuettel has served the community since 2003. The {openssl} package by Jeroen Ooms provides the full OpenSSL library. {secretbase} complements these with a minimal, dependency-free implementation built around streaming from the ground up.
When you hash an R object using most approaches, the object must first be serialized to a raw vector, which is then hashed. For large objects, this means allocating memory for the entire serialized representation before hashing can begin.
{secretbase} eliminates this intermediate step. Using R’s internal serialization callbacks, data flows directly from the serialization process into the hash computation:
library(secretbase)
# Hash a large object without materializing the serialized form
large_data <- data.frame(x = rnorm(1e6), y = rnorm(1e6))
sha3(large_data)
[1] "b742d6fb89cbfa5efde57e4f5fcb25d219502b12f416d3d040693016823a4223"
No memory spike. The data frame is never converted to a serialized representation in memory - serialization feeds directly into the hash context in 64KB chunks.
The same applies to files. Hash files larger than available RAM:
sha256(file = "huge_dataset.parquet")
For reproducibility, hash values are consistent across platforms: always R serialization version 3, big-endian byte order, skipping headers. The same object produces identical hashes on Linux, macOS, and Windows.
{secretbase} serves as infrastructure for the {targets} package, Will Landau’s Make-like pipeline tool for data science workflows.
{targets} determines whether pipeline targets need rebuilding by comparing hash values. When code or data changes, {targets} recomputes the hash - if it matches the stored value, the target is skipped. This happens constantly during pipeline execution, so hashing performance directly impacts the time spent checking versus computing.
Streaming serialization matters here. A pipeline checking dozens of large data frames on each run would otherwise serialize each object to memory before hashing. With {secretbase}, memory footprint stays constant regardless of object size. Cross-platform consistency ensures pipelines produce identical results whether developed on a laptop or executed on a cluster.
{secretbase} provides SHA-3 (NIST 2015), SHA-256, SHAKE256 (extendable-output), SipHash-1-3 (fast keyed hashing), and Keccak (used by Ethereum):
sha3("secret base")
[1] "a721d57570e7ce366adee2fccbe9770723c6e3622549c31c7cab9dbb4a795520"
sha256("secret base")
[1] "1951c1ca3d50e95e6ede2b1c26fefd0f0e8eba1e51a837f8ccefb583a2b686fe"
# SHAKE256 can produce any output length - useful for deterministic seeds
shake256("my seed", bits = 32L, convert = NA)
[1] -167412652
Beyond hashing, {secretbase} provides encoding utilities:
Base64 - Standard base64 encoding and decoding:
Base58 with checksum - Base58 encoding with a 4-byte double-SHA256 checksum for error detection:
Both base64 and base58 encoding functions support arbitrary R objects through serialization:
# Round-trip any R object through base64
obj <- list(a = 1:10, b = "test")
identical(obj, base64dec(base64enc(obj), convert = NA))
[1] TRUE
CBOR - Concise Binary Object Representation (RFC 8949) is a binary data serialization format designed to be compact and efficient to parse. Think of it as a binary alternative to JSON - it represents the same data model (maps, arrays, strings, numbers) but in a more space-efficient binary encoding.
CBOR was developed by the IETF and has become a standard in several domains:
{secretbase} brings CBOR support to R:
# Round-trip any R object through CBOR
obj <- list(a = 1L, b = "hello", c = TRUE)
identical(obj, cbordec(cborenc(obj)))
[1] TRUE
{secretbase} is implemented in C, with hash algorithm implementations derived from:
The R interface is minimal: thin wrapper functions that pass directly to .Call() with no R-code processing overhead. The package has no dependencies beyond base R, keeping installation simple and avoiding potential conflicts.
40KB. No dependencies. One thing, done well.
For attribution, please cite this work as
shikokuchuo (2026, Feb. 4). shikokuchuo{net}: secretbase: The 40KB Hash Package. Retrieved from https://shikokuchuo.net/posts/28-introducing-secretbase/
BibTeX citation
@misc{shikokuchuo2026secretbase:,
author = {shikokuchuo, },
title = {shikokuchuo{net}: secretbase: The 40KB Hash Package},
url = {https://shikokuchuo.net/posts/28-introducing-secretbase/},
year = {2026}
}