Tantivy FFI bindings for full-text search in Inko.
This library provides FFI bindings to Tantivy, a Rust-based full-text search engine. It offers:
- Full-text search with relevance scoring
- Fast indexing and querying
- Field-level indexing (string, integer, boolean, etc.)
- Faceted search and aggregation
- Autocomplete and did-you-mean suggestions
Search result scores are f32 precision in Tantivy for performance. They are represented as Float (f64) in Inko, but this does not add precision—it merely stores the f32 value.
Key implications:
- Use scores for ranking/sorting only, not precise calculations
- Avoid exact equality comparisons on scores (e.g.,
score_a == score_b) - Small differences (< 1e-7) between scores are likely due to f32 rounding
- When displaying scores, round to reasonable precision (4-6 decimal places)
- Scores range from 0.0 to 1.0 for BM25 scoring
Recommended practices:
# Compare scores with a tolerance, not exact equality
fn are_scores_equal(a: Float, b: Float, tolerance: Float = 0.0001) -> Bool {
(a - b).abs < tolerance
}
# Sort by score for ranking (this is the intended use case)
let mut results = index.search(query, limit: 100, offset: 0).or_panic
let sorted = results.sort(fn (a, b) -> { b.score <=> a.score })
Test coverage:
See test/test_score_precision.inko for comprehensive tests covering f32 to f64 conversion, edge cases (zero, negative values), and comparison behavior.
This document describes differences between the Inko API and the underlying Tantivy library, along with current limitations.
The Inko API exposes a subset of Tantivy's capabilities:
✓ Fully supported:
- Full-text search with BM25 scoring
- Custom schema configuration
- Document CRUD operations (create, read, update, delete)
- Faceted search and aggregations
- Autocomplete suggestions
- Did-you-mean suggestions
- Boolean query operators (AND, OR, NOT)
⚠ Partially supported:
- Query builders with basic operators
- Batch document indexing
- Result limits and pagination
✗ Not currently supported:
- Highlighting (TantivyResult has a highlight field, but always returns None)
- Advanced query operators (range queries, proximity search)
- Multi-term phrase search with slop
- Index snapshots and point-in-time queries
- Query cancellation and timeouts
- Index merging and optimization
- Advanced scoring functions (TF-IDF customization)
- Facet drill-down with hierarchical facets
Concurrency:
- Multiple processes can access the same index using separate TantivyIndexManager instances
- Single TantivyIndexManager instance cannot be shared across processes
- No built-in thread-safe concurrent access to the same manager
Memory Management:
- Automatic cleanup via Drop trait (with manual close() recommended)
- No garbage collection (manual memory management)
- Buffer allocation for each FFI call (potential optimization candidate)
Query Features:
- No query caching
- No query optimization suggestions
- No query explain/analysis
- No result highlighting (field exists but not implemented)
Index Management:
- No index statistics or metadata
- No index compaction control
- No index backup/restore operations
- No schema migration support
The following features are under consideration for future releases:
- Query timeouts: Time-based cancellation of long-running queries
- Streaming results: Lazy iteration over large result sets
- Performance monitoring: Built-in metrics for operations
- Advanced query syntax: Support for more Tantivy query operators
- Result highlighting: Proper implementation of search result highlighting
If you need features not listed here:
Consider using the Tantivy Rust library directly, or open an issue to discuss adding the feature to the Inko API.
inko pkg add github.com/jhult/inko-tantivy <latest-version>
inko pkg syncUse the build script for the easiest development experience:
./build.sh build # Build the FFI library (or just ./build.sh)
./build.sh test # Build and run tests (automatically sets library path)
./build.sh install # Install to /usr/local/lib (requires sudo, optional)
./build.sh clean # Clean all build artifacts
./build.sh help # Show all commandsThe script follows shell best practices with strict error handling, proper quoting, and helpful colored output.
This library uses cargo-zigbuild for cross-platform builds, creating native libraries that run on Linux, macOS, and Windows.
./build.shBuilds for your current platform only:
- macOS (Intel/ARM):
libtantivy_c.dylib - Linux (x64/ARM):
libtantivy_c.so - Windows:
tantivy_c.dll
Install cargo-zigbuild:
cargo install cargo-zigbuildBuild for all platforms:
cd native/tantivy-c
# Linux x64
cargo zigbuild --release --target x86_64-unknown-linux-gnu
# Linux ARM64
cargo zigbuild --release --target aarch64-unknown-linux-gnu
# macOS x64 (Intel)
cargo zigbuild --release --target x86_64-apple-darwin
# macOS ARM64 (Apple Silicon)
cargo zigbuild --release --target aarch64-apple-darwin
# Windows x64
cargo zigbuild --release --target x86_64-pc-windows-gnuPre-built libraries for all platforms are available from GitHub Releases.
Available Platforms:
| Platform | Release File |
|---|---|
| Linux x64 | linux_x64_tantivy_c.so |
| Linux ARM64 | linux_arm64_tantivy_c.so |
| macOS x64 (Intel) | macos_x64_tantivy_c.dylib |
| macOS ARM64 (Apple Silicon) | macos_arm64_tantivy_c.dylib |
| Windows x64 | windows_x64_tantivy_c.dll |
The native library must be available when the Inko program runs:
Option A: Library in project root
# Build or download library to project root
./build.sh
# OR download GitHub release (for your platform):
wget https://github.com/jhult/inko-tantivy/releases/latest/download/linux_x64_tantivy_c.so -O libtantivy_c.so
# (see [Download from GitHub Releases](#option-3-download-from-github-releases) for all files)
# Run your Inko program
inko run src/main.inkoOption B: System library path
# Install to system library directory
sudo cp libtantivy_c.so /usr/local/lib/
# Run from anywhere
inko run src/main.inkoOption C: LD_LIBRARY_PATH (Linux)
export LD_LIBRARY_PATH=/path/to/library:$LD_LIBRARY_PATH
inko run src/main.inkoOption D: DYLD_LIBRARY_PATH (macOS)
export DYLD_LIBRARY_PATH=/path/to/library:$DYLD_LIBRARY_PATH
inko run src/main.inkoAlways call close() explicitly to handle cleanup errors:
# Explicit close with error handling
match index.close {
case Ok(_) -> {}
case Error(e) -> std.stdio.Stderr.new.print("Failed to close index: ${e}")
}
Why explicit close is recommended:
- The
Droptrait attempts to close the index automatically ifclose()is not called - Automatic cleanup only logs errors to stderr, making them harder to detect
- Explicit
close()allows proper error handling and recovery - This is especially important in production environments where cleanup failures should be monitored
If you prefer to panic on close errors (for quick scripts), use:
index.close.or_panic # Panics with the error message
For production use, implement rate limiting to prevent abuse and ensure fair resource allocation:
import rate_limiter (RateLimiter)
# Create a rate limiter: 100 requests per second, burst of 200
let mut search_limiter = RateLimiter.new(200.0, 100.0)
# Use rate limiter with search operations
fn search_with_limit(query: String) -> Result[Array[TantivyResult], String] {
if search_limiter.acquire_token {
index.search(query, 10, 0)
} else {
Result.Error("Rate limit exceeded. Please try again later.")
}
}
# For batch operations, acquire multiple tokens at once
fn index_with_limit(doc_id: String, fields: Array[(String, String)]) -> Result[Bool, String] {
if search_limiter.acquire_tokens(5) {
index.add_doc(doc_id, fields)
} else {
Result.Error("Rate limit exceeded. Please try again later.")
}
}
Key considerations for rate limiting:
- Adjust capacity and refill rate based on your hardware capabilities
- Different operations may require different token costs (e.g., batch index vs. single search)
- Consider per-user or per-IP rate limiters for multi-tenant systems
- Manually refill tokens based on time or other application-specific logic
- The provided
RateLimitertype is a simplified implementation; extend it for production time-based refill
Prevent long-running queries from blocking your application:
# Set reasonable limits on result size
let result = index.search(query, limit: 100, offset: 0)
# For aggregations, use smaller limits to prevent excessive computation
let facets = index.aggregate_terms("category", query, limit: 50)
# Monitor query duration in production
import std.process (current_time_in_nanos)
fn timed_search(query: String, max_ms: Int = 5000) -> Result[Array[TantivyResult], String] {
let start = current_time_in_nanos
match index.search(query, limit: 100, offset: 0) {
case Ok(results) -> {
let elapsed_ms = (current_time_in_nanos - start) / 1_000_000
if elapsed_ms > max_ms {
Result.Error("Query exceeded timeout limit")
} else {
Result.Ok(results)
}
}
case Error(e) -> Result.Error(e)
}
}
Best practices for query timeouts:
- Set timeout at 3-5x your average query duration
- Use smaller limits for complex queries (aggregations, faceted search)
- Log slow queries for debugging and performance optimization
- Implement circuit breakers for frequently slow operations
- Consider per-query-type timeouts based on complexity
The library includes a Metrics type for collecting performance data and monitoring search operations:
import metrics (Metrics)
# Create metrics collector
let mut metrics = Metrics.new
# Track search operations with metrics
fn search_with_metrics(
index: TantivyIndexManager,
metrics: mut Metrics,
query: String,
) -> Result[Array[TantivyResult], String] {
let start = std.process.current_time_in_nanos
match index.search(query, limit: 100, offset: 0) {
case Ok(results) -> {
let duration = std.process.current_time_in_nanos - start
metrics.increment_operation("search")
metrics.record_latency("search", duration)
Result.Ok(results)
}
case Error(e) -> {
metrics.increment_error("search_failure")
Result.Error(e)
}
}
}
# Print metrics summary
fn print_metrics_summary(metrics: Metrics) {
std.stdio.Stdout.new.print(metrics.format_summary)
}
Key metrics to track:
- Operation counts: How many searches, indexes, deletions are performed
- Latency: Average duration of each operation type
- Error rates: Frequency and types of errors
- Resource usage: Memory consumption over time (external monitoring)
Production monitoring recommendations:
- Integrate with observability platforms: Export metrics to Prometheus, Datadog, or similar systems
- Set up alerts: Notify on high error rates or slow queries
- Track percentiles: Monitor P95 and P99 latencies, not just averages
- Correlate metrics: Link search performance with system metrics (CPU, memory, I/O)
- Sample efficiently: Don't track every single operation in high-traffic systems
import tantivy (TantivyConfig, TantivyIndexManager, TantivyQueryBuilder)
# Create index manager
let config = TantivyConfig.new('/path/to/index')
let mut index = TantivyIndexManager.new(config)
index.open.or_panic
# Add a document
index.add_doc(
doc_id: 'doc1',
fields: [
('title', 'Hello World'),
('body', 'This is a test document'),
]
).or_panic
# Search
let results = index.search(
query: 'test',
limit: 10,
offset: 0,
)
match results {
case Ok(docs) -> {
for doc in docs {
# Note: Scores are f32 precision (~6-7 significant digits)
std.stdio.Stdout.new.print('Found: ${doc.doc_id} (score: ${doc.score})')
}
}
case Error(e) -> std.stdio.Stdout.new.print("Search failed: ${e}")
}
# Commit changes
index.commit.or_panic
# Close index
index.close.or_panic
Use custom schemas for indexing non-email data:
import tantivy (TantivyConfig, TantivyIndexManager)
# Custom schema for documents
let schema_json = '{
"fields": [
{"name": "title", "type": "text", "indexed": true, "stored": true},
{"name": "content", "type": "text", "indexed": true, "stored": true},
{"name": "timestamp", "type": "u64", "indexed": true, "stored": true},
{"name": "published", "type": "bool", "indexed": true, "stored": true}
],
"default_search_fields": ["title", "content"]
}'
match TantivyConfig.new('/path/to/index') {
case Ok(config) -> {
let config_with_schema = config.with_schema_json(schema_json).or_panic
let mut index = TantivyIndexManager.new(config_with_schema)
index.open.or_panic
# Add documents with custom fields
index.add_doc(
doc_id: 'doc-1',
fields: [
('title', 'Introduction to Search'),
('content', 'Full-text search is powerful'),
('timestamp', '1704067200'),
('published', 'true'),
],
).or_panic
}
case Error(e) -> { panic("Failed to create config: ${e}") }
}
Supported field types:
text- Full-text searchable with tokenizationstring- Exact match without tokenizationu64- Unsigned 64-bit integersi64- Signed 64-bit integersf64- Double-precision floatsbool- Boolean values
Schema Design Best Practices:
- Use text fields for full-text search: Text fields are tokenized and support relevance scoring
- Use string fields for exact matches: String fields are faster for exact lookups (IDs, tags)
- Store vs. indexed: Only index fields you search, store fields you retrieve
- Set default search fields: Specify which fields to search by default
- Avoid over-indexing: Don't index fields you never search
Example: E-commerce schema
let ecommerce_schema = '{
"fields": [
{"name": "name", "type": "text", "indexed": true, "stored": true},
{"name": "description", "type": "text", "indexed": true, "stored": true},
{"name": "price", "type": "f64", "indexed": true, "stored": true},
{"name": "category", "type": "string", "indexed": true, "stored": true},
{"name": "in_stock", "type": "bool", "indexed": true, "stored": true},
{"name": "sku", "type": "string", "indexed": true, "stored": true}
],
"default_search_fields": ["name", "description"]
}'
Example: Log search schema
let log_schema = '{
"fields": [
{"name": "timestamp", "type": "u64", "indexed": true, "stored": true},
{"name": "level", "type": "string", "indexed": true, "stored": true},
{"name": "message", "type": "text", "indexed": true, "stored": true},
{"name": "service", "type": "string", "indexed": true, "stored": true}
],
"default_search_fields": ["message"]
}'
Example: Document search schema
let document_schema = '{
"fields": [
{"name": "title", "type": "text", "indexed": true, "stored": true},
{"name": "body", "type": "text", "indexed": true, "stored": true},
{"name": "author", "type": "string", "indexed": true, "stored": true},
{"name": "tags", "type": "string", "indexed": true, "stored": true},
{"name": "created_at", "type": "u64", "indexed": true, "stored": true},
{"name": "word_count", "type": "i64", "indexed": true, "stored": true}
],
"default_search_fields": ["title", "body"]
}'
Performance considerations:
- Text fields are larger and slower to index than other types
- String fields provide faster exact matches but don't support relevance scoring
- Store only the fields you need to display (reduces index size)
- Indexed fields increase index size and search time
- Use appropriate data types (don't store numbers as strings)
import tantivy (TantivyQueryBuilder)
let mut builder = TantivyQueryBuilder.new
builder.search_text('title', 'search term')
builder.filter('status', 'published')
builder.range('date', 2020, 2024)
let query = builder.build
let facets = index.facet_counts(
field_name: 'category',
query: 'electronics',
limit: 10,
)
for facet in facets.or_panic {
std.stdio.Stdout.new.print('${facet.key}: ${facet.count}')
}
let suggestions = index.autocomplete(
field: 'title',
prefix: 'elect',
limit: 5,
)
for suggestion in suggestions.or_panic {
std.stdio.Stdout.new.print('${suggestion.text} (score: ${suggestion.score})')
}
let suggestions = index.did_you_mean(
field: 'title',
term: 'electrnics',
distance: 2,
limit: 5,
)
for suggestion in suggestions.or_panic {
std.stdio.Stdout.new.print('${suggestion.text} (score: ${suggestion.score})')
}
The library consists of:
src/tantivy.inko- Main FFI bindings andTantivyIndexManagersrc/pointer_helpers.inko- Pointer arithmetic helpers for FFIsrc/marshal.inko- C marshalling utilitiesnative/tantivy-c/- Rust cdylib wrapper exposing C API
All memory allocated for FFI calls is managed automatically using ByteArray. The library handles:
- Proper null-terminated string conversion
- Automatic cleanup of C-allocated memory
- Safe pointer arithmetic with bounds checking
See docs/memory.md for detailed documentation on memory ownership across the FFI boundary.
This library provides basic protections against resource exhaustion and common attacks, but applications must implement additional protections for production use. See docs/security.md for comprehensive guidance on:
- Rate limiting: Limit concurrent operations and requests per user/IP
- Query validation: Validate and sanitize user-provided query strings
- Operational limits: Set appropriate limits for search and aggregation operations
- Path security: Secure handling of index paths and file system access
- Input validation: Validate all user input before passing to library functions
- Error handling: Proper error handling without exposing sensitive information
- Resource cleanup: Ensure indices and resources are properly closed
- Monitoring and alerting: Track resource usage and alert on anomalies
The Rust FFI layer enforces these limits:
| Limit | Value | Purpose |
|---|---|---|
MAX_FIELDS_PER_DOCUMENT |
1,000 | Prevents excessive field count |
MAX_FIELD_VALUE_LENGTH |
10MB | Prevents massive field values |
MAX_QUERY_LENGTH |
10KB | Prevents long query strings |
MAX_SEARCH_LIMIT |
10,000 | Maximum results per search |
For production use, implement:
- Rate limiting - Limit concurrent operations per user/IP
- Query validation - Validate user-provided query strings
- Operational limits - Set appropriate result limits (recommend: 100-1000)
- Query timeouts - Implement timeouts for long-running queries
- Path validation - Canonicalize paths and check against allowed directories
- Input validation - Validate all user input (doc_id, fields, etc.)
- Error sanitization - Don't expose detailed error messages to end users
- Monitoring - Track query performance and resource usage
Note on error message sanitization:
- Debug builds: Full error messages including paths (easier debugging)
- Release builds: Paths sanitized to
<path>placeholder (production security) - Full errors always logged to stderr for troubleshooting
Path security: The library does not enforce path restrictions. Applications using this library should:
- Canonicalize user-provided paths before use
- Verify resolved path is within allowed directories
- Check file permissions before opening index
- Be aware of symlink attacks and path traversal
Example:
import std.fs.path (Path)
let user_path = Path.new(user_provided_path)
let canonical = user_path.canonicalize
if !canonical.starts_with?('/allowed/app/data') {
return Result.Error('Path outside allowed directory')
}
The project uses GitHub Actions to:
- Build 5 platform variants using cargo-zigbuild in Docker (see Cross-Platform Builds for target details)
- Run Inko tests and format checking (downloads Linux x64 library artifact)
- Upload build artifacts for each platform
- Upload binaries directly to GitHub releases when tags are pushed
About Docker container:
- Pre-installed with Rust stable and cargo-zigbuild
- Includes macOS SDK for cross-compilation
- Eliminates installation overhead
- Official image from cargo-zigbuild
To trigger a release:
git tag v0.1.0
git push origin v0.1.0After release, download the appropriate library for your platform from the Releases page.
This is the most common issue. The linker can't find the FFI library during compilation.
Solutions (in order of preference):
-
Use the build script (easiest for development):
./build.sh test # Automatically sets the correct library path
-
Install system-wide:
./build.sh install # Installs to /usr/local/lib -
Set LIBRARY_PATH before building/testing:
# macOS export LIBRARY_PATH=$PWD/native/tantivy-c/target/release inko test # Linux export LIBRARY_PATH=$PWD/native/tantivy-c/target/release inko test
-
Download pre-built library from releases:
- Download the appropriate library for your platform from Releases
- Copy to
/usr/local/lib/or set LIBRARY_PATH
The test suite includes an FFI availability check. If the library can't be loaded, integration tests are automatically skipped with a message like:
Finished running 19 tests in 1 milliseconds, 0 failures
This means only unit tests ran. To run integration tests, ensure the library is accessible using one of the methods above.
If cargo build --release succeeds but inko test fails with linking errors, the library is built but not in the linker search path. Use ./build.sh test or set LIBRARY_PATH as shown above.
Mozilla Public License Version 2.0
See LICENSE for the full license text.