Lattice language model family
Lattice is a family of security-focused language models and runtime tooling built to assist vulnerability discovery and reproducible SARIF-format findings. Each Lattice release ships with reproducible evaluation artifacts, hardened inference settings for safe production use, and integration hooks for enterprise environments (gRPC/REST/CLI). This document summarizes model provenance, security mitigations, SLOs, audit evidence, and operational runbooks for deploying Lattice in CI/CD and research environments.
Audit provenance & compliance
- Model error rateInference requests flagged as erroneous (exceptions, timeouts, malformed SARIF) / total requestsThreshold: < 2% over 5-minute sliding window
- SARIF reproduction successFindings where automated sandbox validation confirms vulnerability / total model findingsThreshold: > 60% over 1-hour window
- p95 latency95th percentile inference duration measured server-sideThreshold: < 500ms (L1), < 800ms (L2), < 1500ms (L3)
Release train
2025.10.02
Current production checkpoints. Updated twice monthly.
Context window
64k – 256k
Depending on tier. Rotary embeddings + chunk routing.
Supported runtimes
A100 • H100 • CPU AVX2/AVX512
Reference deployments maintained for each release train.
Configurations maintained per release
Parameters below are reference values. Custom builds may diverge based on hardware or compliance constraints.
Lattice-1
Entry point model for CI/CD and dependency diffing
- Context: 64k tokens
- Quantization: Q4_0, Q8_0
- Latency @ Q8_0 (batch=1, 512-token prompt, A100, CUDA 12.1): p50 82ms | p95 180ms | p99 320ms
- CPU AVX2 (same config): p50 220ms | p95 510ms | p99 980ms
- Batch size: 16 tokens/step
Lattice-2
Static analysis layer for semantic diffing and exploit chain reasoning
- Context: 128k tokens
- Mixture-of-experts cross-lingual heads
- Tool calling: SARIF generation, SBOM summarization
- Latency @ BF16 (batch=1, 512-token prompt, A100, CUDA 12.1): p50 145ms | p95 310ms | p99 580ms
- CPU AVX512 (same config): p50 410ms | p95 920ms | p99 1750ms
Lattice-3
Research configuration with causal reasoning and multi-hop exploit planning
- Context: 256k tokens
- Architectural backbone: Hybrid transformer + retrieval adapters
- External tools: symbolic executor, fuzz harness broker
- Latency @ BF16 (batch=1, 512-token prompt, H100, CUDA 12.1): p50 280ms | p95 620ms | p99 1100ms
Latest benchmarking results
| Benchmark | Metric | Lattice-1 | Lattice-2 | Lattice-3 | Baseline |
|---|---|---|---|---|---|
| CWE-V Suite (1.4k vulns, test split) | F1 | 0.71 | 0.81 | 0.89 | 0.58 (Llama-3-70B) |
| OSS-Fuzz triage (2.1k crashes, held-out) | Recall @ top5 | 0.54 | 0.66 | 0.78 | 0.41 (GPT-4-turbo) |
| Internal exploit chain set (380 chains) | Step accuracy | 0.42 | 0.63 | 0.74 | N/A (proprietary) |
CWE-V Suite (1.4k vulns, test split): 3 runs, seeds: 42, 1337, 9001; temp=0.3
OSS-Fuzz triage (2.1k crashes, held-out): Single pass, no few-shot prompting
Internal exploit chain set (380 chains): Avg over 3 seeds; eval script: github.com/evalops/eval-harness@v2.1.3
Training data sources & model card
Training data sources
- CVE corpus (2015–2025) with exploit PoCs from NVD, GitHub advisories, OSV database
- Open-source security tooling repositories (static analyzers, fuzzers, symbolic executors)
- Filtered subset of Stack Overflow, security blogs, and vulnerability disclosure reports
- Synthetic exploit chains generated via internal red-team exercises
- Licensed commercial datasets (secure code patterns, proprietary CVE analysis)
Data filtering & compliance
All training data undergoes PII redaction, license verification (MIT/Apache/BSD or commercial agreement), and deduplication. No customer data or classified material is included.
Licensing
Model weights available under commercial license. Deployment requires executed agreement.
Threat model & mitigations
| Failure mode | Detection | Mitigation | Owner |
|---|---|---|---|
Prompt injection via malicious code comments Impact: Model outputs misleading SARIF or skips vulnerabilities | Ensemble disagreement score > 0.6 | Reject output, return safe fallback, increment incident counter | model-security@evalops.dev |
Context window overflow on large repos Impact: Incomplete analysis, missing cross-file vulnerabilities | Token count exceeds 90% of window capacity | Automatic chunking with 10% overlap, hierarchical summarization, dependency graph prioritization | platform-team@evalops.dev |
Training data poisoning (hypothetical) Impact: Model trained to ignore specific vulnerability patterns | Adversarial validation failures, unexpected eval drift | Multi-source data provenance, continuous red-team testing, quarterly re-evaluation | research@evalops.dev |
Output hallucination of exploit steps Impact: False reproduction instructions waste security team time | Symbolic execution validation failure rate > 40% | Docker sandbox verification, confidence scoring, human triage for low-confidence findings | model-security@evalops.dev |
API key compromise Impact: Unauthorized inference access, quota abuse | Anomalous request patterns, geographic anomalies | 30-day mandatory rotation, rate limiting, IP allowlisting, immediate key revocation | security-ops@evalops.dev |
Over-confident false positives
Model may flag benign patterns as vulnerabilities when code resembles known exploits
Mitigation
Temperature clamping at 0.3 for production inference, ensemble voting with static analyzers
Context window overflow
Large codebases may exceed token limits, causing incomplete analysis
Mitigation
Automatic chunking with overlap, hierarchical summarization for cross-file dependencies
Novel exploit patterns
Zero-day techniques outside training distribution may be missed
Mitigation
Hybrid approach: model + symbolic execution + fuzz testing for comprehensive coverage
Environment hallucination
Reproduction steps may assume tooling or libraries not present in target environment
Mitigation
Environment manifest validation, Docker container generation for isolated reproduction
Audit-ready operational standards
| Data retention | Inference logs: 90 days. Model artifacts: indefinite with version control. |
| Access control | mTLS authentication, API key rotation every 30 days, role-based permissions (RBAC). |
| Encryption | TLS 1.3 in transit. AES-256-GCM at rest (keys managed via AWS KMS / GCP KMS). |
| Audit logs | All inference requests logged to S3/GCS with 7-year retention. Tamper-proof via append-only buckets. |
| SBOM | Runtime dependencies tracked via SPDX 2.3 manifests, published per release train. |
| Compliance status | SOC 2 Type II certified. FIPS 140-2 validation in progress. No HIPAA/FedRAMP yet. |
Monitoring, alerting, and automated rollback
| Metric | Prometheus metric | SLO threshold | Action on violation |
|---|---|---|---|
| Model error rate | lattice_inference_error_rate | < 2% over 5-minute window | Page on-call, roll back to previous checkpoint |
| SARIF reproduction success | lattice_sarif_repro_success_rate | > 60% for 1-hour window | Alert security team, disable auto-deployment |
| p95 latency | lattice_inference_duration_seconds | < 500ms (Lattice-1), < 800ms (Lattice-2), < 1500ms (Lattice-3) | Scale up GPU instances, enable request queuing |
Prometheus alert rules
# Prometheus alert rules
groups:
- name: lattice_model_health
rules:
- alert: HighInferenceErrorRate
expr: increase(lattice_inference_error_count[5m]) / increase(lattice_inference_request_count[5m]) > 0.02
for: 5m
annotations:
summary: "Model error rate exceeded 2% over 5m"
runbook: "/runbooks/lattice/high_error_rate.md"
- alert: LowSARIFReproRate
expr: increase(lattice_sarif_repro_success_count[1h]) / increase(lattice_sarif_total_findings[1h]) < 0.60
for: 1h
annotations:
summary: "SARIF reproduction success below 60%"
runbook: "/runbooks/lattice/low_repro_rate.md"Automated rollback
# Automated rollback trigger (deployed as CronJob)
#!/usr/bin/env bash
set -euo pipefail
PROM_API="http://prometheus:9090/api/v1/query"
ERROR_RATE=$(curl -s "${PROM_API}?query=increase(lattice_inference_error_count[5m])/increase(lattice_inference_request_count[5m])" | jq -r '.data.result[0].value[1]')
THRESH=0.02
if (( $(echo "$ERROR_RATE > $THRESH" | bc -l) )); then
STABLE_REV=$(curl -s https://s3.amazonaws.com/evalops-stable-manifests/latest.json | jq -r .revision)
kubectl set image deployment/lattice-inference lattice=lattice:${STABLE_REV}
kubectl rollout status deployment/lattice-inference
curl -X POST $PAGERDUTY_WEBHOOK -d '{"event":"trigger","payload":{"summary":"Auto-rollback executed","details":{"error_rate":"'"$ERROR_RATE"'"}}}'
fiManual rollback playbook
# Manual rollback playbook
# 1. Identify last stable checkpoint
kubectl rollout history deployment/lattice-inference
# 2. Roll back to previous revision
kubectl rollout undo deployment/lattice-inference --to-revision=N
# 3. Verify health
kubectl rollout status deployment/lattice-inference
curl https://api.evalops.dev/v1/health
# 4. Update model version in Terraform
terraform apply -var="model_version=2025.09.18" -target=aws_ecs_task_definition.latticeAlerting infrastructure
PagerDuty integration for critical alerts. Slack notifications for warnings. Automated rollback triggers on sustained SLO violations.
Pricing and autoscaling guidance
| Model tier | Cost estimate | Hardware | Autoscaling strategy |
|---|---|---|---|
| Lattice-1 | $0.08 / 1K tokens | A100 (40GB) or 8-core CPU AVX2 | Horizontal pod autoscaling: target 70% GPU utilization |
| Lattice-2 | $0.18 / 1K tokens | A100 (80GB) or 16-core CPU AVX512 | Batch inference recommended for large repos (>100K LOC) |
| Lattice-3 | $0.42 / 1K tokens | H100 (80GB) required | Reserved capacity for research workloads, spot instances for CI/CD |
Sizing examples
| Scenario | Est. tokens | Recommendation | Cost per scan | Note |
|---|---|---|---|---|
| Medium repo (250K LOC, ~120MB tarball) | ~85K tokens | Lattice-2 batch mode | $15.30 per full scan | Use streaming for repos > 500K LOC to avoid timeout |
| Microservice (15K LOC, ~8MB) | ~5K tokens | Lattice-1 streaming | $0.40 per scan | Ideal for CI/CD on every commit |
| Monorepo (1.2M LOC, ~600MB) | ~420K tokens (chunked) | Lattice-3 with hierarchical summarization | $176.40 per scan | Enable cross-file dependency analysis |
What the runtime expects and emits
Interfaces
- gRPC endpoint (stream + unary)
- REST inference proxy
- CLI for batch audit runs
Artifacts accepted
- Source trees (Git), SBOM manifests
- Compiled binaries (ELF/PE/Mach-O)
- Container images, IaC templates
Outputs
- SARIF v2.1.0
- Custom JSON (root-cause + reproduction steps)
- Markdown incident briefs
Observability
- OpenTelemetry traces
- Metric export: Prometheus
- Audit logs: S3/GCS
Rate limits, authentication, and error handling
| Tier | Rate limit | Burst limit |
|---|---|---|
| Standard | 100 req/min per API key | 200 req/min (30s burst) |
| Enterprise | 1000 req/min per API key | 2000 req/min (30s burst) |
Request limits
Max payload: 50MB tarball or 100K LOC uncompressed. Streaming: 10MB chunks.
Auth policy
API key rotation: 30 days mandatory. mTLS required for production. Bearer token format: `Authorization: Bearer lat_sk_live_...`
429 Rate limit response
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42
{
"error": "rate_limit_exceeded",
"message": "100 requests per minute exceeded",
"retry_after_seconds": 42
}401 Auth failure response
HTTP/1.1 401 Unauthorized
Content-Type: application/json
{
"error": "invalid_token",
"message": "API key expired or invalid. Rotate key at https://evalops.dev/settings/keys"
}Integration code and SARIF preview
gRPC streaming
// gRPC streaming inference
const client = new LatticeClient('grpc://api.evalops.dev:443', credentials);
const stream = client.analyzeCode({
repository: 'github.com/org/repo',
branch: 'main',
model: 'lattice-2'
});
stream.on('data', (result) => console.log(result.sarif));REST API
# REST unary request
curl -X POST https://api.evalops.dev/v1/analyze -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" -d '{
"source": "base64_encoded_tarball",
"model": "lattice-1",
"output_format": "sarif"
}'CLI batch
# CLI batch audit
lattice audit --repo /path/to/repo --model lattice-3 --output results.sarif --parallel 4Sample SARIF output
{
"version": "2.1.0",
"runs": [
{
"tool": {
"driver": {
"name": "Lattice-2",
"version": "2025.10.02"
}
},
"results": [
{
"ruleId": "CWE-787",
"message": {
"text": "Out-of-bounds write detected in buffer copy operation"
},
"locations": [
{
"physicalLocation": {
"artifactLocation": {
"uri": "src/parser.c"
},
"region": {
"startLine": 142,
"startColumn": 5
}
}
}
],
"codeFlows": [
{
"threadFlows": [
{
"locations": [
{
"location": {
"message": {
"text": "User input received"
},
"physicalLocation": {
"artifactLocation": {
"uri": "src/input.c"
},
"region": {
"startLine": 89
}
}
}
},
{
"location": {
"message": {
"text": "Buffer allocated with fixed size"
},
"physicalLocation": {
"artifactLocation": {
"uri": "src/parser.c"
},
"region": {
"startLine": 138
}
}
}
},
{
"location": {
"message": {
"text": "Unchecked copy operation"
},
"physicalLocation": {
"artifactLocation": {
"uri": "src/parser.c"
},
"region": {
"startLine": 142
}
}
}
}
]
}
]
}
]
}
]
}
]
}Full benchmark results and sanitized evaluation traces available under NDA. This preview demonstrates SARIF v2.1.0 compliance with code flow tracking.
Confidence scoring and feature attribution
Each SARIF finding includes confidence scoring and feature attribution to support human triage and prioritization.
Confidence scoring
Confidence scores (0.0–1.0) indicate the model's certainty that a detected pattern represents a genuine vulnerability.
Automatic flagging for review; suitable for automated blocking in CI/CD
Manual triage recommended; review code context and reproduction steps
Informational only; useful for exploratory analysis or code quality insights
Calibration: Confidence scores are calibrated against historical validation data. Scores ≥ 0.85 achieve >92% precision on held-out test sets.
Feature attribution
Top contributing features for each finding, ranked by attention weight.
Example attribution
- Unchecked user input flow (weight: 0.43)
- Buffer allocation with fixed size (weight: 0.31)
- Missing bounds validation (weight: 0.26)
Gradient-based attribution using integrated gradients; visualized in SARIF properties.attributions array.
Uncertainty quantification
For findings with ensemble disagreement, we report uncertainty intervals.
Confidence: 0.72 ± 0.08 (3-model ensemble variance)
High variance (> 0.10) suggests ambiguous patterns; prioritize symbolic execution verification.
Public evaluation preview bundle
Sanitized evaluation preview bundle for pipeline validation (no NDA required)
Bundle contents
- 1 synthetic vulnerable C program (CWE-787 buffer overflow)
- Expected SARIF output with code flow annotations
- CLI invocation script and Docker compose file
- Validation script to verify SARIF schema compliance
sha256:a3f5b8c2d1e9f7a6b4c3d2e1f9a8b7c6d5e4f3a2b1c9d8e7f6a5b4c3d2e1f0
Enterprise compliance pack
NDA-gated compliance pack for enterprise procurement and security audits
Dataset Provenance
- Full dataset manifest with source URLs, licenses, and ingestion dates
- License agreements for commercial datasets (anonymized vendor references)
- PII redaction and filtering pipeline documentation
- Data lineage diagram showing transformation steps
Red-Team Audit Report
- Full audit report from external security firm (2025-09-28)
- Detailed findings with CVSS scores and exploit PoCs
- Patch commit diffs for all identified vulnerabilities
- Post-remediation validation test results
Evaluation Artifacts
- Complete evaluation scripts with exact seeds and hyperparameters
- Test/train/dev split manifests for all benchmarks
- Raw model outputs and ground truth labels for reproducibility
- Statistical analysis notebooks (Jupyter) with variance calculations
Operational Runbooks
- Incident response playbooks for each threat model scenario
- Rollback procedures with tested examples
- On-call escalation matrix and contact protocols
- SLO breach remediation decision trees
Compliance Attestations
- SOC 2 Type II report (current period)
- FIPS 140-2 validation progress documentation
- SBOM (SPDX 2.3) for all runtime dependencies
- Encryption key management policies (AWS KMS / GCP KMS)
Access requirements
Contact research@evalops.dev with company information and use case summary. Requires executed MSA or procurement in progress.
Delivery & security
Secure package delivered via encrypted S3 bucket with 7-day expiring presigned URL. GPG-signed manifest for integrity verification.
Enterprise procurement teams
This pack satisfies typical vendor security questionnaires and SOC 2 attestation requirements for ML/AI systems. Request access early in your procurement cycle.
Request a private evaluation
Tell us about your environment so we can scope a controlled model preview.
We grant access selectively while we remain in stealth. Share the workloads you want to validate, relevant compliance constraints, and an indicative deployment timeline. A researcher will follow up with next steps.
Prefer to email directly? Contact models@evalops.dev.
Engage our research team
Access is gated while we complete external red-team exercises. Share context on your environment and intended use so we can scope deployment, hardware, and disclosure requirements.
Book a technical review