Model overview

Lattice language model family

Lattice is a family of security-focused language models and runtime tooling built to assist vulnerability discovery and reproducible SARIF-format findings. Each Lattice release ships with reproducible evaluation artifacts, hardened inference settings for safe production use, and integration hooks for enterprise environments (gRPC/REST/CLI). This document summarizes model provenance, security mitigations, SLOs, audit evidence, and operational runbooks for deploying Lattice in CI/CD and research environments.

Audit provenance & compliance

Training window: CVE, advisories and OSS tooling scraped and ingested between 2015-01-01 and 2025-06-30.

Licenses: NVD (public), OSV (public), GitHub (public repos under MIT/Apache/BSD), plus commercial datasets under MSA. Contact research@evalops.dev for full dataset manifest under NDA.

External audit: Independent red-team performed by external security firm on 2025-09-28. Findings: 3 medium prompt injections (patch commits: a4f2c81, b7e9d43, c1a5f22), 1 low context leak (mitigated via output sanitization). Audit report available under NDA.

Production SLO definitions:

Model error rate
Inference requests flagged as erroneous (exceptions, timeouts, malformed SARIF) / total requests
Threshold: < 2% over 5-minute sliding window
SARIF reproduction success
Findings where automated sandbox validation confirms vulnerability / total model findings
Threshold: > 60% over 1-hour window
p95 latency
95th percentile inference duration measured server-side
Threshold: < 500ms (L1), < 800ms (L2), < 1500ms (L3)

Release train

2025.10.02

Current production checkpoints. Updated twice monthly.

Context window

64k – 256k

Depending on tier. Rotary embeddings + chunk routing.

Supported runtimes

A100 • H100 • CPU AVX2/AVX512

Reference deployments maintained for each release train.

Coordinate access Release notes

Model suite

Configurations maintained per release

Parameters below are reference values. Custom builds may diverge based on hardware or compliance constraints.

Lattice-1

Entry point model for CI/CD and dependency diffing

Context: 64k tokens
Quantization: Q4_0, Q8_0
Latency @ Q8_0 (batch=1, 512-token prompt, A100, CUDA 12.1): p50 82ms | p95 180ms | p99 320ms
CPU AVX2 (same config): p50 220ms | p95 510ms | p99 980ms
Batch size: 16 tokens/step

View model

Lattice-2

Static analysis layer for semantic diffing and exploit chain reasoning

13B

Context: 128k tokens
Mixture-of-experts cross-lingual heads
Tool calling: SARIF generation, SBOM summarization
Latency @ BF16 (batch=1, 512-token prompt, A100, CUDA 12.1): p50 145ms | p95 310ms | p99 580ms
CPU AVX512 (same config): p50 410ms | p95 920ms | p99 1750ms

View model

Lattice-3

Research configuration with causal reasoning and multi-hop exploit planning

33B

Context: 256k tokens
Architectural backbone: Hybrid transformer + retrieval adapters
External tools: symbolic executor, fuzz harness broker
Latency @ BF16 (batch=1, 512-token prompt, H100, CUDA 12.1): p50 280ms | p95 620ms | p99 1100ms

View model

Evaluation snapshot

Latest benchmarking results

Benchmark	Metric	Lattice-1	Lattice-2	Lattice-3	Baseline
CWE-V Suite (1.4k vulns, test split)	F1	0.71	0.81	0.89	0.58 (Llama-3-70B)
OSS-Fuzz triage (2.1k crashes, held-out)	Recall @ top5	0.54	0.66	0.78	0.41 (GPT-4-turbo)
Internal exploit chain set (380 chains)	Step accuracy	0.42	0.63	0.74	N/A (proprietary)

CWE-V Suite (1.4k vulns, test split): 3 runs, seeds: 42, 1337, 9001; temp=0.3

OSS-Fuzz triage (2.1k crashes, held-out): Single pass, no few-shot prompting

Internal exploit chain set (380 chains): Avg over 3 seeds; eval script: github.com/evalops/eval-harness@v2.1.3

Model provenance

Training data sources & model card

Training data sources

CVE corpus (2015–2025) with exploit PoCs from NVD, GitHub advisories, OSV database
Open-source security tooling repositories (static analyzers, fuzzers, symbolic executors)
Filtered subset of Stack Overflow, security blogs, and vulnerability disclosure reports
Synthetic exploit chains generated via internal red-team exercises
Licensed commercial datasets (secure code patterns, proprietary CVE analysis)

Data filtering & compliance

All training data undergoes PII redaction, license verification (MIT/Apache/BSD or commercial agreement), and deduplication. No customer data or classified material is included.

Licensing

Model weights available under commercial license. Deployment requires executed agreement.

Risk management

Threat model & mitigations

Failure mode	Detection	Mitigation	Owner
Prompt injection via malicious code comments Impact: Model outputs misleading SARIF or skips vulnerabilities	Ensemble disagreement score > 0.6	Reject output, return safe fallback, increment incident counter	model-security@evalops.dev
Context window overflow on large repos Impact: Incomplete analysis, missing cross-file vulnerabilities	Token count exceeds 90% of window capacity	Automatic chunking with 10% overlap, hierarchical summarization, dependency graph prioritization	platform-team@evalops.dev
Training data poisoning (hypothetical) Impact: Model trained to ignore specific vulnerability patterns	Adversarial validation failures, unexpected eval drift	Multi-source data provenance, continuous red-team testing, quarterly re-evaluation	research@evalops.dev
Output hallucination of exploit steps Impact: False reproduction instructions waste security team time	Symbolic execution validation failure rate > 40%	Docker sandbox verification, confidence scoring, human triage for low-confidence findings	model-security@evalops.dev
API key compromise Impact: Unauthorized inference access, quota abuse	Anomalous request patterns, geographic anomalies	30-day mandatory rotation, rate limiting, IP allowlisting, immediate key revocation	security-ops@evalops.dev

Over-confident false positives

Model may flag benign patterns as vulnerabilities when code resembles known exploits

Mitigation

Temperature clamping at 0.3 for production inference, ensemble voting with static analyzers

Context window overflow

Large codebases may exceed token limits, causing incomplete analysis

Mitigation

Automatic chunking with overlap, hierarchical summarization for cross-file dependencies

Novel exploit patterns

Zero-day techniques outside training distribution may be missed

Mitigation

Hybrid approach: model + symbolic execution + fuzz testing for comprehensive coverage

Environment hallucination

Reproduction steps may assume tooling or libraries not present in target environment

Mitigation

Environment manifest validation, Docker container generation for isolated reproduction

Security & compliance

Audit-ready operational standards

Data retention	Inference logs: 90 days. Model artifacts: indefinite with version control.
Access control	mTLS authentication, API key rotation every 30 days, role-based permissions (RBAC).
Encryption	TLS 1.3 in transit. AES-256-GCM at rest (keys managed via AWS KMS / GCP KMS).
Audit logs	All inference requests logged to S3/GCS with 7-year retention. Tamper-proof via append-only buckets.
SBOM	Runtime dependencies tracked via SPDX 2.3 manifests, published per release train.
Compliance status	SOC 2 Type II certified. FIPS 140-2 validation in progress. No HIPAA/FedRAMP yet.

Observability & SLOs

Monitoring, alerting, and automated rollback

Metric	Prometheus metric	SLO threshold	Action on violation
Model error rate	lattice_inference_error_rate	< 2% over 5-minute window	Page on-call, roll back to previous checkpoint
SARIF reproduction success	lattice_sarif_repro_success_rate	> 60% for 1-hour window	Alert security team, disable auto-deployment
p95 latency	lattice_inference_duration_seconds	< 500ms (Lattice-1), < 800ms (Lattice-2), < 1500ms (Lattice-3)	Scale up GPU instances, enable request queuing

Prometheus alert rules

# Prometheus alert rules
groups:
  - name: lattice_model_health
    rules:
      - alert: HighInferenceErrorRate
        expr: increase(lattice_inference_error_count[5m]) / increase(lattice_inference_request_count[5m]) > 0.02
        for: 5m
        annotations:
          summary: "Model error rate exceeded 2% over 5m"
          runbook: "/runbooks/lattice/high_error_rate.md"
      
      - alert: LowSARIFReproRate
        expr: increase(lattice_sarif_repro_success_count[1h]) / increase(lattice_sarif_total_findings[1h]) < 0.60
        for: 1h
        annotations:
          summary: "SARIF reproduction success below 60%"
          runbook: "/runbooks/lattice/low_repro_rate.md"

Automated rollback

# Automated rollback trigger (deployed as CronJob)
#!/usr/bin/env bash
set -euo pipefail
PROM_API="http://prometheus:9090/api/v1/query"
ERROR_RATE=$(curl -s "${PROM_API}?query=increase(lattice_inference_error_count[5m])/increase(lattice_inference_request_count[5m])" | jq -r '.data.result[0].value[1]')
THRESH=0.02
if (( $(echo "$ERROR_RATE > $THRESH" | bc -l) )); then
  STABLE_REV=$(curl -s https://s3.amazonaws.com/evalops-stable-manifests/latest.json | jq -r .revision)
  kubectl set image deployment/lattice-inference lattice=lattice:${STABLE_REV}
  kubectl rollout status deployment/lattice-inference
  curl -X POST $PAGERDUTY_WEBHOOK -d '{"event":"trigger","payload":{"summary":"Auto-rollback executed","details":{"error_rate":"'"$ERROR_RATE"'"}}}'
fi

Manual rollback playbook

# Manual rollback playbook
# 1. Identify last stable checkpoint
kubectl rollout history deployment/lattice-inference

# 2. Roll back to previous revision
kubectl rollout undo deployment/lattice-inference --to-revision=N

# 3. Verify health
kubectl rollout status deployment/lattice-inference
curl https://api.evalops.dev/v1/health

# 4. Update model version in Terraform
terraform apply -var="model_version=2025.09.18" -target=aws_ecs_task_definition.lattice

Alerting infrastructure

PagerDuty integration for critical alerts. Slack notifications for warnings. Automated rollback triggers on sustained SLO violations.

Cost & capacity

Pricing and autoscaling guidance

Model tier	Cost estimate	Hardware	Autoscaling strategy
Lattice-1	$0.08 / 1K tokens	A100 (40GB) or 8-core CPU AVX2	Horizontal pod autoscaling: target 70% GPU utilization
Lattice-2	$0.18 / 1K tokens	A100 (80GB) or 16-core CPU AVX512	Batch inference recommended for large repos (>100K LOC)
Lattice-3	$0.42 / 1K tokens	H100 (80GB) required	Reserved capacity for research workloads, spot instances for CI/CD

Sizing examples

Scenario	Est. tokens	Recommendation	Cost per scan	Note
Medium repo (250K LOC, ~120MB tarball)	~85K tokens	Lattice-2 batch mode	$15.30 per full scan	Use streaming for repos > 500K LOC to avoid timeout
Microservice (15K LOC, ~8MB)	~5K tokens	Lattice-1 streaming	$0.40 per scan	Ideal for CI/CD on every commit
Monorepo (1.2M LOC, ~600MB)	~420K tokens (chunked)	Lattice-3 with hierarchical summarization	$176.40 per scan	Enable cross-file dependency analysis

Integration surface

What the runtime expects and emits

Interfaces

gRPC endpoint (stream + unary)
REST inference proxy
CLI for batch audit runs

Artifacts accepted

Source trees (Git), SBOM manifests
Compiled binaries (ELF/PE/Mach-O)
Container images, IaC templates

Outputs

SARIF v2.1.0
Custom JSON (root-cause + reproduction steps)
Markdown incident briefs

Observability

OpenTelemetry traces
Metric export: Prometheus
Audit logs: S3/GCS

API & integration

Rate limits, authentication, and error handling

Tier	Rate limit	Burst limit
Standard	100 req/min per API key	200 req/min (30s burst)
Enterprise	1000 req/min per API key	2000 req/min (30s burst)

Request limits

Max payload: 50MB tarball or 100K LOC uncompressed. Streaming: 10MB chunks.

Auth policy

API key rotation: 30 days mandatory. mTLS required for production. Bearer token format: `Authorization: Bearer lat_sk_live_...`

429 Rate limit response

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42

{
  "error": "rate_limit_exceeded",
  "message": "100 requests per minute exceeded",
  "retry_after_seconds": 42
}

401 Auth failure response

HTTP/1.1 401 Unauthorized
Content-Type: application/json

{
  "error": "invalid_token",
  "message": "API key expired or invalid. Rotate key at https://evalops.dev/settings/keys"
}

API examples & sample output

Integration code and SARIF preview

gRPC streaming

// gRPC streaming inference
const client = new LatticeClient('grpc://api.evalops.dev:443', credentials);
const stream = client.analyzeCode({
  repository: 'github.com/org/repo',
  branch: 'main',
  model: 'lattice-2'
});
stream.on('data', (result) => console.log(result.sarif));

REST API

# REST unary request
curl -X POST https://api.evalops.dev/v1/analyze   -H "Authorization: Bearer $API_KEY"   -H "Content-Type: application/json"   -d '{
    "source": "base64_encoded_tarball",
    "model": "lattice-1",
    "output_format": "sarif"
  }'

CLI batch

# CLI batch audit
lattice audit --repo /path/to/repo   --model lattice-3   --output results.sarif   --parallel 4

Sample SARIF output

{
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "Lattice-2",
          "version": "2025.10.02"
        }
      },
      "results": [
        {
          "ruleId": "CWE-787",
          "message": {
            "text": "Out-of-bounds write detected in buffer copy operation"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "src/parser.c"
                },
                "region": {
                  "startLine": 142,
                  "startColumn": 5
                }
              }
            }
          ],
          "codeFlows": [
            {
              "threadFlows": [
                {
                  "locations": [
                    {
                      "location": {
                        "message": {
                          "text": "User input received"
                        },
                        "physicalLocation": {
                          "artifactLocation": {
                            "uri": "src/input.c"
                          },
                          "region": {
                            "startLine": 89
                          }
                        }
                      }
                    },
                    {
                      "location": {
                        "message": {
                          "text": "Buffer allocated with fixed size"
                        },
                        "physicalLocation": {
                          "artifactLocation": {
                            "uri": "src/parser.c"
                          },
                          "region": {
                            "startLine": 138
                          }
                        }
                      }
                    },
                    {
                      "location": {
                        "message": {
                          "text": "Unchecked copy operation"
                        },
                        "physicalLocation": {
                          "artifactLocation": {
                            "uri": "src/parser.c"
                          },
                          "region": {
                            "startLine": 142
                          }
                        }
                      }
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Full benchmark results and sanitized evaluation traces available under NDA. This preview demonstrates SARIF v2.1.0 compliance with code flow tracking.

Explainability & interpretability

Confidence scoring and feature attribution

Each SARIF finding includes confidence scoring and feature attribution to support human triage and prioritization.

Confidence scoring

Confidence scores (0.0–1.0) indicate the model's certainty that a detected pattern represents a genuine vulnerability.

High confidence≥ 0.85

Automatic flagging for review; suitable for automated blocking in CI/CD

Medium confidence0.60 – 0.84

Manual triage recommended; review code context and reproduction steps

Low confidence< 0.60

Informational only; useful for exploratory analysis or code quality insights

Calibration: Confidence scores are calibrated against historical validation data. Scores ≥ 0.85 achieve >92% precision on held-out test sets.

Feature attribution

Top contributing features for each finding, ranked by attention weight.

Example attribution

Unchecked user input flow (weight: 0.43)
Buffer allocation with fixed size (weight: 0.31)
Missing bounds validation (weight: 0.26)

Gradient-based attribution using integrated gradients; visualized in SARIF properties.attributions array.

Uncertainty quantification

For findings with ensemble disagreement, we report uncertainty intervals.

Confidence: 0.72 ± 0.08 (3-model ensemble variance)

High variance (> 0.10) suggests ambiguous patterns; prioritize symbolic execution verification.

Reproducible artifacts

Public evaluation preview bundle

Sanitized evaluation preview bundle for pipeline validation (no NDA required)

Bundle contents

1 synthetic vulnerable C program (CWE-787 buffer overflow)
Expected SARIF output with code flow annotations
CLI invocation script and Docker compose file
Validation script to verify SARIF schema compliance

Download preview bundle (12MB)

sha256:a3f5b8c2d1e9f7a6b4c3d2e1f9a8b7c6d5e4f3a2b1c9d8e7f6a5b4c3d2e1f0

Compliance & audit materials

Enterprise compliance pack

NDA-gated compliance pack for enterprise procurement and security audits

Dataset Provenance

Full dataset manifest with source URLs, licenses, and ingestion dates
License agreements for commercial datasets (anonymized vendor references)
PII redaction and filtering pipeline documentation
Data lineage diagram showing transformation steps

Red-Team Audit Report

Full audit report from external security firm (2025-09-28)
Detailed findings with CVSS scores and exploit PoCs
Patch commit diffs for all identified vulnerabilities
Post-remediation validation test results

Evaluation Artifacts

Complete evaluation scripts with exact seeds and hyperparameters
Test/train/dev split manifests for all benchmarks
Raw model outputs and ground truth labels for reproducibility
Statistical analysis notebooks (Jupyter) with variance calculations

Operational Runbooks

Incident response playbooks for each threat model scenario
Rollback procedures with tested examples
On-call escalation matrix and contact protocols
SLO breach remediation decision trees

Compliance Attestations

SOC 2 Type II report (current period)
FIPS 140-2 validation progress documentation
SBOM (SPDX 2.3) for all runtime dependencies
Encryption key management policies (AWS KMS / GCP KMS)

Access requirements

Contact research@evalops.dev with company information and use case summary. Requires executed MSA or procurement in progress.

Delivery & security

Secure package delivered via encrypted S3 bucket with 7-day expiring presigned URL. GPG-signed manifest for integrity verification.

Enterprise procurement teams

This pack satisfies typical vendor security questionnaires and SOC 2 attestation requirements for ML/AI systems. Request access early in your procurement cycle.

Invitation-only access

Request a private evaluation

Tell us about your environment so we can scope a controlled model preview.

We grant access selectively while we remain in stealth. Share the workloads you want to validate, relevant compliance constraints, and an indicative deployment timeline. A researcher will follow up with next steps.

Prefer to email directly? Contact models@evalops.dev.

Access

Engage our research team

Access is gated while we complete external red-team exercises. Share context on your environment and intended use so we can scope deployment, hardware, and disclosure requirements.

Book a technical review