Lattice
Model overview

Lattice language model family

Lattice is a family of security-focused language models and runtime tooling built to assist vulnerability discovery and reproducible SARIF-format findings. Each Lattice release ships with reproducible evaluation artifacts, hardened inference settings for safe production use, and integration hooks for enterprise environments (gRPC/REST/CLI). This document summarizes model provenance, security mitigations, SLOs, audit evidence, and operational runbooks for deploying Lattice in CI/CD and research environments.

Audit provenance & compliance

Training window: CVE, advisories and OSS tooling scraped and ingested between 2015-01-01 and 2025-06-30.
Licenses: NVD (public), OSV (public), GitHub (public repos under MIT/Apache/BSD), plus commercial datasets under MSA. Contact research@evalops.dev for full dataset manifest under NDA.
External audit: Independent red-team performed by external security firm on 2025-09-28. Findings: 3 medium prompt injections (patch commits: a4f2c81, b7e9d43, c1a5f22), 1 low context leak (mitigated via output sanitization). Audit report available under NDA.
Production SLO definitions:
  • Model error rate
    Inference requests flagged as erroneous (exceptions, timeouts, malformed SARIF) / total requests
    Threshold: < 2% over 5-minute sliding window
  • SARIF reproduction success
    Findings where automated sandbox validation confirms vulnerability / total model findings
    Threshold: > 60% over 1-hour window
  • p95 latency
    95th percentile inference duration measured server-side
    Threshold: < 500ms (L1), < 800ms (L2), < 1500ms (L3)

Release train

2025.10.02

Current production checkpoints. Updated twice monthly.

Context window

64k – 256k

Depending on tier. Rotary embeddings + chunk routing.

Supported runtimes

A100 • H100 • CPU AVX2/AVX512

Reference deployments maintained for each release train.

Model suite

Configurations maintained per release

Parameters below are reference values. Custom builds may diverge based on hardware or compliance constraints.

Evaluation snapshot

Latest benchmarking results

BenchmarkMetricLattice-1Lattice-2Lattice-3Baseline
CWE-V Suite (1.4k vulns, test split)F10.710.810.890.58 (Llama-3-70B)
OSS-Fuzz triage (2.1k crashes, held-out)Recall @ top50.540.660.780.41 (GPT-4-turbo)
Internal exploit chain set (380 chains)Step accuracy0.420.630.74N/A (proprietary)

CWE-V Suite (1.4k vulns, test split): 3 runs, seeds: 42, 1337, 9001; temp=0.3

OSS-Fuzz triage (2.1k crashes, held-out): Single pass, no few-shot prompting

Internal exploit chain set (380 chains): Avg over 3 seeds; eval script: github.com/evalops/eval-harness@v2.1.3

Model provenance

Training data sources & model card

Training data sources

  • CVE corpus (2015–2025) with exploit PoCs from NVD, GitHub advisories, OSV database
  • Open-source security tooling repositories (static analyzers, fuzzers, symbolic executors)
  • Filtered subset of Stack Overflow, security blogs, and vulnerability disclosure reports
  • Synthetic exploit chains generated via internal red-team exercises
  • Licensed commercial datasets (secure code patterns, proprietary CVE analysis)

Data filtering & compliance

All training data undergoes PII redaction, license verification (MIT/Apache/BSD or commercial agreement), and deduplication. No customer data or classified material is included.

Licensing

Model weights available under commercial license. Deployment requires executed agreement.

Risk management

Threat model & mitigations

Failure modeDetectionMitigationOwner
Prompt injection via malicious code comments
Impact: Model outputs misleading SARIF or skips vulnerabilities
Ensemble disagreement score > 0.6Reject output, return safe fallback, increment incident countermodel-security@evalops.dev
Context window overflow on large repos
Impact: Incomplete analysis, missing cross-file vulnerabilities
Token count exceeds 90% of window capacityAutomatic chunking with 10% overlap, hierarchical summarization, dependency graph prioritizationplatform-team@evalops.dev
Training data poisoning (hypothetical)
Impact: Model trained to ignore specific vulnerability patterns
Adversarial validation failures, unexpected eval driftMulti-source data provenance, continuous red-team testing, quarterly re-evaluationresearch@evalops.dev
Output hallucination of exploit steps
Impact: False reproduction instructions waste security team time
Symbolic execution validation failure rate > 40%Docker sandbox verification, confidence scoring, human triage for low-confidence findingsmodel-security@evalops.dev
API key compromise
Impact: Unauthorized inference access, quota abuse
Anomalous request patterns, geographic anomalies30-day mandatory rotation, rate limiting, IP allowlisting, immediate key revocationsecurity-ops@evalops.dev

Over-confident false positives

Model may flag benign patterns as vulnerabilities when code resembles known exploits

Mitigation

Temperature clamping at 0.3 for production inference, ensemble voting with static analyzers

Context window overflow

Large codebases may exceed token limits, causing incomplete analysis

Mitigation

Automatic chunking with overlap, hierarchical summarization for cross-file dependencies

Novel exploit patterns

Zero-day techniques outside training distribution may be missed

Mitigation

Hybrid approach: model + symbolic execution + fuzz testing for comprehensive coverage

Environment hallucination

Reproduction steps may assume tooling or libraries not present in target environment

Mitigation

Environment manifest validation, Docker container generation for isolated reproduction

Security & compliance

Audit-ready operational standards

Data retentionInference logs: 90 days. Model artifacts: indefinite with version control.
Access controlmTLS authentication, API key rotation every 30 days, role-based permissions (RBAC).
EncryptionTLS 1.3 in transit. AES-256-GCM at rest (keys managed via AWS KMS / GCP KMS).
Audit logsAll inference requests logged to S3/GCS with 7-year retention. Tamper-proof via append-only buckets.
SBOMRuntime dependencies tracked via SPDX 2.3 manifests, published per release train.
Compliance statusSOC 2 Type II certified. FIPS 140-2 validation in progress. No HIPAA/FedRAMP yet.
Observability & SLOs

Monitoring, alerting, and automated rollback

MetricPrometheus metricSLO thresholdAction on violation
Model error ratelattice_inference_error_rate< 2% over 5-minute windowPage on-call, roll back to previous checkpoint
SARIF reproduction successlattice_sarif_repro_success_rate> 60% for 1-hour windowAlert security team, disable auto-deployment
p95 latencylattice_inference_duration_seconds< 500ms (Lattice-1), < 800ms (Lattice-2), < 1500ms (Lattice-3)Scale up GPU instances, enable request queuing

Prometheus alert rules

# Prometheus alert rules
groups:
  - name: lattice_model_health
    rules:
      - alert: HighInferenceErrorRate
        expr: increase(lattice_inference_error_count[5m]) / increase(lattice_inference_request_count[5m]) > 0.02
        for: 5m
        annotations:
          summary: "Model error rate exceeded 2% over 5m"
          runbook: "/runbooks/lattice/high_error_rate.md"
      
      - alert: LowSARIFReproRate
        expr: increase(lattice_sarif_repro_success_count[1h]) / increase(lattice_sarif_total_findings[1h]) < 0.60
        for: 1h
        annotations:
          summary: "SARIF reproduction success below 60%"
          runbook: "/runbooks/lattice/low_repro_rate.md"

Automated rollback

# Automated rollback trigger (deployed as CronJob)
#!/usr/bin/env bash
set -euo pipefail
PROM_API="http://prometheus:9090/api/v1/query"
ERROR_RATE=$(curl -s "${PROM_API}?query=increase(lattice_inference_error_count[5m])/increase(lattice_inference_request_count[5m])" | jq -r '.data.result[0].value[1]')
THRESH=0.02
if (( $(echo "$ERROR_RATE > $THRESH" | bc -l) )); then
  STABLE_REV=$(curl -s https://s3.amazonaws.com/evalops-stable-manifests/latest.json | jq -r .revision)
  kubectl set image deployment/lattice-inference lattice=lattice:${STABLE_REV}
  kubectl rollout status deployment/lattice-inference
  curl -X POST $PAGERDUTY_WEBHOOK -d '{"event":"trigger","payload":{"summary":"Auto-rollback executed","details":{"error_rate":"'"$ERROR_RATE"'"}}}'
fi

Manual rollback playbook

# Manual rollback playbook
# 1. Identify last stable checkpoint
kubectl rollout history deployment/lattice-inference

# 2. Roll back to previous revision
kubectl rollout undo deployment/lattice-inference --to-revision=N

# 3. Verify health
kubectl rollout status deployment/lattice-inference
curl https://api.evalops.dev/v1/health

# 4. Update model version in Terraform
terraform apply -var="model_version=2025.09.18" -target=aws_ecs_task_definition.lattice

Alerting infrastructure

PagerDuty integration for critical alerts. Slack notifications for warnings. Automated rollback triggers on sustained SLO violations.

Cost & capacity

Pricing and autoscaling guidance

Model tierCost estimateHardwareAutoscaling strategy
Lattice-1$0.08 / 1K tokensA100 (40GB) or 8-core CPU AVX2Horizontal pod autoscaling: target 70% GPU utilization
Lattice-2$0.18 / 1K tokensA100 (80GB) or 16-core CPU AVX512Batch inference recommended for large repos (>100K LOC)
Lattice-3$0.42 / 1K tokensH100 (80GB) requiredReserved capacity for research workloads, spot instances for CI/CD

Sizing examples

ScenarioEst. tokensRecommendationCost per scanNote
Medium repo (250K LOC, ~120MB tarball)~85K tokensLattice-2 batch mode$15.30 per full scanUse streaming for repos > 500K LOC to avoid timeout
Microservice (15K LOC, ~8MB)~5K tokensLattice-1 streaming$0.40 per scanIdeal for CI/CD on every commit
Monorepo (1.2M LOC, ~600MB)~420K tokens (chunked)Lattice-3 with hierarchical summarization$176.40 per scanEnable cross-file dependency analysis
Integration surface

What the runtime expects and emits

Interfaces

  • gRPC endpoint (stream + unary)
  • REST inference proxy
  • CLI for batch audit runs

Artifacts accepted

  • Source trees (Git), SBOM manifests
  • Compiled binaries (ELF/PE/Mach-O)
  • Container images, IaC templates

Outputs

  • SARIF v2.1.0
  • Custom JSON (root-cause + reproduction steps)
  • Markdown incident briefs

Observability

  • OpenTelemetry traces
  • Metric export: Prometheus
  • Audit logs: S3/GCS
API & integration

Rate limits, authentication, and error handling

TierRate limitBurst limit
Standard100 req/min per API key200 req/min (30s burst)
Enterprise1000 req/min per API key2000 req/min (30s burst)

Request limits

Max payload: 50MB tarball or 100K LOC uncompressed. Streaming: 10MB chunks.

Auth policy

API key rotation: 30 days mandatory. mTLS required for production. Bearer token format: `Authorization: Bearer lat_sk_live_...`

429 Rate limit response

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 42

{
  "error": "rate_limit_exceeded",
  "message": "100 requests per minute exceeded",
  "retry_after_seconds": 42
}

401 Auth failure response

HTTP/1.1 401 Unauthorized
Content-Type: application/json

{
  "error": "invalid_token",
  "message": "API key expired or invalid. Rotate key at https://evalops.dev/settings/keys"
}
API examples & sample output

Integration code and SARIF preview

gRPC streaming

// gRPC streaming inference
const client = new LatticeClient('grpc://api.evalops.dev:443', credentials);
const stream = client.analyzeCode({
  repository: 'github.com/org/repo',
  branch: 'main',
  model: 'lattice-2'
});
stream.on('data', (result) => console.log(result.sarif));

REST API

# REST unary request
curl -X POST https://api.evalops.dev/v1/analyze   -H "Authorization: Bearer $API_KEY"   -H "Content-Type: application/json"   -d '{
    "source": "base64_encoded_tarball",
    "model": "lattice-1",
    "output_format": "sarif"
  }'

CLI batch

# CLI batch audit
lattice audit --repo /path/to/repo   --model lattice-3   --output results.sarif   --parallel 4

Sample SARIF output

{
  "version": "2.1.0",
  "runs": [
    {
      "tool": {
        "driver": {
          "name": "Lattice-2",
          "version": "2025.10.02"
        }
      },
      "results": [
        {
          "ruleId": "CWE-787",
          "message": {
            "text": "Out-of-bounds write detected in buffer copy operation"
          },
          "locations": [
            {
              "physicalLocation": {
                "artifactLocation": {
                  "uri": "src/parser.c"
                },
                "region": {
                  "startLine": 142,
                  "startColumn": 5
                }
              }
            }
          ],
          "codeFlows": [
            {
              "threadFlows": [
                {
                  "locations": [
                    {
                      "location": {
                        "message": {
                          "text": "User input received"
                        },
                        "physicalLocation": {
                          "artifactLocation": {
                            "uri": "src/input.c"
                          },
                          "region": {
                            "startLine": 89
                          }
                        }
                      }
                    },
                    {
                      "location": {
                        "message": {
                          "text": "Buffer allocated with fixed size"
                        },
                        "physicalLocation": {
                          "artifactLocation": {
                            "uri": "src/parser.c"
                          },
                          "region": {
                            "startLine": 138
                          }
                        }
                      }
                    },
                    {
                      "location": {
                        "message": {
                          "text": "Unchecked copy operation"
                        },
                        "physicalLocation": {
                          "artifactLocation": {
                            "uri": "src/parser.c"
                          },
                          "region": {
                            "startLine": 142
                          }
                        }
                      }
                    }
                  ]
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Full benchmark results and sanitized evaluation traces available under NDA. This preview demonstrates SARIF v2.1.0 compliance with code flow tracking.

Explainability & interpretability

Confidence scoring and feature attribution

Each SARIF finding includes confidence scoring and feature attribution to support human triage and prioritization.

Confidence scoring

Confidence scores (0.0–1.0) indicate the model's certainty that a detected pattern represents a genuine vulnerability.

High confidence≥ 0.85

Automatic flagging for review; suitable for automated blocking in CI/CD

Medium confidence0.60 – 0.84

Manual triage recommended; review code context and reproduction steps

Low confidence< 0.60

Informational only; useful for exploratory analysis or code quality insights

Calibration: Confidence scores are calibrated against historical validation data. Scores ≥ 0.85 achieve >92% precision on held-out test sets.

Feature attribution

Top contributing features for each finding, ranked by attention weight.

Example attribution

  • Unchecked user input flow (weight: 0.43)
  • Buffer allocation with fixed size (weight: 0.31)
  • Missing bounds validation (weight: 0.26)

Gradient-based attribution using integrated gradients; visualized in SARIF properties.attributions array.

Uncertainty quantification

For findings with ensemble disagreement, we report uncertainty intervals.

Confidence: 0.72 ± 0.08 (3-model ensemble variance)

High variance (> 0.10) suggests ambiguous patterns; prioritize symbolic execution verification.

Reproducible artifacts

Public evaluation preview bundle

Sanitized evaluation preview bundle for pipeline validation (no NDA required)

Bundle contents

  • 1 synthetic vulnerable C program (CWE-787 buffer overflow)
  • Expected SARIF output with code flow annotations
  • CLI invocation script and Docker compose file
  • Validation script to verify SARIF schema compliance
Download preview bundle (12MB)

sha256:a3f5b8c2d1e9f7a6b4c3d2e1f9a8b7c6d5e4f3a2b1c9d8e7f6a5b4c3d2e1f0

Compliance & audit materials

Enterprise compliance pack

NDA-gated compliance pack for enterprise procurement and security audits

Dataset Provenance

  • Full dataset manifest with source URLs, licenses, and ingestion dates
  • License agreements for commercial datasets (anonymized vendor references)
  • PII redaction and filtering pipeline documentation
  • Data lineage diagram showing transformation steps

Red-Team Audit Report

  • Full audit report from external security firm (2025-09-28)
  • Detailed findings with CVSS scores and exploit PoCs
  • Patch commit diffs for all identified vulnerabilities
  • Post-remediation validation test results

Evaluation Artifacts

  • Complete evaluation scripts with exact seeds and hyperparameters
  • Test/train/dev split manifests for all benchmarks
  • Raw model outputs and ground truth labels for reproducibility
  • Statistical analysis notebooks (Jupyter) with variance calculations

Operational Runbooks

  • Incident response playbooks for each threat model scenario
  • Rollback procedures with tested examples
  • On-call escalation matrix and contact protocols
  • SLO breach remediation decision trees

Compliance Attestations

  • SOC 2 Type II report (current period)
  • FIPS 140-2 validation progress documentation
  • SBOM (SPDX 2.3) for all runtime dependencies
  • Encryption key management policies (AWS KMS / GCP KMS)

Access requirements

Contact research@evalops.dev with company information and use case summary. Requires executed MSA or procurement in progress.

Delivery & security

Secure package delivered via encrypted S3 bucket with 7-day expiring presigned URL. GPG-signed manifest for integrity verification.

Enterprise procurement teams

This pack satisfies typical vendor security questionnaires and SOC 2 attestation requirements for ML/AI systems. Request access early in your procurement cycle.

Invitation-only access

Request a private evaluation

Tell us about your environment so we can scope a controlled model preview.

We grant access selectively while we remain in stealth. Share the workloads you want to validate, relevant compliance constraints, and an indicative deployment timeline. A researcher will follow up with next steps.

Prefer to email directly? Contact models@evalops.dev.

We review each request manually. Describe your environment so we can scope evaluation access.

Access

Engage our research team

Access is gated while we complete external red-team exercises. Share context on your environment and intended use so we can scope deployment, hardware, and disclosure requirements.

Book a technical review