Skip to main content

Architecture

Chive is an AT Protocol AppView that indexes and presents scholarly preprints from the decentralized network. This document provides a high-level overview of the system architecture.

System overview

┌─────────────────────────────────────────────────────────────────────────┐
│ Internet │
└─────────────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Web Clients │ │ API Clients │ │ AT Protocol │
│ (Next.js) │ │ (Mobile, etc) │ │ Relay (BGS) │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
│ HTTPS │ HTTPS │ WebSocket
│ │ │
└─────────────────────────┼─────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────┐
│ Load Balancer │
└─────────────────────────────────────────────────────────────────────────┘

┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ API Server 1 │ │ API Server 2 │ │ Firehose │
│ (Hono) │ │ (Hono) │ │ Consumer │
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
└─────────────────────────┼─────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────┐
│ Service Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ Preprint │ │ Search │ │ Discovery │ │ Knowledge │ │
│ │ Service │ │ Service │ │ Service │ │ Graph Service │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────┐
│ Storage Layer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
│ │ PostgreSQL │ │Elasticsearch│ │ Neo4j │ │ Redis │ │
│ │ (metadata) │ │ (search) │ │ (graph) │ │ (cache) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

Core components

API layer

The API layer handles all incoming requests through Hono:

ComponentPurpose
XRPC handlersAT Protocol native endpoints (/xrpc/pub.chive.*)
REST handlersTraditional HTTP endpoints (/api/v1/*)
AuthenticationOAuth 2.0 + PKCE, JWT sessions
Rate limitingTiered limits by user type
ValidationRequest/response schema validation

Firehose consumer

Subscribes to the AT Protocol relay to receive real-time events:

Relay WebSocket → Event Parser → Collection Filter → Event Handler → Storage

The consumer filters for pub.chive.* records and processes:

  • Preprint submissions
  • Reviews and comments
  • Endorsements
  • Governance proposals and votes
  • User tags

Service layer

Business logic is encapsulated in services:

ServiceResponsibility
PreprintServicePreprint CRUD, version management
SearchServiceFull-text search, faceted queries
DiscoveryServiceRecommendations, similar papers
KnowledgeGraphServiceField taxonomy, authority control
ReviewServiceReview threading, endorsements
GovernanceServiceProposals, voting
MetricsServiceView counts, trending
ClaimingServiceAuthorship verification

Storage layer

Four specialized storage systems:

SystemUse case
PostgreSQLStructured metadata, relationships, transactions
ElasticsearchFull-text search, faceted filtering
Neo4jKnowledge graph, citation networks
RedisSession cache, rate limiting, real-time data

Data flow

Write path (indexing)

User PDS → Relay → Firehose Consumer → Event Handler

┌──────────────────────┼──────────────────────┐
▼ ▼ ▼
PostgreSQL Elasticsearch Neo4j
(metadata) (search) (graph)
  1. User creates a record in their PDS
  2. PDS syncs to the relay network
  3. Firehose consumer receives the event
  4. Event handler validates and routes to appropriate service
  5. Service writes to relevant storage systems

Read path (queries)

Client Request → API Handler → Service → Storage Adapter → Database


Response
  1. Client sends request to API
  2. Handler authenticates and validates
  3. Service orchestrates data retrieval
  4. Storage adapter queries appropriate database
  5. Response formatted and returned

Key principles

AT Protocol compliance

Chive is a read-only indexer. It never:

  • Writes to user PDSes
  • Stores blob data (only BlobRefs)
  • Acts as source of truth for user content

All indexes can be rebuilt from the firehose.

Horizontal scalability

┌────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ API Pod │ │ API Pod │ │ API Pod │ │ Firehose Pod │ │
│ │ (1) │ │ (2) │ │ (3) │ │ │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────┐ │
│ │ Managed Services │ │
│ │ PostgreSQL (RDS) │ Elasticsearch │ Neo4j │ Redis │ │
│ └──────────────────────────────────────────────────────────┘ │
└────────────────────────────────────────────────────────────────┘
  • API servers scale horizontally
  • Single firehose consumer (with failover)
  • Managed database services with replication

Fault tolerance

FailureRecovery
API server crashLoad balancer routes to healthy nodes
Firehose disconnectAutomatic reconnect with cursor resume
Database unavailableCircuit breaker, graceful degradation
Cache missFallback to primary storage

Security architecture

Zero trust model

Client → TLS → Load Balancer → mTLS → API Server → mTLS → Services


Authentication
Authorization
Audit Logging
  • Every request authenticated
  • Mutual TLS between internal services
  • All actions audit logged
  • Secrets managed via HashiCorp Vault

Authentication flow

User → OAuth 2.0 + PKCE → User's PDS → Callback → JWT Session

Plugin architecture

┌─────────────────────────────────────────────────────────────────┐
│ Plugin Host │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Plugin │ │ Plugin │ │ Plugin │ │
│ │ (ORCID) │ │ (DOI) │ │ (Zenodo) │ │
│ │ │ │ │ │ │ │
│ │ isolated-vm │ │ isolated-vm │ │ isolated-vm │ │
│ └─────────────┘ └─────────────┘ └─────────────────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────────┘ │
│ │ │
│ ▼ │
│ Event Bus (EventEmitter2) │
│ │ │
│ ▼ │
│ Core Services │
└─────────────────────────────────────────────────────────────────┘

Plugins run in isolated-vm sandboxes with:

  • Declared permissions
  • Resource limits (CPU, memory)
  • Controlled API access

Observability

┌─────────────────────────────────────────────────────────────────┐
│ Application │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ Traces │ │ Metrics │ │ Logs │ │
│ │ (OpenTelemetry)│ (Prometheus)│ │ (Pino) │ │
│ └──────┬──────┘ └──────┬──────┘ └───────────┬─────────────┘ │
└─────────┼────────────────┼─────────────────────┼───────────────┘
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Jaeger │ │ Grafana │ │ Loki │
│ (traces) │ │ (metrics) │ │ (logs) │
└─────────────┘ └─────────────┘ └─────────────┘

Technology choices

ComponentTechnologyRationale
RuntimeNode.js 22V8 performance, AT Protocol SDK
LanguageTypeScriptType safety, AT Protocol tooling
API frameworkHonoFast, lightweight, Edge-compatible
FrontendNext.js 15React 19, App Router, SSR
Primary DBPostgreSQLACID, JSON support, extensions
SearchElasticsearchFull-text, faceted, scalable
Graph DBNeo4jCitation networks, traversals
CacheRedisSessions, rate limiting, pub/sub
ContainerDockerConsistent environments
OrchestrationKubernetesScaling, self-healing
ObservabilityOpenTelemetryVendor-neutral telemetry

Next steps