Scaling

This guide covers horizontal and vertical scaling strategies for Chive.

Scaling principles

Stateless services

API, web, and worker services are stateless and can scale horizontally:

Service	Scalable	Notes
API	Yes	Load-balanced, any replica handles any request
Web	Yes	Static assets via CDN, SSR scales horizontally
Worker	Yes	Jobs distributed via BullMQ
Indexer	Limited	Single consumer for event ordering (see below)

Stateful services

Databases require different scaling strategies:

Database	Scaling approach
PostgreSQL	Read replicas, connection pooling
Elasticsearch	Add nodes, increase shards
Neo4j	Single instance or causal cluster
Redis	Sentinel for HA, Cluster for scale

Horizontal scaling

API service

# Kubernetes HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: chive-api
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: chive-api
  minReplicas: 3
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

Worker service

Scale workers based on queue depth:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: chive-worker
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: chive-worker
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: External
      external:
        metric:
          name: bullmq_queue_depth
          selector:
            matchLabels:
              queue: indexing
        target:
          type: AverageValue
          averageValue: 100

Indexer scaling

The indexer must remain a single instance to preserve event ordering. Scale by:

Faster processing: Optimize event handlers
Parallel non-ordered work: Offload to workers via queue
Multiple firehose subscriptions: Partition by collection (advanced)

Vertical scaling

Resource recommendations

Service	CPU	Memory	Notes
API	1-2 cores	1-2 GB	Scale horizontally first
Indexer	0.5-1 core	512 MB	Memory for event buffering
Worker	0.5-1 core	512 MB	CPU-bound for enrichment
Web	0.5-1 core	512 MB	Memory for SSR cache

Database sizing

PostgreSQL

Metric	Small	Medium	Large
Preprints	100K	1M	10M
Storage	50 GB	200 GB	1 TB
RAM	4 GB	16 GB	64 GB
CPUs	2	8	32

Elasticsearch

Metric	Small	Medium	Large
Documents	100K	1M	10M
Storage	20 GB	100 GB	500 GB
Nodes	1	3	5+
RAM per node	4 GB	16 GB	32 GB

Neo4j

Metric	Small	Medium	Large
Nodes	100K	1M	10M
Storage	10 GB	50 GB	200 GB
RAM	4 GB	16 GB	64 GB

Connection pooling

PgBouncer

Use PgBouncer for PostgreSQL connection pooling:

# pgbouncer.ini
[databases]
chive = host=postgres port=5432 dbname=chive

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 50
min_pool_size = 10

Application-level pooling

// src/storage/postgresql/connection.ts
const pool = new Pool({
  host: process.env.POSTGRES_HOST,
  port: parseInt(process.env.POSTGRES_PORT || '5432'),
  database: process.env.POSTGRES_DB,
  user: process.env.POSTGRES_USER,
  password: process.env.POSTGRES_PASSWORD,
  min: 5,
  max: 20,
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 10000,
});

Caching strategies

Multi-tier caching

Request → L1 (Memory) → L2 (Redis) → L3 (Database)
                ↓              ↓
            ~1ms           ~5ms        ~50ms

Cache warming

Pre-populate caches on startup:

async function warmCache(): Promise<void> {
  // Warm trending preprints
  const trending = await db.getTrendingPreprints(100);
  await Promise.all(
    trending.map((p) => cache.set(`preprint:${p.uri}`, p, 600))
  );

  // Warm field taxonomy
  const fields = await neo4j.getAllFields();
  await cache.set('fields:all', fields, 3600);
}

Cache invalidation

Event-driven invalidation via Redis Pub/Sub:

// On preprint update
await redis.publish('cache:invalidate', JSON.stringify({
  type: 'preprint',
  uri: preprintUri,
}));

// Subscriber
subscriber.on('message', (channel, message) => {
  const { type, uri } = JSON.parse(message);
  cache.del(`${type}:${uri}`);
});

Load balancing

Kubernetes Ingress

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: chive
  annotations:
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "300"
spec:
  ingressClassName: nginx
  rules:
    - host: api.chive.pub
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: chive-api
                port:
                  number: 3000

Session affinity

For WebSocket connections (notifications):

annotations:
  nginx.ingress.kubernetes.io/affinity: "cookie"
  nginx.ingress.kubernetes.io/session-cookie-name: "chive-affinity"
  nginx.ingress.kubernetes.io/session-cookie-max-age: "3600"

CDN configuration

Static assets

Serve frontend assets via CDN:

# Cloudflare page rules
- url: "chive.pub/_next/static/*"
  cache_level: "Cache Everything"
  edge_cache_ttl: 2592000  # 30 days

- url: "chive.pub/api/*"
  cache_level: "Bypass"

Blob caching

PDF and image blobs:

- url: "blobs.chive.pub/*"
  cache_level: "Cache Everything"
  edge_cache_ttl: 86400  # 24 hours

Performance tuning

Node.js options

# Increase memory for large datasets
NODE_OPTIONS="--max-old-space-size=4096"

# Enable clustering (handled by K8s, not needed)
# PM2_INSTANCES=4

Database tuning

PostgreSQL

-- Increase shared buffers
ALTER SYSTEM SET shared_buffers = '4GB';

-- Increase work memory for complex queries
ALTER SYSTEM SET work_mem = '256MB';

-- Increase max connections
ALTER SYSTEM SET max_connections = 200;

Elasticsearch

# elasticsearch.yml
indices.memory.index_buffer_size: 30%
thread_pool.write.queue_size: 1000

Capacity planning

Request rate estimates

Scenario	RPS	API replicas
Low	100	2
Medium	500	5
High	2000	15
Peak	5000	30

Storage growth

Metric	Growth rate
Preprints	~1000/day
Reviews	~500/day
Blob cache	~10 GB/day
Logs	~5 GB/day

Deployment: Production setup
Monitoring: Observability
Troubleshooting: Common issues

Scaling principles​

Stateless services​

Stateful services​

Horizontal scaling​

API service​

Worker service​

Indexer scaling​

Vertical scaling​

Resource recommendations​

Database sizing​

PostgreSQL​

Elasticsearch​

Neo4j​

Connection pooling​

PgBouncer​

Application-level pooling​

Caching strategies​

Multi-tier caching​

Cache warming​

Cache invalidation​

Load balancing​

Kubernetes Ingress​

Session affinity​

CDN configuration​

Static assets​

Blob caching​

Performance tuning​

Node.js options​

Database tuning​

PostgreSQL​

Elasticsearch​

Capacity planning​

Request rate estimates​

Storage growth​

Related documentation​

Scaling principles

Stateless services

Stateful services

Horizontal scaling

API service

Worker service

Indexer scaling

Vertical scaling

Resource recommendations

Database sizing

PostgreSQL

Elasticsearch

Neo4j

Connection pooling

PgBouncer

Application-level pooling

Caching strategies

Multi-tier caching

Cache warming

Cache invalidation

Load balancing

Kubernetes Ingress

Session affinity

CDN configuration

Static assets

Blob caching

Performance tuning

Node.js options

Database tuning

PostgreSQL

Elasticsearch

Capacity planning

Request rate estimates

Storage growth

Related documentation