Skip to main content

Observability and metrics reference

This page documents all Prometheus metrics registered in Chive's centralized prometheus-registry.ts. These metrics are exposed via the /metrics endpoint for Prometheus scraping and are also available as JSON through pub.chive.admin.getPrometheusMetrics.

All metrics use the chive_ prefix. Default Node.js metrics (CPU, memory, event loop, GC) are collected automatically with the same prefix.

HTTP metrics

Instrumented in the HTTP middleware layer. Follow the RED method (Rate, Errors, Duration).

MetricTypeLabelsDescription
chive_http_requests_totalCountermethod, endpoint, statusTotal HTTP requests
chive_http_request_duration_secondsHistogrammethod, endpoint, statusRequest duration (buckets: 10ms to 10s)

Source: src/observability/prometheus-registry.ts (exported as httpMetrics)

Instrumented in: HTTP middleware (wraps all Hono route handlers)

Eprint indexing metrics

Tracks eprint record indexing from the firehose.

MetricTypeLabelsDescription
chive_eprints_indexed_totalCounterfield, statusTotal eprints indexed, by knowledge graph field and outcome
chive_eprint_indexing_duration_secondsHistogramstatusIndexing duration per eprint (buckets: 10ms to 10s)

Source: src/observability/prometheus-registry.ts (exported as eprintMetrics)

Instrumented in: Firehose event handlers, eprint indexing service

Firehose metrics

Tracks the ATProto firehose consumer connection and event processing.

MetricTypeLabelsDescription
chive_firehose_events_totalCounterevent_typeTotal events processed, by type (commit, identity, account, handle)
chive_firehose_cursor_lag_secondsGauge(none)How far behind the consumer is from the relay
chive_firehose_active_connectionsGauge(none)Number of active WebSocket connections to the relay
chive_firehose_parse_errors_totalCountererror_typeEvents that failed to parse (json_parse, validation, unknown)

Source: src/observability/prometheus-registry.ts (exported as firehoseMetrics)

Instrumented in: src/atproto/ firehose consumer

Database metrics

Tracks connection pool status and query performance for all databases.

MetricTypeLabelsDescription
chive_database_connections_activeGaugedatabaseActive connections (postgresql, redis, elasticsearch, neo4j)
chive_database_query_duration_secondsHistogramdatabase, operationQuery duration (buckets: 10ms to 10s)

Source: src/observability/prometheus-registry.ts (exported as databaseMetrics)

Instrumented in: Database connection wrappers in src/storage/

PDS scanning metrics

Tracks PDS discovery, scanning, and record indexing during backfill and periodic scan operations.

MetricTypeLabelsDescription
chive_pds_scans_totalCounterstatusTotal PDS scans (success, error, skipped)
chive_pds_scan_duration_secondsHistogramstatusScan duration per PDS (buckets: 100ms to 60s)
chive_pds_records_scanned_totalCountercollectionTotal records scanned from PDSes, by collection
chive_pds_records_indexed_totalCountercollection, statusRecords indexed from scans, by collection and outcome
chive_pds_record_index_duration_secondsHistogramcollection, statusPer-record indexing duration (buckets: 10ms to 5s)
chive_pdses_discovered_totalGauge(none)Total PDSes known to the system
chive_pdses_with_records_totalGauge(none)PDSes that have pub.chive.* records

Source: src/observability/prometheus-registry.ts (exported as pdsMetrics)

Instrumented in: src/services/pds/ scanner and registry

Citation extraction metrics

Tracks GROBID extraction, Crossref lookups, Semantic Scholar enrichment, and internal matching.

MetricTypeLabelsDescription
chive_citation_extractions_totalCountersource, statusExtraction operations by source (grobid, semantic-scholar, crossref) and outcome
chive_citations_extracted_totalCountersourceIndividual citations extracted, by source
chive_citations_matched_totalCountermatch_methodCitations matched to Chive eprints (doi, title)
chive_citation_extraction_duration_secondsHistogramsource, statusExtraction duration (buckets: 100ms to 60s)

Source: src/observability/prometheus-registry.ts (exported as citationMetrics)

Instrumented in: src/services/citation/ extraction service

Background job metrics

Tracks execution of periodic background jobs (PDS scanning, governance sync, etc.).

MetricTypeLabelsDescription
chive_job_executions_totalCounterjob, statusTotal job executions by name and outcome
chive_job_duration_secondsHistogramjob, statusExecution duration (buckets: 100ms to 5 minutes)
chive_job_last_run_timestampGaugejobUnix timestamp of last execution
chive_job_items_processed_totalCounterjob, statusItems processed by jobs

Source: src/observability/prometheus-registry.ts (exported as jobMetrics)

Instrumented in: src/jobs/ job runners

Worker metrics

Tracks background worker task processing (thread pool workers, queue consumers).

MetricTypeLabelsDescription
chive_worker_tasks_totalCounterworker, statusTotal tasks processed by worker name and outcome
chive_worker_task_duration_secondsHistogramworkerTask duration (buckets: 10ms to 30s)
chive_worker_queue_depthGaugeworkerCurrent pending items in worker queue
chive_worker_active_countGaugeworkerCurrently active workers

Source: src/observability/prometheus-registry.ts (exported as workerMetrics)

Instrumented in: src/workers/ worker implementations

Authentication metrics

Tracks authentication attempts, token validation, and role lookups.

MetricTypeLabelsDescription
chive_auth_attempts_totalCountermethod, resultAuth attempts by method (service_auth) and result (success, failure, anonymous)
chive_auth_duration_secondsHistogrammethodAuth processing duration (buckets: 10ms to 5s)
chive_role_lookups_totalCounterresultRole lookups by result (cache_hit, cache_miss)

Source: src/observability/prometheus-registry.ts (exported as authMetrics)

Instrumented in: src/auth/ middleware

Search metrics

Tracks Elasticsearch search queries and results.

MetricTypeLabelsDescription
chive_search_queries_totalCountertypeTotal search queries by type
chive_search_results_totalCountertypeTotal results returned by type
chive_search_duration_secondsHistogramphaseDuration by phase (buckets: 10ms to 5s)

Source: src/observability/prometheus-registry.ts (exported as searchMetrics)

Instrumented in: src/services/search/ search service

Blob proxy metrics

Tracks blob proxy requests (fetching PDFs and other blobs from user PDSes).

MetricTypeLabelsDescription
chive_blob_proxy_requests_totalCounterstatus, cacheProxy requests by HTTP status and cache source (redis, cdn, pds)
chive_blob_proxy_bytes_totalCounterdirectionBytes transferred (in/out)
chive_blob_proxy_duration_secondsHistogram(none)Request duration (buckets: 10ms to 10s)

Source: src/observability/prometheus-registry.ts (exported as blobProxyMetrics)

Instrumented in: src/api/ blob proxy handler

Dead letter queue (DLQ) metrics

Tracks the firehose dead-letter queue size and retry operations.

MetricTypeLabelsDescription
chive_dlq_entries_totalGauge(none)Current number of entries in the DLQ
chive_dlq_retries_totalCounterstatusRetry operations by outcome (success, failure)

Source: src/observability/prometheus-registry.ts (exported as dlqMetrics)

Instrumented in: src/api/handlers/xrpc/admin/ DLQ handlers (listDLQEntries, retryDLQEntry, retryAllDLQ, dismissDLQEntry, purgeOldDLQ)

Admin action metrics

Tracks administrative operations performed through the admin dashboard.

MetricTypeLabelsDescription
chive_admin_actions_totalCounteraction, targetAdmin actions by type and target

Common label combinations:

actiontargetWhen
approvealpha_applicationAlpha application approved
rejectalpha_applicationAlpha application rejected
revokealpha_applicationAlpha application revoked
assign_roleuserRole assigned to user
revoke_roleuserRole revoked from user
deletepub.chive.eprint.submissionEprint soft-deleted
deletepub.chive.review.commentReview soft-deleted
rescanpdsPDS rescan triggered

Source: src/observability/prometheus-registry.ts (exported as adminMetrics)

Instrumented in: src/api/handlers/xrpc/admin/ mutation handlers

Backfill operation metrics

Tracks backfill operations triggered through the admin dashboard.

MetricTypeLabelsDescription
chive_backfill_operations_totalCountertype, statusOperations by type and outcome (started, completed, failed, cancelled)
chive_backfill_records_processedCountertypeTotal records processed across all backfills
chive_backfill_duration_secondsHistogramtypeOperation duration (buckets: 1s, 5s, 10s, 30s, 60s, 5m, 10m, 30m, 1h)

Source: src/observability/prometheus-registry.ts (exported as backfillMetrics)

Instrumented in: src/services/admin/backfill-manager.ts

Grafana dashboard recommendations

Admin operations dashboard

Create a Grafana dashboard with the following panels:

PanelQueryVisualization
Admin actions per hourrate(chive_admin_actions_total[1h])Time series, split by action
Active backfill operationschive_backfill_operations_total{status="started"} - chive_backfill_operations_total{status=~"completed|failed|cancelled"}Stat
Backfill duration P95histogram_quantile(0.95, rate(chive_backfill_duration_seconds_bucket[1h]))Time series, split by type
DLQ depthchive_dlq_entries_totalGauge
DLQ retry raterate(chive_dlq_retries_total[5m])Time series

PDS scanning dashboard

PanelQueryVisualization
PDS scans per hourrate(chive_pds_scans_total[1h])Time series, split by status
Records indexed per minuterate(chive_pds_records_indexed_total[1m])Time series, split by collection
PDSes discoveredchive_pdses_discovered_totalStat
PDSes with recordschive_pdses_with_records_totalStat
Scan duration P95histogram_quantile(0.95, rate(chive_pds_scan_duration_seconds_bucket[1h]))Time series

Authentication dashboard

PanelQueryVisualization
Auth attempts per minuterate(chive_auth_attempts_total[1m])Time series, split by result
Auth failure raterate(chive_auth_attempts_total{result="failure"}[5m]) / rate(chive_auth_attempts_total[5m])Gauge
Auth duration P99histogram_quantile(0.99, rate(chive_auth_duration_seconds_bucket[5m]))Time series
Role lookup cache hit raterate(chive_role_lookups_total{result="cache_hit"}[5m]) / rate(chive_role_lookups_total[5m])Gauge

Next steps