Skip to main content

Neo4j storage

Neo4j stores Chive's knowledge graph: field taxonomy, authority records, citations, and collaboration networks.

Graph schema

Node types

LabelDescriptionKey Properties
FieldHierarchical field taxonomyid, label, description
AuthorityRecordControlled vocabulary termsid, name, type, aliases
WikidataEntityWikidata Q-IDs for linkingqid, label, description
PreprintPreprint nodes for graph queriesuri, title
AuthorAuthor nodes for collaborationdid, name
FacetDimensionPMEST dimension templatesid, type, name

Relationship types

TypeFromToDescription
PARENT_OFFieldFieldHierarchy
RELATED_TOFieldFieldSemantic similarity
MAPPED_TOAuthorityRecordWikidataEntityExternal linking
TAGGED_WITHPreprintFieldField classification
AUTHOREDAuthorPreprintAuthorship
CITESPreprintPreprintCitations
COLLABORATES_WITHAuthorAuthorCo-authorship

Constraints and indexes

Uniqueness constraints

CREATE CONSTRAINT field_id_unique
FOR (f:Field) REQUIRE f.id IS UNIQUE;

CREATE CONSTRAINT authority_id_unique
FOR (a:AuthorityRecord) REQUIRE a.id IS UNIQUE;

CREATE CONSTRAINT wikidata_qid_unique
FOR (w:WikidataEntity) REQUIRE w.qid IS UNIQUE;

CREATE CONSTRAINT preprint_uri_unique
FOR (p:Preprint) REQUIRE p.uri IS UNIQUE;

CREATE CONSTRAINT author_did_unique
FOR (a:Author) REQUIRE a.did IS UNIQUE;

Performance indexes

-- Field search
CREATE INDEX field_label_idx FOR (f:Field) ON (f.label);
CREATE TEXT INDEX field_label_text FOR (f:Field) ON (f.label);

-- Authority search
CREATE INDEX authority_name_idx FOR (a:AuthorityRecord) ON (a.name);
CREATE INDEX authority_type_idx FOR (a:AuthorityRecord) ON (a.type);

-- Preprint lookup
CREATE INDEX preprint_created_idx FOR (p:Preprint) ON (p.createdAt);

Adapter usage

Connection

import { Neo4jAdapter } from '@/storage/neo4j/adapter.js';
import { createNeo4jConnection } from '@/storage/neo4j/connection.js';

const driver = await createNeo4jConnection({
uri: process.env.NEO4J_URI,
user: process.env.NEO4J_USER,
password: process.env.NEO4J_PASSWORD,
});

const adapter = new Neo4jAdapter(driver, logger);

Field operations

// Get field by ID
const field = await adapter.getField('cs.AI');

// Get children
const children = await adapter.getFieldChildren('cs');

// Get ancestors (path to root)
const ancestors = await adapter.getFieldAncestors('cs.AI.ML');

// Search fields
const matches = await adapter.searchFields('artificial intelligence', {
limit: 10,
includeAliases: true,
});

// Get related fields
const related = await adapter.getRelatedFields('cs.AI', {
limit: 5,
minScore: 0.5,
});

Authority records

import { AuthorityRepository } from '@/storage/neo4j/authority-repository.js';

const repo = new AuthorityRepository(driver, logger);

// Get by ID
const authority = await repo.findById('authority-123');

// Search
const results = await repo.search('machine learning', {
type: 'concept',
limit: 20,
});

// Get with external links
const withLinks = await repo.findWithExternalIds('authority-123');
console.log(withLinks.wikidataQid); // Q2539

Citation graph

import { CitationGraph } from '@/storage/neo4j/citation-graph.js';

const graph = new CitationGraph(driver, logger);

// Add citation
await graph.addCitation(citingUri, citedUri);

// Get citations for a preprint
const citations = await graph.getCitations(preprintUri, {
direction: 'outgoing', // or 'incoming'
limit: 50,
});

// Get citation count
const count = await graph.getCitationCount(preprintUri);

// Find co-citation clusters
const clusters = await graph.findCoCitationClusters(preprintUri, {
minSharedCitations: 3,
});

Graph algorithms

Neo4j Graph Data Science library powers advanced queries:

import { GraphAlgorithms } from '@/storage/neo4j/graph-algorithms.js';

const algorithms = new GraphAlgorithms(driver, logger);

// PageRank for field importance
const ranked = await algorithms.fieldPageRank({
dampingFactor: 0.85,
maxIterations: 20,
});

// Louvain community detection
const communities = await algorithms.detectCommunities('Author', 'COLLABORATES_WITH');

// Shortest path between fields
const path = await algorithms.shortestPath('cs.AI', 'physics.comp-ph');

// Node similarity
const similar = await algorithms.nodeSimilarity('Preprint', 'TAGGED_WITH', {
topK: 10,
});

Wikidata integration

import { WikidataConnector } from '@/storage/neo4j/wikidata-connector.js';

const wikidata = new WikidataConnector(driver, sparqlClient, logger);

// Reconcile authority record with Wikidata
const match = await wikidata.reconcile('machine learning', {
type: 'concept',
threshold: 0.8,
});

if (match) {
console.log(`Matched to ${match.qid}: ${match.label}`);
}

// Sync Wikidata properties
await wikidata.syncProperties('Q2539', ['description', 'aliases', 'sitelinks']);

// Get Wikidata hierarchy
const hierarchy = await wikidata.getHierarchy('Q2539', {
relationshipType: 'P279', // subclass of
depth: 3,
});

SPARQL queries

import { SparqlClient } from '@/storage/neo4j/sparql-client.js';

const sparql = new SparqlClient('https://query.wikidata.org/sparql');

// Find related Wikidata entities
const results = await sparql.query(`
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q11862829 . # instance of academic discipline
?item wdt:P361 wd:Q21198 . # part of computer science
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
LIMIT 100
`);

Facet management

import { FacetManager } from '@/storage/neo4j/facet-manager.js';

const facets = new FacetManager(driver, logger);

// Get PMEST dimensions
const dimensions = await facets.getDimensions();

// Get facet values for a dimension
const values = await facets.getFacetValues('matter', {
limit: 100,
includeCount: true,
});

// Assign facets to preprint
await facets.assignFacets(preprintUri, [
{ dimension: 'matter', value: 'cs.AI' },
{ dimension: 'energy', value: 'empirical' },
]);

Tag management

import { TagManager } from '@/storage/neo4j/tag-manager.js';

const tags = new TagManager(driver, logger);

// Get trending tags
const trending = await tags.getTrending({
period: '7d',
limit: 20,
});

// Get tag suggestions for preprint
const suggestions = await tags.getSuggestions(preprintUri, {
basedOn: 'similar-preprints',
limit: 10,
});

// Check for spam tags
import { TagSpamDetector } from '@/storage/neo4j/tag-spam-detector.js';

const detector = new TagSpamDetector(driver, logger);
const isSpam = await detector.check(tagText, authorDid);

Proposal handling

import { ProposalHandler } from '@/storage/neo4j/proposal-handler.js';

const proposals = new ProposalHandler(driver, storage, logger);

// Create field proposal
const proposal = await proposals.createFieldProposal({
type: 'create',
field: {
id: 'cs.QML',
label: 'Quantum Machine Learning',
parent: 'cs.AI',
},
proposerDid: userDid,
});

// Apply approved proposal
await proposals.applyProposal(proposalId);

// Revert proposal (if needed)
await proposals.revertProposal(proposalId);

Configuration

Environment variables:

VariableDefaultDescription
NEO4J_URIbolt://localhost:7687Bolt connection URI
NEO4J_USERneo4jUsername
NEO4J_PASSWORDRequiredPassword
NEO4J_DATABASEneo4jDatabase name
NEO4J_MAX_POOL_SIZE50Connection pool size

Setup

# Initialize schema and bootstrap data
tsx scripts/db/setup-neo4j.ts

# Or via npm script
pnpm db:setup:neo4j

Bootstrap data

Initial data includes:

  • Root field node (id: 'root')
  • Top-level field categories (cs, physics, math, bio, etc.)
  • 10 PMEST facet dimension templates
  • Initial authority records from LCSH

Testing

# Integration tests
pnpm test tests/integration/storage/neo4j-operations.test.ts

# Citation graph tests
pnpm test tests/unit/storage/neo4j/citation-graph.test.ts

# Algorithm tests
pnpm test tests/integration/storage/neo4j-algorithms.test.ts

Monitoring

Query performance

// Show slow queries
CALL dbms.listQueries() YIELD query, elapsedTimeMillis
WHERE elapsedTimeMillis > 1000
RETURN query, elapsedTimeMillis;

// Profile a query
PROFILE MATCH (f:Field)-[:PARENT_OF*]->(child)
WHERE f.id = 'cs'
RETURN child;

Database statistics

// Node counts by label
MATCH (n)
RETURN labels(n)[0] AS label, count(*) AS count
ORDER BY count DESC;

// Relationship counts
MATCH ()-[r]->()
RETURN type(r) AS type, count(*) AS count
ORDER BY count DESC;