Skip to main content

DiscoveryService

The DiscoveryService provides personalized preprint recommendations using Semantic Scholar and OpenAlex enrichment with graceful degradation when external APIs are unavailable.

Features

  • Preprint enrichment with citation data, concepts, and topics
  • Personalized recommendations based on user reading history
  • Similar paper discovery using SPECTER2 embeddings
  • Citation graph traversal (forward and backward citations)
  • Graceful degradation when external services are unavailable

Usage

import { DiscoveryService } from '@/services/discovery';

const discovery = container.resolve(DiscoveryService);

// Get personalized recommendations
const recommendations = await discovery.getRecommendationsForUser(userDid, {
limit: 20,
sources: ['semantic-scholar', 'openalex', 'local']
});

// Find similar preprints
const similar = await discovery.findRelatedPreprints(preprintUri, {
method: 'specter2',
limit: 10
});

// Enrich preprint with external metadata
const enriched = await discovery.enrichPreprint(preprint);

Recommendation algorithm

The service combines multiple signals for recommendations:

Signal sources

SignalWeightSource
SPECTER2 similarity0.35Semantic Scholar
Citation overlap0.25OpenAlex/Semantic Scholar
Field match0.20Local knowledge graph
Author co-citation0.10Citation analysis
Recency0.10Publication date

Personalization

User preferences are built from:

interface UserProfile {
readHistory: AtUri[]; // Preprints viewed > 30 seconds
endorsedPreprints: AtUri[]; // Preprints user endorsed
taggedPreprints: AtUri[]; // Preprints user tagged
followedFields: string[]; // Subscribed fields
researchInterests: string[]; // Profile keywords
}

Scoring

function scoreRecommendation(
preprint: Preprint,
userProfile: UserProfile,
signals: SignalScores
): number {
return (
signals.specter2Similarity * 0.35 +
signals.citationOverlap * 0.25 +
signals.fieldMatch * 0.20 +
signals.authorCoCitation * 0.10 +
signals.recencyScore * 0.10
);
}

Enrichment

The service enriches preprints with external metadata:

interface EnrichedPreprint extends Preprint {
enrichment: {
citationCount: number;
influentialCitationCount: number;
concepts: Concept[];
topics: Topic[];
relatedPapers: RelatedPaper[];
externalIds: {
doi?: string;
arxivId?: string;
semanticScholarId?: string;
openAlexId?: string;
};
};
}

External API integration

The service uses plugins for external APIs:

async enrichPreprint(preprint: Preprint): Promise<EnrichedPreprint> {
const enrichment: Enrichment = {};

// Try Semantic Scholar first
if (this.pluginManager?.hasPlugin('semantic-scholar')) {
const s2Plugin = this.pluginManager.getPlugin('semantic-scholar');
const paper = await s2Plugin.getPaperByDoi(preprint.doi);
if (paper) {
enrichment.citationCount = paper.citationCount;
enrichment.influentialCitationCount = paper.influentialCitationCount;
}
}

// Fallback to OpenAlex
if (!enrichment.citationCount && this.pluginManager?.hasPlugin('openalex')) {
const oaPlugin = this.pluginManager.getPlugin('openalex');
const work = await oaPlugin.getWorkByDoi(preprint.doi);
if (work) {
enrichment.citationCount = work.citedByCount;
enrichment.concepts = work.concepts;
}
}

// Always available: local data
enrichment.localMetrics = await this.metricsService.getMetrics(preprint.uri);

return { ...preprint, enrichment };
}

Citation graph

The service provides citation graph traversal:

// Forward citations (papers citing this one)
const citing = await discovery.getCitingPapers(preprintUri, {
limit: 50,
sort: 'influence'
});

// Backward citations (papers this one cites)
const references = await discovery.getReferences(preprintUri, {
limit: 100
});

// Citation statistics
const stats = await discovery.getCitationCounts(preprintUri);
// { total: 42, influential: 8, recent: 15 }

Interaction tracking

User interactions feed back into recommendations:

interface Interaction {
preprintUri: AtUri;
action: 'view' | 'download' | 'endorse' | 'tag' | 'share';
duration?: number; // For views, time spent in seconds
context?: string; // Where the interaction occurred
}

await discovery.recordInteraction(userDid, {
preprintUri: 'at://did:plc:abc.../pub.chive.preprint.submission/3k5...',
action: 'view',
duration: 120,
context: 'for-you-feed'
});

Graceful degradation

The service works without external APIs:

async getRecommendationsForUser(
userDid: string,
options: RecommendationOptions
): Promise<Recommendation[]> {
const recommendations: Recommendation[] = [];

// Try external APIs (optional)
if (this.pluginManager?.hasPlugin('semantic-scholar')) {
try {
const s2Recs = await this.getSemanticScholarRecs(userDid);
recommendations.push(...s2Recs);
} catch (error) {
this.logger.warn('Semantic Scholar unavailable, using local only');
}
}

// Always available: local recommendations
const localRecs = await this.getLocalRecommendations(userDid);
recommendations.push(...localRecs);

// Merge, dedupe, and rank
return this.rankRecommendations(recommendations, options.limit);
}

Local recommendations

When external APIs are unavailable, local signals are used:

  • Field-based: Preprints in fields the user follows
  • Co-author network: Preprints by authors of papers the user has read
  • Tag similarity: Preprints with similar tags to user's tagged papers
  • Trending: Popular preprints in user's fields

Dependencies

interface DiscoveryDependencies {
logger: ILogger;
database: IDatabasePool;
search: ISearchEngine;
ranking: IRankingService;
citationGraph: ICitationGraph;
pluginManager?: IPluginManager; // Optional for external APIs
}

Configuration

interface DiscoveryConfig {
maxRecommendations: number; // Max recommendations per request
minScore: number; // Minimum relevance score
recencyWeight: number; // Weight for recent papers
diversityFactor: number; // Reduce similar paper clustering
cacheTimeout: number; // Recommendation cache TTL
}

Environment variables:

VariableDefaultDescription
DISCOVERY_MAX_RECS100Max recommendations
DISCOVERY_MIN_SCORE0.1Minimum relevance score
DISCOVERY_CACHE_TTL3600Cache TTL in seconds