Schema compatibility service
The SchemaCompatibilityService detects record format types and generates migration hints for ATProto records. It ensures Chive accepts both legacy and current formats while informing clients about available updates.
Overview
ATProto records evolve over time. As Chive adds features like rich text formatting, older records may use outdated field formats. The schema compatibility service:
- Detects field formats in incoming records
- Identifies fields using legacy formats
- Generates migration hints for clients
- Maintains forward compatibility (accepts both old and new formats)
Format types
AbstractFormat
Detects the format of abstract fields in eprint records.
| Format | Description | Current |
|---|---|---|
string | Plain text string (legacy) | No |
rich-text-array | Array of RichTextItem objects | Yes |
empty | Missing or null value | Yes |
invalid | Unexpected type | No |
TitleFormat
Detects the format of title fields, including whether a titleRich array is needed.
| Format | Description | Current |
|---|---|---|
plain | Plain string with no special formatting | Yes |
plain-needs-rich | Plain string containing LaTeX, subscripts, or superscripts | No |
with-rich | Plain title with accompanying titleRich array | Yes |
empty | Missing or empty title | No |
ReviewBodyFormat
Detects the format of review body fields.
| Format | Description | Current |
|---|---|---|
string | Plain text string (legacy) | No |
rich-text-array | Array of RichTextItem objects | Yes |
empty | Missing or null value | No |
invalid | Unexpected type | No |
Detecting formats
Abstract format detection
import { SchemaCompatibilityService } from '@/services/schema/schema-compatibility.js';
const service = new SchemaCompatibilityService();
// Legacy string format
const legacyResult = service.detectAbstractFormat('Plain text abstract');
// {
// field: 'abstract',
// format: 'string',
// isCurrent: false,
// metadata: { length: 20, detectedVersion: '0.0.0' }
// }
// Current rich text format
const currentResult = service.detectAbstractFormat([
{ type: 'text', content: 'Rich text abstract with ' },
{
type: 'nodeRef',
uri: 'at://did:plc:chive-governance/pub.chive.graph.node/8e31479f-01c0-5c1e-aae4-bd28b7cb0a7b',
},
]);
// {
// field: 'abstract',
// format: 'rich-text-array',
// isCurrent: true,
// metadata: { itemCount: 2, isValid: true, detectedVersion: '0.1.0' }
// }
Title format detection
The service detects whether titles contain special formatting that requires a titleRich array:
// Plain title (no special formatting)
const plainResult = service.detectTitleFormat('Simple Title', undefined);
// { field: 'title', format: 'plain', isCurrent: true }
// Title with LaTeX but no titleRich (needs migration)
const latexResult = service.detectTitleFormat(
'Study of $\\alpha$-decay in heavy nuclei',
undefined
);
// {
// field: 'title',
// format: 'plain-needs-rich',
// isCurrent: false,
// metadata: {
// titleLength: 39,
// hasLatex: true,
// hasLatexCommand: false,
// hasSubscript: false,
// hasSuperscript: false
// }
// }
// Title with rich formatting (current format)
const richResult = service.detectTitleFormat('Study of alpha-decay in heavy nuclei', [
{ type: 'text', content: 'Study of ' },
{ type: 'latex', content: '\\alpha', displayMode: false },
{ type: 'text', content: '-decay in heavy nuclei' },
]);
// { field: 'title', format: 'with-rich', isCurrent: true }
LaTeX pattern detection
The service detects several LaTeX patterns that indicate a title needs rich formatting:
- Inline math:
$...$ - Display math:
$$...$$ - LaTeX commands:
\alpha,\frac{a}{b},\sqrt[3]{x} - Subscripts:
_{}or_x - Superscripts:
^{}or^2
// Examples of titles that trigger plain-needs-rich format
'Properties of $\\beta$-functions'; // inline math
'The formula $$E=mc^2$$ explained'; // display math
'Study of \\textit{Drosophila} genetics'; // LaTeX command
'Analysis of H_2O molecules'; // subscript
'The power x^2 in equations'; // superscript
Review body format detection
// Legacy string format
const legacyBody = service.detectReviewBodyFormat('Plain review text');
// { field: 'body', format: 'string', isCurrent: false }
// Current rich text format
const currentBody = service.detectReviewBodyFormat([
{ type: 'text', content: 'This paper presents...' },
]);
// { field: 'body', format: 'rich-text-array', isCurrent: true }
Analyzing complete records
Eprint records
Use analyzeEprintRecord to check all fields at once:
const record = await fetchRecordFromPds(uri);
const result = service.analyzeEprintRecord(record);
if (!result.isCurrentSchema) {
console.log('Legacy format detected');
console.log('Schema version:', result.compatibility.schemaVersion);
console.log('Deprecated fields:', result.compatibility.deprecatedFields);
if (result.compatibility.migrationAvailable) {
for (const hint of result.compatibility.migrationHints ?? []) {
console.log(`Field: ${hint.field}`);
console.log(`Action: ${hint.action}`);
console.log(`Instructions: ${hint.instructions}`);
}
}
}
Review records
Use analyzeReviewRecord for review comments:
const reviewRecord = await fetchRecordFromPds(reviewUri);
const result = service.analyzeReviewRecord(reviewRecord);
if (!result.isCurrentSchema) {
console.log('Legacy review detected');
}
Migration workflow
Step 1: Detect legacy formats
const result = service.analyzeEprintRecord(record);
if (service.needsMigration(record)) {
// record uses deprecated formats
}
Step 2: Generate API hints
Include schema hints in API responses to inform clients:
const result = service.analyzeEprintRecord(record);
const hints = service.generateApiHints(result);
const response = {
uri: record.uri,
value: record,
...(hints && { _schemaHints: hints }),
};
// Response includes hints for legacy records:
// {
// "uri": "at://did:plc:.../pub.chive.eprint.submission/abc123",
// "value": { ... },
// "_schemaHints": {
// "schemaVersion": "0.0.0",
// "deprecatedFields": ["abstract"],
// "migrationAvailable": true,
// "migrationUrl": "https://docs.chive.pub/schema/migrations/abstract-richtext"
// }
// }
Step 3: Apply migrations
Migration hints provide instructions for each field:
for (const hint of result.compatibility.migrationHints ?? []) {
switch (hint.action) {
case 'convert':
// transform field format (e.g., string to array)
console.log('Convert:', hint.instructions);
break;
case 'add':
// add a new field (e.g., titleRich)
console.log('Add field:', hint.instructions);
break;
case 'restructure':
// fix invalid format
console.log('Fix format:', hint.instructions);
break;
}
if (hint.example) {
console.log('Example:', JSON.stringify(hint.example, null, 2));
}
}
Example: Converting string abstract to rich text
Legacy format:
{
"title": "Example Paper",
"abstract": "This paper studies the effects of..."
}
Current format:
{
"title": "Example Paper",
"abstract": [{ "type": "text", "content": "This paper studies the effects of..." }]
}
Example: Adding titleRich for LaTeX titles
Legacy format:
{
"title": "Study of $\\alpha$-decay",
"abstract": [...]
}
Current format:
{
"title": "Study of alpha-decay",
"titleRich": [
{ "type": "text", "content": "Study of " },
{ "type": "latex", "content": "\\alpha", "displayMode": false },
{ "type": "text", "content": "-decay" }
],
"abstract": [...]
}
Integration with API handlers
XRPC endpoint example
import { schemaCompatibilityService } from '@/services/schema/schema-compatibility.js';
export async function getSubmission(uri: string): Promise<GetSubmissionResponse> {
const record = await repository.getRecord(uri);
const result = schemaCompatibilityService.analyzeEprintRecord(record.value);
const hints = schemaCompatibilityService.generateApiHints(result);
return {
uri: record.uri,
cid: record.cid,
value: record.value,
...(hints && { _schemaHints: hints }),
};
}
Conditional migration hints
Only include hints for records that need them:
const result = service.analyzeEprintRecord(record);
// hints is undefined for current schema records
const hints = service.generateApiHints(result);
if (hints) {
// record uses legacy formats
response._schemaHints = hints;
}
Types reference
SchemaVersion
interface SchemaVersion {
readonly major: number;
readonly minor: number;
readonly patch: number;
}
FieldFormatDetection
interface FieldFormatDetection {
readonly field: string;
readonly format: string;
readonly isCurrent: boolean;
readonly metadata?: Record<string, unknown>;
}
SchemaCompatibilityInfo
interface SchemaCompatibilityInfo {
readonly schemaVersion: SchemaVersion;
readonly detectedFormat: 'current' | 'legacy' | 'unknown';
readonly deprecatedFields: readonly DeprecatedFieldInfo[];
readonly migrationAvailable: boolean;
readonly migrationHints?: readonly SchemaMigrationHint[];
}
SchemaMigrationHint
interface SchemaMigrationHint {
readonly field: string;
readonly action: 'convert' | 'add' | 'restructure' | 'remove';
readonly instructions: string;
readonly documentationUrl?: string;
readonly example?: unknown;
}
ApiSchemaHints
interface ApiSchemaHints {
readonly schemaVersion?: string;
readonly deprecatedFields?: readonly string[];
readonly migrationAvailable?: boolean;
readonly migrationUrl?: string;
}
Record migration service
While the SchemaCompatibilityService detects formats and generates hints for API responses, the RecordMigrator transforms legacy records at index time. When the firehose event processor receives a record, it checks whether migration is needed and applies transformations before storing the indexed data.
How it works
Each lexicon that supports versioning has a schemaRevision integer field. When a record's revision is below the current revision (or absent, implying revision 1), the migrator applies each migration in sequence.
import { RecordMigrator } from '@/services/migration/record-migrator.js';
const migrator = new RecordMigrator();
// Check if a record needs migration
if (migrator.needsMigration('pub.chive.eprint.submission', record)) {
const migrated = migrator.migrate('pub.chive.eprint.submission', record);
// migrated record is at the current revision
}
Current migrations
| ID | Collection | From | To | Description |
|---|---|---|---|---|
| 0001 | pub.chive.eprint.submission | 1 | 2 | Convert abstract to rich text array, generate titleRich for LaTeX titles, map licenseSlug to licenseUri |
| 0002 | pub.chive.eprint.submission | 2 | 3 | Convert flat affiliation strings to pub.chive.defs#affiliation tree objects with institutionUri lookup |
| 0002 | pub.chive.actor.profile | 1 | 2 | Convert flat affiliation strings to pub.chive.defs#affiliation tree objects |
Migration chaining
A submission record at revision 1 passes through both migrations in sequence (1 to 2 to 3). A record at revision 2 only needs the affiliation tree migration (2 to 3).
Integration with firehose indexing
The event processor calls the migrator before indexing:
// In the firehose event processor
const record = event.record;
if (migrator.needsMigration(event.collection, record)) {
const migrated = migrator.migrate(event.collection, record);
await indexRecord(event.collection, migrated);
} else {
await indexRecord(event.collection, record);
}
Migrations are idempotent and run at index time only. The original PDS record is never modified.
Connection to frontend
The _schemaHints field in API responses bridges backend detection with frontend user-initiated PDS updates. When a user opens an eprint that has legacy fields, the frontend's migration system can prompt the user to update their PDS record to the current schema.
ATProto compliance
The schema compatibility service follows ATProto principles:
- Forward compatibility: Accept both legacy and current formats without breaking
- Additive hints: Schema hints are optional fields that existing clients ignore
- No breaking changes: Legacy records continue to work; hints are informational only
- User data sovereignty: Migration is performed by clients updating their PDS records; Chive never writes to user PDSes
Numeric field serialization
ATProto requires floating-point numbers to be serialized as strings in certain contexts to preserve precision across different JSON parsers. The schema compatibility service handles this for:
Bounding rectangle coordinates
PDF annotation bounding rectangles use string-serialized floats:
interface BoundingRect {
x: string; // "0.123456"
y: string; // "0.234567"
width: string; // "0.345678"
height: string; // "0.456789"
pageNumber: number;
}
When reading:
const rect = {
x: parseFloat(record.boundingRect.x),
y: parseFloat(record.boundingRect.y),
width: parseFloat(record.boundingRect.width),
height: parseFloat(record.boundingRect.height),
pageNumber: record.boundingRect.pageNumber,
};
When writing:
const record = {
boundingRect: {
x: coords.x.toString(),
y: coords.y.toString(),
width: coords.width.toString(),
height: coords.height.toString(),
pageNumber: coords.pageNumber,
},
};
This ensures consistent precision when coordinates are round-tripped through different ATProto implementations.
Default instance
A singleton instance is exported for convenience:
import { schemaCompatibilityService } from '@/services/schema/schema-compatibility.js';
// use directly
const result = schemaCompatibilityService.analyzeEprintRecord(record);
Version information
Get the current schema version:
const version = service.getCurrentVersionString();
// "0.1.0"
const versionObj = service.currentVersion;
// { major: 0, minor: 1, patch: 0 }