Skip to content

Source Ingestion and PDF Fidelity

Use this workflow when a research node, manuscript claim, or teaching chapter depends on a math-heavy paper. Zotero is the library of record for metadata, collections, and CRUD operations. Local PDF bundles remain necessary for theorem-fidelity checks because extracted text alone often corrupts equations, theorem numbering, and notation.

Source Priority

  1. Use arXiv HTML or TeX source when available.
  2. Use publisher HTML when it preserves theorem and equation structure.
  3. Use a Zotero-backed local PDF bundle with page text plus page images.
  4. Use raw PDF text only for search, never as the sole source for notation-sensitive claims.

Zotero Role

Use Zotero for:

  • collection membership and de-duplication,
  • citation metadata and BibTeX snapshots,
  • attachment discovery,
  • item creation, updates, tags, notes, and deletions when explicitly requested.

The project-wide bibliography snapshot is ../10_conjectures.bib. Refresh or check Zotero before assuming a paper is absent.

Local PDF Bundle Role

Use local bundles for:

  • theorem and lemma statement verification,
  • page-level citation audits,
  • equation-sensitive reading,
  • manuscript source-fidelity checks.

Durable project bundles live under ../sources/paper-bundles/. A bundle should contain manifest.json, text.txt, per-page pages/page-XXXX.txt, and page images when available.

Commands

Resolve local Zotero attachments from the bibliography snapshot:

1
2
3
4
/Users/trainerblade/Documents/02_myDocs/.venv/bin/python \
  /Users/trainerblade/.codex/skills/bounded-graph-literature-research/scripts/resolve_zotero_attachments.py \
  --bib docs/project_QEM-QEC/10_conjectures.bib \
  --storage-root "$HOME/Zotero/storage"

Create or refresh a paper bundle:

1
2
3
4
/Users/trainerblade/Documents/02_myDocs/.venv/bin/python \
  /Users/trainerblade/.codex/skills/bounded-graph-literature-research/scripts/prepare_paper_bundle.py \
  /path/to/paper.pdf \
  --output-dir docs/project_QEM-QEC/sources/paper-bundles/<paper-slug>

Node-Writing Rule

When adding or revising a research node, classify every important claim as one of:

  • established theorem,
  • source-backed definition,
  • inferred bridge,
  • open gap,
  • boundary evidence.

If a formula or theorem number matters, verify it against page images or source-formatted text before treating it as established.