Local PDF Research Workflow¶
Purpose¶
This workflow keeps Zotero as the library of record while making local PDF attachments usable for Codex research runs without writing back to Zotero.
Step 1: Resolve Zotero attachments from the local .bib¶
Use the resolver in the bounded-graph skill draft:
This uses:
- the BibTeX snapshot for citation keys and titles,
- the Zotero desktop Local API when available,
- the Zotero Web API as a fallback,
- the local storage root only when path reconstruction is still needed.
When the desktop Local API is enabled, attachment items expose a direct local file URL through Zotero metadata, so the resolver can usually skip storage-path reconstruction entirely.
For imported attachments without a direct local file URL, the reconstructed path is:
Step 2: Build a paper bundle before theorem extraction¶
For a resolved local PDF, create a bundle with per-page text and page images:
The bundle contains:
text.txt: full extracted text with page separators,pages/page-0001.txt, etc.: per-page extracted text,pages/page-0001.png, etc. when a renderer is available,manifest.json: page counts, parser choice, and next-step guidance.
Dependencies:
- minimum:
pypdffor text extraction, - recommended:
PyMuPDFfor page-image rendering, - fallback renderer:
pdftoppmfrom Poppler. - current workspace recommendation: use
/Users/trainerblade/Documents/02_myDocs/.venv/bin/python
Best parsing strategy for math-heavy papers¶
Use this order of preference:
- arXiv HTML or TeX source when available.
- Publisher HTML when math rendering is preserved cleanly.
- Local PDF bundle with page images plus extracted text.
Use extracted text for:
- search,
- rough navigation,
- candidate theorem discovery,
- keyword and notation lookup.
Use page images or source-formatted HTML as the source of truth for:
- displayed equations,
- theorem and lemma statements,
- notation-heavy definitions,
- any passage where a missing superscript, subscript, or symbol would change meaning.
Skill metadata¶
No installed skill update is strictly required for the workflow to function. The scripts above are sufficient.
Skill-description updates are still useful for discoverability:
bounded-graph-literature-researchshould mention local.bibsnapshots and local PDF attachment resolution.pdfshould mention theorem-heavy and equation-heavy scientific papers, not only layout review.
The draft bounded-graph skill in .skill_drafts/ has been updated accordingly. Mirroring those changes into the globally installed skill is a separate step.