paper-fetch
DOI in, PDF out. A Claude Code skill that resolves paper PDFs through Unpaywall, Semantic Scholar, arXiv, PubMed Central, and bioRxiv — legally, across every discipline, with zero dependencies.
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch
Why This Skill
A deterministic, legal-only replacement for ad-hoc "can you find this PDF" requests.
5-Source Fallback Chain
Unpaywall → Semantic Scholar → arXiv → PubMed Central → bioRxiv/medRxiv. Stops at the first hit, reports failure with metadata if none found.
All Disciplines
Not just life sciences or CS. Unpaywall + Semantic Scholar cover humanities, social sciences, chemistry, physics, economics — any Crossref DOI.
Legal Only, Always
Never touches Sci-Hub or any paywall bypass. If no OA copy exists, the skill fails honestly and returns title + authors for ILL requests.
Zero Dependencies
Pure Python 3.8+ standard library. No pip install, no virtualenv, no Node.js — just clone and run.
Batch Mode
Pass a file of DOIs with --batch dois.txt. Output is auto-named author_year_title.pdf for consistent library organization.
Agent-Native Output
Stable JSON envelope on stdout, NDJSON progress events on stderr, typed exit codes (0/1/3/4), machine-readable schema subcommand, ok: "partial" batches with next retry hints, --idempotency-key replay. Scored 28/28 on the agent-native CLI rubric.
How It Resolves a DOI
The skill tries each source in order and stops at the first one that returns a valid PDF.
Unpaywall
Queries api.unpaywall.org/v2/{doi} and reads best_oa_location.url_for_pdf. Covers every publisher with an OA copy in any institutional repository. Requires UNPAYWALL_EMAIL (optional — skipped if not set).
Semantic Scholar
Queries api.semanticscholar.org/graph/v1/paper/DOI:{doi} for the openAccessPdf field and externalIds (arXiv, PMC). Cross-disciplinary academic graph.
arXiv
If the paper has an arXiv ID, downloads from arxiv.org/pdf/{arxiv_id}.pdf. Covers physics, math, CS, stats, quantitative finance, economics, and EE.
PubMed Central OA
If the paper has a PMCID, downloads from ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/. Biomedical OA subset only.
bioRxiv / medRxiv
If the DOI prefix is 10.1101, queries api.biorxiv.org/details/{server}/{doi} for the latest-version PDF URL. Biology and medicine preprints.
Discipline Coverage
Works for every field, not just life sciences or CS. Coverage depends on OA availability, not subject area.
| Source | Discipline Scope |
|---|---|
| Unpaywall | All disciplines — every Crossref DOI (humanities, social sciences, physics, chemistry, economics, …) |
| Semantic Scholar | All disciplines — cross-domain academic graph |
| arXiv | Physics, math, CS, statistics, quantitative finance, economics, EE |
| PubMed Central | Biomedical only |
| bioRxiv / medRxiv | Biology / medicine preprints only |
In practice, Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, and humanities via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies. arXiv/PMC/bioRxiv are additional fallbacks for their specific domains. If no legal OA copy exists, the skill reports failure honestly — it will never bypass paywalls regardless of discipline.
vs Native Agent
What you get with the skill vs prompting an LLM to "find this PDF."
| Feature | Native Agent | This Skill |
|---|---|---|
| DOI resolution strategy | Ad-hoc web search | ✓ Deterministic 5-source chain |
| Unpaywall integration | ✗ | ✓ Highest OA hit rate |
| arXiv / PMC / bioRxiv fallback | Manual | ✓ Automatic |
| Batch download | ✗ | ✓ --batch dois.txt or --batch - (stdin) |
| Consistent filenames | ✗ | ✓ author_year_title.pdf |
| Agent-native JSON output | ✗ | ✓ Stable envelope + NDJSON progress |
| Machine-readable schema | ✗ | ✓ fetch.py schema |
| Idempotent retries | ✗ | ✓ --idempotency-key replays original envelope |
| Typed exit codes | ✗ | ✓ 0/1/3/4 route failures deterministically |
| Dry-run preview | ✗ | ✓ --dry-run resolves without downloading |
| Host allowlist safety | ✗ | ✓ Restricted to known OA domains |
| 50 MB size cap | ✗ | ✓ Prevents runaway downloads |
| PDF header validation | ✗ | ✓ Rejects HTML landing pages |
| Legal-only guarantee | None | ✓ Hard refuses paywall bypass |
| Dependencies | Varies | ✓ Python stdlib only |
| Works across all disciplines | Varies | ✓ Any field |
Install
Pick your platform. Takes 10 seconds. No pip install required.
# Global install (available in all projects) git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch # Project-level install git clone https://github.com/Agents365-ai/paper-fetch.git .claude/skills/paper-fetch # Optional: set Unpaywall contact email for highest hit rate export UNPAYWALL_EMAIL=you@example.com
# Via ClawHub registry clawhub install paper-fetch # Manual install git clone https://github.com/Agents365-ai/paper-fetch.git ~/.openclaw/skills/paper-fetch
# User-level install git clone https://github.com/Agents365-ai/paper-fetch.git ~/.pimo/skills/paper-fetch
# User-level install git clone https://github.com/Agents365-ai/paper-fetch.git ~/.agents/skills/paper-fetch # Project-level install git clone https://github.com/Agents365-ai/paper-fetch.git .agents/skills/paper-fetch
# Global install under research category
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.hermes/skills/research/paper-fetch
# Or add to ~/.hermes/config.yaml
skills:
external_dirs:
- ~/myskills/paper-fetch
# Via SkillsMP CLI skills install paper-fetch
Usage
Call directly from the command line, or just ask your agent in natural language.
# Single DOI (auto-detects TTY: JSON when piped, text in a terminal)
python scripts/fetch.py 10.1038/s41586-020-2649-2
# Force human-readable output
python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text
# Dry-run preview — resolve without downloading
python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run
# Batch mode — one DOI per line
python scripts/fetch.py --batch dois.txt --out ~/papers
# Pipe DOIs from another tool
echo 10.1038/s41586-021-03819-2 | python scripts/fetch.py --batch -
# Safely retriable batch (replay on retry, no network I/O)
python scripts/fetch.py --batch dois.txt --out ~/papers \
--idempotency-key monday-review-batch
# Agent discovery — machine-readable CLI schema
python scripts/fetch.py schema --pretty
Or just ask your agent: "Download the AlphaFold2 paper PDF to my ~/papers folder."