5-Source Legal OA Fallback Chain

paper-fetch

DOI in, PDF out. A Claude Code skill that resolves paper PDFs through Unpaywall, Semantic Scholar, arXiv, PubMed Central, and bioRxiv — legally, across every discipline, with zero dependencies.

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch

GitHub SkillsMP Install Guide

Why This Skill

A deterministic, legal-only replacement for ad-hoc "can you find this PDF" requests.

🔗

5-Source Fallback Chain

Unpaywall → Semantic Scholar → arXiv → PubMed Central → bioRxiv/medRxiv. Stops at the first hit, reports failure with metadata if none found.

🌐

All Disciplines

Not just life sciences or CS. Unpaywall + Semantic Scholar cover humanities, social sciences, chemistry, physics, economics — any Crossref DOI.

🛡️

Legal Only, Always

Never touches Sci-Hub or any paywall bypass. If no OA copy exists, the skill fails honestly and returns title + authors for ILL requests.

⚡

Zero Dependencies

Pure Python 3.8+ standard library. No pip install, no virtualenv, no Node.js — just clone and run.

📦

Batch Mode

Pass a file of DOIs with --batch dois.txt. Output is auto-named author_year_title.pdf for consistent library organization.

🤖

Agent-Native Output

Stable JSON envelope on stdout, NDJSON progress events on stderr, typed exit codes (0/1/3/4), machine-readable schema subcommand, ok: "partial" batches with next retry hints, --idempotency-key replay. Scored 28/28 on the agent-native CLI rubric.

How It Resolves a DOI

The skill tries each source in order and stops at the first one that returns a valid PDF.

Unpaywall

Queries api.unpaywall.org/v2/{doi} and reads best_oa_location.url_for_pdf. Covers every publisher with an OA copy in any institutional repository. Requires UNPAYWALL_EMAIL (optional — skipped if not set).

Semantic Scholar

Queries api.semanticscholar.org/graph/v1/paper/DOI:{doi} for the openAccessPdf field and externalIds (arXiv, PMC). Cross-disciplinary academic graph.

arXiv

If the paper has an arXiv ID, downloads from arxiv.org/pdf/{arxiv_id}.pdf. Covers physics, math, CS, stats, quantitative finance, economics, and EE.

PubMed Central OA

If the paper has a PMCID, downloads from ncbi.nlm.nih.gov/pmc/articles/{pmcid}/pdf/. Biomedical OA subset only.

bioRxiv / medRxiv

If the DOI prefix is 10.1101, queries api.biorxiv.org/details/{server}/{doi} for the latest-version PDF URL. Biology and medicine preprints.

Discipline Coverage

Works for every field, not just life sciences or CS. Coverage depends on OA availability, not subject area.

Source	Discipline Scope
Unpaywall	All disciplines — every Crossref DOI (humanities, social sciences, physics, chemistry, economics, …)
Semantic Scholar	All disciplines — cross-domain academic graph
arXiv	Physics, math, CS, statistics, quantitative finance, economics, EE
PubMed Central	Biomedical only
bioRxiv / medRxiv	Biology / medicine preprints only

In practice, Unpaywall + Semantic Scholar alone cover OA papers in chemistry, materials, economics, psychology, and humanities via institutional repositories, SSRN, RePEc, and publisher-hosted OA copies. arXiv/PMC/bioRxiv are additional fallbacks for their specific domains. If no legal OA copy exists, the skill reports failure honestly — it will never bypass paywalls regardless of discipline.

vs Native Agent

What you get with the skill vs prompting an LLM to "find this PDF."

Feature	Native Agent	This Skill
DOI resolution strategy	Ad-hoc web search	✓ Deterministic 5-source chain
Unpaywall integration	✗	✓ Highest OA hit rate
arXiv / PMC / bioRxiv fallback	Manual	✓ Automatic
Batch download	✗	✓ `--batch dois.txt` or `--batch -` (stdin)
Consistent filenames	✗	✓ `author_year_title.pdf`
Agent-native JSON output	✗	✓ Stable envelope + NDJSON progress
Machine-readable schema	✗	✓ `fetch.py schema`
Idempotent retries	✗	✓ `--idempotency-key` replays original envelope
Typed exit codes	✗	✓ `0`/`1`/`3`/`4` route failures deterministically
Dry-run preview	✗	✓ `--dry-run` resolves without downloading
Host allowlist safety	✗	✓ Restricted to known OA domains
50 MB size cap	✗	✓ Prevents runaway downloads
PDF header validation	✗	✓ Rejects HTML landing pages
Legal-only guarantee	None	✓ Hard refuses paywall bypass
Dependencies	Varies	✓ Python stdlib only
Works across all disciplines	Varies	✓ Any field

Install

Pick your platform. Takes 10 seconds. No pip install required.

# Global install (available in all projects)
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch

# Project-level install
git clone https://github.com/Agents365-ai/paper-fetch.git .claude/skills/paper-fetch

# Optional: set Unpaywall contact email for highest hit rate
export UNPAYWALL_EMAIL=you@example.com

# Via ClawHub registry
clawhub install paper-fetch

# Manual install
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.openclaw/skills/paper-fetch

# User-level install
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.pimo/skills/paper-fetch

# User-level install
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.agents/skills/paper-fetch

# Project-level install
git clone https://github.com/Agents365-ai/paper-fetch.git .agents/skills/paper-fetch

# Global install under research category
git clone https://github.com/Agents365-ai/paper-fetch.git ~/.hermes/skills/research/paper-fetch

# Or add to ~/.hermes/config.yaml
skills:
  external_dirs:
    - ~/myskills/paper-fetch

# Via SkillsMP CLI
skills install paper-fetch

Usage

Call directly from the command line, or just ask your agent in natural language.

# Single DOI (auto-detects TTY: JSON when piped, text in a terminal)
python scripts/fetch.py 10.1038/s41586-020-2649-2

# Force human-readable output
python scripts/fetch.py 10.1038/s41586-020-2649-2 --format text

# Dry-run preview — resolve without downloading
python scripts/fetch.py 10.1038/s41586-020-2649-2 --dry-run

# Batch mode — one DOI per line
python scripts/fetch.py --batch dois.txt --out ~/papers

# Pipe DOIs from another tool
echo 10.1038/s41586-021-03819-2 | python scripts/fetch.py --batch -

# Safely retriable batch (replay on retry, no network I/O)
python scripts/fetch.py --batch dois.txt --out ~/papers \
    --idempotency-key monday-review-batch

# Agent discovery — machine-readable CLI schema
python scripts/fetch.py schema --pretty

Or just ask your agent: "Download the AlphaFold2 paper PDF to my ~/papers folder."