Pp Archive Is — Workflow
Overview
How the pp-archive-is skill works, step by step.
Source Workflow
Codex skill workflow.
Step-by-step Workflow
archive.today — Printing Press CLI
Prerequisites: Install the CLI
This skill drives the archive-is-pp-cli binary. You must verify the CLI is installed before invoking any command from this skill. If it is missing, install it first:
- Install via the Printing Press installer:
npx -y @mvanhorn/printing-press install archive-is --cli-only - Verify:
archive-is-pp-cli --version - Ensure
$GOPATH/bin(or$HOME/go/bin) is on$PATH.
If the npx install fails (no Node, offline, etc.), fall back to a direct Go install (requires Go 1.26.3 or newer):
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
If --version reports "command not found" after install, the install step did not put the binary on $PATH. Do not proceed with skill commands until verification succeeds.
When to Use This CLI
Reach for this whenever a user wants to archive a URL, read a paywalled article, check whether something was previously archived, or batch-capture a list of URLs for research. Specifically good when:
- A user sends a paywalled link and asks "can you read this" →
readfetches text via archive - They want to preserve a URL that might change →
saveforces a fresh capture - They want historical versions →
historylists all known snapshots - They have 20+ URLs to archive →
bulkruns rate-limited batch archival
Don't reach for this if the URL is trivially scrapeable without archive services (no paywall, robots-allowed, direct HTTP works), or if the user wants the original source rather than a cached version.
Unique Capabilities
The whole CLI is unique — archive.today has no official API. But within this CLI, certain commands are the differentiators.
The hero commands
-
read <url>— Find or create an archive for a URL. Looks up existing snapshots first (Memento timegate → CDX fallback); submits a fresh capture only if nothing exists. The "always do the right thing" command.This is how 90% of agent calls should start. It's idempotent — calling it twice on the same URL doesn't double-submit.
-
get <url> [--format text|html]/tldr <url>— Fetch article text, optionally LLM-summarized. Automatic Wayback fallback when archive.today serves a CAPTCHA (which happens daily to cloud IPs).tldrpipes the fetched text through a summarization step — useful for agent chains where you want a short take without shipping 20KB of HTML back.
Durability operations
-
save <url>— Force a fresh capture via/submit/?url=<x>&anyway=1. Use whenreadreturns an existing snapshot that's too old or missing a paywall update. -
history <url>— List all known snapshots via Memento timemap parsing. Shows every capture date across both archive.today and Wayback. -
bulk [file]— Rate-limited batch archiving from a file or stdin. Reads URLs one per line, submits each with backoff, returns a report of successes / failures / pre-existing.grep -oE 'https?://[^ )]+' notes.md | archive-is-pp-cli bulk -archives every URL in a markdown file. -
request <url>— Fire-and-forget submit with optional wait+poll. Useful for long captures where you want to come back later.
Observability
-
snapshots newest <url>— Just the newest snapshot URL for a target, useful in scripts. -
captures— List your local capture index (post-sync). -
feeds— archive.today's global recent-archives feed. -
--backend archive-is,wayback— Every read/get accepts a backend preference. Defaults to archive-is with Wayback fallback; flip the order for Wayback-primary.
Command Reference
Archive + retrieve:
archive-is-pp-cli read <url>— Find or create (hero command)archive-is-pp-cli get <url>— Fetch article text (with Wayback fallback)archive-is-pp-cli tldr <url>— Fetch + summarizearchive-is-pp-cli save <url>— Force fresh capturearchive-is-pp-cli request <url>— Fire-and-forget submitarchive-is-pp-cli check <url>— Does an archive exist?
Listing + history:
archive-is-pp-cli history <url>— All known snapshotsarchive-is-pp-cli newest <url>— Newest snapshot URLarchive-is-pp-cli captures— Local capture indexarchive-is-pp-cli feeds— Global recent feed
Batch:
archive-is-pp-cli bulk [file]— Batch from file or stdin
Local store:
archive-is-pp-cli sync/archive/export/import— Local SQLite ops
Auth + health:
archive-is-pp-cli auth— Config (no API key needed; auth is a no-op)archive-is-pp-cli doctor— Verify backend reachability
Recipes
Read a paywalled article
archive-is-pp-cli read "https://www.wsj.com/articles/..." --agent
# or: return just the text
archive-is-pp-cli get "https://www.wsj.com/articles/..." --format text --agent
read returns the archive URL (finding existing or creating new). get --format text returns the article body, falling back to Wayback if archive.today CAPTCHAs.
Preserve a URL before it changes
archive-is-pp-cli save "https://example.com/important-page" --agent
archive-is-pp-cli history "https://example.com/important-page" --agent # verify
Force capture, then check history to confirm the new snapshot registered.
Bulk archive a research batch
grep -oE 'https?://[^ )]+' research-notes.md | archive-is-pp-cli bulk - --agent
# or from a file:
archive-is-pp-cli bulk urls.txt --agent
Reads URLs one per line, submits each with exponential backoff, returns per-URL status (archived, pre-existing, failed) as JSON.
Wayback-preferred for a reliable-read
archive-is-pp-cli read "https://ft.com/content/xyz" --backend wayback,archive-is --agent
Use when the Wayback Machine snapshot is known to be cleaner or archive.today is rate-limiting.
Auth Setup
No API key required. Archive.today and Wayback Machine are both public. The auth subcommand exists for consistency but is a no-op — doctor reports "Auth: not required" which is the expected state.
Optional env:
ARCHIVE_IS_BASE_URL— override archive.today host (for mirrors)WAYBACK_BASE_URL— override Wayback Machine host
Agent Mode
Add --agent to any command. Expands to --json --compact --no-input --no-color --yes --no-prompt. Every action command also prints structured next_actions hints on stderr when called non-interactively — the calling agent sees "tried X, got Y, consider Z" automatically.
Notable flags:
--submit-timeout <duration>— max wait for a fresh submit (default10m;0= unbounded)--backend archive-is,wayback— backend preference and fallback order--format text|html—get/tldroutput format
Filtering output
--select accepts dotted paths to descend into nested responses; arrays traverse element-wise:
archive-is-pp-cli <command> --agent --select id,name
archive-is-pp-cli <command> --agent --select items.id,items.owner.name
Use this to narrow huge payloads to the fields you actually need — critical for deeply nested API responses.
Response envelope
Data-layer commands wrap output in {"meta": {...}, "results": <data>}. Parse .results for data and .meta.source to know whether it's live or local. The N results (live) summary is printed to stderr only when stdout is a TTY; piped/agent consumers see pure JSON on stdout.
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 2 | Usage error |
| 3 | Not found (no snapshot exists) |
| 5 | API error (archive.today or Wayback down) |
| 7 | Rate limited (too many submits) |
Installation
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-cli@latest
archive-is-pp-cli doctor
MCP Server
go install github.com/mvanhorn/printing-press-library/library/media-and-entertainment/archive-is/cmd/archive-is-pp-mcp@latest
claude mcp add archive-is-pp-mcp -- archive-is-pp-mcp
Argument Parsing
Given $ARGUMENTS:
- Empty,
help, or--help→ runarchive-is-pp-cli --help install→ CLI;install mcp→ MCP- Anything that looks like a URL, or "archive <url>" / "bypass paywall on <url>" →
read <url> --agentis the default — it's idempotent and covers the 90% case. - "bulk archive" / "archive these" →
bulkfrom stdin if URLs are pasted, else ask for the file path.
Agent Workflow Features
This CLI exposes three shared agent-workflow capabilities patched in from cli-printing-press PR #218.
Named profiles
Persist a set of flags under a name and reuse them across invocations.
# Save the current non-default flags as a named profile
archive-is-pp-cli profile save <name>
# Use a profile — overlays its values onto any flag you don't set explicitly
archive-is-pp-cli --profile <name> <command>
# List / inspect / remove
archive-is-pp-cli profile list
archive-is-pp-cli profile show <name>
archive-is-pp-cli profile delete <name> --yes
Flag precedence: explicit flag > env var > profile > default.
--deliver
Route command output to a sink other than stdout. Useful when an agent needs to hand a result to a file, a webhook, or another process without plumbing.
archive-is-pp-cli <command> --deliver file:/path/to/out.json
archive-is-pp-cli <command> --deliver webhook:https://hooks.example/in
File sinks write atomically (tmp + rename). Webhook sinks POST application/json (or application/x-ndjson when --compact is set). Unknown schemes produce a structured refusal listing the supported set.
feedback
Record in-band feedback about this CLI from the agent side of the loop. Local-only by default; safe to call without configuration.
archive-is-pp-cli feedback "what surprised you or tripped you up"
archive-is-pp-cli feedback list # show local entries
archive-is-pp-cli feedback clear --yes # wipe
Entries append to ~/.archive-is-pp-cli/feedback.jsonl as JSON lines. When ARCHIVE_IS_FEEDBACK_ENDPOINT is set and either --send is passed or ARCHIVE_IS_FEEDBACK_AUTO_SEND=true, the entry is also POSTed upstream (non-blocking — local write always succeeds).
Execution Logic
The skill executes when its trigger fires (slash command, natural-language match, or direct invocation). It reads its references, applies its rules, and produces the documented outputs.
Edge Cases
See the source skill's references/ and scripts/ folders for edge-case handling.
Failure Handling
A skill failure surfaces as a tool error or a partial output; never a silent skip. Re-run with --verbose (where applicable) for diagnostics.
Integration Notes
- Claude — invoked via the
Skilltool withskill: "pp-archive-is". - Codex — referenced from
AGENTS.mdif mirrored. - Antigravity — referenced from the workspace agent rules if mirrored.
- HQ Project — listed on the landing page Skills section + post-login sidebar.
- MD Project (md.sgnk.ai) — file rendered from
Skills/Pp Archive Is/workflow.md. - Obsidian — file rendered with frontmatter + tags.
Usage Examples
Invoke via slash command or natural language matching the skill description.
Source: (none)