Bulk import from CSV
Script-driven import of a list of URLs or text snippets into Graniite. For one-shot migrations and scheduled catch-ups.
You have a CSV (or any list) of URLs you want to ingest into Graniite all at once. Common reasons:
- Migrating from another tool, such as Readwise, Pocket, Notion, Raindrop, or Instapaper. Export your saved items as CSV, run this script, and walk away.
- Backfilling: you finally set up Graniite and want to import every podcast episode you ever saved.
- Scheduled catch-up: run a list of YouTube channels through ingest weekly via cron.
The recipe is a small Node.js script (TypeScript optional). The same shape works in Python, Go, or shell with curl and jq.
What you are building
[some-source.csv]
|
v
[parse to a list of URLs]
|
v
[for each: POST /api/v1/ingest, with rate-limit handling]
|
v
[About 5 minutes later: 200 items in your Graniite library]Prerequisites
- A CSV of URLs. The column header does not matter. The script reads a single column. Other columns can be present.
- Node.js 20 or later installed locally, or use the Python version below.
- A Graniite token. Mint at
/settings/integrations:- Capabilities:
readandwrite - Access: pick the folder you want items to land in, or whole library if you will file later.
- Auto-revoke after: shorter is fine. 30 days is reasonable for a one-shot job.
- Capabilities:
Save the token in your shell as GRANIITE_TOKEN:
export GRANIITE_TOKEN="lev_…"Example CSV
Save as urls.csv:
url,title,note
https://www.youtube.com/watch?v=...,Karpathy on training,backfill
https://example.com/article/...,Long-form essay on RLHF,
https://podcast.example.com/ep/42,,Only the first column matters for the script. The rest is ignored, but you can extend the script to use it.
Node.js script
Save as bulk-import.mjs:
import { readFile } from 'node:fs/promises';
const TOKEN = process.env.GRANIITE_TOKEN;
if (!TOKEN) throw new Error('Set GRANIITE_TOKEN env var');
const SOURCE = process.argv[2] ?? 'urls.csv';
const CONCURRENCY = 2; // ingest writes are 60/hr per token
const PAUSE_BETWEEN_MS = 1000; // ~1 req/sec stays well under the cap
// Parse the CSV. Naive single-column read. For complex CSVs use a
// real parser (csv-parse, papaparse).
const rows = (await readFile(SOURCE, 'utf8'))
.split('\n')
.map((line) => line.split(',')[0].trim().replace(/^"|"$/g, ''))
.filter((url) => url && url !== 'url' && url.startsWith('http'));
console.log(`Importing ${rows.length} URLs from ${SOURCE}`);
let done = 0;
let failed = 0;
const failures = [];
async function ingest(url) {
const res = await fetch('https://graniite.co/api/v1/ingest', {
method: 'POST',
headers: {
Authorization: `Bearer ${TOKEN}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
kind: 'url',
url,
in_kb: true,
auto_transform: false,
}),
});
if (res.status === 429) {
// Rate-limited. Sleep, then bubble up so we retry.
const body = await res.json().catch(() => ({}));
const wait = (body.retry_after_sec ?? 60) * 1000;
console.log(` rate-limited; sleeping ${wait / 1000}s`);
await new Promise((r) => setTimeout(r, wait));
return ingest(url); // recurse. Guaranteed to terminate because
// each retry sleeps the full retry_after window.
}
if (!res.ok) {
const body = await res.json().catch(() => ({}));
throw new Error(`${res.status}: ${body.error ?? 'unknown'} (${url})`);
}
return res.json();
}
// Process N at a time. fetch is async; this just bounds the
// in-flight set.
const queue = rows.slice();
async function worker() {
while (queue.length > 0) {
const url = queue.shift();
try {
const result = await ingest(url);
done++;
console.log(` ok ${done}/${rows.length} ${result.status.padEnd(13)} ${url.slice(0, 60)}`);
} catch (e) {
failed++;
failures.push({ url, error: e.message });
console.log(` err ${url}: ${e.message}`);
}
await new Promise((r) => setTimeout(r, PAUSE_BETWEEN_MS));
}
}
await Promise.all(Array.from({ length: CONCURRENCY }, worker));
console.log(`\nDone. Imported ${done}, failed ${failed}.`);
if (failures.length) {
console.log('Failures:');
for (const f of failures) console.log(` ${f.url}\n ${f.error}`);
}Run it:
node bulk-import.mjs urls.csvYou will see progress per URL:
Importing 132 URLs from urls.csv
ok 1/132 ready https://www.youtube.com/watch?v=...
ok 2/132 transcribing https://podcast.example.com/ep/42
ok 3/132 ready https://example.com/article/long-form-essay
...ready means the item is available immediately. transcribing means the paid pipeline is running and the item will be available within minutes.
Python equivalent
import os, csv, time, sys, json
import requests
TOKEN = os.environ['GRANIITE_TOKEN']
SOURCE = sys.argv[1] if len(sys.argv) > 1 else 'urls.csv'
with open(SOURCE) as f:
reader = csv.DictReader(f)
urls = [row[next(iter(row))].strip() for row in reader
if next(iter(row.values())).startswith('http')]
print(f'Importing {len(urls)} URLs')
failures = []
for i, url in enumerate(urls, 1):
while True:
r = requests.post(
'https://graniite.co/api/v1/ingest',
headers={'Authorization': f'Bearer {TOKEN}'},
json={'kind': 'url', 'url': url, 'in_kb': True, 'auto_transform': False},
timeout=30,
)
if r.status_code == 429:
wait = r.json().get('retry_after_sec', 60)
print(f' rate-limited; sleeping {wait}s')
time.sleep(wait)
continue
break
if r.ok:
print(f' ok {i}/{len(urls)} {r.json()["status"]:13} {url[:60]}')
else:
failures.append((url, r.text))
print(f' err {url}: {r.text}')
time.sleep(1)
print(f'\nDone. {len(urls) - len(failures)} succeeded, {len(failures)} failed.')Trade-offs and gotchas
Rate limits
POST /api/v1/ingest is capped at 60 per hour per token. The script above sleeps 1 second between calls (a theoretical maximum of 3,600 per hour). It will hit the cap on the 60th call and respect the retry_after_sec response.
For lists of hundreds of URLs, plan for:
- 132 URLs: about 2 hours of wall-clock with rate limiting.
- 1,000 URLs: about 16 hours.
Run overnight for large imports. If urgency matters, mint two tokens and split the list.
Transcription quota
If your CSV contains many YouTube and podcast URLs, each one counts against your monthly transcription minutes. A 50-URL podcast batch can exhaust your quota.
Check current usage at /settings under Usage. If you are about to exceed quota, either upgrade the tier or pause the import.
Idempotency
The script does not track "already ingested." Running it twice on the same CSV creates duplicate user_items rows (with the same underlying contents row for YouTube, and separate per-user rows for non-YouTube).
For idempotent imports, query GET /api/v1/items?per_page=100 ahead of time, build a set of existing URLs, and skip those:
const existing = new Set();
let page = 1;
while (true) {
const r = await fetch(`https://graniite.co/api/v1/items?per_page=100&page=${page}`, {
headers: { Authorization: `Bearer ${TOKEN}` },
});
const { items, total } = await r.json();
for (const it of items) existing.add(it.url);
if (existing.size >= total) break;
page++;
}
const remaining = rows.filter((url) => !existing.has(url));Failed URLs
YouTube videos that have been deleted, podcasts behind paywalls, and articles requiring login will return 400 from /ingest with a descriptive error. The script captures them in the failures list so you can review at the end.
Migrating from Readwise, Pocket, or Notion
Each tool's CSV export shape is slightly different. Common columns:
- Readwise:
URLcolumn. The full text is exported as separate columns (Highlights). To ingest the highlight text instead of the source URL, modify the script to usekind: "text"with the highlight body. - Pocket:
urlcolumn. URL ingest works directly. - Notion: varies by export type. A CSV with a URL column works the same as Pocket.
Scheduled and cron usage
For a weekly catch-up of a list of YouTube channels' new uploads, combine this script with a feed-reading library or YouTube's RSS:
[cron, weekly]
|
v
[fetch RSS feeds for N channels, build list of new video URLs]
|
v
[diff against already-ingested set]
|
v
[run this script on the diff]Cron and this script give you a hands-off catch-up loop without needing a full automation platform.
What this unlocks
- Migration off another knowledge-management tool in an afternoon.
- A "personal corpus" that grows by background script every week without you thinking about it.
- A repeatable seed step for setting up Graniite for someone else. Hand them a CSV of starter content, run this script with their token, and they have a populated library on day 1.
Related
- n8n quickstart: for event-driven ingest instead of batch.
POST /api/v1/ingest: endpoint reference.- Errors: rate-limit handling detail.