CitationClaw is a lightweight academic citation analysis tool developed in Python. It leverages web crawlers and large language models to transform raw citation data into readable, insightful impact information. Simply enter a paper title or import publications from a Google Scholar profile, and within minutes, you'll receive a comprehensive citation analysis report. This report can be used for grant applications, thesis defenses, research output summaries, and various other scenarios.
Five-Stage Analysis Pipeline Progressing through citation fetching, author analysis, structured export, citation description generation, and report creation, this step-by-step pipeline systematically transforms raw citation data into clear, actionable information.
High-Impact Scholar Identification Automatically identifies highly influential scholars who have cited your work and generates targeted analysis for those specific citations, helping you quickly pinpoint key disseminators of your research.
Three Analysis Modes Choose from Basic, Advanced, or Complete modes, each offering increasing depth, cost, and processing time. The Basic mode is perfect for first-time users wanting to quickly see results.
Resume Capability and Caching Supports resuming interrupted citation fetching page-by-page and caches author information and citation descriptions. When re-analyzing the same paper, there's no need to re-fetch or recompute, saving both time and API costs.
Shareable HTML Report The generated visual report is a standalone HTML file. It requires no server – simply share the file directly for others to view, making distribution effortless.
Modular Skills Runtime The execution of the five stages is broken down into independent modules, allowing for greater flexibility and easier extension. New features can be added without impacting the existing workflow.
CitationClaw's architecture clearly separates business logic from execution components:
UI/REST/WebSocket
│
▼
TaskExecutor (Master Scheduler)
│
▼
Skills Runtime
├─ phase1_citation_fetch
├─ phase2_author_intel
├─ phase3_export
├─ phase4_citation_desc
└─ phase5_report_generate
The top layer accepts connections from UI, REST interfaces, or WebSockets, all coordinated by the unified TaskExecutor. The lower layer invokes the specific stage modules through the Skills Runtime to complete the entire analysis.
Requires Python 3.10 or higher (Python 3.12 recommended).
pip install citationclaw
citationclaw # Default address: 127.0.0.1:8000
citationclaw --port 8080 # Custom port
git clone https://github.com/VisionXLab/CitationClaw.git
cd CitationClaw
pip install -r requirements.txt
python start.py # Default address: 127.0.0.1:8000
python start.py --port 8080
You need to prepare the following before use:
Additionally, it is recommended to use the gemini-3-flash-preview-search model for stages involving web searches, as it can improve the accuracy of information retrieval.
Service Level Description:
For papers with over 1000 citations, enabling the year traversal mode is recommended. This helps overcome fetching limitations and yields more complete data.
After each analysis run, the following files are generated in the data/result-{timestamp}/ directory:
paper_results.xlsx: Structured Excel file containing all citation data.paper_results_all_renowned_scholar.xlsx: Analysis related to all identified renowned scholars.paper_results_top-tier_scholar.xlsx: Analysis related specifically to top-tier scholars.paper_results_with_citing_desc.xlsx: Citation analysis including generated citation descriptions.paper_results.json: Full dataset in JSON format.paper_dashboard.html: A standalone, visual citation profile report (HTML file).