GitHub Park

CitationClaw: A Lightweight Engine for Discovering Scientific Impact through Citations

CitationClaw is a lightweight academic citation analysis tool developed in Python. It leverages web crawlers and large language models to transform raw citation data into readable, insightful impact information. Simply enter a paper title or import publications from a Google Scholar profile, and within minutes, you'll receive a comprehensive citation analysis report. This report can be used for grant applications, thesis defenses, research output summaries, and various other scenarios.

Core Features of CitationClaw

  1. Five-Stage Analysis Pipeline Progressing through citation fetching, author analysis, structured export, citation description generation, and report creation, this step-by-step pipeline systematically transforms raw citation data into clear, actionable information.

  2. High-Impact Scholar Identification Automatically identifies highly influential scholars who have cited your work and generates targeted analysis for those specific citations, helping you quickly pinpoint key disseminators of your research.

  3. Three Analysis Modes Choose from Basic, Advanced, or Complete modes, each offering increasing depth, cost, and processing time. The Basic mode is perfect for first-time users wanting to quickly see results.

  4. Resume Capability and Caching Supports resuming interrupted citation fetching page-by-page and caches author information and citation descriptions. When re-analyzing the same paper, there's no need to re-fetch or recompute, saving both time and API costs.

  5. Shareable HTML Report The generated visual report is a standalone HTML file. It requires no server – simply share the file directly for others to view, making distribution effortless.

  6. Modular Skills Runtime The execution of the five stages is broken down into independent modules, allowing for greater flexibility and easier extension. New features can be added without impacting the existing workflow.

Overall Architecture

CitationClaw's architecture clearly separates business logic from execution components:

UI/REST/WebSocket
│
▼
TaskExecutor (Master Scheduler)
│
▼
Skills Runtime
├─ phase1_citation_fetch
├─ phase2_author_intel
├─ phase3_export
├─ phase4_citation_desc
└─ phase5_report_generate

The top layer accepts connections from UI, REST interfaces, or WebSockets, all coordinated by the unified TaskExecutor. The lower layer invokes the specific stage modules through the Skills Runtime to complete the entire analysis.

Installation

Requires Python 3.10 or higher (Python 3.12 recommended).

1. Install from PyPI (Recommended)

pip install citationclaw
citationclaw # Default address: 127.0.0.1:8000
citationclaw --port 8080 # Custom port

2. Install from Source (For Developers)

git clone https://github.com/VisionXLab/CitationClaw.git
cd CitationClaw
pip install -r requirements.txt
python start.py # Default address: 127.0.0.1:8000
python start.py --port 8080

Key Configuration

You need to prepare the following before use:

  • ScraperAPI Key: Used for fetching citation data from Google Scholar. Multiple keys can be configured for rotation.
  • OpenAI-compatible API Key: Used for author analysis and citation description generation.

Additionally, it is recommended to use the gemini-3-flash-preview-search model for stages involving web searches, as it can improve the accuracy of information retrieval.

Service Level Description:

  • Basic Mode: Low cost, fast processing. Suitable for testing the pipeline.
  • Advanced Mode: Generates citation descriptions only for citing papers associated with high-impact scholars, balancing depth and cost.
  • Complete Mode: Generates citation descriptions for all citing papers, providing the most comprehensive results but with the highest time and cost investment.

For papers with over 1000 citations, enabling the year traversal mode is recommended. This helps overcome fetching limitations and yields more complete data.

Analysis Output

After each analysis run, the following files are generated in the data/result-{timestamp}/ directory:

  • paper_results.xlsx: Structured Excel file containing all citation data.
  • paper_results_all_renowned_scholar.xlsx: Analysis related to all identified renowned scholars.
  • paper_results_top-tier_scholar.xlsx: Analysis related specifically to top-tier scholars.
  • paper_results_with_citing_desc.xlsx: Citation analysis including generated citation descriptions.
  • paper_results.json: Full dataset in JSON format.
  • paper_dashboard.html: A standalone, visual citation profile report (HTML file).
Visit VisionXLab/CitationClaw to access the source code and obtain more information.