PaperBanana-CN: an academic paper illustration generation tool

PaperBanana-CN is an academic paper illustration generation tool adapted from the open-source project PaperBanana. Driven by AI, it supports pasting the methodology section and captions of a paper to automatically generate high-quality academic illustrations. PaperBanana-CN requires no complex configuration, significantly lowering the barrier to creating academic figures, making it an essential assistant for researchers and students writing papers.

Candidate Illustration Generation

Simply paste the methodology section and figure captions, and PaperBanana-CN automatically generates multiple candidate illustrations for selection. Five AI Agents collaborate to complete the entire process:

Retriever: Searches for similar images from a reference library to provide references for generation.
Planner: Transforms the textual description into a clear chart description.
Stylizer: Optimizes the academic aesthetic style of the image to conform to academic paper standards.
Visualizer: Generates the initial image based on the description.
Critic: Reviews the image quality and proposes improvement suggestions.

The Critic and Visualizer automatically iterate for 3 rounds, progressively optimizing the chart quality. The generation function supports producing 1-20 candidate options in parallel, with selectable aspect ratios such as 21:9, 16:9, and 3:2. Each candidate scheme allows viewing the evolution timeline to understand intermediate results at each stage. After generation, supports single image download, batch ZIP download, or exporting complete results in JSON format.

Image Refinement Function

Upload a generated illustration or any image, describe the specific modification requirements, and generate a 2K/4K high-resolution version. Supports two modes: image-to-image editing (modifying based on the original image) and pure text description regeneration. Whether adjusting details, optimizing style, or enhancing resolution, it can be quickly achieved, making the illustration quality more aligned with academic publication standards.

Intelligent Retrieval

The original PaperBanana's reference image retrieval requires sending the full text of 200 papers, consuming about 800k tokens per attempt, which is costly. PaperBanana-CN, after optimization, sends only the figure captions by default, reducing token consumption to approximately 30k, with essentially the same effect.

Retrieval Mode	Token Consumption/Candidate	Description
auto	~30k	LLM intelligently matches reference images, sends only captions (recommended)
auto-full	~800k	Sends full paper text, high accuracy but high cost
random	0	Randomly selects 10 references, no API calls
none	0	No reference images used

The default configuration (5 candidates + auto retrieval) saves 96% retrieval cost compared to the original. The interface explicitly indicates the cost for each mode to prevent users from unexpected expenses.

API Support and Extension

The tool comes with two built-in API providers, ready to use out-of-the-box:

Mode	Description	Network Requirement
Evolink (default)	Domestic API proxy, directly accessible	No VPN needed
Google Gemini	Google's official API	Requires VPN/proxy

API modes can be switched with one click in the sidebar, and the model names update automatically. The tool has no commercial affiliation with Evolink; users can integrate other API services compatible with the OpenAI interface, such as Zhipu AI, Alibaba Cloud (Tongyi Qianwen), SiliconFlow, and Volcano Engine. Simply implement the generate_text() and generate_image() methods according to the interface in providers/base.py, referencing the implementation in providers/evolink.py to complete the extension.

PaperBanana-CN Installation and Usage

Step 1: Obtain an API Key

Recommended: Choose Evolink. Go to https://evolink.ai/dashboard/keys to register and obtain a key.
Alternatively, choose Google Gemini. Go to https://aistudio.google.com/apikey to obtain a key.

Step 2: Launch the Program

macOS users: Double-click the mac-start.command file in the project.
Windows users: Double-click the win-start.bat file.

Note for Windows users: If Python is not installed locally, it is recommended to open the Microsoft Store, search for "Python 3.12", and install it before running the script to avoid lengthy automatic installation times.

The first launch will automatically complete the following operations (approximately 2-3 minutes):

Detect or automatically install Python (version ≥3.10 required);
Create a virtual environment;
Install all dependencies;
Start the program and automatically open the browser. Subsequent launches take only a few seconds.

Step 3: Optional – Download the Reference Dataset

The built-in "Retrieval Agent" can find similar academic illustrations from a reference library to improve generation quality. To use this feature:

Go to PaperBananaBench to download the dataset.

Place the downloaded contents into the project's data/PaperBananaBench/ directory, ensuring the following structure:

data/
└── PaperBananaBench/
    ├── diagram/
    │   ├── images/              ← Paper illustration images
    │   ├── ref.json
    │   └── test.json
    └── plot/
        ├── images/              ← Paper chart images
        ├── ref.json
        └── test.json

You can use the tool normally without downloading the dataset; simply change the "Retrieval Setting" in the sidebar to none. This skips reference image retrieval and does not affect other functionalities.

Step 4: Generate Academic Illustrations

In the left sidebar, select the API provider and fill in the obtained API Key.
Switch to the "Generate Candidates" tab.
Paste the methodology section of the paper + the figure caption.
Click "Generate Candidates" and wait a few minutes.
Choose a satisfactory illustration from the generated candidates and download it.

Sidebar Settings Description

The tool's sidebar offers a wealth of customizable settings; parameters can be adjusted as needed:

Setting Item	Description
API Provider	Choose Evolink (domestic direct connection) or Gemini (requires VPN)
API Key	Fill in the key for the corresponding API provider
Text Model	Model used for planning/critiquing (default: gemini-2.5-flash)
Image Model	Model used for generating images (default: nano-banana-2-lite)
Pipeline Mode	`demo_planner_critic` (fast) or `demo_full` (includes stylizer, more aesthetic)
Retrieval Setting	auto / auto-full / random / none, corresponding to different retrieval modes
Number of Candidates	1-20; 3-5 recommended to balance efficiency and quality
Aspect Ratio	Options: 21:9 / 16:9 / 3:2
Max Critique Rounds	1-5; default 3 rounds

Manual Installation Method (Optional)

If the one-click script encounters issues, manual installation can be performed with the following steps:

Ensure Python 3.10+ is installed (check with python3 --version).
Create a virtual environment: python3 -m venv .venv
Activate the virtual environment:
- macOS/Linux: source .venv/bin/activate
- Windows: .venv\Scripts\activate
Install dependencies: pip install -r requirements.txt
Launch the program: streamlit run demo.py --server.port 8501
Open your browser and navigate to http://localhost:8501.