PaperBanana-CN is an academic paper illustration generation tool adapted from the open-source project PaperBanana. Driven by AI, it supports pasting the methodology section and captions of a paper to automatically generate high-quality academic illustrations. PaperBanana-CN requires no complex configuration, significantly lowering the barrier to creating academic figures, making it an essential assistant for researchers and students writing papers.
Simply paste the methodology section and figure captions, and PaperBanana-CN automatically generates multiple candidate illustrations for selection. Five AI Agents collaborate to complete the entire process:
The Critic and Visualizer automatically iterate for 3 rounds, progressively optimizing the chart quality. The generation function supports producing 1-20 candidate options in parallel, with selectable aspect ratios such as 21:9, 16:9, and 3:2. Each candidate scheme allows viewing the evolution timeline to understand intermediate results at each stage. After generation, supports single image download, batch ZIP download, or exporting complete results in JSON format.
Upload a generated illustration or any image, describe the specific modification requirements, and generate a 2K/4K high-resolution version. Supports two modes: image-to-image editing (modifying based on the original image) and pure text description regeneration. Whether adjusting details, optimizing style, or enhancing resolution, it can be quickly achieved, making the illustration quality more aligned with academic publication standards.
The original PaperBanana's reference image retrieval requires sending the full text of 200 papers, consuming about 800k tokens per attempt, which is costly. PaperBanana-CN, after optimization, sends only the figure captions by default, reducing token consumption to approximately 30k, with essentially the same effect.
| Retrieval Mode | Token Consumption/Candidate | Description |
|---|---|---|
| auto | ~30k | LLM intelligently matches reference images, sends only captions (recommended) |
| auto-full | ~800k | Sends full paper text, high accuracy but high cost |
| random | 0 | Randomly selects 10 references, no API calls |
| none | 0 | No reference images used |
The default configuration (5 candidates + auto retrieval) saves 96% retrieval cost compared to the original. The interface explicitly indicates the cost for each mode to prevent users from unexpected expenses.
The tool comes with two built-in API providers, ready to use out-of-the-box:
| Mode | Description | Network Requirement |
|---|---|---|
| Evolink (default) | Domestic API proxy, directly accessible | No VPN needed |
| Google Gemini | Google's official API | Requires VPN/proxy |
API modes can be switched with one click in the sidebar, and the model names update automatically. The tool has no commercial affiliation with Evolink; users can integrate other API services compatible with the OpenAI interface, such as Zhipu AI, Alibaba Cloud (Tongyi Qianwen), SiliconFlow, and Volcano Engine. Simply implement the generate_text() and generate_image() methods according to the interface in providers/base.py, referencing the implementation in providers/evolink.py to complete the extension.
https://evolink.ai/dashboard/keys to register and obtain a key.https://aistudio.google.com/apikey to obtain a key.mac-start.command file in the project.win-start.bat file.Note for Windows users: If Python is not installed locally, it is recommended to open the Microsoft Store, search for "Python 3.12", and install it before running the script to avoid lengthy automatic installation times.
The first launch will automatically complete the following operations (approximately 2-3 minutes):
The built-in "Retrieval Agent" can find similar academic illustrations from a reference library to improve generation quality. To use this feature:
data/PaperBananaBench/ directory, ensuring the following structure:data/
└── PaperBananaBench/
├── diagram/
│ ├── images/ ← Paper illustration images
│ ├── ref.json
│ └── test.json
└── plot/
├── images/ ← Paper chart images
├── ref.json
└── test.json
You can use the tool normally without downloading the dataset; simply change the "Retrieval Setting" in the sidebar to none. This skips reference image retrieval and does not affect other functionalities.
The tool's sidebar offers a wealth of customizable settings; parameters can be adjusted as needed:
| Setting Item | Description |
|---|---|
| API Provider | Choose Evolink (domestic direct connection) or Gemini (requires VPN) |
| API Key | Fill in the key for the corresponding API provider |
| Text Model | Model used for planning/critiquing (default: gemini-2.5-flash) |
| Image Model | Model used for generating images (default: nano-banana-2-lite) |
| Pipeline Mode | demo_planner_critic (fast) or demo_full (includes stylizer, more aesthetic) |
| Retrieval Setting | auto / auto-full / random / none, corresponding to different retrieval modes |
| Number of Candidates | 1-20; 3-5 recommended to balance efficiency and quality |
| Aspect Ratio | Options: 21:9 / 16:9 / 3:2 |
| Max Critique Rounds | 1-5; default 3 rounds |
If the one-click script encounters issues, manual installation can be performed with the following steps:
python3 --version).python3 -m venv .venvsource .venv/bin/activate.venv\Scripts\activatepip install -r requirements.txtstreamlit run demo.py --server.port 8501http://localhost:8501.