OpenCoder is an open and reproducible large language model (LLM) for code, featuring base and instruct models at 1.5B and 8B scales, with support for both English and Chinese. Trained from scratch, OpenCoder was pre-trained on 2.5 trillion tokens, 90% of w