Recently, IBM officially released a lightweight visual language AI model called Granite-Docling-258M. This model is specifically designed for document conversion and features strong multilingual support, including Chinese, Arabic, and Japanese, aiming to improve the efficiency and accuracy of document processing. The parameter count of Granite-Docling-258M is 258 million, and it is an optimized model for document table processing.
Compared to traditional OCR software, Granite-Docling-258M has a significant improvement in recognition accuracy. Its output not only retains the original document layout structure but also effectively identifies various elements such as tables, mathematical formulas, lists, and code blocks. The core of this new technology lies in DocTags, a universal file structure markup language developed by IBM Research, which can accurately describe the type, position, and reading order of page elements.
During the document conversion process, Granite-Docling-258M first identifies the various elements in the document and then performs OCR recognition. This method makes content extraction and output more efficient and accurate. After conversion, users can export the content in multiple formats such as Markdown, JSON, and HTML, meeting different usage needs. In addition, IBM plans to incorporate the DocTags vocabulary into the tokenizer and training process of Granite to further enhance the model's performance.
Currently, Granite-Docling-258M has not reached the level of enterprise application, but IBM states that they will continue to expand the range of supported languages and improve the reliability of the model. In the future, IBM will also focus on enhancing the compatibility of DocTags with IBM watsonx.ai models to ensure comprehensive application of the technology.
The release of this new model undoubtedly brings new technological choices to the field of document processing and provides strong support for improving efficiency in related industries.
huggingface:https://huggingface.co/ibm-granite/granite-docling-258M
Key Points:
📄 ** Lightweight Model **: IBM released Granite-Docling-258M, specifically designed for document conversion.
🔍 ** High Accuracy **: This model has higher recognition accuracy than traditional OCR software and supports various document elements.
🌍 ** Multilingual Support **: Granite-Docling-258M currently supports Chinese, Arabic, and Japanese, and will expand to more languages in the future.