Docling - Tool thumbnail
  • Python Libraries
  • Open Source
  • Document AI, PDF Parsing, Gen AI, Data Extraction, LLM Integration
  • Added: Oct 03, 2025
Visit Docling

You will be redirected to the official website.

Docling

Docling is a document parser that prepares diverse documents for gen AI integration.

Docling simplifies document processing by parsing a wide array of formats, including advanced PDF understanding, and offering seamless integrations with the gen AI ecosystem. It provides a unified, expressive representation format for documents, with various export options like Markdown and HTML, and supports local execution for sensitive data. This tool is ideal for developers and researchers looking to leverage document data within AI applications, especially for tasks involving complex PDF analysis and agentic AI workflows.

Key benefits include extensive OCR support, integration with Visual Language Models (VLMs), and audio processing capabilities. Docling connects to any agent via its MCP server, making it a plug-and-play solution for frameworks like LangChain and LlamaIndex. Its simple CLI and Python API make it accessible for various use cases, from individual document conversion to building sophisticated AI agents that process and understand document content.
  • Parse diverse document formats including PDF, DOCX, and audio files.
  • Understand advanced PDF layouts, reading order, and table structures.
  • Export documents to Markdown, HTML, and lossless JSON formats.
  • Integrate seamlessly with LangChain, LlamaIndex, and other AI frameworks.
  • Execute locally for secure processing of sensitive data.
  • Unstructured.io
  • Apache Tika
  • PDFMiner.six