SYSTEM LIMITATIONS

Docs2MD is a powerful tool for converting documentation graphs into AI-ready markdown, but it is not magic. Understanding its boundaries will help you get the best results.

01

Client-Side Rendering (CSR)

Docs2MD requires content to be present in the initial server response. Websites that rely entirely on JavaScript to render content (Single Page Applications without Server-Side Rendering) may return incomplete pages.

ISSUE: Page content missing from output.
FIX: Ensure the target documentation supports Server-Side Rendering (SSR) or Static Site Generation (SSG).
02

Crawl Scope & Depth

To ensure relevant results, processing is automatically scoped to the provided documentation section.

  • Links to external domains are excluded to keep the dataset focused.
  • Subdomains are generally treated as separate entities.
  • Processing remains within the context path of the provided URL to avoid crawling unrelated site sections.
03

Anti-Bot Protection

Documentation hosted behind strict access controls or aggressive bot protection services may restrict access. Docs2MD also strictly adheres to `robots.txt` directives, which may explicitly disallow automated access to certain sections.

04

Content Extraction

Docs2MD focuses on extracting the main content area while filtering out navigational elements and ads. While effective for most standard documentation layouts, highly custom designs may occasionally result in:

  • False Positives: Essential content being inadvertently filtered if it resembles navigation.
  • False Negatives: Non-content elements remaining if they blend into the main article structure.
05

Noise Removal Precision

The content cleaning and noise removal process is not 100% accurate. While we employ advanced heuristics to isolate documentation, some artifacts may occasionally persist in the final Markdown.

  • Page Text: Fragments of navigation bars or menus that structurally resemble body content may be retained.
  • Hidden Content: Text present in the DOM but invisible to the user, such as license headers within code blocks, may be included in the output.