When parsing DOCX files that contain tracked changes or comments, Tensorlake preserves this collaboration metadata in the HTML output. This enables workflows that need to process document revisions, review comments, or extract specific change history. Tracked changes and comments are preserved using semantic HTML markup: Tracked Changes:Documentation Index
Fetch the complete documentation index at: https://docs.tensorlake.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Insertions:
<ins>inserted text</ins>- Text that was added to the document - Deletions:
<del>deleted text</del>- Text that was removed or struck through
- Comment ranges:
<span class="comment" data-note="comment text">highlighted text</span>- Comments anchored to selected text - Comment references:
<!-- Comment: comment text -->- Comments at cursor positions without highlighted text
Example Output
Markdown
Extracting Change Data Programmatically
Use these HTML patterns to extract specific content types:Python
Tracked changes are only preserved when parsing DOCX files that contain Microsoft Word’s revision history. Regular text formatting (bold, italic) is handled separately through standard HTML markup.