The last of the plugins, office_parser.py, parses DOCX, PPTX, and XLSX files, extracting embedded metadata in XML files. We use the zipfile module, which is part of the standard library, to unzip and access the contents of the Office document. This script has two functions, office_parser() and get_tags():
001 import zipfile
002 import os
003 from time import gmtime, strftime
004
005 from lxml import etree
006 import processors
... 037 def office_parser(): ... 059 def get_tags():