Extracting text from a PDF using the Extract PDF Text and Extract PDF with OCR activities
Data extraction from PDF files can be done in columns of text or on individual elements. First, we will look at how we can extract all of the data from a page. Using the Read PDF Text and Read PDF with OCR activities enables a user to read large text files. A page of a PDF file might contain computer-generated text and also text in image format. Once the file is open in Adobe Reader, you can easily highlight the computer-generated text individually, whereas the text in image format gets highlighted as a section. Find a PDF file that contains such text and try to use it for our assignment. Let's try to build a simple workflow with the previously mentioned activities by going through the following steps:
- Open the UiPath Studio solution that you created in the previous section.
- Open the
Main.xaml
workflow file to start editing the workflow. - Drag and drop the Read PDF Text activity...