How To Extract Data From PDF Programatically? (Using External API)

We had a client who had the requirement to scrape data from a bulk list of PDFs and save this data in a database. We tried couple of PHP libraries but we were not able to get proper text output. Then we found an external API library that returned near perfect text output. In this article we are going to show how to use this library to extract text data from pdf.

Disclaimer: All credits/copyrights of the scraper goes to the maker of the API convertpdftoword.net. This article is just for educational purpose.

API endpoint URL: https://api.convertpdftoword.net/api/Convert

HTML Code :

<form id="ajaxUploadForm" enctype="multipart/form-data" action="https://api.convertpdftoword.net/api/Convert" method="post">
 <div id="optionId">
 <input type="file" id="uploadFile" name="uploadFile" onChange="jQuery('#ajaxUploadForm').show();return sendForm(this.form); "/>
 <br />
 </div>
</form>

The API url takes input as PDF file and select option 4 for text file. And returns a file name in AJAX response, by attaching the response to “https://api.convertpdftoword.net/api/Convert?fileName=” url we can have text file as response. As we are looking at getting text, we use jQuery.get() function to get text file content, and print it in console.

Javascript function :

function sendForm() {
        // get Form
        var form = document.getElementById("ajaxUploadForm");
        // get the selected option and the Uploaded PDF file
        var formData = new FormData();
        formData.append("option", 4);
        formData.append("file", document.getElementById("uploadFile").files[0]);
        var xhr;
        if (window.XMLHttpRequest) {
            xhr = new XMLHttpRequest();
        }
        xhr.open('POST', form.action, true);
        xhr.onload = function () {
            if (this.status == 200) {
				 jQuery.get( "https://api.convertpdftoword.net/api/Convert?fileName=" + this.response, function( data ) {
                      jQuery.ajax({
					type : 'POST',
					url  : fs_object_name.ajax_url,
					data : {action: 'actionFunction', mydata: data},
					success: function(data1) {
					       	console.log(data1);
					}
				});
                    });
        };
        };
        xhr.send(formData);
        return false;
};
2017-10-27T06:32:22+00:00 By |APIs|