PDF to Text API

Accurately Extract Text from Any PDF with Our Advanced OCR-Powered API.

Code Examples in Popular Languages

Integrate our PDF to Text API easily into your apps with comprehensive code examples in popular languages to get started quickly.

CURL Request

curl --location 'https://theonlineconverter.com/api/v1/document-converter' \
--header 'Content-Type: application/json' \
--header 'x-api-key: enter_your_api_key' \
--form 'from="pdf"' \
--form 'to="txt"' \
--form 'file=@"/D:/data/Document/pdf/other.pdf"'

JavaScript Fetch

const myHeaders = new Headers();
myHeaders.append("Content-Type", "application/json");
myHeaders.append("x-api-key", "enter_your_api_key");

const formdata = new FormData();
formdata.append("from", "pdf");
formdata.append("to", "txt");
formdata.append("file", fileInput.files[0], "/D:/data/Document/pdf/other.pdf");

const requestOptions = {
  method: "POST",
  headers: myHeaders,
  body: formdata,
  redirect: "follow"
};

fetch("https://theonlineconverter.com/api/v1/document-converter", requestOptions)
  .then((response) => response.text())
  .then((result) => console.log(result))
  .catch((error) => console.error(error));

Ruby Net::HTTP

import requests
import json

url = "https://theonlineconverter.com/api/v1/document-converter"

payload = {'from': 'pdf',
'to': 'txt'}
files=[
  ('file',('other.pdf',open('/D:/data/Document/pdf/other.pdf','rb'),'application/pdf'))
]
headers = {
  'Content-Type': 'application/json',
  'x-api-key': 'enter_your_api_key'
}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)

Python Requests

import requests
import json

url = "https://theonlineconverter.com/api/v1/document-converter"

payload = {'from': 'pdf',
'to': 'txt'}
files=[
  ('file',('other.pdf',open('/D:/data/Document/pdf/other.pdf','rb'),'application/pdf'))
]
headers = {
  'Content-Type': 'application/json',
  'x-api-key': 'enter_your_api_key'
}

response = requests.request("POST", url, headers=headers, data=payload, files=files)

print(response.text)

PHP Guzzle

<?php
$client = new Client();
$headers = [
  'Content-Type' => 'application/json',
  'x-api-key' => 'enter_your_api_key'
];
$options = [
  'multipart' => [
    [
      'name' => 'from',
      'contents' => 'pdf'
    ],
    [
      'name' => 'to',
      'contents' => 'txt'
    ],
    [
      'name' => 'file',
      'contents' => Utils::tryFopen('/D:/data/Document/pdf/other.pdf', 'r'),
      'filename' => '/D:/data/Document/pdf/other.pdf',
      'headers'  => [
        'Content-Type' => '<Content-type header>'
      ]
    ]
]];
$request = new Request('POST', 'https://theonlineconverter.com/api/v1/document-converter', $headers);
$res = $client->sendAsync($request, $options)->wait();
echo $res->getBody();

Java HttpURLConnection

OkHttpClient client = new OkHttpClient().newBuilder()
  .build();
MediaType mediaType = MediaType.parse("application/json");
RequestBody body = new MultipartBody.Builder().setType(MultipartBody.FORM)
  .addFormDataPart("from","pdf")
  .addFormDataPart("to","txt")
  .addFormDataPart("file","/D:/data/Document/pdf/other.pdf",
    RequestBody.create(MediaType.parse("application/octet-stream"),
    new File("/D:/data/Document/pdf/other.pdf")))
  .build();
Request request = new Request.Builder()
  .url("https://theonlineconverter.com/api/v1/document-converter")
  .method("POST", body)
  .addHeader("Content-Type", "application/json")
  .addHeader("x-api-key", "enter_your_api_key")
  .build();
Response response = client.newCall(request).execute();

Go net/http

package main

import (
  "fmt"
  "bytes"
  "mime/multipart"
  "os"
  "path/filepath"
  "net/http"
  "io"
)

func main() {

  url := "https://theonlineconverter.com/api/v1/document-converter"
  method := "POST"

  payload := &bytes.Buffer{}
  writer := multipart.NewWriter(payload)
  _ = writer.WriteField("from", "pdf")
  _ = writer.WriteField("to", "txt")
  file, errFile3 := os.Open("/D:/data/Document/pdf/other.pdf")
  defer file.Close()
  part3,
         errFile3 := writer.CreateFormFile("file",filepath.Base("/D:/data/Document/pdf/other.pdf"))
  _, errFile3 = io.Copy(part3, file)
  if errFile3 != nil {
    fmt.Println(errFile3)
    return
  }
  err := writer.Close()
  if err != nil {
    fmt.Println(err)
    return
  }


  client := &http.Client {
  }
  req, err := http.NewRequest(method, url, payload)

  if err != nil {
    fmt.Println(err)
    return
  }
  req.Header.Add("Content-Type", "application/json")
  req.Header.Add("x-api-key", "enter_your_api_key")

  req.Header.Set("Content-Type", writer.FormDataContentType())
  res, err := client.Do(req)
  if err != nil {
    fmt.Println(err)
    return
  }
  defer res.Body.Close()

  body, err := io.ReadAll(res.Body)
  if err != nil {
    fmt.Println(err)
    return
  }
  fmt.Println(string(body))
}

C# HttpClient

var options = new RestClientOptions("https://theonlineconverter.com")
{
  MaxTimeout = -1,
};
var client = new RestClient(options);
var request = new RestRequest("/api/v1/document-converter", Method.Post);
request.AddHeader("Content-Type", "application/json");
request.AddHeader("x-api-key", "enter_your_api_key");
request.AlwaysMultipartFormData = true;
request.AddParameter("from", "pdf");
request.AddParameter("to", "txt");
request.AddFile("file", "/D:/data/Document/pdf/other.pdf");
RestResponse response = await client.ExecuteAsync(request);
Console.WriteLine(response.Content);

Key Features & Capabilities

Our advanced API provides a comprehensive suite of features for accurate and flexible PDF text extraction.

Advanced OCR Engine

Our API integrates a cutting-edge OCR engine, capable of accurately recognizing and extracting text from even scanned or image-based PDF documents, ensuring no content is left behind.

Native & Scanned PDF Support

Seamlessly handle both text-selectable (native) PDFs and image-only (scanned) PDFs, thanks to our intelligent content detection and advanced OCR.

High Extraction Accuracy

Benefit from superior accuracy in text extraction, preserving content integrity regardless of the PDF's complexity, fonts, or language.

Table Data Extraction

Intelligently detect and extract tabular data from PDFs, converting it into structured formats that are ready for analysis or database integration.

Multi-Language Recognition

Extract text from PDFs in a multitude of languages, enabling global application development and international document processing.

Secure & Privacy Compliant

All PDF documents are processed over secure connections with robust data encryption and strict privacy policies, ensuring your data remains confidential.

Frequently Asked Questions

Find quick answers to common questions about our PDF to Text API to help you get started and optimize your text extraction workflows.

Native PDFs contain selectable text, allowing direct extraction. Scanned PDFs are essentially images, requiring our advanced OCR engine to recognize and convert the image of text into digital characters for extraction. Our API handles both seamlessly.

Yes, you can specify individual pages or a range of pages in your API request, allowing you to extract only the relevant content from a multi-page document.

Our advanced OCR engine is highly trained to handle complex layouts, various fonts, and even mixed content (images, tables, text) within PDFs, resulting in very high accuracy. Performance may vary with extremely poor quality scans.

Absolutely. Our API can intelligently detect tables within PDFs and extract their content into a structured format (e.g., JSON) for easy use.

You can receive the output as clean plain text, structured JSON (which can include metadata like page numbers), or even Markdown.

For privacy and security, uploaded PDFs are processed and then securely deleted from our servers within a short, specified timeframe, typically immediately after successful extraction.

PDF to Text API

Code Examples in Popular Languages

Key Features & Capabilities

Advanced OCR Engine

Native & Scanned PDF Support

High Extraction Accuracy

Table Data Extraction

Multi-Language Recognition

Secure & Privacy Compliant

Frequently Asked Questions

What's the difference between extracting text from a native PDF versus a scanned PDF?

Can I extract text from specific pages of a PDF?

How accurate is the text extraction from complex PDFs?

Is it possible to extract data from tables within a PDF?

What output formats are available for the extracted text?

How long are my PDFs stored on your servers?