PDF analysis;
extraction and repurposing of
text, graphics & metadata
in PDF Files
(PDF Data Mining):
Citation Software

PDF data mining software PDF text extraction software
Government buyers
click here
 • Products & Services     • Buy software 
 • Downloads     • Support 
 • Customer testimonials     • FAQ 
 • Free newsletter     • Press 
 • Mailpiece-design site     • Links 
 • News archives     • Contact 
   • About     • Home 
 



PDF data-mining software lets you
extract information
from PDF files
and repurpose it.

PDF data mining software
 
The XpdfAnalyze SDK is

PDF data mining API
 
The XpdfInfo SDK is

API to extract metadata from PDF files
 
The XpdfText SDK is

API to extract text from PDF files
 



































































































































































































































































  







Citation Software Inc.  Specialists in variable-data publishing since 1986
 
www.CitationSoftware.com     info@CitationSoftware.com

          Click to use wizard
             Use our Wizard to find the right product for your requirements and budget!


888-260-7316
  
  
  
 

   Search
    
♦♦♦ Sign up for our free newsletter ♦♦♦
 

These are the products we offer for PDF analysis and data mining (extraction and repurposing of text, graphics & metadata).
server based PDF data mining software

 XpdfAnalyze SDK
  server based PDF data mining software

 XpdfInfo SDK
  server based PDF data mining software

 XpdfText SDK

These products are programmer's libraries/toolkits that make it easy to do dynamic PDF text extration, PDF metadata extration, and other kinds of PDF analysis and data mining.
 

XpdfAnalyze SDK         
888-260-7316    info@CitationSoftware.com         

PDF analysis API The XpdfAnalyze SDK is a very affordable developer's library/SDK that makes it easy to determine the object types and colors used on one or more pages in a PDF file. Object types are:
  • images
  • text strings
  • strokes (lines)
  • fills (filled polygons)
Object-type information can be used to categorize PDF files as image-only, text-only, or image-and-text.

Color information includes color spaces (DeviceRGB, DeviceCMYK, Separation, etc.), as well as information on which process colors (CMYK) and/or custom colors (spot colors) are used.

The XpdfAnalyze SDK can be used in an automated workflow to determine which pages contain color and which are black & white.

The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfAnalyze SDK is easy to use!

PDFHandle pdf;
int n;

pdfLoadFile(&pdf, "MyFile.pdf");

// analyze pages 1-4
pdfAnalyzePages(pdf, 1, 4);

// number of images 
// on pages 1-4
n = pdfGetNumImages(pdf);
The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

Click the button below to download a free trial version of the XpdfAnalyze SDK for Windows. If you're not using Windows, call us at 888-260-7316 to get your free trial version.

To purchase, call us at 888-260-7316 — or if you prefer, you may fill out our order form and fax it to 207-433-1160.

We accept American Express, Discover, MasterCard, and Visa.

PDF analysis API server based PDF spliting software
Pricing starts at $235.00 USD for a developer's license and $9.00 USD per unit for runtime licenses.

Volume discounts are available. Call us at 888-260-7316 to get a price quote.

Pricing is subject to change without notice.

Pricing shown here might not be available to customers in particular geographic locations.



*Payment of an additional fee for maintenance & support is optional but recommended.

XpdfInfo SDK         
888-260-7316    info@CitationSoftware.com         

PDF data mining API The XpdfInfo SDK is a very affordable developer's library/SDK that reads a PDF file and provides access to the following information:
  • page count
  • page size (per page)
  • standard metadata fields: title, subject, keywords, author, creator, producer, creation date, modification date
  • custom metadata fields (depending on the software used to create the PDF file)
The XpdfInfo SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfInfo SDK is easy to use!

PDFHandle pdf;
char *title;
int length;

pdfLoadFile(&pdf, "MyFile.pdf");
title = pdfGetTitle(pdf, &length);
printf("%s\n", title);
The XpdfInfo SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfInfo SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

Click the button below to download a free trial version of the XpdfInfo SDK for Windows. If you're not using Windows, call us at 888-260-7316 to get your free trial version.

To purchase, call us at 888-260-7316 — or if you prefer, you may fill out our order form and fax it to 207-433-1160.

We accept American Express, Discover, MasterCard, and Visa.

PDF analysis API server based PDF spliting software
Pricing starts at $235.00 USD for a developer's license and $9.00 USD per unit for runtime licenses.

Volume discounts are available. Call us at 888-260-7316 to get a price quote.

Pricing is subject to change without notice.

Pricing shown here might not be available to customers in particular geographic locations.



*Payment of an additional fee for maintenance & support is optional but recommended.

XpdfText SDK         
888-260-7316    info@CitationSoftware.com         

PDF data mining API The XpdfText SDK is a very affordable developer's library/SDK that extracts plain text from a PDF file. The PDF file can be on disk or in memory; and likewise, the text can be extracted to memory or directly to disk.

The XpdfText SDK can be used in different ways:

  • Convert entire PDF files or individual pages to plain text, maintaining layout or converting to "reading order."
  • Extract text from a specified rectangle on a page (useful for extracting text from forms).
  • Convert pages into word lists: for each word, you can retrieve font name and font size, text color, word position on the page, character offset (for highlight files).
The extracted text can be converted to a wide choice of standard encodings:

  • UTF-8 Unicode
  • Latin1 (8-bit ISO-8859-1)
  • 7-bit ASCII
  • ISO-2022-CN (simplified Chinese)
  • EUC-CN (simplified Chinese)
  • Big5 (traditional Chinese)
  • KOI8-R (Cyrillic)
  • ISO-8859-7 (Greek)
  • ISO-2022-JP (Japanese)
  • EUC-JP (Japanese)
  • Shift-JIS (Japanese)
  • KSX1001 (Korean)
  • TIS-620 (Thai)
  • ISO-8859-9 (Turkish)
Other encodings can be supported upon request.

In addition to the features described above, the XpdfText SDK includes all the functionality of the XpdfInfo SDK.

The XpdfText SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfText SDK is easy to use!

PDFHandle pdf;
char *buf;
int length;

pdfLoadFile(&pdf, "MyFile.pdf");

// convert to a text file on disk...
pdfConvertToTextFile(pdf, 1, 5,
 "MyFile.txt");

// ... or convert in memory
buf = pdfConvertToTextString(pdf, 
1, 5, &length);
The XpdfText SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfText SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

Click the button below to download a free trial version of the XpdfText SDK for Windows. If you're not using Windows, call us at 888-260-7316 to get your free trial version.

To purchase, call us at 888-260-7316 — or if you prefer, you may fill out our order form and fax it to 207-433-1160.

We accept American Express, Discover, MasterCard, and Visa.

PDF analysis API server based PDF spliting software
Pricing starts at $475.00 USD for a developer's license and $18.00 USD per unit for runtime licenses.

Volume discounts are available. Call us at 888-260-7316 to get a price quote.

Pricing is subject to change without notice.

Pricing shown here might not be available to customers in particular geographic locations.



*Payment of an additional fee for maintenance & support is optional but recommended.
 
Can't find exactly what you need? Not sure exactly what you need? Contact us by phone at 888-260-7316, or send e-mail to info@CitationSoftware.com. We can help you find appropriate software for your requirements.
 
 




    
Let our Wizard help you find the right product!

• Products & Services   • Buy software   • Downloads   • Support
• Mailpiece-design site   • Our customers    • Company information   • Links
• Free newsletter   • FAQ   • Case studies   • Contact us
• News archives   • Press   • Customer testimonials   • Home


   Search

Copyright © 2010 Citation Software Inc.
info@CitationSoftware.com
888-260-7316
www.CitationSoftware.com
print on demand
PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing