PDF analysis;
extraction and repurposing of
text, graphics & metadata
in PDF Files
(PDF Data Mining):
Citation Software

 


Looking for a product or
service? Our Solution Finder
can help you find it fast!
   • Products & Services     • Buy software   
   • Downloads     • Support   
   • Mailpiece-design site     • Client list   
   • Company information     • Links   
   • Free newsletter     • FAQ   
   • Case studies     • Contact us   
   • News archives     • Press   
   • Customer testimonials   
   • Home   
 
 
  Click here to sign up for our free newsletter.  
 



































































































































































































































































  
Citation Software Inc.
 Specialists in variable-data publishing since 1986
 
www.CitationSoftware.com     info@CitationSoftware.com
          508-436-2543
       

  
  
 

   Search
    
* Click here to sign up for our free newsletter.
 

These are the products we offer for PDF analysis and data mining (extraction and repurposing of text, graphics & metadata).

XpdAnalyze SDK   XpdfText SDK
XpdfInfo SDK    

XpdfAnalyze SDK

The XpdfAnalyze SDK is a developer's library/SDK that makes it easy to determine the object types and colors used on one or more pages in a PDF file. Object types are:
  • images
  • text strings
  • strokes (lines)
  • fills (filled polygons)
Object-type information can be used to detect image-only, text-only, and image-and-text PDF files.

Color information includes color spaces (DeviceRGB, DeviceCMYK, Separation, etc.), as well as information on which process colors (CMYK) and/or custom colors (spot colors) are used.

The XpdfAnalyze SDK can be used in an automated workflow to determine which pages contain color and which are black & white.

The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

The XpdfAnalyze SDK is easy to use!

PDFHandle pdf;
int n;

pdfLoadFile(&pdf, "MyFile.pdf");

// analyze pages 1-4
pdfAnalyzePages(pdf, 1, 4);

// number of images 
// on pages 1-4
n = pdfGetNumImages(pdf);
RELATED PRODUCTS:
 • XpdfText SDK
 • XpdfInfo SDK
The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

Pricing starts at $235.00 USD for a developer's license and $9.00 USD per unit for runtime licenses. (Pricing is subject to change without notice.)

Volume discounts are available. Call us at 508-436-2543 to get a price quote.


If you're not using Windows,
call us at 508-436-2543 to get your free trial version.









XpdfText SDK

The XpdfText SDK is a developer's library/SDK that extracts plain text from a PDF file. The PDF file can be on disk or in memory; and likewise, the text can be extracted to memory or directly to disk.

The XpdfText SDK can be used in different ways:
  • Convert entire PDF files or individual pages to plain text, maintaining layout or converting to "reading order."
  • Extract text from a specified rectangle on a page (useful for extracting text from forms).
  • Convert pages into word lists: for each word, you can retrieve font name and font size, text color, word position on the page, character offset (for highlight files).
The extracted text can be converted to a wide choice of standard encodings:
  • UTF-8 Unicode
  • Latin1 (8-bit ISO-8859-1)
  • 7-bit ASCII
  • ISO-2022-CN (simplified Chinese)
  • EUC-CN (simplified Chinese)
  • Big5 (traditional Chinese)
  • KOI8-R (Cyrillic)
  • ISO-8859-7 (Greek)
  • ISO-2022-JP (Japanese)
  • EUC-JP (Japanese)
  • Shift-JIS (Japanese)
  • KSX1001 (Korean)
  • TIS-620 (Thai)
  • ISO-8859-9 (Turkish)
Other encodings can be supported upon request.

In addition to the features described above, the XpdfText SDK includes all the functionality of the XpdfInfo SDK.

The XpdfText SDK is easy to use!

PDFHandle pdf;
char *buf;
int length;

pdfLoadFile(&pdf, "MyFile.pdf");

// convert to a text file on disk...
pdfConvertToTextFile(pdf, 1, 5,
 "MyFile.txt");

// ... or convert in memory
buf = pdfConvertToTextString(pdf, 
1, 5, &length);
RELATED PRODUCTS:
 • XpdfAnalyze SDK
 • XpdfInfo SDK
The XpdfText SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

Pricing starts at $475.00 USD for a developer's license and $18.00 USD per unit for runtime licenses. (Pricing is subject to change without notice.)

Volume discounts are available. Call us at 508-436-2543 to get a price quote.


If you're not using Windows,
call us at 508-436-2543 to get your free trial version.









XpdfInfo SDK

The XpdfInfo SDK is a developer's library/SDK that reads a PDF file and provides access to the following information:
  • page count
  • page size (per page)
  • standard metadata fields: title, subject, keywords, author, creator, producer, creation date, modification date
  • custom metadata fields (depending on the software used to create the PDF file)
The XpdfInfo SDK is easy to use!

PDFHandle pdf;
char *title;
int length;

pdfLoadFile(&pdf, "MyFile.pdf");
title = pdfGetTitle(pdf, &length);
printf("%s\n", title);
 
RELATED PRODUCTS:
 • XpdfAnalyze SDK
 • XpdfText SDK

The XpdfInfo SDK is available as a COM component or a DLL for Windows platforms and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.

Pricing starts at $235.00 USD for a developer's license and $9.00 USD per unit for runtime licenses. (Pricing is subject to change without notice.)

Volume discounts are available. Call us at 508-436-2543 to get a price quote.


If you're not using Windows,
call us at 508-436-2543 to get your free trial version.
 
Didn't find exactly what you need? Not sure exactly what you need? Contact us by phone at 508-436-2543, or send e-mail to info@CitationSoftware.com. We can help you find appropriate software for your requirements.
 
 




    
Click here to go to the Solution Finder

• Products & Services   • Buy software   • Downloads   • Support
• Mailpiece-design site   • Client list   • Company information   • Links
• Free newsletter   • FAQ   • Case studies   • Contact us
• News archives   • Press   • Customer testimonials   • Home


   Search

Copyright © 2010 Citation Software Inc.
info@CitationSoftware.com
508-436-2543
www.CitationSoftware.com
print on demand
PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing PDF data mining PDF text extraction PDF image extraction PDF repurposing