|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
*
|
Click here to sign up for our free newsletter.
|
|
|
|
|
These are the products we offer for PDF analysis and data mining
(extraction and repurposing of text, graphics & metadata).
|
|
|
XpdfAnalyze SDK
|
The XpdfAnalyze SDK is a developer's library/SDK that makes it easy to
determine the object types and colors used on one or more pages in
a PDF file. Object types are:
- images
- text strings
- strokes (lines)
- fills (filled polygons)
Object-type information can be used to detect image-only, text-only,
and image-and-text PDF files.
Color information includes color spaces (DeviceRGB, DeviceCMYK,
Separation, etc.), as well as information on which process
colors (CMYK) and/or custom colors (spot colors) are used.
The XpdfAnalyze SDK can be used in an automated workflow
to determine which pages contain color and which are black & white.
The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms and as a
shared library for Linux and Solaris platforms. Portable C++ source code is also available.
The XpdfAnalyze SDK is easy to use!
|
PDFHandle pdf;
int n;
pdfLoadFile(&pdf, "MyFile.pdf");
// analyze pages 1-4
pdfAnalyzePages(pdf, 1, 4);
// number of images
// on pages 1-4
n = pdfGetNumImages(pdf);
|
|
|
|
The XpdfAnalyze SDK is available as a COM component or a DLL for Windows platforms
and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
Pricing starts at $235.00 USD for a developer's license and $9.00 USD
per unit for runtime licenses.
(Pricing is subject to change without notice.)
Volume discounts are available. Call us at
508-436-2543 to get a price quote.
|
|
|
XpdfText SDK
|
The XpdfText SDK is a developer's library/SDK that
extracts plain text from a PDF file. The PDF file can
be on disk or in memory; and likewise, the text
can be extracted to memory or directly to disk.
The XpdfText SDK can be used in different ways:
- Convert entire PDF files or individual pages to plain text,
maintaining layout or converting to "reading order."
- Extract text from a specified rectangle on a page
(useful for extracting text from forms).
- Convert pages into word lists: for each word,
you can retrieve font name and font size,
text color,
word position on the page,
character offset (for highlight files).
The extracted text can be converted to a wide
choice of standard encodings:
- UTF-8 Unicode
- Latin1 (8-bit ISO-8859-1)
- 7-bit ASCII
- ISO-2022-CN (simplified Chinese)
- EUC-CN (simplified Chinese)
- Big5 (traditional Chinese)
- KOI8-R (Cyrillic)
- ISO-8859-7 (Greek)
- ISO-2022-JP (Japanese)
- EUC-JP (Japanese)
- Shift-JIS (Japanese)
- KSX1001 (Korean)
- TIS-620 (Thai)
- ISO-8859-9 (Turkish)
Other encodings can be supported upon request.
In addition to the features described above, the
XpdfText SDK includes all the functionality of the
XpdfInfo SDK.
The XpdfText SDK is easy to use!
|
PDFHandle pdf;
char *buf;
int length;
pdfLoadFile(&pdf, "MyFile.pdf");
// convert to a text file on disk...
pdfConvertToTextFile(pdf, 1, 5,
"MyFile.txt");
// ... or convert in memory
buf = pdfConvertToTextString(pdf,
1, 5, &length);
|
|
|
|
The XpdfText SDK is available as a COM component or a DLL for Windows platforms
and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
Pricing starts at $475.00 USD for a developer's license and $18.00 USD
per unit for runtime licenses.
(Pricing is subject to change without notice.)
Volume discounts are available. Call us at
508-436-2543 to get a price quote.
|
|
|
XpdfInfo SDK
|
The XpdfInfo SDK is a developer's library/SDK that
reads a PDF file and provides access to the following information:
- page count
- page size (per page)
- standard metadata fields: title, subject, keywords, author, creator, producer, creation date, modification date
- custom metadata fields (depending on the software used to create the PDF file)
The XpdfInfo SDK is easy to use!
|
PDFHandle pdf;
char *title;
int length;
pdfLoadFile(&pdf, "MyFile.pdf");
title = pdfGetTitle(pdf, &length);
printf("%s\n", title);
|
|
|
|
|
The XpdfInfo SDK is available as a COM component or a DLL for Windows platforms
and as a shared library for Linux and Solaris platforms. Portable C++ source code is also available.
Pricing starts at $235.00 USD for a developer's license and $9.00 USD
per unit for runtime licenses.
(Pricing is subject to change without notice.)
Volume discounts are available. Call us at
508-436-2543 to get a price quote.
|
|
|
|
|
Didn't find exactly what you need? Not sure exactly what you need?
Contact us by phone at 508-436-2543, or send e-mail to
info@CitationSoftware.com.
We can help you find appropriate software for your requirements.
|
|
|
|
|
|
|
Copyright © 2010 Citation Software Inc.
info@CitationSoftware.com
508-436-2543
www.CitationSoftware.com
|
|
|