Package org.apache.pdfbox.pdmodel
Class PDDocument
java.lang.Object
org.apache.pdfbox.pdmodel.PDDocument
- All Implemented Interfaces:
Pageable
,Closeable
,AutoCloseable
- Direct Known Subclasses:
ConformingPDDocument
This is the in-memory representation of the PDF document. You need to call
close() on this object when you are done using it!!
This class implements the Pageable
interface, but since PDFBox
version 1.3.0 you should be using the PDPageable
adapter instead
(see PDFBOX-788).
- Version:
- $Revision: 1.47 $
- Author:
- Ben Litchfield
-
Field Summary
Fields inherited from interface java.awt.print.Pageable
UNKNOWN_NUMBER_OF_PAGES
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor, creates a new PDF Document with no pages.PDDocument
(COSDocument doc) Constructor that uses an existing document.PDDocument
(COSDocument doc, BaseParser usedParser) Constructor that uses an existing document. -
Method Summary
Modifier and TypeMethodDescriptionvoid
This will add a page to the document.void
addSignature
(PDSignature sigObject, SignatureInterface signatureInterface) Add a signature.void
addSignature
(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options) This will add a signature to the document.void
addSignatureField
(List<PDSignatureField> sigFields, SignatureInterface signatureInterface, SignatureOptions options) This will add a signaturefield to the document.void
Deprecated.Do not rely on this method anymore.void
close()
This will close the underlying COSDocument object.void
This will decrypt a document.void
This will mark a document to be encrypted.Returns the access permissions granted when the document was decrypted.This will get the low level document.This will get the document CATALOG.This will get the document info dictionary.This will get the encryption dictionary for this document.This will return the last signature.int
Deprecated.Do not rely on this method anymore.int
Deprecated.Use the getNumberOfPages method instead!getPageFormat
(int pageIndex) Deprecated.Use thePDPageable
adapter classThis will return the Map containing the mapping from object-ids to pagenumbers.getPrintable
(int pageIndex) Get the security handler that is used for document encryption.Retrieve all signature dictionaries from the document.Deprecated.usegetLastSignatureDictionary()
instead.Retrieve all signature fields from the document.Deprecated.Do not rely on this method anymore.importPage
(PDPage page) This will import and copy the contents from another location.boolean
Indicates if all security is removed or not when writing the pdf.boolean
This will tell if this document is encrypted or not.boolean
isOwnerPassword
(String password) Deprecated.boolean
isUserPassword
(String password) Deprecated.static PDDocument
This will load a document from a file.static PDDocument
load
(File file, RandomAccess scratchFile) This will load a document from a file.static PDDocument
load
(InputStream input) This will load a document from an input stream.static PDDocument
load
(InputStream input, boolean force) This will load a document from an input stream.static PDDocument
load
(InputStream input, RandomAccess scratchFile) This will load a document from an input stream.static PDDocument
load
(InputStream input, RandomAccess scratchFile, boolean force) This will load a document from an input stream.static PDDocument
This will load a document from a file.static PDDocument
This will load a document from a file.static PDDocument
load
(String filename, RandomAccess scratchFile) This will load a document from a file.static PDDocument
This will load a document from a url.static PDDocument
This will load a document from a url.static PDDocument
load
(URL url, RandomAccess scratchFile) This will load a document from a url.static PDDocument
loadNonSeq
(File file, RandomAccess scratchFile) Parses PDF with the new non sequential parser and an empty password.static PDDocument
loadNonSeq
(File file, RandomAccess scratchFile, String password) Parses PDF with the new non sequential parser and an empty password.static PDDocument
loadNonSeq
(InputStream input, RandomAccess scratchFile) Parses PDF with the new non sequential parser.static PDDocument
loadNonSeq
(InputStream input, RandomAccess scratchFile, String password) Parses PDF with the new non sequential parser.void
Tries to decrypt the document in memory using the provided decryption material.void
print()
This will send the PDF document to a printer.void
print
(PrinterJob printJob) void
Protects the document with the protection policy pp.boolean
removePage
(int pageNumber) Remove the page from the document.boolean
removePage
(PDPage page) Remove the page from the document.void
Save the document to a file.void
save
(OutputStream output) This will save the document to an output stream.void
Save the document to a file.void
saveIncremental
(InputStream input, OutputStream output) Save the pdf as incremental for signing.void
saveIncremental
(String fileName) Save the pdf as incremental for signing.void
setAllSecurityToBeRemoved
(boolean removeAllSecurity) Activates/Deactivates the removal of all security when writing the pdf.void
setDocumentId
(Long docId) void
This will set the document information for this document.void
setEncryptionDictionary
(PDEncryptionDictionary encDictionary) This will set the encryption dictionary for this document.boolean
setSecurityHandler
(SecurityHandler secHandler) Sets security handler if none is set already.void
This will send the PDF to the default printer without prompting the user for any printer settings.void
silentPrint
(PrinterJob printJob) This will send the PDF to the default printer without prompting the user for any printer settings.boolean
Deprecated.usegetCurrentAccessPermission
insteadboolean
Deprecated.Do not rely on this method anymore.
-
Constructor Details
-
PDDocument
public PDDocument()Constructor, creates a new PDF Document with no pages. You need to add at least one page for the document to be valid. -
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc
- The COSDocument that this document wraps.
-
PDDocument
Constructor that uses an existing document. The COSDocument that is passed in must be valid.- Parameters:
doc
- The COSDocument that this document wraps.usedParser
- the parser which is used to read the pdf
-
-
Method Details
-
getPageMap
This will return the Map containing the mapping from object-ids to pagenumbers.- Returns:
- the pageMap
-
addPage
This will add a page to the document. This is a convenience method, that will add the page to the root of the hierarchy and set the parent of the page to the root.- Parameters:
page
- The page to add to the document.
-
addSignature
public void addSignature(PDSignature sigObject, SignatureInterface signatureInterface) throws IOException, SignatureException Add a signature.- Parameters:
sigObject
- is the PDSignature modelsignatureInterface
- is a interface which provides signing capabilities- Throws:
IOException
- if there is an error creating required fieldsSignatureException
- if something went wrong
-
addSignature
public void addSignature(PDSignature sigObject, SignatureInterface signatureInterface, SignatureOptions options) throws IOException, SignatureException This will add a signature to the document.- Parameters:
sigObject
- is the PDSignature modelsignatureInterface
- is a interface which provides signing capabilitiesoptions
- signature options- Throws:
IOException
- if there is an error creating required fieldsSignatureException
- if something went wrong
-
addSignatureField
public void addSignatureField(List<PDSignatureField> sigFields, SignatureInterface signatureInterface, SignatureOptions options) throws IOException, SignatureException This will add a signaturefield to the document.- Parameters:
sigFields
- are the PDSignatureFields that should be added to the documentsignatureInterface
- is a interface which provides signing capabilitiesoptions
- signature options- Throws:
IOException
- if there is an error creating required fieldsSignatureException
-
removePage
Remove the page from the document.- Parameters:
page
- The page to remove from the document.- Returns:
- true if the page was found false otherwise.
-
removePage
public boolean removePage(int pageNumber) Remove the page from the document.- Parameters:
pageNumber
- 0 based index to page number.- Returns:
- true if the page was found false otherwise.
-
importPage
This will import and copy the contents from another location. Currently the content stream is stored in a scratch file. The scratch file is associated with the document. If you are adding a page to this document from another document and want to copy the contents to this document's scratch file then use this method otherwise just use the addPage method. UnlikeaddPage(org.apache.pdfbox.pdmodel.PDPage)
, this method does a deep copy. If your page has annotations, and if these link to pages not in the target document, then the target document might become huge. What you need to do is to delete page references of such annotations. See here for how to do this.- Parameters:
page
- The page to import.- Returns:
- The page that was imported.
- Throws:
IOException
- If there is an error copying the page.
-
getDocument
This will get the low level document.- Returns:
- The document that this layer sits on top of.
-
getDocumentInformation
This will get the document info dictionary. This is guaranteed to not return null.- Returns:
- The documents /Info dictionary
-
setDocumentInformation
This will set the document information for this document.- Parameters:
info
- The updated document information.
-
getDocumentCatalog
This will get the document CATALOG. This is guaranteed to not return null.- Returns:
- The documents /Root dictionary
-
isEncrypted
public boolean isEncrypted()This will tell if this document is encrypted or not.- Returns:
- true If this document is encrypted.
-
getEncryptionDictionary
This will get the encryption dictionary for this document. This will still return the parameters if the document was decrypted. If the document was never encrypted then this will return null. As the encryption architecture in PDF documents is plugable this returns an abstract class, but the only supported subclass at this time is a PDStandardEncryption object.- Returns:
- The encryption dictionary(most likely a PDStandardEncryption object)
- Throws:
IOException
- If there is an error determining which security handler to use.
-
setEncryptionDictionary
This will set the encryption dictionary for this document.- Parameters:
encDictionary
- The encryption dictionary(most likely a PDStandardEncryption object)- Throws:
IOException
- If there is an error determining which security handler to use.
-
getSignatureDictionary
Deprecated.usegetLastSignatureDictionary()
instead.This will return the last signature.- Returns:
- the last signature as
PDSignature
. - Throws:
IOException
- if no document catalog can be found.
-
getLastSignatureDictionary
This will return the last signature.- Returns:
- the last signature as
PDSignature
. - Throws:
IOException
- if no document catalog can be found.
-
getSignatureFields
Retrieve all signature fields from the document.- Returns:
- a
List
ofPDSignatureField
s - Throws:
IOException
- if no document catalog can be found.
-
getSignatureDictionaries
Retrieve all signature dictionaries from the document.- Returns:
- a
List
ofPDSignature
s - Throws:
IOException
- if no document catalog can be found.
-
isUserPassword
@Deprecated public boolean isUserPassword(String password) throws IOException, CryptographyException Deprecated.This will determine if this is the user password. This only applies when the document is encrypted and uses standard encryption.- Parameters:
password
- The plain text user password.- Returns:
- true If the password passed in matches the user password used to encrypt the document.
- Throws:
IOException
- If there is an error determining if it is the user password.CryptographyException
- If there is an error in the encryption algorithms.
-
isOwnerPassword
@Deprecated public boolean isOwnerPassword(String password) throws IOException, CryptographyException Deprecated.This will determine if this is the owner password. This only applies when the document is encrypted and uses standard encryption.- Parameters:
password
- The plain text owner password.- Returns:
- true If the password passed in matches the owner password used to encrypt the document.
- Throws:
IOException
- If there is an error determining if it is the user password.CryptographyException
- If there is an error in the encryption algorithms.
-
decrypt
This will decrypt a document. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.Do not call this method if you have opened your document with one of the
loadNonSeq
methods.- Parameters:
password
- Either the user or owner password.- Throws:
CryptographyException
- If there is an error decrypting the document.IOException
- If there is an error getting the stream data.
-
wasDecryptedWithOwnerPassword
Deprecated.usegetCurrentAccessPermission
insteadThis will tell if the document was decrypted with the master password. This entry is invalid if the PDF was not decrypted.- Returns:
- true if the pdf was decrypted with the master password.
-
encrypt
public void encrypt(String ownerPassword, String userPassword) throws CryptographyException, IOException This will mark a document to be encrypted. The actual encryption will occur when the document is saved. This method is provided for compatibility reasons only. User should use the new security layer instead and the openProtection method especially.- Parameters:
ownerPassword
- The owner password to encrypt the document.userPassword
- The user password to encrypt the document.- Throws:
CryptographyException
- If an error occurs during encryption.IOException
- If there is an error accessing the data.
-
getOwnerPasswordForEncryption
Deprecated.Do not rely on this method anymore.The owner password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.- Returns:
- The owner password passed to the encrypt method.
-
getUserPasswordForEncryption
Deprecated.Do not rely on this method anymore.The user password that was passed into the encrypt method. You should never use this method. This will not longer be valid once encryption has occured.- Returns:
- The user password passed to the encrypt method.
-
willEncryptWhenSaving
Deprecated.Do not rely on this method anymore. It is the responsibility of COSWriter to hold this stateInternal method do determine if the document will be encrypted when it is saved.- Returns:
- True if encrypt has been called and the document has not been saved yet.
-
clearWillEncryptWhenSaving
Deprecated.Do not rely on this method anymore. It is the responsability of COSWriter to hold this state.This shoule only be called by the COSWriter after encryption has completed. -
load
This will load a document from a url.- Parameters:
url
- The url to load the PDF from.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a url. Used for skipping corrupt pdf objects- Parameters:
url
- The url to load the PDF from.force
- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a url.- Parameters:
url
- The url to load the PDF from.scratchFile
- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
filename
- The name of the file to load.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a file. Allows for skipping corrupt pdf objects- Parameters:
filename
- The name of the file to load.force
- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
filename
- The name of the file to load.scratchFile
- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
file
- The name of the file to load.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from a file.- Parameters:
file
- The name of the file to load.scratchFile
- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from an input stream.- Parameters:
input
- The stream that contains the document.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from an input stream. Allows for skipping corrupt pdf objects- Parameters:
input
- The stream that contains the document.force
- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
This will load a document from an input stream.- Parameters:
input
- The stream that contains the document.scratchFile
- A location to store temp PDFBox data for this document.- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
load
public static PDDocument load(InputStream input, RandomAccess scratchFile, boolean force) throws IOException This will load a document from an input stream. Allows for skipping corrupt pdf objects- Parameters:
input
- The stream that contains the document.scratchFile
- A location to store temp PDFBox data for this document.force
- When true, the parser will skip corrupt pdf objects and will continue parsing at the next object in the file- Returns:
- The document that was loaded.
- Throws:
IOException
- If there is an error reading from the stream.
-
loadNonSeq
Parses PDF with the new non sequential parser and an empty password.- Parameters:
file
- file to be loadedscratchFile
- location to store temp PDFBox data for this document- Returns:
- loaded document
- Throws:
IOException
- in case of a file reading or parsing error
-
loadNonSeq
public static PDDocument loadNonSeq(File file, RandomAccess scratchFile, String password) throws IOException Parses PDF with the new non sequential parser and an empty password.- Parameters:
file
- file to be loadedscratchFile
- location to store temp PDFBox data for this documentpassword
- password to be used for decryption- Returns:
- loaded document
- Throws:
IOException
- in case of a file reading or parsing error
-
loadNonSeq
Parses PDF with the new non sequential parser.- Parameters:
input
- stream that contains the document.scratchFile
- location to store temp PDFBox data for this document- Returns:
- loaded document
- Throws:
IOException
- in case of a file reading or parsing error
-
loadNonSeq
public static PDDocument loadNonSeq(InputStream input, RandomAccess scratchFile, String password) throws IOException Parses PDF with the new non sequential parser.- Parameters:
input
- stream that contains the document.scratchFile
- location to store temp PDFBox data for this documentpassword
- password to be used for decryption- Returns:
- loaded document
- Throws:
IOException
- in case of a file reading or parsing error
-
save
Save the document to a file.- Parameters:
fileName
- The file to save as.- Throws:
IOException
- If there is an error saving the document.COSVisitorException
- If an error occurs while generating the data.
-
save
Save the document to a file.- Parameters:
file
- The file to save as.- Throws:
IOException
- If there is an error saving the document.COSVisitorException
- If an error occurs while generating the data.
-
save
This will save the document to an output stream.- Parameters:
output
- The stream to write to.- Throws:
IOException
- If there is an error writing the document.COSVisitorException
- If an error occurs while generating the data.
-
saveIncremental
Save the pdf as incremental for signing. Use this only for small files because this method temporarily stores the entire file into memory.- Parameters:
fileName
- the filename to be used. This should be a copy of the original file.- Throws:
IOException
- if something went wrongCOSVisitorException
- if something went wrong
-
saveIncremental
public void saveIncremental(InputStream input, OutputStream output) throws IOException, COSVisitorException Save the pdf as incremental for signing. See the signature examples sources on how to use this.- Parameters:
input
- . This must be a FileInputStream or it won't work. It should point to the same file than the output parameter.output
- . This must be a FileOutputStream or it won't work. It must be positioned at the end of the file, i.e. it should just have written the original file. The appending constructor of FileOutputStream has been found not to be working, so you need to write the whole file yourself.- Throws:
IOException
- if something went wrongCOSVisitorException
- if something went wrong
-
getPageCount
Deprecated.Use the getNumberOfPages method instead!This will return the total page count of the PDF document. Note: This method is deprecated in favor of the getNumberOfPages method. The getNumberOfPages is a required interface method of the Pageable interface. This method will be removed in a future version of PDFBox!!- Returns:
- The total number of pages in the PDF document.
-
getNumberOfPages
public int getNumberOfPages()- Specified by:
getNumberOfPages
in interfacePageable
-
getPageFormat
Deprecated.Use thePDPageable
adapter classReturns the format of the page at the given index when using a default printer job returned byPrinterJob.getPrinterJob()
.- Specified by:
getPageFormat
in interfacePageable
- Parameters:
pageIndex
- page index, zero-based- Returns:
- page format
-
getPrintable
- Specified by:
getPrintable
in interfacePageable
-
print
- Parameters:
printJob
- The printer job.- Throws:
PrinterException
- If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.- See Also:
-
print
This will send the PDF document to a printer. The printing functionality depends on the org.apache.pdfbox.pdfviewer.PageDrawer functionality. The PageDrawer is a work in progress and some PDFs will print correctly and some will not. This is a convenience method to create the java.awt.print.PrinterJob. The PDDocument implements the java.awt.print.Pageable interface and PDPage implementes the java.awt.print.Printable interface, so advanced printing capabilities can be done by using those interfaces instead of this method.- Throws:
PrinterException
- If there is an error while sending the PDF to the printer, or you do not have permissions to print this document.
-
silentPrint
This will send the PDF to the default printer without prompting the user for any printer settings.- Throws:
PrinterException
- If there is an error while printing.- See Also:
-
silentPrint
This will send the PDF to the default printer without prompting the user for any printer settings.- Parameters:
printJob
- A printer job definition.- Throws:
PrinterException
- If there is an error while printing.- See Also:
-
close
This will close the underlying COSDocument object.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
- If there is an error releasing resources.
-
protect
Protects the document with the protection policy pp. The document content will be really encrypted when it will be saved. This method only marks the document for encryption.- Parameters:
pp
- The protection policy.- Throws:
BadSecurityHandlerException
- If there is an error during protection.- See Also:
-
openProtection
public void openProtection(DecryptionMaterial pm) throws BadSecurityHandlerException, IOException, CryptographyException Tries to decrypt the document in memory using the provided decryption material.Do not call this method if you have opened your document with one of the
loadNonSeq
methods.- Parameters:
pm
- The decryption material (password or certificate).- Throws:
BadSecurityHandlerException
- If there is an error during decryption.IOException
- If there is an error reading cryptographic information.CryptographyException
- If there is an error during decryption.- See Also:
-
getCurrentAccessPermission
Returns the access permissions granted when the document was decrypted. If the document was not decrypted this method returns the access permission for a document owner (ie can do everything). The returned object is in read only mode so that permissions cannot be changed. Methods providing access to content should rely on this object to verify if the current user is allowed to proceed.- Returns:
- the access permissions for the current user on the document.
-
getSecurityHandler
Get the security handler that is used for document encryption.- Returns:
- The handler used to encrypt/decrypt the document.
-
setSecurityHandler
Sets security handler if none is set already.- Parameters:
secHandler
- security handler to be assigned to document- Returns:
true
if security handler was set,false
otherwise (a security handler was already set)
-
isAllSecurityToBeRemoved
public boolean isAllSecurityToBeRemoved()Indicates if all security is removed or not when writing the pdf.- Returns:
- returns true if all security shall be removed otherwise false
-
setAllSecurityToBeRemoved
public void setAllSecurityToBeRemoved(boolean removeAllSecurity) Activates/Deactivates the removal of all security when writing the pdf.- Parameters:
removeAllSecurity
- remove all security if set to true
-
getDocumentId
-
setDocumentId
-