Package org.apache.pdfbox.pdfparser
Class ConformingPDFParser
java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
org.apache.pdfbox.pdfparser.ConformingPDFParser
- Author:
- Adam Nichols
-
Field Summary
FieldsFields inherited from class org.apache.pdfbox.pdfparser.BaseParser
DEF, document, ENDOBJ, ENDSTREAM, forceParsing, pdfSource, PROP_PUSHBACK_SIZE
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected byte
This will read all bytes until a non-whitespace character is found.protected byte
This will read all bytes (backwards) until a non-whitespace character is found.This will get the document that was parsed.getObject
(long objectNumber, long generation) This will get the PD document that was parsed.boolean
void
parse()
This will parse the stream and populate the COSDocument object.protected COSNumber
parseNumber
(String number) protected long
protected COSBase
processCosObject
(String string) protected String
protected byte
readByte()
protected byte
protected COSDictionary
protected int
readInt()
This will read an integer from the stream.protected String
readLine()
This will read a line starting with the byte at offset and going forward until it finds a newline.protected String
This will read a line starting with the byte at offset and going backwards until it finds a newline.protected long
This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long.protected COSName
protected COSNumber
This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).protected COSBase
This actually reads the object data.readObject
(long objectNumber, long generation) This will read an object from the inputFile at whatever our currentOffset is.protected COSBase
protected String
This will read the next string from the stream.protected String
readWord()
void
setRecursivlyRead
(boolean recursivlyRead) Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
clearResources, isClosing, isClosing, isEndOfName, isEOL, isEOL, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSStream, parseCOSString, parseCOSString, parseDirObject, readExpectedString, readGenerationNumber, readLong, readObjectNumber, readString, readStringNumber, readUntilEndStream, setDocument, skipSpaces
-
Field Details
-
inputFile
-
-
Constructor Details
-
ConformingPDFParser
Constructor.- Parameters:
inputFile
- The input stream that contains the PDF document.- Throws:
IOException
- If there is an error initializing the stream.
-
-
Method Details
-
parse
This will parse the stream and populate the COSDocument object. This will close the stream when it is done parsing.- Throws:
IOException
- If there is an error reading from the stream or corrupt data is found.
-
getDocument
This will get the document that was parsed. parse() must be called before this is called. When you are done with this document you must call close() on it to release resources.- Returns:
- The document that was parsed.
- Throws:
IOException
- If there is an error getting the document.
-
getPDDocument
This will get the PD document that was parsed. When you are done with this document you must call close() on it to release resources.- Returns:
- The document at the PD layer.
- Throws:
IOException
- If there is an error getting the document.
-
parseTrailerInformation
- Throws:
IOException
NumberFormatException
-
readByteBackwards
- Throws:
IOException
-
readByte
- Throws:
IOException
-
readBackwardUntilWhitespace
- Throws:
IOException
-
consumeWhitespaceBackwards
This will read all bytes (backwards) until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
IOException
- if there is an error reading from the file
-
consumeWhitespace
This will read all bytes until a non-whitespace character is found. To save you an extra read, the non-whitespace character is returned. If the current character is not whitespace, this method will just return the current char.- Returns:
- the first non-whitespace character found
- Throws:
IOException
- if there is an error reading from the file
-
readLongBackwards
This will consume any whitespace, read in bytes until whitespace is found again and then parse the characters which have been read as a long. The current offset will then point at the first whitespace character which preceeds the number.- Returns:
- the parsed number
- Throws:
IOException
- if there is an error reading from the fileNumberFormatException
- if the bytes read can not be converted to a number
-
readInt
Description copied from class:BaseParser
This will read an integer from the stream.- Overrides:
readInt
in classBaseParser
- Returns:
- The integer that was read from the stream.
- Throws:
IOException
- If there is an error reading from the stream.
-
readNumber
This will read in a number and return the COS version of the number (be it a COSInteger or a COSFloat).- Returns:
- the COSNumber which was read/parsed
- Throws:
IOException
-
parseNumber
- Throws:
IOException
-
processCosObject
- Throws:
IOException
-
readObjectBackwards
- Throws:
IOException
-
readNameBackwards
- Throws:
IOException
-
getObject
- Throws:
IOException
-
readObject
This will read an object from the inputFile at whatever our currentOffset is. If the object and generation are not the expected values and this object is set to throw an exception for non-conforming documents, then an exception will be thrown.- Parameters:
objectNumber
- the object number you expect to readgeneration
- the generation you expect this object to be- Returns:
- the object being read.
- Throws:
IOException
-
readObject
This actually reads the object data.- Returns:
- the object which is read
- Throws:
IOException
-
readString
This will read the next string from the stream.- Overrides:
readString
in classBaseParser
- Returns:
- The string that was read from the stream.
- Throws:
IOException
- If there is an error reading from the stream.
-
readDictionaryBackwards
- Throws:
IOException
-
readLineBackwards
This will read a line starting with the byte at offset and going backwards until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Returns:
- the string which was read
- Throws:
IOException
- if there was an error reading data from the file
-
readLine
This will read a line starting with the byte at offset and going forward until it finds a newline. This should only be used if we are certain that the data will only be text, and not binary data.- Overrides:
readLine
in classBaseParser
- Returns:
- the string which was read
- Throws:
IOException
- if there was an error reading data from the file
-
readWord
- Throws:
IOException
-
isRecursivlyRead
public boolean isRecursivlyRead()- Returns:
- the recursivlyRead
-
setRecursivlyRead
public void setRecursivlyRead(boolean recursivlyRead) - Parameters:
recursivlyRead
- the recursivlyRead to set
-