Class BaseParser

java.lang.Object
org.apache.pdfbox.pdfparser.BaseParser
Direct Known Subclasses:
ConformingPDFParser, PDFObjectStreamParser, PDFParser, PDFStreamParser, PDFXrefStreamParser, VisualSignatureParser

public abstract class BaseParser extends Object
This class is used to contain parsing logic that will be used by both the PDFParser and the COSStreamParser.
Version:
$Revision$
Author:
Ben Litchfield
  • Field Details

    • PROP_PUSHBACK_SIZE

      public static final String PROP_PUSHBACK_SIZE
      system property allowing to define size of push back buffer.
      See Also:
    • ENDSTREAM

      public static final byte[] ENDSTREAM
      This is a byte array that will be used for comparisons.
    • ENDOBJ

      public static final byte[] ENDOBJ
      This is a byte array that will be used for comparisons.
    • DEF

      public static final String DEF
      This is a string constant that will be used for comparisons.
      See Also:
    • pdfSource

      protected PushBackInputStream pdfSource
      This is the stream that will be read from.
    • document

      protected COSDocument document
      This is the document that will be parsed.
    • forceParsing

      protected final boolean forceParsing
      Flag to skip malformed or otherwise unparseable input where possible.
  • Constructor Details

    • BaseParser

      public BaseParser()
      Default constructor.
    • BaseParser

      public BaseParser(InputStream input, boolean forceParsingValue) throws IOException
      Constructor.
      Parameters:
      input - The input stream to read the data from.
      forceParsingValue - flag to skip malformed or otherwise unparseable input where possible
      Throws:
      IOException - If there is an error reading the input stream.
      Since:
      Apache PDFBox 1.3.0
    • BaseParser

      public BaseParser(InputStream input) throws IOException
      Constructor.
      Parameters:
      input - The input stream to read the data from.
      Throws:
      IOException - If there is an error reading the input stream.
    • BaseParser

      protected BaseParser(byte[] input) throws IOException
      Constructor.
      Parameters:
      input - The array to read the data from.
      Throws:
      IOException - If there is an error reading the byte data.
  • Method Details

    • setDocument

      public void setDocument(COSDocument doc)
      Set the document for this stream.
      Parameters:
      doc - The current document.
    • parseCOSDictionary

      protected COSDictionary parseCOSDictionary() throws IOException
      This will parse a PDF dictionary.
      Returns:
      The parsed dictionary.
      Throws:
      IOException - IF there is an error reading the stream.
    • parseCOSStream

      protected COSStream parseCOSStream(COSDictionary dic, RandomAccess file) throws IOException
      This will read a COSStream from the input stream.
      Parameters:
      file - The file to write the stream to when reading.
      dic - The dictionary that goes with this stream.
      Returns:
      The parsed pdf stream.
      Throws:
      IOException - If there is an error reading the stream.
    • readUntilEndStream

      protected void readUntilEndStream(OutputStream out) throws IOException
      This method will read through the current stream object until we find the keyword "endstream" meaning we're at the end of this object. Some pdf files, however, forget to write some endstream tags and just close off objects with an "endobj" tag so we have to handle this case as well. This method is optimized using buffered IO and reduced number of byte compare operations.
      Parameters:
      out - stream we write out to.
      Throws:
      IOException
    • parseCOSString

      @Deprecated protected COSString parseCOSString(boolean isDictionary) throws IOException
      Deprecated.
      Not needed anymore. Use parseCOSString() instead. PDFBOX-1437
      This will parse a PDF string.
      Parameters:
      isDictionary - indicates if the stream is a dictionary or not
      Returns:
      The parsed PDF string.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseCOSString

      protected COSString parseCOSString() throws IOException
      This will parse a PDF string.
      Returns:
      The parsed PDF string.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseCOSArray

      protected COSArray parseCOSArray() throws IOException
      This will parse a PDF array object.
      Returns:
      The parsed PDF array.
      Throws:
      IOException - If there is an error parsing the stream.
    • isEndOfName

      protected boolean isEndOfName(char ch)
      Determine if a character terminates a PDF name.
      Parameters:
      ch - The character
      Returns:
      true if the character terminates a PDF name, otherwise false.
    • parseCOSName

      protected COSName parseCOSName() throws IOException
      This will parse a PDF name from the stream.
      Returns:
      The parsed PDF name.
      Throws:
      IOException - If there is an error reading from the stream.
    • parseBoolean

      protected COSBoolean parseBoolean() throws IOException
      This will parse a boolean object from the stream.
      Returns:
      The parsed boolean object.
      Throws:
      IOException - If an IO error occurs during parsing.
    • parseDirObject

      protected COSBase parseDirObject() throws IOException
      This will parse a directory object from the stream.
      Returns:
      The parsed object.
      Throws:
      IOException - If there is an error during parsing.
    • readString

      protected String readString() throws IOException
      This will read the next string from the stream.
      Returns:
      The string that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • readExpectedString

      protected String readExpectedString(String theString) throws IOException
      This will read bytes until the end of line marker occurs.
      Parameters:
      theString - The next expected string in the stream.
      Returns:
      The characters between the current position and the end of the line.
      Throws:
      IOException - If there is an error reading from the stream or theString does not match what was read.
    • readString

      protected String readString(int length) throws IOException
      This will read the next string from the stream up to a certain length.
      Parameters:
      length - The length to stop reading at.
      Returns:
      The string that was read from the stream of length 0 to length.
      Throws:
      IOException - If there is an error reading from the stream.
    • isClosing

      protected boolean isClosing() throws IOException
      This will tell if the next character is a closing brace( close of PDF array ).
      Returns:
      true if the next byte is ']', false otherwise.
      Throws:
      IOException - If an IO error occurs.
    • isClosing

      protected boolean isClosing(int c)
      This will tell if the next character is a closing brace( close of PDF array ).
      Parameters:
      c - The character to check against end of line
      Returns:
      true if the next byte is ']', false otherwise.
    • readLine

      protected String readLine() throws IOException
      This will read bytes until the first end of line marker occurs. Note: if you later unread the results of this function, you'll need to add a newline character to the end of the string.
      Returns:
      The characters between the current position and the end of the line.
      Throws:
      IOException - If there is an error reading from the stream.
    • isEOL

      protected boolean isEOL() throws IOException
      This will tell if the next byte to be read is an end of line byte.
      Returns:
      true if the next byte is 0x0A or 0x0D.
      Throws:
      IOException - If there is an error reading from the stream.
    • isEOL

      protected boolean isEOL(int c)
      This will tell if the next byte to be read is an end of line byte.
      Parameters:
      c - The character to check against end of line
      Returns:
      true if the next byte is 0x0A or 0x0D.
    • isWhitespace

      protected boolean isWhitespace() throws IOException
      This will tell if the next byte is whitespace or not.
      Returns:
      true if the next byte in the stream is a whitespace character.
      Throws:
      IOException - If there is an error reading from the stream.
    • isWhitespace

      protected boolean isWhitespace(int c)
      This will tell if the next byte is whitespace or not. These values are specified in table 1 (page 12) of ISO 32000-1:2008.
      Parameters:
      c - The character to check against whitespace
      Returns:
      true if the next byte in the stream is a whitespace character.
    • skipSpaces

      protected void skipSpaces() throws IOException
      This will skip all spaces and comments that are present.
      Throws:
      IOException - If there is an error reading from the stream.
    • readObjectNumber

      protected long readObjectNumber() throws IOException
      This will read a long from the Stream and throw an IllegalArgumentException if the long value has more than 10 digits (i.e. : bigger than OBJECT_NUMBER_THRESHOLD)
      Returns:
      the object number being read.
      Throws:
      IOException - if an I/O error occurs
    • readGenerationNumber

      protected int readGenerationNumber() throws IOException
      This will read a integer from the Stream and throw an IllegalArgumentException if the integer value has more than the maximum object revision (i.e. : bigger than GENERATION_NUMBER_THRESHOLD)
      Returns:
      the generation number being read.
      Throws:
      IOException - if an I/O error occurs
    • readInt

      protected int readInt() throws IOException
      This will read an integer from the stream.
      Returns:
      The integer that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • readLong

      protected long readLong() throws IOException
      This will read an long from the stream.
      Returns:
      The long that was read from the stream.
      Throws:
      IOException - If there is an error reading from the stream.
    • readStringNumber

      protected final StringBuilder readStringNumber() throws IOException
      This method is used to read a token by the readInt() method and the readLong() method.
      Returns:
      the token to parse as integer or long by the calling method.
      Throws:
      IOException - throws by the pdfSource methods.
    • clearResources

      public void clearResources()
      Release all used resources.