Class TextPosition

java.lang.Object
org.apache.pdfbox.util.TextPosition

public class TextPosition extends Object
This represents a string and a position on the screen of those characters.
Version:
$Revision: 1.12 $
Author:
Ben Litchfield
  • Constructor Details

    • TextPosition

      protected TextPosition()
      Constructor.
    • TextPosition

      public TextPosition(PDPage page, Matrix textPositionSt, Matrix textPositionEnd, float maxFontH, float[] individualWidths, float spaceWidth, String string, PDFont currentFont, float fontSizeValue, int fontSizeInPt, float ws)
      Constructor.
      Parameters:
      page - Page that the text is located in
      textPositionSt - TextMatrix for start of text (in display units)
      textPositionEnd - TextMatrix for end of text (in display units)
      maxFontH - Maximum height of text (in display units)
      individualWidths - The width of each individual character. (in ? units)
      spaceWidth - The width of the space character. (in display units)
      string - The character to be displayed.
      currentFont - The current for for this text position.
      fontSizeValue - The new font size.
      fontSizeInPt - The font size in pt units.
      ws - The word spacing parameter (in display units)
    • TextPosition

      public TextPosition(int pageRotation, float pageWidthValue, float pageHeightValue, Matrix textPositionSt, Matrix textPositionEnd, float maxFontH, float individualWidth, float spaceWidth, String string, PDFont currentFont, float fontSizeValue, int fontSizeInPt)
      Constructor.
      Parameters:
      pageRotation - rotation of the page that the text is located in
      pageWidthValue - rotation of the page that the text is located in
      pageHeightValue - rotation of the page that the text is located in
      textPositionSt - TextMatrix for start of text (in display units)
      textPositionEnd - TextMatrix for end of text (in display units)
      maxFontH - Maximum height of text (in display units)
      individualWidth - The width of the given character/string. (in ? units)
      spaceWidth - The width of the space character. (in display units)
      string - The character to be displayed.
      currentFont - The current for for this text position.
      fontSizeValue - The new font size.
      fontSizeInPt - The font size in pt units.
    • TextPosition

      public TextPosition(int pageRotation, float pageWidthValue, float pageHeightValue, Matrix textPositionSt, float endXValue, float endYValue, float maxFontH, float individualWidth, float spaceWidth, String string, PDFont currentFont, float fontSizeValue, int fontSizeInPt)
      Constructor.
      Parameters:
      pageRotation - rotation of the page that the text is located in
      pageWidthValue - rotation of the page that the text is located in
      pageHeightValue - rotation of the page that the text is located in
      textPositionSt - TextMatrix for start of text (in display units)
      endXValue - x coordinate of the end position
      endYValue - y coordinate of the end position
      maxFontH - Maximum height of text (in display units)
      individualWidth - The width of the given character/string. (in ? units)
      spaceWidth - The width of the space character. (in display units)
      string - The character to be displayed.
      currentFont - The current for for this text position.
      fontSizeValue - The new font size.
      fontSizeInPt - The font size in pt units.
    • TextPosition

      public TextPosition(int pageRotation, float pageWidthValue, float pageHeightValue, Matrix textPositionSt, float endXValue, float endYValue, float maxFontH, float individualWidth, float spaceWidth, String string, int[] codePoints, PDFont currentFont, float fontSizeValue, int fontSizeInPt)
      Constructor.
      Parameters:
      pageRotation - rotation of the page that the text is located in
      pageWidthValue - rotation of the page that the text is located in
      pageHeightValue - rotation of the page that the text is located in
      textPositionSt - TextMatrix for start of text (in display units)
      endXValue - x coordinate of the end position
      endYValue - y coordinate of the end position
      maxFontH - Maximum height of text (in display units)
      individualWidth - The width of the given character/string. (in ? units)
      spaceWidth - The width of the space character. (in display units)
      string - The character to be displayed.
      codePoints - An array containing the codepoints of the given string.
      currentFont - The current font for this text position.
      fontSizeValue - The new font size.
      fontSizeInPt - The font size in pt units.
  • Method Details

    • getCharacter

      public String getCharacter()
      Return the string of characters stored in this object.
      Returns:
      The string on the screen.
    • getCodePoints

      public int[] getCodePoints()
      Return the codepoints of the characters stored in this object.
      Returns:
      an array containing all codepoints.
    • getTextPos

      public Matrix getTextPos()
      Return the Matrix textPos stored in this object.
      Returns:
      The Matrix containing all infos of the starting textposition
    • getDir

      public float getDir()
      Return the direction/orientation of the string in this object based on its text matrix.
      Returns:
      The direction of the text (0, 90, 180, or 270)
    • getX

      public float getX()
      This will get the page rotation adjusted x position of the character. This is adjusted based on page rotation so that the upper left is 0,0.
      Returns:
      The x coordinate of the character.
    • getXDirAdj

      public float getXDirAdj()
      This will get the text direction adjusted x position of the character. This is adjusted based on text direction so that the first character in that direction is in the upper left at 0,0.
      Returns:
      The x coordinate of the text.
    • getY

      public float getY()
      This will get the y position of the text, adjusted so that 0,0 is upper left and it is adjusted based on the page rotation.
      Returns:
      The adjusted y coordinate of the character.
    • getYDirAdj

      public float getYDirAdj()
      This will get the y position of the text, adjusted so that 0,0 is upper left and it is adjusted based on the text direction.
      Returns:
      The adjusted y coordinate of the character.
    • getWidth

      public float getWidth()
      This will get the width of the string when page rotation adjusted coordinates are used.
      Returns:
      The width of the text in display units.
    • getWidthDirAdj

      public float getWidthDirAdj()
      This will get the width of the string when text direction adjusted coordinates are used.
      Returns:
      The width of the text in display units.
    • getHeight

      public float getHeight()
      This will get the maximum height of all characters in this string.
      Returns:
      The maximum height of all characters in this string.
    • getHeightDir

      public float getHeightDir()
      This will get the maximum height of all characters in this string.
      Returns:
      The maximum height of all characters in this string.
    • getFontSize

      public float getFontSize()
      This will get the font size that this object is suppose to be drawn at.
      Returns:
      The font size.
    • getFontSizeInPt

      public float getFontSizeInPt()
      This will get the font size in pt. To get this size we have to multiply the pdf-fontsize and the scaling from the textmatrix
      Returns:
      The font size in pt.
    • getFont

      public PDFont getFont()
      This will get the font for the text being drawn.
      Returns:
      The font size.
    • getWordSpacing

      @Deprecated public float getWordSpacing()
      Deprecated.
      This will get the current word spacing.
      Returns:
      The current word spacing.
    • getWidthOfSpace

      public float getWidthOfSpace()
      This will get the width of a space character. This is useful for some algorithms such as the text stripper, that need to know the width of a space character.
      Returns:
      The width of a space character.
    • getXScale

      public float getXScale()
      Returns:
      Returns the xScale.
    • getYScale

      public float getYScale()
      Returns:
      Returns the yScale.
    • getIndividualWidths

      public float[] getIndividualWidths()
      Get the widths of each individual character.
      Returns:
      An array that is the same length as the length of the string.
    • toString

      public String toString()
      Show the string data for this text position.
      Overrides:
      toString in class Object
      Returns:
      A human readable form of this object.
    • contains

      public boolean contains(TextPosition tp2)
      Determine if this TextPosition logically contains another (i.e. they overlap and should be rendered on top of each other).
      Parameters:
      tp2 - The other TestPosition to compare against
      Returns:
      True if tp2 is contained in the bounding box of this text.
    • mergeDiacritic

      public void mergeDiacritic(TextPosition diacritic, TextNormalize normalize)
      Merge a single character TextPosition into the current object. This is to be used only for cases where we have a diacritic that overlaps an existing TextPosition. In a graphical display, we could overlay them, but for text extraction we need to merge them. Use the contains() method to test if two objects overlap.
      Parameters:
      diacritic - TextPosition to merge into the current TextPosition.
      normalize - Instance of TextNormalize class to be used to normalize diacritic
    • isDiacritic

      public boolean isDiacritic()
      Returns:
      True if the current character is a diacritic char.