Jak wyodrębnić pogrubiony tekst z pdf przy użyciu pdfbox?

Question

Nov 04, 2013, 04:22 PM

Jak wyodrębnić pogrubiony tekst z pdf przy użyciu pdfbox?

Używam Apache pdfbox do rozpakowywania tekstu. Mogę wyodrębnić tekst z pliku PDF, ale nie wiem, skąd wiedzieć, czy to słowo jest pogrubione czy nie ??? (Sugestia dotycząca kodu byłaby dobra !!!) Oto kod umożliwiający wyodrębnianie zwykłego tekstu z pliku PDF, który działa dobrze.

PDDocument document = PDDocument
    .load("/home/lipu/workspace/MRCPTester/test.pdf");
document.getClass();
if (document.isEncrypted()) {
    try {
        document.decrypt("");
    } catch (InvalidPasswordException e) {
        System.err.println("Error: Document is encrypted with a password.");
        System.exit(1);
    }
}

// PDFTextStripperByArea stripper = new PDFTextStripperByArea();
// stripper.setSortByPosition(true);
PDFTextStripper stripper = new PDFTextStripper();
stripper.setStartPage(1);
stripper.setEndPage(2);
stripper.setSortByPosition(true);
String st = stripper.getText(document);