College Programs: Readind word document with Apache POI

Saturday, September 27, 2014

Readind word document with Apache POI

Reading the files is very easy with Java. we can read any type of file. But each file has it's own formatting and the features. consider word document and the excel sheet each has it's own formatting. we can use buffered reader and any other type of readers available in java but these are not feasible while handling the documents containing lot of formatting.
We have sophisticated framework developed by Apache called POI to process the any type of document generated using MS office. Apache provides the ready made jars just we have to include them in our class path and we can use the functionality provide by them.
Following example provides the very basic functionality of reading the word document containing single paragraph single line.

public static void main(String[] args)
    {
        try
        {
            FileInputStream fis = new FileInputStream("C:/Practice/Reading.doc");
            HWPFDocument doc = new HWPFDocument(fis);
            Range range = doc.getRange();
            int numOfParas = range.numParagraphs();
            for(int i = 0; i < numOfParas; i++)
            {
                Paragraph para = range.getParagraph(i);
                System.out.println(para.text());
            }
        }
        catch(FileNotFoundException fnfe)
        {
            fnfe.printStackTrace();
        }
        catch(IOException ioe)
        {
            ioe.printStackTrace();
        }
    }

HWPFDocument is the wrapper containing all the data structures of the word document. The variable doc of type HWPFDocument points to the instance of the word document pointed by the HWPFDocument class. HWPFDocument takes the File or path to the word document as string. The variable range is of type Range contains all the data of the word document except the header and footer section. By using range we can read all the data present in the word document. The method numParagraphs employed on range gives the total number of paragraphs of the word document. getParagraph method returns the paragraph of provided index.

College Programs

Saturday, September 27, 2014

Readind word document with Apache POI

No comments:

Post a Comment

DC motor control with Pulse Width Modulation Part 1

Report Abuse