PDA

View Full Version : Field names in first line of flat files


rasky
Nov 19th, 2007, 03:58 AM
Hi all,
I ran into some problems when trying to implement an import batch (using 1.0-M2). The thing is that the source import file (flat file, pipe separated) has the column names in the first row. It would be a pretty simple task to just remove the first row from the file before leaving it to the batch, but there should be some way of specifying this in the framework.

Ideally, I would like to have the field names read from the first row passed on to the Tokenizer so that I don't have to specify them in xml, but at the very least I shold be possible to specify a number of initial rows to skip, e.g. in DefaultFlatFileInputSource.

Am I missing something or do you guys have any comments on this?

robert.kasanicky
Nov 20th, 2007, 03:10 AM
I think the way to go is to override the protected getReader() by something like:


reader = super.getReader();
// use the reader to read the first line and
// setup the tokenizer
return reader;

Dave Syer
Nov 20th, 2007, 03:22 AM
That's actually probably not a great idea - (getReader() is used internally every time the reader is needed, not just at the top of the file), so you will have to also introduce some state to track whether this is the first call. Also ResourceLineReader is package private so there might be issues with visibility.

In any case I'd be interested to see someone try this out for real and give us some feedback. It's a very common use case, so it might be worth adding another explicit feature to the input source.

robert.kasanicky
Nov 20th, 2007, 03:31 AM
Oh, right, getReader() is a wrong place, but probably overriding open() makes sense?


public void open() {
super.open();
ResourceLineReader reader = getReader();
// use reader to read the first line and setup tokenizer
}

sotretus
Nov 20th, 2007, 11:30 AM
I also think this deserves a way to configure it in XML. It should be pretty straightforward to do, right?

Regards
AB

Ronald
Nov 21st, 2007, 01:48 AM
I was looking for the same functionality. It would be nice if you could specify the column names in the first row instead of in the XML. I couldn't find a good extension point to build this functionality myself.

Dave Syer
Nov 21st, 2007, 03:36 AM
http://opensource.atlassian.com/projects/spring/browse/BATCH-211

jglynn
Nov 25th, 2007, 10:17 PM
I couldn't wait around for spring batch so I went with OpenCSV (http://opencsv.sourceforge.net/).

It possesses a constructor parameter which allows to initially skip any # of lines.

Dave Syer
Nov 26th, 2007, 08:08 AM
OpenCSV could be used to create an inputSource for Spring Batch I guess. That issue is fixed, by the way, in case you weren't watching it.

lucasward
Nov 26th, 2007, 12:19 PM
OpenCSV looks interesting, and you could certainly use it in Spring Batch, but keep in mind that it will not be transactional. If you read in 5 lines, that result in 5 writes to the database, and the 5th line causes and error on output, you have no way to rollback and start at line 1 unless you manually register for TransactionSynchronization and make the call on a CSVReader (assuming it supports moving backwards), which is the same functionality already present in the FlatFileInputSource.