Simple S3 ItemReader for Spring Batch Application - Part 1

Spring Batch is one of the most popular Open Source batch processing frameworks available today. Also it supports most advanced features such as optimization and partitioning techniques, thus makes it the most suitable framework for high volume and high performance enterprise applications.

In this article, we will discuss about using Spring batch to process files from aws S3(Simple Storage Service).


The lifecycle of a batch process is, read large chunk of data, process it and then write the transformed data back to some storage. So, the main components of a batch process are: a reader, a processor and a writer.

Batch Reader

Spring Batch provides various item readers such as:
  • FlatFileItemReader
  • HibernatePagingItemReader
  • IbatisPagingItemReader
  • JdbcPagingItemReader
  • JmsItemReader
  • MongoItemReader
As you might be knowing, there is no in-built reader available for S3. You can write your own item reader by implementing the interface ItemReader. But here, I will show you how to build an item reader for S3 with some simple steps!

The approach

Here I will use FlatFileItemReader as the ItemReader implementation with a custom resource. The resource will be a ByteArrayResource for which the input will be the bytes read from S3, simple isn't it?

The code to read bytes from S3 will look like:
public byte[] getBytes() throws IOException {
        S3Object object = getClient().getObject(new GetObjectRequest("bucket", "file"));
        try (InputStream is = object.getObjectContent()) {
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            IOUtils.copy(is, out);
            return out.toByteArray();
        }
    }
And here goes the code for building the ItemReader:
public ItemReader reader() throws IOException {
        FlatFileItemReader reader = new FlatFileItemReader<>();
        reader.setResource(new ByteArrayResource(bytes(), "s3 bytes"));
        lineMapper.setLineTokenizer(your tokenizer);
        lineMapper.setFieldSetMapper(your field mapper);
        reader.setLineMapper(your line mapper);
        return reader;
    }
That's it! Now we have a S3 item reader which can be used in your Spring Batch application. But there are some issues with this approach, continue to part 2 of this article where I will show you a better way to implement S3 file reader.

ALso I will be writing a detailed article on how to build an S3 item writer as well. Stay tuned!

2 Comments

Did you enjoy this post? Why not leave a comment below and continue the conversation, or subscribe to our feed and get articles like this delivered automatically to your feed reader? Like our Facebook Page.

  1. How are you going with S3 item writer ?

    ReplyDelete
  2. Hi Do you have any example which reads list of files from s3 in spring batch

    ReplyDelete
Post a Comment
Previous Post Next Post