Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

CSV is written without column separators if first column is null #33

Closed
Magrath opened this issue Mar 11, 2014 · 7 comments
Closed

CSV is written without column separators if first column is null #33

Magrath opened this issue Mar 11, 2014 · 7 comments
Milestone

Comments

@Magrath
Copy link
Contributor

Magrath commented Mar 11, 2014

There seems to be a bug in CsvWriter in the code dealing with the buffer. When the first column is null, the row is output without column separators. See unit tests below. testWrite_NullFirstColumn fails (all others pass). Expected output is ,42,hello\n, actual output is 42hello\n.

Tested against jackson-dataformat-csv v2.3.2.

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectWriter;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;
import com.google.common.collect.ImmutableMap;
import org.junit.Test;

import static org.junit.Assert.assertEquals;

public class CsvWriterTest {

    @Test
    public void testWrite_NoNulls() throws JsonProcessingException {
        final CsvSchema.Builder csvSchemaBuilder = new CsvSchema.Builder();
        csvSchemaBuilder.addColumn("timestamp", CsvSchema.ColumnType.STRING);
        csvSchemaBuilder.addColumn("value", CsvSchema.ColumnType.NUMBER);
        csvSchemaBuilder.addColumn("id", CsvSchema.ColumnType.STRING);
        final CsvSchema schema = csvSchemaBuilder.build();
        final ObjectWriter writer = new CsvMapper().writer().withSchema(schema);

        final String string = writer.writeValueAsString(
                ImmutableMap.of("timestamp", "2014-03-10T23:32:47+00:00",
                        "value", 42, "id", "hello"));

        assertEquals("\"2014-03-10T23:32:47+00:00\",42,hello\n", string);
    }

    @Test
    public void testWrite_NullFirstColumn() throws JsonProcessingException {
        final CsvSchema.Builder csvSchemaBuilder = new CsvSchema.Builder();
        csvSchemaBuilder.addColumn("timestamp", CsvSchema.ColumnType.STRING);
        csvSchemaBuilder.addColumn("value", CsvSchema.ColumnType.NUMBER);
        csvSchemaBuilder.addColumn("id", CsvSchema.ColumnType.STRING);
        final CsvSchema schema = csvSchemaBuilder.build();
        final ObjectWriter writer = new CsvMapper().writer().withSchema(schema);

        final String string = writer.writeValueAsString(
                ImmutableMap.of("value", 42, "id", "hello"));

        assertEquals(",42,hello\n", string);
    }

    @Test
    public void testWrite_NullSecondColumn() throws JsonProcessingException {
        final CsvSchema.Builder csvSchemaBuilder = new CsvSchema.Builder();
        csvSchemaBuilder.addColumn("timestamp", CsvSchema.ColumnType.STRING);
        csvSchemaBuilder.addColumn("value", CsvSchema.ColumnType.NUMBER);
        csvSchemaBuilder.addColumn("id", CsvSchema.ColumnType.STRING);
        final CsvSchema schema = csvSchemaBuilder.build();
        final ObjectWriter writer = new CsvMapper().writer().withSchema(schema);

        final String string = writer.writeValueAsString(
                ImmutableMap.of("timestamp", "2014-03-10T23:32:47+00:00",
                        "id", "hello"));

        assertEquals("\"2014-03-10T23:32:47+00:00\",,hello\n", string);
    }

    @Test
    public void testWrite_NullThirdColumn() throws JsonProcessingException {
        final CsvSchema.Builder csvSchemaBuilder = new CsvSchema.Builder();
        csvSchemaBuilder.addColumn("timestamp", CsvSchema.ColumnType.STRING);
        csvSchemaBuilder.addColumn("value", CsvSchema.ColumnType.NUMBER);
        csvSchemaBuilder.addColumn("id", CsvSchema.ColumnType.STRING);
        final CsvSchema schema = csvSchemaBuilder.build();
        final ObjectWriter writer = new CsvMapper().writer().withSchema(schema);

        final String string = writer.writeValueAsString(
                ImmutableMap.of("timestamp", "2014-03-10T23:32:47+00:00",
                        "value", 42));

        assertEquals("\"2014-03-10T23:32:47+00:00\",42\n", string);
    }

}
Magrath pushed a commit to Magrath/jackson-dataformat-csv that referenced this issue Mar 11, 2014
@cowtowncoder
Copy link
Member

Thank you for reporting this, and especially for providing unit tests. It sounds like a bug indeed.

cowtowncoder added a commit that referenced this issue Mar 12, 2014
Add unit test from #33 to failing unit tests folder
@cowtowncoder cowtowncoder added this to the 2.3.0 milestone Mar 21, 2014
cowtowncoder added a commit that referenced this issue Mar 21, 2014
cowtowncoder added a commit that referenced this issue Mar 21, 2014
cowtowncoder added a commit that referenced this issue Mar 21, 2014
@cowtowncoder cowtowncoder modified the milestone: 2.3.0 Mar 21, 2014
@cowtowncoder
Copy link
Member

Thanks again -- there was a bug in code that appends (or not...) column separators, for cases where buffering is needed for out-of-order writes. Fixed it in master (for 2.4), as well as 2.3 branch for 2.3.3.

@Magrath
Copy link
Contributor Author

Magrath commented Mar 24, 2014

Thanks for the quick fix! :)

@cowtowncoder
Copy link
Member

Many thanks for reporting this -- I hope to get CSV module as reliable as possible, and we get there one fixed bug at a time. :)

@hkothari
Copy link

Why is the default behavior to leave out the commas for null columns at the end, ie. in the last assertion we see:

assertEquals("\"2014-03-10T23:32:47+00:00\",42\n", string);

why is it not:

assertEquals("\"2014-03-10T23:32:47+00:00\",42,\n", string);

It seems inconsistent to leave out the last comma if the provided entry has a missing value there because you then end up with rows that have different number of commas.

@Magrath
Copy link
Contributor Author

Magrath commented May 25, 2014

It is quite common in CSV to leave out commas for null columns at the end, it would be nice if the behaviour was configurable though.

@cowtowncoder
Copy link
Member

I think Jackson should not optimize by leaving out nulls here, so I'd consider this a bug. Although a feature to allow omitting such values would be acceptable as well; but default to writing "full" rows.

So @hkothari feel free to file a bug against this behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants