Skip to content

Skip non-CSV head

Some CSV files contain one or more lines of text before the actual CSV data starts. For example, it could look like this:

example.csv
This is an example of a CSV file that contains
three lines before the actual CSV records.
header 1,header 2
value 1,value 2

Strictly speaking, such a file is not a valid CSV file as defined by the CSV specification (RFC 4180).

The main problem with those files is:

  • An exception would be thrown unless the options ignoreDifferentFieldCount() and skipEmptyLines() are set.
  • When working with named fields, the very first line (This is an example of a CSV file that contains) would be interpreted as the actual header line.

FastCSV itself does currently not provide a way to skip non-CSV head lines when reading a CSV file. However, you can skip non-CSV head lines by reading the file line by line and only hand over to FastCSV when the actual CSV data starts. This could be done based on a fixed number of lines or by detecting the actual CSV data.

Example

This example demonstrates how to skip non-CSV head lines when reading such a CSV file with FastCSV.

ExampleCsvReaderWithNonCsvAtStart.java
package example;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.StringReader;
import java.util.List;
import de.siegmar.fastcsv.reader.CsvReader;
import de.siegmar.fastcsv.reader.NamedCsvRecordHandler;
/**
* Example for reading CSV data with non-CSV data before the actual CSV header.
*/
public class ExampleCsvReaderWithNonCsvAtStart {
private static final String DATA = """
Your CSV file contains some non-CSV data before the actual CSV header?
And you don't want to misinterpret them as CSV header? No problem!
header1,header2
foo,bar
""";
public static void main(final String[] args) throws IOException {
alternative1();
alternative2();
}
private static void alternative1() throws IOException {
System.out.println("Alternative 1 - ignore specific number of lines");
final CsvReader.CsvReaderBuilder builder = CsvReader.builder()
.ignoreDifferentFieldCount(false);
try (var br = new BufferedReader(new StringReader(DATA))) {
// ignore the first 3 lines
br.lines().limit(3).forEach(r -> { });
builder.ofNamedCsvRecord(br)
.forEach(System.out::println);
}
}
private static void alternative2() throws IOException {
System.out.println("Alternative 2 - wait for a specific line");
final CsvReader.CsvReaderBuilder builder = CsvReader.builder()
.ignoreDifferentFieldCount(false);
try (var br = new BufferedReader(new StringReader(DATA))) {
// Look for the CSV header but read at most 100 lines
final List<String> header = br.lines()
.limit(100)
.filter(l -> l.contains("header1,header2"))
.findFirst()
.map(line -> builder.ofCsvRecord(line).stream().findFirst()
.orElseThrow(() -> new IllegalStateException("Illegal header: " + line))
.getFields())
.orElseThrow(() -> new IllegalStateException("No CSV header found"));
builder.build(new NamedCsvRecordHandler(header), br)
.forEach(System.out::println);
}
}
}

You also find this source code example in the FastCSV GitHub repository.