Skip to content

Skip non-CSV head

Some CSV files contain one or more lines of text before the actual CSV data starts. For example, it could look like this:

example.csv
This is an example of a CSV file that contains
three lines before the actual CSV records.
header 1,header 2
value 1,value 2

Strictly speaking, such a file is not a valid CSV file as defined by the CSV specification (RFC 4180).

The main problem with those files is:

  • An exception would be thrown unless the options ignoreDifferentFieldCount() and skipEmptyLines() are set.
  • When working with named fields, the very first line (This is an example of a CSV file that contains) would be interpreted as the actual header line.

FastCSV comes with two features to handle such files:

  • skipLines(int lineCount): Skip a specific number of lines (lineCount) regardless of their content.
  • skipLines(Predicate<String> predicate, int maxLines): Skip lines until a specific line (e.g., the header) is found. Stop skipping after a specific number of lines (maxLines).

Example

This example demonstrates how to skip non-CSV head lines when reading such a CSV file with FastCSV.

ExampleCsvReaderWithNonCsvAtStart.java
package example;
import java.io.IOException;
import java.util.function.Predicate;
import de.siegmar.fastcsv.reader.CsvReader;
/**
* Example for reading CSV data with non-CSV data before the actual CSV header.
*/
public class ExampleCsvReaderWithNonCsvAtStart {
private static final String DATA = """
Your CSV file contains some non-CSV data before the actual CSV header?
And you don't want to (mis)interpret them as CSV header? No problem!
header1,header2
foo,bar
""";
public static void main(final String[] args) throws IOException {
alternative1();
alternative2();
}
private static void alternative1() throws IOException {
System.out.println("Alternative 1 - ignore specific number of lines");
try (var csv = CsvReader.builder().ofNamedCsvRecord(DATA)) {
// Skip the first 3 lines
System.out.println("Skipping the first 3 lines");
csv.skipLines(3);
// Read the CSV data
csv.forEach(System.out::println);
}
}
private static void alternative2() throws IOException {
System.out.println("Alternative 2 - wait for a specific line");
final Predicate<String> isHeader = line ->
line.contains("header1");
try (var csv = CsvReader.builder().ofNamedCsvRecord(DATA)) {
// Skip until the header line is found, but not more than 10 lines
final int actualSkipped = csv.skipLines(isHeader, 10);
System.out.println("Found header line after skipping " + actualSkipped + " lines");
// Read the CSV data
csv.forEach(System.out::println);
}
}
}

You also find this source code example in the FastCSV GitHub repository.