Understanding CSV files and their handling in ABAPPosted: September 10, 2014 | |
From my blog on SCN
In many ABAP developments, we use CSV files and sometimes there is confusion regarding CSV itself. Is it just a text file with values separated by commas ?
Let’s look at semantics of the various components involved so we have a vocabulary to work with.
Separator: Demarcates between two fields – so it will be a comma ‘,’ for CSVs.
Delimiter: It signifies limits of things, where it begins and ends. e.g. “Test String” has two delimiters, both double quote characters. Many CSVs can have double quotes as delimiters when comma values are to be placed as text.
Terminator : Indicates end of sequence. For CSV, we can think of newline as the terminator.
So if we have confusion about whether commas or double quotes are allowed inside data, looking at the CSV specification:
The de facto standard for CSV is here in case you want to read the full standard.
Definition of the CSV Format
1. Each record is located on a separate line, delimited by a line break.
2. The last record in the file may or may not have an ending line break.
3. There maybe an optional header line appearing as the first line of the file with the same format as normal record lines.
4. Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same
number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the
record must not be followed by a comma.
5. Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes
at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.
6. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes.
7. If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with
another double quote.
In my experience, point 7 is where we get tripped the most. CSV stands as comma separated values leading to the impression that commas are the separator and given that excel doesn’t put commas, it can start to get confusing.
So looking at some examples
10, Vikas , Sydney
Data with separator / delimiter inside them.
“11”, “Vikas”, “Sydney, AU” <– Data containing comma
“12”, “Vikas”, “Sydney, “NSW” AU” <– Data containing comma and quotes in data
Handling in ABAP:
I’m focusing on reading the files as that’s where we face issues. The file can be uploaded from user’s desktop or read from the application server.
1) Write your own code:
This can be easiest to start with but can start to get complicated with time.
Get data as a string, split at comma.
split lv_data at ‘,’ into lw_struct-test1 lw_struct-test2 lw_struct-test3.
a) This won’t work if we have data with separator, terminator or delimiter ( so no commas, double-quotes or newline within data ).
b) The code will need to be updated if the file format changes – say we need to add another field test4. The code then changes to :
split lv_data at ‘,’ into lw_struct-test1 lw_struct-test2 lw_struct-test3 lw_struct-test4.
2) Read the file using KCD_CSV_FILE_TO_INTERN_CONVERT
CALL FUNCTION ‘KCD_CSV_FILE_TO_INTERN_CONVERT’
i_filename = ‘C:\Temp\Upload.csv’
i_separator = ‘,’
e_intern = gt_intern
upload_csv = 1
upload_filetype = 2.
a) The file can be read only from presentation server/ desktop.
b) If a CSV file exists with double quotes, the last field is left with double quotes.
c) In case the file is to be read from application server, we need to read the code inside this FM and write some custom logic.
3) Use RTTI and dynamic programming along with FM RSDS_CONVERT_CSV .
It works but has lots of code . You can have a look at the code in this GIST.
In summary the steps are :
– Get structre of destination table using RTTI
– Create field catalog
– Create a dynamic table for field catalog values
– Create dynamic table lines
– Process Raw CSV data
– Store CSV files into dynamic table
a) Relatively long code leading to results especially if you have to program it from scratch.
a) Code is free from the target table format. If a new field is to be added, just update the structure for table typez_data_tty
4) Use class CL_RSDA_CSV_CONVERTER .
So the call becomes very straight forward – Instantiate the class with the separator and delimiter values. For a normal CSV, leave them as default .
* Instantiate the CSV object
call method cl_rsda_csv_converter=>create
* i_delimiter = C_DEFAULT_DELIMITER
* i_separator = C_DEFAULT_SEPARATOR
r_r_conv = lo_csv
* Process records
loop at lt_upload_data into lv_data.
CALL METHOD lo_csv->csv_to_structure
i_data = lv_data
e_s_data = lw_struct.
That’s It !
a) The code is very small – less chances of us making an error compared to the one in 3) above.
b) The code is decoupled with file structure – we get to keep the benefit from the above point .
c) It can be used for both application server / presentation server files – of course file reading will need to be done before the call.
d) The developer has documented the examples exhaustively in method CSV_TO_STRUCTURE . Big thank to him/her !
e) It’s part of package RSDA which is present in ABAP trial environments as well such as NSP .
If you feel lazy to type the full program, here is the whole source code: