CSV data manipulation
You are encouraged to solve this task according to the task description, using any language you may know.
CSV spreadsheet files are suitable for storing tabular data in a relatively portable way.
The CSV format is flexible but somewhat ill-defined.
For present purposes, authors may assume that the data fields contain no commas, backslashes, or quotation marks.
- Task
Read a CSV file, change some values and save the changes back to a file.
For this task we will use the following CSV file:
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20
Suggestions
-
Show how to add a column, headed 'SUM', of the sums of the rows.
If possible, illustrate the use of built-in or standard functions, methods, or libraries, that handle generic CSV files.
Ada
Ada has no build-in or predefined functions to read or write CSV tables. We thus define a (very simplistic) package CSV, which allows to read a row (function Line), to step from column to column (function Next), and to get the items in the column (function Item):
<lang Ada>package CSV is
type Row(<>) is tagged private; function Line(S: String; Separator: Character := ',') return Row; function Next(R: in out Row) return Boolean; -- if there is still an item in R, Next advances to it and returns True function Item(R: Row) return String; -- after calling R.Next i times, this returns the i'th item (if any)
private
type Row(Length: Natural) is tagged record Str: String(1 .. Length); Fst: Positive; Lst: Natural; Nxt: Positive; Sep: Character; end record;
end CSV;</lang>
The implementation of the package is
<lang Ada>package body CSV is
function Line(S: String; Separator: Character := ',') return Row is (Length => S'Length, Str => S, Fst => S'First, Lst => S'Last, Nxt => S'First, Sep => Separator); function Item(R: Row) return String is (R.Str(R.Fst .. R.Lst));
function Next(R: in out Row) return Boolean is Last: Natural := R.Nxt; begin R.Fst := R.Nxt; while Last <= R.Str'Last and then R.Str(Last) /= R.Sep loop -- find Separator Last := Last + 1; end loop; R.Lst := Last - 1; R.Nxt := Last + 1; return (R.Fst <= R.Str'Last); end Next;
end CSV;</lang>
Finally, the main program which uses the package CSV:
<lang Ada>with CSV, Ada.Text_IO; use Ada.Text_IO;
procedure CSV_Data_Manipulation is
Header: String := Get_Line;
begin
Put_Line(Header & ", SUM"); while not End_Of_File loop declare R: CSV.Row := CSV.Line(Get_Line); Sum: Integer := 0; begin while R.Next loop Sum := Sum + Integer'Value(R.Item); Put(R.Item & ","); end loop; Put_Line(Integer'Image(Sum)); end; end loop;
end CSV_Data_Manipulation;</lang>
- Output:
>./csv_data_manipulation < csv_sample.csv C1,C2,C3,C4,C5, SUM 1,5,9,13,17, 45 2,6,10,14,18, 50 3,7,11,15,19, 55 4,8,12,16,20, 60
ALGOL 68
<lang algol68># count occurrances of a char in string # PROC char count = (CHAR c, STRING str) INT:
BEGIN INT count := 0; FOR i TO UPB str DO
IF c = str[i] THEN count +:= 1 FI
OD; count END;
- split string on separator #
PROC char split = (STRING str, CHAR sep) FLEX[]STRING :
BEGIN INT strlen := UPB str, cnt := 0; INT len, p; INT start := 1; [char count (sep, str) + 1] STRING list; WHILE start <= strlen ANDF char in string (sep, p, str[start:]) DO
p +:= start - 1; list[cnt +:= 1] := str[start:p-1]; start := p + 1
OD; IF cnt = 0 THEN list[cnt +:= 1] := str ELIF start <= UPB str + 1 THEN list[cnt +:= 1] := str[start:] FI; list END;
PROC join = ([]STRING words, STRING sep) STRING:
IF UPB words > 0 THEN STRING str := words [1]; FOR i FROM 2 TO UPB words DO
str +:= sep + words[i]
OD; str ELSE "" FI;
- read a line from file #
PROC readline = (REF FILE f) STRING:
BEGIN STRING line; get (f, line); new line (f); line END;
- Add one item to tuple #
OP +:= = (REF FLEX[]STRING tuple, STRING item) VOID:
BEGIN [UPB tuple+1]STRING new; new[:UPB tuple] := tuple; new[UPB new] := item; tuple := new END;
- convert signed number TO INT #
OP TOINT = (STRING str) INT:
BEGIN INT n := 0, sign := 1; FOR i TO UPB str WHILE sign /= 0 DO
IF is digit (str[i]) THEN n := n * 10 + ABS str[i] - ABS "0" ELIF i = 1 AND str[i] = "-" THEN sign := -1 ELIF i /= 1 OR str[i] /= "+" THEN sign := 0 FI
OD; n * sign END;
OP STR = (INT i) STRING: whole (i,0);
- The main program #
FILE foo; open (foo, "CSV_data_manipulation.data", stand in channel); FLEX[0]STRING header := char split (readline (foo), ","); header +:= "SUM"; print ((join (header, ","), new line)); WHILE NOT end of file (foo) DO
FLEX[0]STRING fields := char split (readline (foo), ","); INT sum := 0; FOR i TO UPB fields DO sum +:= TOINT fields[i] OD; fields +:= STR sum; print ((join (fields, ","), new line))
OD; close (foo)</lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
AutoHotkey
<lang AutoHotkey>Loop, Read, Data.csv {
i := A_Index Loop, Parse, A_LoopReadLine, CSV Output .= (i=A_Index && i!=1 ? A_LoopField**2 : A_LoopField) (A_Index=5 ? "`n" : ",")
} FileAppend, %Output%, NewData.csv</lang> Output:
C1,C2,C3,C4,C5 1,25,9,13,17 2,6,100,14,18 3,7,11,225,19 4,8,12,16,400
AWK
adds a column sum to a csv table <lang AWK>#!/usr/bin/awk -f BEGIN { FS = OFS = "," } NR==1 {
print $0, "SUM" next
} {
sum = 0 for (i=1; i<=NF; i++) { sum += $i } print $0, sum
}</lang>
awk -f csv_data_manipulation.awk data.csv C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
C
<lang c>
- define TITLE "CSV data manipulation"
- define URL "http://rosettacode.org/wiki/CSV_data_manipulation"
- define _GNU_SOURCE
- define bool int
- include <stdio.h>
- include <stdlib.h> /* malloc...*/
- include <string.h> /* strtok...*/
- include <ctype.h>
- include <errno.h>
/**
* How to read a CSV file ? */
typedef struct {
char * delim;
unsigned int rows;
unsigned int cols;
char ** table;
} CSV;
/**
* Utility function to trim whitespaces from left & right of a string */
int trim(char ** str) { int trimmed; int n; int len;
len = strlen(*str); n = len - 1; /* from right */ while((n>=0) && isspace((*str)[n])) { (*str)[n] = '\0'; trimmed += 1; n--; }
/* from left */ n = 0; while((n < len) && (isspace((*str)[0]))) { (*str)[0] = '\0'; *str = (*str)+1; trimmed += 1; n++; } return trimmed; }
/**
* De-allocate csv structure */
int csv_destroy(CSV * csv) { if (csv == NULL) { return 0; } if (csv->table != NULL) { free(csv->table); } if (csv->delim != NULL) { free(csv->delim); } free(csv); return 0; }
/**
* Allocate memory for a CSV structure */
CSV * csv_create(unsigned int cols, unsigned int rows) { CSV * csv;
csv = malloc(sizeof(CSV)); csv->rows = rows; csv->cols = cols; csv->delim = strdup(",");
csv->table = malloc(sizeof(char *) * cols * rows); if (csv->table == NULL) { goto error; }
memset(csv->table, 0, sizeof(char *) * cols * rows);
return csv;
error: csv_destroy(csv); return NULL; }
/**
* Get value in CSV table at COL, ROW */
char * csv_get(CSV * csv, unsigned int col, unsigned int row) { unsigned int idx; idx = col + (row * csv->cols); return csv->table[idx]; }
/**
* Set value in CSV table at COL, ROW */
int csv_set(CSV * csv, unsigned int col, unsigned int row, char * value) { unsigned int idx; idx = col + (row * csv->cols); csv->table[idx] = value; return 0; }
void csv_display(CSV * csv) { int row, col; char * content; if ((csv->rows == 0) || (csv->cols==0)) { printf("[Empty table]\n"); return ; }
printf("\n[Table cols=%d rows=%d]\n", csv->cols, csv->rows); for (row=0; row<csv->rows; row++) { printf("[|"); for (col=0; col<csv->cols; col++) { content = csv_get(csv, col, row);
printf("%s\t|", content);
}
printf("]\n");
} printf("\n"); }
/**
* Resize CSV table */
int csv_resize(CSV * old_csv, unsigned int new_cols, unsigned int new_rows) { unsigned int cur_col, cur_row, max_cols, max_rows; CSV * new_csv; char * content; bool in_old, in_new;
/* Build a new (fake) csv */ new_csv = csv_create(new_cols, new_rows); if (new_csv == NULL) { goto error; }
new_csv->rows = new_rows; new_csv->cols = new_cols;
max_cols = (new_cols > old_csv->cols)? new_cols : old_csv->cols;
max_rows = (new_rows > old_csv->rows)? new_rows : old_csv->rows;
for (cur_col=0; cur_col<max_cols; cur_col++) { for (cur_row=0; cur_row<max_rows; cur_row++) { in_old = (cur_col < old_csv->cols) && (cur_row < old_csv->rows); in_new = (cur_col < new_csv->cols) && (cur_row < new_csv->rows);
if (in_old && in_new) { /* re-link data */ content = csv_get(old_csv, cur_col, cur_row); csv_set(new_csv, cur_col, cur_row, content); } else if (in_old) { /* destroy data */ content = csv_get(old_csv, cur_col, cur_row); free(content); } else { /* skip */ } } } /* on rows */ free(old_csv->table); old_csv->rows = new_rows; old_csv->cols = new_cols; old_csv->table = new_csv->table; new_csv->table = NULL; csv_destroy(new_csv);
return 0;
error: printf("Unable to resize CSV table: error %d - %s\n", errno, strerror(errno)); return -1; }
/**
* Open CSV file and load its content into provided CSV structure **/
int csv_open(CSV * csv, char * filename) { FILE * fp; unsigned int m_rows; unsigned int m_cols, cols; char line[2048]; char * lineptr; char * token;
fp = fopen(filename, "r");
if (fp == NULL) { goto error; }
m_rows = 0; m_cols = 0; while(fgets(line, sizeof(line), fp) != NULL) {
m_rows += 1; cols = 0; lineptr = line; while ((token = strtok(lineptr, csv->delim)) != NULL) { lineptr = NULL; trim(&token); cols += 1; if (cols > m_cols) { m_cols = cols; } csv_resize(csv, m_cols, m_rows); csv_set(csv, cols-1, m_rows-1, strdup(token)); }
}
fclose(fp); csv->rows = m_rows; csv->cols = m_cols; return 0;
error: fclose(fp); printf("Unable to open %s for reading.", filename); return -1; }
/**
* Open CSV file and save CSV structure content into it **/
int csv_save(CSV * csv, char * filename) { FILE * fp; int row, col; char * content;
fp = fopen(filename, "w"); for (row=0; row<csv->rows; row++) { for (col=0; col<csv->cols; col++) { content = csv_get(csv, col, row);
fprintf(fp, "%s%s", content, ((col == csv->cols-1) ? "" : csv->delim) );
}
fprintf(fp, "\n");
}
fclose(fp); return 0; }
/**
* Test */
int main(int argc, char ** argv) { CSV * csv;
printf("%s\n%s\n\n",TITLE, URL);
csv = csv_create(0, 0); csv_open(csv, "fixtures/csv-data-manipulation.csv"); csv_display(csv);
csv_set(csv, 0, 0, "Column0"); csv_set(csv, 1, 1, "100"); csv_set(csv, 2, 2, "200"); csv_set(csv, 3, 3, "300"); csv_set(csv, 4, 4, "400"); csv_display(csv);
csv_save(csv, "tmp/csv-data-manipulation.result.csv"); csv_destroy(csv);
return 0; } </lang>
- Output (in tmp/csv-data-manipulation.result.csv):
Column0,C2,C3,C4,C5 1,100,9,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
C++
<lang cpp>#include <map>
- include <vector>
- include <iostream>
- include <fstream>
- include <utility>
- include <functional>
- include <string>
- include <sstream>
- include <algorithm>
- include <cctype>
class CSV { public:
CSV(void) : m_nCols( 0 ), m_nRows( 0 ) {}
bool open( const char* filename, char delim = ',' ) { std::ifstream file( filename ); clear(); if ( file.is_open() ) { open( file, delim ); return true; }
return false; }
void open( std::istream& istream, char delim = ',' ) { std::string line;
clear(); while ( std::getline( istream, line ) ) { unsigned int nCol = 0; std::istringstream lineStream(line); std::string cell;
while( std::getline( lineStream, cell, delim ) ) { m_oData[std::make_pair( nCol, m_nRows )] = trim( cell ); nCol++; } m_nCols = std::max( m_nCols, nCol ); m_nRows++; } }
bool save( const char* pFile, char delim = ',' ) { std::ofstream ofile( pFile ); if ( ofile.is_open() ) { save( ofile ); return true; } return false; }
void save( std::ostream& ostream, char delim = ',' ) { for ( unsigned int nRow = 0; nRow < m_nRows; nRow++ ) { for ( unsigned int nCol = 0; nCol < m_nCols; nCol++ ) { ostream << trim( m_oData[std::make_pair( nCol, nRow )] ); if ( (nCol+1) < m_nCols ) { ostream << delim; } else { ostream << std::endl; } } } }
void clear() { m_oData.clear(); m_nRows = m_nCols = 0; }
std::string& operator()( unsigned int nCol, unsigned int nRow ) { m_nCols = std::max( m_nCols, nCol+1 ); m_nRows = std::max( m_nRows, nRow+1 ); return m_oData[std::make_pair(nCol, nRow)]; }
inline unsigned int GetRows() { return m_nRows; } inline unsigned int GetCols() { return m_nCols; }
private:
// trim string for empty spaces in begining and at the end inline std::string &trim(std::string &s) { s.erase(s.begin(), std::find_if(s.begin(), s.end(), std::not1(std::ptr_fun<int, int>(std::isspace)))); s.erase(std::find_if(s.rbegin(), s.rend(), std::not1(std::ptr_fun<int, int>(std::isspace))).base(), s.end()); return s; }
private:
std::map<std::pair<unsigned int, unsigned int>, std::string> m_oData;
unsigned int m_nCols; unsigned int m_nRows;
};
int main()
{
CSV oCSV;
oCSV.open( "test_in.csv" ); oCSV( 0, 0 ) = "Column0"; oCSV( 1, 1 ) = "100"; oCSV( 2, 2 ) = "200"; oCSV( 3, 3 ) = "300"; oCSV( 4, 4 ) = "400"; oCSV.save( "test_out.csv" ); return 0;
}</lang>
- Output (in test_out.csv):
Column0,C2,C3,C4,C5 1,100,9,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
C#
<lang csharp>using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.IO;
namespace CSV {
class CSV { private Dictionary<Tuple<int, int>, string> _data; private int _rows; private int _cols;
public int Rows { get { return _rows; } } public int Cols { get { return _cols; } }
public CSV() { Clear(); }
public void Clear() { _rows = 0; _cols = 0; _data = new Dictionary<Tuple<int, int>, string>(); }
public void Open(StreamReader stream, char delim = ',') { string line; int col = 0; int row = 0;
Clear();
while ((line = stream.ReadLine()) != null) { if (line.Length > 0) { string[] values = line.Split(delim); col = 0; foreach (var value in values) { this[col,row] = value; col++; } row++; } } stream.Close(); }
public void Save(StreamWriter stream, char delim = ',') { for (int row = 0; row < _rows; row++) { for (int col = 0; col < _cols; col++) { stream.Write(this[col, row]); if (col < _cols - 1) { stream.Write(delim); } } stream.WriteLine(); } stream.Flush(); stream.Close(); }
public string this[int col, int row] { get { try { return _data[new Tuple<int, int>(col, row)]; } catch { return ""; } }
set { _data[new Tuple<int, int>(col, row)] = value.ToString().Trim(); _rows = Math.Max(_rows, row + 1); _cols = Math.Max(_cols, col + 1); } }
static void Main(string[] args) { CSV csv = new CSV();
csv.Open(new StreamReader(@"test_in.csv")); csv[0, 0] = "Column0"; csv[1, 1] = "100"; csv[2, 2] = "200"; csv[3, 3] = "300"; csv[4, 4] = "400"; csv.Save(new StreamWriter(@"test_out.csv")); } }
}</lang>
- Output (in test_out.csv):
Column0,C2,C3,C4,C5 1,100,9,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
Clojure
<lang clojure> (require '[clojure.data.csv :as csv]
'[clojure.java.io :as io])
(defn add-sum-column [coll]
(let [titles (first coll) values (rest coll)] (cons (conj titles "SUM") (map #(conj % (reduce + (map read-string %))) values))))
(with-open [in-file (io/reader "test_in.csv")]
(doall (let [out-data (add-sum-column (csv/read-csv in-file))] (with-open [out-file (io/writer "test_out.csv")] (csv/write-csv out-file out-data)))))
</lang>
- Output (in
test_out.csv
):
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Common Lisp
Used only built-in functions which are in the standard. There are widespread libraries for working with csv (which can be easily loaded via quicklisp). As another example, I didn't use a split-string function, even though it is available in some implementations and in many compatibility layers and libraries. Instead, I formatted the csv file into s-expressions for the reader to understand it. Also, it deserves a mention that Common Lisp has built-in arrays, but for so little data it is easier to use nested lists.
<lang lisp> (defun csvfile-to-nested-list (filename delim-char)
"Reads the csv to a nested list, where each sublist represents a line." (with-open-file (input filename) (loop :for line := (read-line input nil) :while line :collect (read-from-string (substitute #\SPACE delim-char (format nil "(~a)~%" line))))))
(defun sublist-sum-list (nested-list)
"Return a list with the sum of each list of numbers in a nested list." (mapcar (lambda (l) (if (every #'numberp l) (reduce #'+ l) nil)) nested-list))
(defun append-each-sublist (nested-list1 nested-list2)
"Horizontally append the sublists in two nested lists. Used to add columns." (mapcar #'append nested-list1 nested-list2))
(defun nested-list-to-csv (nested-list delim-string)
"Converts the nested list back into a csv-formatted string." (format nil (concatenate 'string "~{~{~2,'0d" delim-string "~}~%~}") nested-list))
(defun main ()
(let* ((csvfile-path #p"projekte/common-lisp/example_comma_csv.txt") (result-path #p"results.txt") (data-list (csvfile-to-nested-list csvfile-path #\,)) (list-of-sums (sublist-sum-list data-list)) (result-header "C1,C2,C3,C4,C5,SUM")) (setf data-list ; add list of sums as additional column (rest ; remove old header (append-each-sublist data-list (mapcar #'list list-of-sums)))) ;; write to output-file (with-open-file (output result-path :direction :output :if-exists :supersede) (format output "~a~%~a" result-header (nested-list-to-csv data-list ",")))))
(main) </lang>
- Output (in
results.txt
):
C1,C2,C3,C4,C5,SUM 01,05,09,13,17,45, 02,06,10,14,18,50, 03,07,11,15,19,55, 04,08,12,16,20,60,
D
<lang d>void main() {
import std.stdio, std.csv, std.file, std.typecons, std.array, std.algorithm, std.conv, std.range;
auto rows = "csv_data_in.csv".File.byLine; auto fout = "csv_data_out.csv".File("w"); fout.writeln(rows.front); fout.writef("%(%(%d,%)\n%)", rows.dropOne .map!(r => r.csvReader!int.front.map!(x => x + 1)));
}</lang>
- Output (in
csv_data_out.csv
):
C1,C2,C3,C4,C5 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20 5,9,13,17,21
EchoLisp
<lang scheme>
- CSV -> LISTS
(define (csv->row line) (map (lambda(x) (or (string->number x) x)) (string-split line ","))) (define (csv->table csv) (map csv->row (string-split csv "\n")))
- LISTS -> CSV
(define (row->csv row) (string-join row ",")) (define (table->csv header rows)
(string-join (cons (row->csv header) (for/list ((row rows)) (row->csv row))) "\n"))
(define (task file)
(let* ((table (csv->table file)) (header (first table)) (rows (rest table))) (table->csv (append header "SUM") ;; add last column (for/list ((row rows)) (append row (apply + row))))))
</lang>
- Output:
<lang scheme> (define file.csv #<< C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20 >>#)
(task file.csv)
→ "C1,C2,C3,C4,C5,SUM
1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60" </lang>
ECL
<lang>// Assumes a CSV file exists and has been sprayed to a Thor cluster MyFileLayout := RECORD STRING Field1; STRING Field2; STRING Field3; STRING Field4; STRING Field5; END;
MyDataset := DATASET ('~Rosetta::myCSVFile', MyFileLayout,CSV(SEPARATOR(',')));
MyFileLayout Appended(MyFileLayout pInput):= TRANSFORM
SELF.Field1 := pInput.Field1 +'x'; SELF.Field2 := pInput.Field2 +'y'; SELF.Field3 := pInput.Field3 +'z'; SELF.Field4 := pInput.Field4 +'a'; SELF.Field5 := pInput.Field5 +'b';
END ;
MyNewDataset := PROJECT(MyDataset,Appended(LEFT)); OUTPUT(myNewDataset,,'~Rosetta::myNewCSVFile',CSV,OVERWRITE);</lang>
- Output:
(contents of Rosetta
C1x,C2y,C3z,C4a,C5b 1x,5y,9z,13a,17b 2x,6y,10z,14a,18b 3x,7y,11z,15a,19b 4x,8y,12z,16a,20b
Erlang
<lang Erlang> -module( csv_data ).
-export( [change/2, from_binary/1, from_file/1, into_file/2, task/0] ).
change( CSV, Changes ) -> lists:foldl( fun change_foldl/2, CSV, Changes ).
from_binary( Binary ) ->
Lines = binary:split( Binary, <<"\n">>, [global] ), [binary:split(X, <<",">>, [global]) || X <- Lines].
from_file( Name ) ->
{ok, Binary} = file:read_file( Name ), from_binary( Binary ).
into_file( Name, CSV ) ->
Binaries = join_binaries( [join_binaries(X, <<",">>) || X <- CSV], <<"\n">> ), file:write_file( Name, Binaries ).
task() ->
CSV = from_file( "CSV_file.in" ), New_CSV = change( CSV, [{2,3,<<"23">>}, {4,4,<<"44">>}] ), into_file( "CSV_file.out", New_CSV ).
change_foldl( {Row_number, Column_number, New}, Acc ) ->
{Row_befores, [Row_columns | Row_afters]} = split( Row_number, Acc ), {Column_befores, [_Old | Column_afters]} = split( Column_number, Row_columns ), Row_befores ++ [Column_befores ++ [New | Column_afters]] ++ Row_afters.
join_binaries( Binaries, Binary ) ->
[_Last | Rest] = lists:reverse( lists:flatten([[X, Binary] || X <- Binaries]) ), lists:reverse( Rest ).
split( 1, List ) -> {[], List}; split( N, List ) -> lists:split( N - 1, List ). </lang>
- Output:
Contents of "CSV_file.out"
C1,C2,C3,C4,C5 1,5,23,13,17 2,6,10,14,18 3,7,11,44,19 4,8,12,16,20
Euphoria
<lang euphoria>--- Read CSV file and add columns headed with 'SUM' --- with trace -- trace(0)
include get.e include std/text.e
function split(sequence s, integer c)
sequence removables = " \t\n\r\x05\u0234\" " sequence out integer first, delim out = {} first = 1 while first <= length(s) do delim = find_from(c,s,first) if delim = 0 then delim = length(s)+1 end if out = append(out,trim(s[first..delim-1],removables)) first = delim + 1 end while return out
end function
procedure main()
integer fn -- the file number integer fn2 -- the output file number integer e -- the number of lines read object line -- the next line from the file sequence data = {} -- parsed csv data row sequence headerNames = {} -- array saving column names atom sum = 0.0 -- sum for each row sequence var -- holds numerical data read -- First we try to open the file called "data.csv". fn = open("data.csv", "r") if fn = -1 then puts(1, "Can't open data.csv\n")
-- abort();
end if
-- Then we create an output file for processed data. fn2 = open("newdata.csv", "w") if fn2 = -1 then puts(1, "Can't create newdata.csv\n") end if -- By successfully opening the file we have established that -- the file exists, and open() gives us a file number (or "handle") -- that we can use to perform operations on the file. e = 1 while 1 do line = gets(fn) if atom(line) then exit end if data = split(line, ',') if (e=1) then -- Save the header labels and
-- write them to output file.
headerNames = data
for i=1 to length(headerNames) do printf(fn2, "%s,", {headerNames[i]}) end for printf(fn2, "SUM\n")
end if -- Run a sum for the numerical data. if (e >= 2) then
for i=1 to length(data) do printf(fn2, "%s,", {data[i]}) var = value(data[i]) if var[1] = 0 then -- data read is numerical -- add to sum sum = sum + var[2] end if end for
printf(fn2, "%g\n", {sum})
sum = 0.0
end if e = e + 1 end while close(fn) close(fn2)
end procedure
main() </lang>
- Output:
Contents of "newdata.csv"
C1 C2 C3 C4 C5 SUM 1 5 9 13 17 45 2 6 10 14 18 50 3 7 11 15 19 55 4 8 12 16 20 60
F#
<lang fsharp>open System open System.Collections.Generic open System.IO
type Csv = class
val private _data : Dictionary<int * int, string> val mutable private _rows : int val mutable private _cols : int
new () = { _data = new Dictionary<int * int, string>(); _rows = 0; _cols = 0 } member this.Rows with get () = this._rows member this.Cols with get () = this._cols
member this.Item with get(row,col) = match this._data.TryGetValue((row,col)) with | (true, s) -> s | _ -> "" and set(row,col) value = this._data.[(row,col)] <- value this._rows <- Math.Max(this._rows, row + 1) this._cols <- Math.Max(this._cols, col + 1)
member this.Open(stream : StreamReader, delim : char) = this._data.Clear() this._cols <- 0 this._rows <- 0
Seq.unfold ( fun (s : StreamReader) -> if s.EndOfStream then None else Some(s.ReadLine(), s)) stream |> Seq.iteri ( fun i line -> line.Split(delim) |> Array.iteri (fun j item -> this.[i,j] <- item)) stream.Close()
member this.Open(stream : StreamReader) = this.Open(stream, ',')
member this.Save(writer : TextWriter, delim : char) = for row = 0 to this._rows - 1 do String.Join(delim.ToString(), [0 .. this._cols - 1] |> List.map(fun col -> this.[row,col]) |> List.toArray) |> writer.WriteLine
member this.Save(writer : TextWriter) = this.Save(writer, ',')
end
[<EntryPoint>]
let main argv =
let csv = new Csv() csv.Open(new StreamReader(argv.[0])) csv.[0, 0] <- "Column0" csv.[1, 1] <- "100" csv.[2, 2] <- "200" csv.[3, 3] <- "300" csv.[4, 4] <- "400" csv.Save(Console.Out) 0</lang>
- Output:
>Rosetta.exe test_in.csv Column0,C2,C3,C4,C5 1,100,9,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
Forth
<lang forth>\ csvsum.fs Add a new column named SUM that contain sums from rows of CommaSeparatedValues \ USAGE: \ gforth-fast csvsum.fs -e "stdout stdin csvsum bye" <input.csv >output.csv
CHAR , CONSTANT SEPARATOR 3 CONSTANT DECIMALS 1E1 DECIMALS S>D D>F F** FCONSTANT FSCALE
- colsum ( ca u -- F: -- sum ;return SUM from CSV-string )
0E0 OVER SWAP BOUNDS ?DO ( a ) I C@ SEPARATOR = IF ( a ) I TUCK OVER - >FLOAT IF F+ THEN 1+ THEN LOOP DROP
- f>string ( -- ca u F: x -- )
FSCALE F* F>D TUCK DABS <# DECIMALS 0 DO # LOOP [CHAR] . HOLD #S ROT SIGN #>
- rowC!+ ( offs char -- u+1 ;store CHAR at here+OFFS,increment offset )
OVER HERE + C! 1+
- row$!+ ( offs ca u -- offs+u ;store STRING at here+OFFS,update offset )
ROT 2DUP + >R HERE + SWAP MOVE R>
\ If run program with '-m 4G'option, we have practically 4G to store a row
- csvsum ( fo fi -- ;write into FILEID-OUTPUT processed input from FILEID-INPUT )
2DUP HERE UNUSED ROT READ-LINE THROW IF ( fo fi fo u ) HERE SWAP ( fo fi fo ca u ) SEPARATOR rowC!+ s\" SUM" row$!+ ( fo fi fo ca u' ) ROT WRITE-LINE THROW BEGIN ( fo fi ) 2DUP HERE UNUSED ROT READ-LINE THROW WHILE ( fo fi fo u ) HERE SWAP ( fo fi fo ca u ) SEPARATOR rowC!+ HERE OVER colsum f>string ( fo fi fo ca u ca' u' ) row$!+ ( fo fi fo ca u'+u ) ROT WRITE-LINE THROW REPEAT THEN 2DROP 2DROP
- </lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45.000 2,6,10,14,18,50.000 3,7,11,15,19,55.000 4,8,12,16,20,60.000
Fortran
This task in general soon becomes messy, especially when texts instead of numbers are amongst the input possibilities. Primarily, what is the upper bound on the number of values to a line? Will all lines have the same number of values; if not, is this an error? Do the data have special types (such as calendar dates?) or are all simple numbers? Very soon it becomes better to read a record as text and scan the text: problems can then be dealt with and good complaints made. So, how long is the longest record? Only the B6700 filesystem would make available a file's maximum record length: others offer no help. I have confronted a singularly witless format for supplying electricity data that would write up to an entire year's worth of half-hourly values to one line though it might be used to write just a few day's worth of data also. The header line specified the date and time slot for each column as Country,Island,Node,MEAN Energy,01AUG2010 Daily ENERGY,01AUG2010 01,01AUG2010 02,01AUG2010 03, etc.
so all-in-all it was less trouble to specify CHARACTER*246810 for the input record scratchpad so as not to have to struggle with piecemeal input. More generally, I have a routine for reading this sort of data that, provoked by multiple methods of specifying dates and times, is 1,500 lines long...
Rather than present a monster, what follows is a simple programme that employs the Fortran free-format (or "list-directed") input protocol, applied to the example data. Each READ statement reads an arbitrary number of input records until its input list is satisfied. In this case, notably, every line has five values, and five values are requested in each READ. Instead of numbers the first line presents texts that are column headings and, happily, the free-format input reader will parse the example headings as desired, provided that the read statement's input list is of CHARACTER type, not a number type. Commas are used as as delimiters, and the READ statement accepts that, but also spaces; fortunately no spaces appear within the heading texts. This would be the first reason for writing a data-scanning system. It also accepts quoted texts (and also, allows hexadecimal/octal/binary values thereby) but this example does not require them. On the other hand, the appearance of a / in an input line marks the end of reading for that line and if the input list demands more data, an additional input record will be sought. Thus, if the input data includes dates expressed as say 6/12/2015 there will be difficulty, and another cause for devising a data-scanning system. The ISO standard of 2015-12-6 would also cause trouble as that would manifest as three numbers, if accepted at all, there being no delimiters (spaces or commas) in the sequence. Thus the / character marks the start of in in-line comment (as ! does in later-style Fortran source code) and is treated as end-of-line. The free-format input scan also recognises boolean values ("T" or "F", without the quotes) and complex numbers presented as (real,imag)
pairs, and also repetition counts as in 24*6
meaning twenty-four occurrences of the value six, alas not using the @-symbol. Opportunities escalate still further with NAMELIST style I/O (allowing name = value(s)
), so considerable flexibility is available. However, the input statement's list of variables must have the correct type for the input data (or vice-versa) and the counting of the number of values requested and the number supplied must be correct too, lest everything get out of step. These are further reasons for writing a data-scanning system, but for the simple example, the standard features suffice. The idea is that these features suffice for getting data into a programme without much trouble, so that programming time can be devoted to what is to be done with the data, not how they are to be read.
So much for the header. The data lines can be read as numbers without difficulty, as they are numbers, so all that is necessary is that the input list be of type number. Chosen to be integer here, to match the example.
Output presents some worries, the usual one being how long is a piece of string? Unlike a number, which can be read as input with arbitrary size (0007 or 7, or 7.000000, etc.) character sequences are not translated into a number. Until there is provision for reading a string of arbitrary length, which will be stored by the READ statement as a string of an arbitrary length as encountered, an upper bound must be specified, and for this problem, six will do. For output, the special function TRIM removes trailing spaces, but is an intrinsic function only for F90 and later. Alas, it does not work for an array span, only a specific element at a time, so an implicit DO-loop is needed in the WRITE statement. Text literals in a FORMAT statement are rolled forth until such time as there is an edit code for a list item for which the output list is already exhausted. This would normally mean that the last datum written would be followed by a comma (from the text literal ",") however the mysterious colon in the FORMAT statement (instead of the more usual comma) means that text literals following it are not to be rolled forth unless a value from the output list is still waiting for its opportunity. One could instead use 666(",",I0) to omit the trailing comma but now there would be a leading comma. So, ... I0,666(",",I0) would avoid the leading comma, except if there was only one value to send forth, there would still be a trailing comma...
Another F90 feature is the SUM function that adds the elements of an array span. Even though for the example the first column looks rather like a record number, all five columns will be added, but otherwise the statement would be SUM(X(2:N)). Other modifications can be made without much difficulty, if desired. The output format is I0 rather than say I2, as it provides only the needed number of characters to present the integer's value. There is no corresponding F format code, and free-format output would roll out many spaces as padding in case of large numbers, that are not present here. It would be needed for a more general solution, but for this example, I0 will do.
<lang Fortran> Copies a file with 5 comma-separated values to a line, appending a column holding their sum.
INTEGER N !Instead of littering the source with "5" PARAMETER (N = 5) !Provide some provenance. CHARACTER*6 HEAD(N) !A perfect size? INTEGER X(N) !Integers suffice. INTEGER LINPR,IN !I/O unit numbers. LINPR = 6 !Standard output via this unit number. IN = 10 !Some unit number for the input file. OPEN (IN,FILE="CSVtest.csv",STATUS="OLD",ACTION="READ") !For formatted input.
READ (IN,*) HEAD !The first line has texts as column headings. WRITE (LINPR,1) (TRIM(HEAD(I)), I = 1,N),"Sum" !Append a "Sum" column. 1 FORMAT (666(A:",")) !The : sez "stop if no list element awaits". 2 READ (IN,*,END = 10) X !Read a line's worth of numbers, separated by commas or spaces. WRITE (LINPR,3) X,SUM(X) !Write, with a total appended. 3 FORMAT (666(I0:",")) !I0 editing uses only as many columns as are needed. GO TO 2 !Do it again.
10 CLOSE (IN) !All done. END !That's all.
</lang> Output could of course be written to a disc file instead of a screen, but here it is:
C1,C2,C3,C4,C5,Sum 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
FreeBASIC
<lang freebasic>' FB 1.05.0 Win64
Open "manip.csv" For Input As #1 ' existing CSV file Open "manip2.csv" For Output As #2 ' new CSV file for writing changed data
Dim header As String Line Input #1, header header += ",SUM" Print #2, header
Dim As Integer c1, c2, c3, c4, c5, sum
While Not Eof(1)
Input #1, c1, c2, c3, c4, c5 sum = c1 + c2 + c3 + c4 + c5 Write #2, c1, c2, c3, c4, c5, sum
Wend
Close #1 Close #2</lang>
- Output:
' contents of manip2.csv C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
FunL
<lang funl>import io.{lines, PrintWriter}
data Table( header, rows )
def read( file ) =
l = lines( file ) def next = vector( l.next().split(',') ) if l.isEmpty() then return Table( vector(), [] ) header = next() rows = seq() while l.hasNext() rows += next() Table( header, rows.toList() )
def write( table, out ) =
w = if out is String then PrintWriter( out ) else out
w.println( table.header.mkString(',') ) for r <- table.rows w.println( r.mkString(',') ) if out is String w.close()
def updateRow( header, row, updates ) =
r = dict( (header(i), row(i)) | i <- 0:header.length() ) updates( r ) vector( r(f) | f <- header )
def update( table, updates ) =
Table( table.header, (updateRow(table.header, r, updates) | r <- table.rows).toList() )
def addColumn( table, column, updates ) =
Table( table.header + [column], (updateRow(table.header + [column], r + [null], updates) | r <- table.rows).toList() )
t = addColumn( read('test.csv'), 'SUM', r -> r('SUM') = sum(int(v) | (_, v) <- r if v != null) ) write( t, 'test_out.csv' ) write( t, System.out )</lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Go
<lang go>package main
import ( "encoding/csv" "log" "os" "strconv" )
func main() { rows := readSample() appendSum(rows) writeChanges(rows) }
func readSample() [][]string { f, err := os.Open("sample.csv") if err != nil { log.Fatal(err) } rows, err := csv.NewReader(f).ReadAll() f.Close() if err != nil { log.Fatal(err) } return rows }
func appendSum(rows [][]string) { rows[0] = append(rows[0], "SUM") for i := 1; i < len(rows); i++ { rows[i] = append(rows[i], sum(rows[i])) } }
func sum(row []string) string { sum := 0 for _, s := range row { x, err := strconv.Atoi(s) if err != nil { return "NA" } sum += x } return strconv.Itoa(sum) }
func writeChanges(rows [][]string) { f, err := os.Create("output.csv") if err != nil { log.Fatal(err) } err = csv.NewWriter(f).WriteAll(rows) f.Close() if err != nil { log.Fatal(err) } }</lang>
- sample.csv:
C1,C2,C3,C4,C5 1,5,9,13,17 2,six,10,14,18 3,7,11,15,19 4,8,12,16,20
- output.csv:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,six,10,14,18,NA 3,7,11,15,19,55 4,8,12,16,20,60
Groovy
<lang groovy>def csv = [] def loadCsv = { source -> source.splitEachLine(/,/) { csv << it.collect { it } } } def saveCsv = { target -> target.withWriter { writer -> csv.each { writer.println it.join(',') } } }
loadCsv new File('csv.txt') csv[0][0] = 'Column0' (1..4).each { i -> csv[i][i] = i * 100 } saveCsv new File('csv_out.txt')</lang>
csv_out.txt:
Column0,C2,C3,C4,C5 1,100,9,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
Haskell
Solution 1 Array-based solution:
<lang haskell>import Data.Array import Data.Maybe (isJust) import Data.List (intercalate) import Control.Monad (when)
delimiters = ",;:"
fields [] = [] fields xs = let (item, rest) = break (`elem` delimiters) xs
(_, next) = break (`notElem` delimiters) rest in item : fields next
unfields Nothing = [] unfields (Just a) = every fieldNumber $ elems a
where ((_, _), (_, fieldNumber)) = bounds a every _ [] = [] every n xs = let (y, z) = splitAt n xs in intercalate "," y : every n z
fieldArray [] = Nothing fieldArray xs = Just $ listArray ((1,1), (length xs, length $ head xs))
$ concat xs
fieldsFromFile = fmap (fieldArray . map fields . lines) . readFile
fieldsToFile f = writeFile f . unlines . unfields
someChanges = fmap (// [((1,1), "changed"), ((3,4), "altered"),
((5,2), "modified")])
main = do
a <- fieldsFromFile "example.txt" when (isJust a) $ fieldsToFile "output.txt" $ someChanges a
</lang>
Solution 2 List-based solution, heavily using functors and lenses <lang haskell>{-# LANGUAGE FlexibleContexts,
TypeFamilies, NoMonomorphismRestriction #-}
import Data.List (intercalate) import Data.List.Split (splitOn) import Lens.Micro
(<$$>) :: (Functor f1, Functor f2) =>
(a -> b) -> f1 (f2 a) -> f1 (f2 b)
(<$$>) = fmap . fmap
-- reading and writing
newtype CSV = CSV { values :: String }
readCSV :: String -> CSV readCSV = CSV . (splitOn "," <$$> lines)
instance Show CSV where
show = unlines . map (intercalate ",") . values
-- construction and combination
mkColumn, mkRow :: [String] -> CSV (<||>), (<==>) :: CSV -> CSV -> CSV
mkColumn lst = CSV $ sequence [lst] mkRow lst = CSV [lst]
CSV t1 <||> CSV t2 = CSV $ zipWith (++) t1 t2 CSV t1 <==> CSV t2 = CSV $ t1 ++ t2
-- access and modification via lenses
table = lens values (\csv t -> csv {values = t}) row i = table . ix i . traverse col i = table . traverse . ix i item i j = table . ix i . ix j
sample = readCSV "C1, C2, C3, C4, C5\n\
\1, 5, 9, 13, 17\n\ \2, 6, 10, 14, 18\n\ \3, 7, 11, 15, 19\n\ \4, 8, 12, 16, 20"</lang>
Examples:
1. Reading from a file
λ> readCSV <$> readFile "example.csv" C1, C2, C3, C4, C5 1, 5, 9, 13, 17 2, 6, 10, 14, 18 3, 7, 11, 15, 19 4, 8, 12, 16, 20
2. Access and modification
λ> sample ^. item 2 3 "14" λ> sample ^.. row 2 ["2","6","10","14","18"] λ> sample ^.. col 2 ["C3","9","10","11","12"] λ> (item 3 2 .~ "Ok") sample C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,Ok,15,19 4,8,12,16,20 λ> (item 3 2 %~ (show.(^2).read)) sample C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,121,15,19 4,8,12,16,20 λ> (row 4 %~ (show.(^2).read)) sample C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 16,64,144,256,400 λ> (col 4 %~ (++"!")) sample C1,C2,C3,C4,C5! 1,5,9,13,17! 2,6,10,14,18! 3,7,11,15,19! 4,8,12,16,20!
3. Construction and combination <lang haskell>sampleSum = sample <||> (mkRow ["SUM"] <==> mkColumn sums)
where sums = map (show . sum) (read <$$> drop 1 (values sample))</lang>
λ> sampleSum C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Icon and Unicon
This version only works in Unicon, but can be easily adapted to work in Icon.
<lang unicon>import Utils # To get CSV procedures
procedure main(A)
f := open(A[1]) | &input i := 1 write(!f) # header line(?) every csv := parseCSV(!f) do { csv[i+:=1] *:= 100 write(encodeCSV(csv)) }
end</lang>
Sample run:
->csv csv.dat C1,C2,C3,C4,C5 1,500,9,13,17 2,6,1000,14,18 3,7,11,1500,19 4,8,12,16,2000 ->
J
Like other languages it is not necessary to use the csv utilities to accomplish this task.
<lang j> data=: (','&splitstring);.2 freads 'rc_csv.csv' NB. read and parse data
data=: (<'"spam"') (<2 3)} data NB. amend cell in 3rd row, 4th column (0-indexing) 'rc_outcsv.csv' fwrites~ ;<@(','&joinstring"1) data NB. format and write out amended data</lang>
Using the delimiter-separated-values utilities (of which tables/csv
is a special case) will handle more complex csv constructs:
<lang j> require 'tables/csv'
data=: makenum readcsv 'rc_csv.csv' NB. read data and convert cells to numeric where possible data=: (<'spam') (2 3;3 0)} data NB. amend 2 cells data writecsv 'rc_outcsv.csv' NB. write out amended data. Strings are double-quoted</lang>
Adding a column with the sum of the rows: <lang j> require 'tables/csv'
'hdr data'=: split readcsv 'rc_csv.csv' NB. read data, split the header & data hdr=: hdr , <'SUM' NB. add title for extra column to header data=: <"0 (,. +/"1) makenum data NB. convert to numeric, sum rows & append column (hdr,data) writecsv 'rc_out.csv'</lang>
Tacit version of above: <lang j> sumCSVrows=: writecsv~ (((<'SUM') ,~ {.) , [: (<"0)@(,. +/"1) makenum@}.)@readcsv
'rc_out.csv' sumCSVrows 'rc.csv'</lang>
Java
Roll Your Own
<lang java>import java.io.*; import java.awt.Point; import java.util.HashMap; import java.util.Scanner;
public class CSV {
private HashMap<Point, String> _map = new HashMap<Point, String>(); private int _cols; private int _rows;
public void open(File file) throws FileNotFoundException, IOException { open(file, ','); }
public void open(File file, char delimiter) throws FileNotFoundException, IOException { Scanner scanner = new Scanner(file); scanner.useDelimiter(Character.toString(delimiter));
clear();
while(scanner.hasNextLine()) { String[] values = scanner.nextLine().split(Character.toString(delimiter));
int col = 0; for ( String value: values ) { _map.put(new Point(col, _rows), value); _cols = Math.max(_cols, ++col); } _rows++; } scanner.close(); }
public void save(File file) throws IOException { save(file, ','); }
public void save(File file, char delimiter) throws IOException { FileWriter fw = new FileWriter(file); BufferedWriter bw = new BufferedWriter(fw);
for (int row = 0; row < _rows; row++) { for (int col = 0; col < _cols; col++) { Point key = new Point(col, row); if (_map.containsKey(key)) { bw.write(_map.get(key)); }
if ((col + 1) < _cols) { bw.write(delimiter); } } bw.newLine(); } bw.flush(); bw.close(); }
public String get(int col, int row) { String val = ""; Point key = new Point(col, row); if (_map.containsKey(key)) { val = _map.get(key); } return val; }
public void put(int col, int row, String value) { _map.put(new Point(col, row), value); _cols = Math.max(_cols, col+1); _rows = Math.max(_rows, row+1); }
public void clear() { _map.clear(); _cols = 0; _rows = 0; }
public int rows() { return _rows; }
public int cols() { return _cols; }
public static void main(String[] args) { try { CSV csv = new CSV();
csv.open(new File("test_in.csv")); csv.put(0, 0, "Column0"); csv.put(1, 1, "100"); csv.put(2, 2, "200"); csv.put(3, 3, "300"); csv.put(4, 4, "400"); csv.save(new File("test_out.csv")); } catch (Exception e) { } }
}</lang>
- Output (in test_out.csv):
Column0,C2,C3,C4,C5 1,100,9,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
Apache commons-csv
Using the Apache commons-csv library. <lang java>import java.io.*; import java.util.*;
import org.apache.commons.csv.*;
public class RCsv {
private static final String NL = System.getProperty("line.separator"); private static final String FILENAME_IR = "data/csvtest_in.csv"; private static final String FILENAME_OR = "data/csvtest_sum.csv"; private static final String COL_NAME_SUM = "SUM, \"integers\""; // demonstrate white space, comma & quote handling
public static void main(String[] args) { Reader iCvs = null; Writer oCvs = null; System.out.println(textFileContentsToString(FILENAME_IR)); try { iCvs = new BufferedReader(new FileReader(FILENAME_IR)); oCvs = new BufferedWriter(new FileWriter(FILENAME_OR)); processCsv(iCvs, oCvs); } catch (IOException ex) { ex.printStackTrace(); } finally { try { if (iCvs != null) { iCvs.close(); } if (oCvs != null) { oCvs.close(); } } catch (IOException ex) { ex.printStackTrace(); } } System.out.println(textFileContentsToString(FILENAME_OR)); return; }
public static void processCsv(Reader iCvs, Writer oCvs) throws IOException { CSVPrinter printer = null; try { printer = new CSVPrinter(oCvs, CSVFormat.DEFAULT.withRecordSeparator(NL)); List<String> oCvsHeaders; List<String> oCvsRecord; CSVParser records = CSVFormat.DEFAULT.withHeader().parse(iCvs); Map<String, Integer> irHeader = records.getHeaderMap(); oCvsHeaders = new ArrayList<String>(Arrays.asList((irHeader.keySet()).toArray(new String[0]))); oCvsHeaders.add(COL_NAME_SUM); printer.printRecord(oCvsHeaders); for (CSVRecord record : records) { oCvsRecord = record2list(record, oCvsHeaders); printer.printRecord(oCvsRecord); } } finally { if (printer != null) { printer.close(); } } return; }
private static List<String> record2list(CSVRecord record, List<String> oCvsHeaders) { List<String> cvsRecord; Map<String, String> rMap = record.toMap(); long recNo = record.getRecordNumber(); rMap = alterRecord(rMap, recNo); int sum = 0; sum = summation(rMap); rMap.put(COL_NAME_SUM, String.valueOf(sum)); cvsRecord = new ArrayList<String>(); for (String key : oCvsHeaders) { cvsRecord.add(rMap.get(key)); } return cvsRecord; }
private static Map<String, String> alterRecord(Map<String, String> rMap, long recNo) { int rv; Random rg = new Random(recNo); rv = rg.nextInt(50); String[] ks = rMap.keySet().toArray(new String[0]); int ix = rg.nextInt(ks.length); long yv = 0; String ky = ks[ix]; String xv = rMap.get(ky); if (xv != null && xv.length() > 0) { yv = Long.valueOf(xv) + rv; rMap.put(ks[ix], String.valueOf(yv)); } return rMap; }
private static int summation(Map<String, String> rMap) { int sum = 0; for (String col : rMap.keySet()) { String nv = rMap.get(col); sum += nv != null && nv.length() > 0 ? Integer.valueOf(nv) : 0; } return sum; }
private static String textFileContentsToString(String filename) { StringBuilder lineOut = new StringBuilder(); Scanner fs = null; try { fs = new Scanner(new File(filename)); lineOut.append(filename); lineOut.append(NL); while (fs.hasNextLine()) { String line = fs.nextLine(); lineOut.append(line); lineOut.append(NL); } } catch (FileNotFoundException ex) { // TODO Auto-generated catch block ex.printStackTrace(); } finally { if (fs != null) { fs.close(); } } return lineOut.toString(); }
} </lang>
- Input:
data/csvtest_in.csv
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20
- Output:
data/csvtest_sum.csv
C1,C2,C3,C4,C5,"SUM, ""integers""" 1,5,9,21,17,53 36,6,10,14,18,84 3,7,11,27,19,67 4,8,12,53,20,97
uniVocity-parsers
Using the uniVocity-parsers library. <lang java> public static void main(String[] args) throws IOException {
// 1st, config the CSV reader with line separator CsvParserSettings settings = new CsvParserSettings(); settings.getFormat().setLineSeparator("\n");
// 2nd, config the CSV reader with row processor attaching the bean definition BeanListProcessor<Employee> rowProcessor = new BeanListProcessor<Employee>(Employee.class); settings.setRowProcessor(rowProcessor);
// 3rd, creates a CSV parser with the configs CsvParser parser = new CsvParser(settings);
// 4th, parse all rows from the CSF file into the list of beans you defined parser.parse(new FileReader("/examples/employees.csv")); List<Employee> resolvedBeans = rowProcessor.getBeans();
// 5th, Store, Delete duplicates, Re-arrange the words in specific order // ......
// 6th, Write the listed of processed employee beans out to a CSV file. CsvWriterSettings writerSettings = new CsvWriterSettings();
// 6.1 Creates a BeanWriterProcessor that handles annotated fields in the Employee class. writerSettings.setRowWriterProcessor(new BeanWriterProcessor<Employee>(Employee.class));
// 6.2 persistent the employee beans to a CSV file. CsvWriter writer = new CsvWriter(new FileWriter("/examples/processed_employees.csv"), writerSettings); writer.processRecords(resolvedBeans); writer.writeRows(new ArrayList<List<Object>>()); }
</lang>
jq
The following adds a column with header "SUM" as suggested in the task description.
The writing out of each line is facilitated by jq's @csv builtin. We must "slurp in" the file (using the -s option) so the header line can be handled specially. <lang jq># Omit empty lines def read_csv:
split("\n") | map(if length>0 then split(",") else empty end) ;
- add_column(label) adds a summation column (with the given label) to
- the matrix representation of the CSV table, and assumes that all the
- entries in the body of the CSV file are, or can be converted to,
- numbers:
def add_column(label):
[.[0] + [label], (reduce .[1:][] as $line ([]; ($line|map(tonumber)) as $line | . + [$line + [$line|add]]))[] ] ;
read_csv | add_column("SUM") | map(@csv)[]</lang>
- Output:
$ jq -s -R -r -f CSV_data_manipulation.jq input.csv "C1","C2","C3","C4","C5","SUM" 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Julia
Julia has rudimentary built-in support for csv input and output, which is certainly adequate for this task. More sophisticated support may be available with the DataFrames
package.
<lang Julia>
ifn = "csv_data_manipulation_in.dat"
ofn = "csv_data_manipulation_out.dat"
ifile = open(ifn, "r") (a, h) = readcsv(ifile, Int, header=true) close(ifile)
a = hcat(a, sum(a, 2)) h = hcat(h, "SUM") a = vcat(h, a)
ofile = open(ofn, "w") writecsv(ofile, a) close(ofile) </lang>
- Output:
$ cat csv_data_manipulation_out.dat C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Lingo
<lang lingo>---------------------------------------- -- Simplified CSV parser (without escape character support etc.). -- First line is interrepted as header with column names. -- @param {string} csvStr -- @param {string} [sep=","] - single char as string -- @param {string} [eol=RETURN] -- @return {propList}
on parseSimpleCSVString (csvStr, sep, eol)
if voidP(sep) then sep="," if voidP(eol) then eol = RETURN lines = explode(eol, csvStr) if lines.getLast()="" then lines.deleteAt(lines.count) res = [:] res[#header] = explode(sep, lines[1]) res[#data] = [] cnt = lines.count repeat with i = 2 to cnt res[#data].append(explodeBySingleChar(sep, lines[i])) end repeat return res
end
-- Simplified CSV creater (without escape character support etc.). -- @param {propList} csvData -- @param {string} [sep=","] -- @param {string} [eol=RETURN] -- @return {string}
on createSimpleCSVString (csvData, sep, eol)
if voidP(sep) then sep="," if voidP(eol) then eol = RETURN res = "" put implode(sep, csvData[#header])&eol after res cnt = csvData[#data].count repeat with i = 1 to cnt put implode(sep, csvData[#data][i])&eol after res end repeat return res
end
-- Explodes string into list -- @param {string} delim -- @param {string} str -- @return {list}
on explode (delim, str)
if delim.length=1 then return explodeBySingleChar(delim, str) l = [] if voidP(str) then return l dl = delim.length repeat while true pos = offset(delim, str) if pos=0 then exit repeat l.add(str.char[1..pos-1]) delete char 1 to pos+dl-1 of str end repeat if pos=0 then pos = 1-dl l.add(str.char[pos+dl..str.length]) return l
end
-- Explode string into list based on single char delimiter -- (uses Lingo's build-in 'item' support, therefor faster) -- @param {string} delim -- @param {string} str -- @return {list}
on explodeBySingleChar (delim, str)
l = [] if voidP(str) then return l od = _player.itemDelimiter _player.itemDelimiter = delim cnt = str.item.count repeat with i = 1 to cnt l.add(str.item[i]) end repeat _player.itemDelimiter = od return l
end
-- Implodes list into string -- @param {string} delim -- @param {list} l -- @return {string}
on implode (delim, l)
str = "" cnt = l.count repeat with i = 1 to cnt put l[i]&delim after str end repeat delete char (str.length-delim.length+1) to str.length of str return str
end</lang>
<lang lingo>sep = "," eol = numtochar(10)
-- load CSV string from file fn = _movie.path & "file.csv" fp = xtra("fileIO").new() fp.openFile(fn, 1) csvStr = fp.readFile() fp.closeFile()
-- parse CSV string into propList csvData = parseSimpleCSVString(csvStr, sep, eol)
-- add SUM column csvData[#header].append("SUM") repeat with row in csvData[#data]
sum = 0 repeat with cell in row sum = sum+integer(cell) end repeat row.append(sum)
end repeat
-- create CSV string from updated propList csvString = createSimpleCSVString(csvData, sep, eol)
-- save CSV string to file fn = _movie.path & "file.csv" fp.openFile(fn, 2) if not fp.status() then fp.delete() fp.createFile(fn) fp.openFile(fn, 2) fp.writeString(csvString) fp.closeFile()
-- show the CSV string put csvString
-- "C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60 "</lang>
Lua
Adds a SUM column. <lang lua>local csv={} for line in io.lines('file.csv') do
table.insert(csv, {}) local i=1 for j=1,#line do if line:sub(j,j) == ',' then table.insert(csv[#csv], line:sub(i,j-1)) i=j+1 end end table.insert(csv[#csv], line:sub(i,j))
end
table.insert(csv[1], 'SUM') for i=2,#csv do
local sum=0 for j=1,#csv[i] do sum=sum + tonumber(csv[i][j]) end if sum>0 then table.insert(csv[i], sum) end
end
local newFileData = for i=1,#csv do
newFileData=newFileData .. table.concat(csv[i], ',') .. '\n'
end
local file=io.open('file.csv', 'w') file:write(newFileData) </lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Mathematica / Wolfram Language
Mathematica's Import and Export functions support CSV files. <lang mathematica>iCSV=Import["test.csv"] ->{{"C1","C2","C3","C4","C5"},{1,5,9,13,17},{2,6,10,14,18},{3,7,11,15,19},{4,8,12,16,20}} iCSV1, 1 = Column0; iCSV2, 2 = 100; iCSV3, 3 = 200; iCSV4, 4 = 300; iCSV5, 5 = 400; iCSV2, 3 = 60; Export["test.csv",iCSV];</lang>
- Output:
Column0,C2,C3,C4,C5 1,100,60,13,17 2,6,200,14,18 3,7,11,300,19 4,8,12,16,400
Maple
Entire script: <lang Maple>M := ImportMatrix("data.csv",source=csv); M(..,6) := < "Total", seq( add(M[i,j], j=1..5), i=2..5 ) >; ExportMatrix("data_out.csv",M,target=csv); </lang>
Running this script showing interactive results: <lang Maple>> M := ImportMatrix("data.csv",source=csv);
["C1" "C2" "C3" "C4" "C5"] [ ] [ 1 5 9 13 17 ] [ ] M := [ 2 6 10 14 18 ] [ ] [ 3 7 11 15 19 ] [ ] [ 4 8 12 16 20 ]
> M(..,6) := < "Total", seq( add(M[i,j], j=1..5), i=2..5 ) >;
["C1" "C2" "C3" "C4" "C5" "Total"] [ ] [ 1 5 9 13 17 45 ] [ ] M := [ 2 6 10 14 18 50 ] [ ] [ 3 7 11 15 19 55 ] [ ] [ 4 8 12 16 20 60 ]
> ExportMatrix("data_out.csv",M,target=csv);
96
</lang>
MATLAB / Octave
<lang Matlab>filename='data.csv'; fid = fopen(filename); header = fgetl(fid); fclose(fid); X = dlmread(filename,',',1,0);
fid = fopen('data.out.csv','w+'); fprintf(fid,'%s,sum\n',header); for k=1:size(X,1), fprintf(fid,"%i,",X(k,:)); fprintf(fid,"%i\n",sum(X(k,:))); end; fclose(fid);</lang>
NetRexx
Using the Apache commons-csv library. <lang NetRexx>/* NetRexx */ options replace format comments java crossref symbols
import org.apache.commons.csv.
-- ============================================================================= class RCsv public final
properties private constant
NL = String System.getProperty("line.separator") COL_NAME_SUM = String 'SUM, "integers"' CSV_IFILE = 'data/csvtest_in.csv' CSV_OFILE = 'data/csvtest_sumRexx.csv'
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method main(args = String[]) public static
Arg = Rexx(args) iCvs = Reader null oCvs = Writer null parse arg ifile ofile . if ifile = , ifile = '.' then ifile = CSV_IFILE if ofile = , ofile = '.' then ofile = CSV_OFILE say textFileContentsToString(ifile) do iCvs = BufferedReader(FileReader(ifile)) oCvs = BufferedWriter(FileWriter(ofile)) processCsv(iCvs, oCvs); catch ex = IOException ex.printStackTrace(); finally do if iCvs \= null then iCvs.close() if oCvs \= null then oCvs.close() catch ex = IOException ex.printStackTrace() end end say textFileContentsToString(ofile) return
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method processCsv(iCvs = Reader, oCvs = Writer) public static binary signals IOException
printer = CSVPrinter null do printer = CSVPrinter(oCvs, CSVFormat.DEFAULT.withRecordSeparator(NL)) oCvsHeaders = java.util.List oCvsRecord = java.util.List records = CSVFormat.DEFAULT.withHeader(String[0]).parse(iCvs) irHeader = records.getHeaderMap() oCvsHeaders = ArrayList(Arrays.asList((irHeader.keySet()).toArray(String[0]))) oCvsHeaders.add(COL_NAME_SUM) printer.printRecord(oCvsHeaders) recordIterator = records.iterator() record = CSVRecord loop while recordIterator.hasNext() record = CSVRecord recordIterator.next() oCvsRecord = record2list(record, oCvsHeaders) printer.printRecord(oCvsRecord) end finally if printer \= null then printer.close() end return
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method record2list(record = CSVRecord, oCvsHeaders = java.util.List) private static binary returns java.util.List
cvsRecord = java.util.List rMap = record.toMap() recNo = record.getRecordNumber() rMap = alterRecord(rMap, recNo) sum = summation(record.iterator()) rMap.put(COL_NAME_SUM, sum) cvsRecord = ArrayList() loop ci = 0 to oCvsHeaders.size() - 1 key = oCvsHeaders.get(ci) cvsRecord.add(rMap.get(key)) end ci return cvsRecord
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method alterRecord(rMap = Map, recNo = long) private static binary returns Map
rv = int rg = Random(recNo) rv = rg.nextInt(50) ks = rMap.keySet().toArray(String[0]) ix = rg.nextInt(ks.length) yv = long 0 ky = ks[ix]; xv = String rMap.get(ky) if xv \= null & xv.length() > 0 then do yv = Long.valueOf(xv).longValue() + rv rMap.put(ks[ix], String.valueOf(yv)) end return rMap
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method summation(iColumn = Iterator) private static
sum = 0 loop while iColumn.hasNext() nv = Rexx(String iColumn.next()) if nv = null, nv.length() = 0, \nv.datatype('n') then nv = 0 sum = sum + nv end return sum
-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ method textFileContentsToString(filename) private static
lineOut = fs = Scanner null do fs = Scanner(File(filename)) lineOut = lineout || filename || NL loop while fs.hasNextLine() line = fs.nextLine() lineOut = lineout || line || NL end catch ex = FileNotFoundException ex.printStackTrace() finally if fs \= null then fs.close() end return lineOut
</lang>
- Input:
data/csvtest_in.csv
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20
- Output:
data/csvtest_sumRexx.csv
C1,C2,C3,C4,C5,"SUM, ""integers""" 1,5,9,21,17,45 36,6,10,14,18,50 3,7,11,27,19,55 4,8,12,53,20,60
Nim
Nim's standard library contains a robust CSV parser, but for this simple document that's not necessary.
<lang nim>import strutils, streams
let
csv = newFileStream("data.csv", fmRead) outf = newFileStream("data-out.csv", fmWrite)
var lineNumber = 1
while true:
if atEnd(csv): break var line = readLine(csv)
if lineNumber == 1: line.add(",SUM") else: var tmp = 0 for n in split(line, ","): tmp += parseInt(n) line.add(",") line.add($tmp) outf.writeLn($line)
inc lineNumber</lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Perl
For simple files, you can use split: <lang perl>#!/usr/bin/perl use warnings; use strict;
use List::Util 'sum';
my @header = split /,/, <>;
- Remove the newline.
chomp $header[-1];
my %column_number; for my $i (0 .. $#header) {
$column_number{$header[$i]} = $i;
} my @rows = map [ split /,/ ], <>; chomp $_->[-1] for @rows;
- Add 1 to the numbers in the 2nd column:
$_->[1]++ for @rows;
- Add C1 into C4:
$_->[ $column_number{C4} ] += $_->[ $column_number{C1} ] for @rows;
- Add sums to both rows and columns.
push @header, 'Sum'; $column_number{Sum} = $#header;
push $_, sum(@$_) for @rows; push @rows, [
map { my $col = $_; sum(map $_->[ $column_number{$col} ], @rows); } @header ];
- Print the output.
print join(',' => @header), "\n"; print join(',' => @$_), "\n" for @rows; </lang>
However, if the CSV can contain quoted texts (the type MS Excel produces), you should rather use the Text::CSV. Only reading the data and printing the result is different: <lang perl>#!/usr/bin/perl use warnings; use strict;
use Text::CSV; use List::Util 'sum';
my $csv = 'Text::CSV'->new({eol => "\n"})
or die 'Cannot use CSV: ' . 'Text::CSV'->error_diag;
my $file = shift; my @rows; open my $FH, '<', $file or die "Cannot open $file: $!"; my @header = @{ $csv->getline($FH) }; while (my $row = $csv->getline($FH)) {
push @rows, $row;
} $csv->eof or $csv->error_diag;
- The processing is the same.
- Print the output.
$csv->print(*STDOUT, $_) for \@header, @rows;</lang>
Perl 6
On the face of it this task is pretty simple. Especially given the sample CSV file and the total lack of specification of what changes to make to the file. Something like this would suffice. <lang perl6>my $csvfile = './whatever.csv'; my $fh = open($csvfile, :r); my @header = $fh.get.split(','); my @csv = map {[.split(',')]}, $fh.lines; close $fh;
my $out = open($csvfile, :w); $out.say((@header,'SUM').join(',')); $out.say((@$_, [+] @$_).join(',')) for @csv; close $out;</lang> But if your CSV file is at all complex you are better off using a CSV parsing module. (Complex meaning fields that contain commas, quotes, newlines, etc.) <lang perl6>use Text::CSV; my $csvfile = './whatever.csv'; my @csv = Text::CSV.parse-file($file); modify(@csv); # do whatever; csv-write-file( @csv, :file($csvfile) );</lang>
PHP
<lang PHP> <?php
// fputcsv() requires at least PHP 5.1.0 // file "data_in.csv" holds input data // the result is saved in "data_out.csv" // this version has no error-checking
$handle = fopen('data_in.csv','r'); $handle_output = fopen('data_out.csv','w'); $row = 0; $arr = array();
while ($line = fgetcsv($handle)) {
$arr[] = $line;
}
//change some data to zeroes $arr[1][0] = 0; // 1,5,9,13,17 => 0,5,9,13,17 $arr[2][1] = 0; // 2,6,10,14,18 => 2,0,10,14,18
//add sum and write file foreach ($arr as $line) {
if ($row==0) { array_push($line,"SUM"); } else { array_push($line,array_sum($line)); } fputcsv($handle_output, $line); $row++;
} ?></lang>
PicoLisp
<lang PicoLisp>(in "data.csv"
(prinl (line) "," "SUM") (while (split (line) ",") (prinl (glue "," @) "," (sum format @)) ) )</lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
PL/I
<lang pli>*process source xref attributes or(!);
csv: Proc Options(Main); /********************************************************************* * 19.10.2013 Walter Pachl * 'erase d:\csv.out' * 'set dd:in=d:\csv.in,recsize(300)' * 'set dd:out=d:\csv.out,recsize(300)' * Say 'Input:' * 'type csv.in' * 'csv' * Say ' ' * Say 'Output:' * 'type csv.out' *********************************************************************/ Dcl in Record Input; Dcl out Record Output; On Endfile(in) Goto part2; Dcl (INDEX,LEFT,SUBSTR,TRIM) Builtin;
Dcl (i,j,p,m,n) Bin Fixed(31) Init(0); Dcl s Char(100) Var; Dcl iline(10) Char(100) Var; Dcl a(20,20) Char(10) Var; Dcl sum Dec Fixed(3); Dcl oline Char(100) Var;
Do i=1 By 1; Read File(in) Into(s); iline(i)=s; m=i; Call sep((s)); End;
part2: Do i=1 To m; If i=1 Then oline=iline(1)!!','!!'SUM'; Else Do; sum=0; Do j=1 To n; sum=sum+a(i,j); End; oline=iline(i)!!','!!trim(sum); End; Write File(out) From(oline); End;
sep: Procedure(line); Dcl line Char(*) Var; loop: Do j=1 By 1; p=index(line,','); If p>0 Then Do; a(i,j)=left(line,p-1); line=substr(line,p+1); End; Else Do; a(i,j)=line; Leave loop; End; End; n=j; End;
End;</lang>
- Input:
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Prolog
Add a "SUM" column. Output is as for Lua and is not repeated here.
The following uses SWI-Prolog's csv_read_file_row/3 in order to demonstrate that it is not necessary to read more than a line at a time. <lang Prolog>test :- augment('test.csv', 'test.out.csv').
% augment( +InFileName, +OutFileName) augment(InFile, OutFile) :- open(OutFile, write, OutStream), ( ( csv_read_file_row(InFile, Row, [line(Line)]), % Row is of the form row( Item1, Item2, ....). addrow(Row, Out), csv_write_stream(OutStream, [Out], []), fail ) ; close(OutStream) ).
% If the first item in a row is an integer, then append the sum; % otherwise append 'SUM': addrow( Term, NewTerm ) :- Term =.. [F | List], List = [X|_], (integer(X) -> sum_list(List, Sum) ; Sum = 'SUM'), append(List, [Sum], NewList), NewTerm =.. [F | NewList]. </lang>
PowerShell
<lang PowerShell>## Create a CSV file @" C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20 "@ -split "`r`n" | Out-File -FilePath .\Temp.csv -Force
- Import each line of the CSV file into an array of PowerShell objects
$records = Import-Csv -Path .\Temp.csv
- Sum the values of the properties of each object
$sums = $records | ForEach-Object {
[int]$sum = 0 foreach ($field in $_.PSObject.Properties.Name) { $sum += $_.$field } $sum
}
- Add a column (Sum) and its value to each object in the array
$records = for ($i = 0; $i -lt $sums.Count; $i++) {
$records[$i] | Select-Object *,@{Name='Sum';Expression={$sums[$i]}}
}
- Export the array of modified objects to the CSV file
$records | Export-Csv -Path .\Temp.csv -Force
- Display the object in tabular form
$records | Format-Table -AutoSize </lang>
- Output:
C1 C2 C3 C4 C5 Sum -- -- -- -- -- --- 1 5 9 13 17 45 2 6 10 14 18 50 3 7 11 15 19 55 4 8 12 16 20 60
PureBasic
<lang PureBasic> EnableExplicit
- Separator$ = ","
Define fInput$ = "input.csv"; insert path to input file Define fOutput$ = "output.csv"; insert path to output file Define header$, row$, field$ Define nbColumns, sum, i
If OpenConsole()
If Not ReadFile(0, fInput$) PrintN("Error opening input file") Goto Finish EndIf If Not CreateFile(1, fOutput$) PrintN("Error creating output file") CloseFile(0) Goto Finish EndIf ; Read header row header$ = ReadString(0) ; Determine number of columns nbColumns = CountString(header$, ",") + 1 ; Change header row header$ + #Separator$ + "SUM" ; Write to output file WriteStringN(1, header$) ; Read remaining rows, process and write to output file While Not Eof(0) row$ = ReadString(0) sum = 0 For i = 1 To nbColumns field$ = StringField(row$, i, #Separator$) sum + Val(field$) Next row$ + #Separator$ + sum WriteStringN(1, row$) Wend CloseFile(0) CloseFile(1) Finish: PrintN("") PrintN("Press any key to close the console") Repeat: Delay(10) : Until Inkey() <> "" CloseConsole()
EndIf </lang>
- Output (in output.csv):
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Python
Note that the csv module is not required for such a simple and regular csv file. <lang python>import fileinput
changerow, changecolumn, changevalue = 2, 4, '"Spam"'
with fileinput.input('csv_data_manipulation.csv', inplace=True) as f:
for line in f: if fileinput.filelineno() == changerow: fields = line.rstrip().split(',') fields[changecolumn-1] = changevalue line = ','.join(fields) + '\n' print(line, end=)</lang>
- Output:
After this the data file csv_data_manipulation.csv
gets changed from that of the task to:
C1,C2,C3,C4,C5 1,5,9,"Spam",17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20
R
<lang R> df <- read.csv(textConnection( "C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20"))
df <- transform(df,SUM = rowSums(df))
write.csv(df,row.names = FALSE) </lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
This output can also be saved to a file:
<lang R> write.csv(df,row.names = FALSE,file = "foo.csv") </lang>
Racket
<lang racket>#lang racket (require (planet neil/csv:1:=7) net/url)
(define make-reader
(make-csv-reader-maker '((separator-chars #\,) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t))))
(define (all-rows port)
(define read-row (make-reader port)) (define head (append (read-row) '("SUM"))) (define rows (for/list ([row (in-producer read-row '())]) (define xs (map string->number row)) (append row (list (~a (apply + xs)))))) (define (->string row) (string-join row "," #:after-last "\n")) (string-append* (map ->string (cons head rows))))</lang>
Example: <lang racket>(define csv-file
"C1, C2, C3, C4, C5 1, 5, 9, 13, 17 2, 6, 10, 14, 18 3, 7, 11, 15, 19 4, 8, 12, 16, 20")
(display (all-rows (open-input-string csv-file)))</lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
REXX
<lang rexx>/* REXX ***************************************************************
- extend in.csv to add a column containing the sum of the lines' elems
- 21.06.2013 Walter Pachl
- /
csv='in.csv' Do i=1 By 1 While lines(csv)>0
l=linein(csv) If i=1 Then l.i=l',SUM' Else Do ol=l sum=0 Do While l<> Parse Var l e ',' l sum=sum+e End l.i=ol','sum End End
Call lineout csv 'erase' csv Do i=1 To i-1
Call lineout csv,l.i End</lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Ruby
<lang ruby>require 'csv'
- read:
ar = CSV.table("test.csv").to_a #table method assumes headers and converts numbers if possible.
- manipulate:
ar.first << "SUM" ar[1..-1].each{|row| row << row.inject(:+)}
- write:
CSV.open("out.csv", 'w') do |csv|
ar.each{|line| csv << line}
end</lang>
- Output:
c1,c2,c3,c4,c5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Run BASIC
<lang runbasic>csv$ = "C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20 "
print csv$ dim csvData$(5,5)
for r = 1 to 5
a$ = word$(csv$,r,chr$(13)) for c = 1 to 5 csvData$(r,c) = word$(a$,c,",") next c
next r
[loop] input "Row to change:";r input "Col to change;";c if r > 5 or c > 5 then
print "Row ";r;" or Col ";c;" is greater than 5" goto [loop]
end if input "Change Row ";r;" Col ";c;" from ";csvData$(r,c);" to ";d$ csvData$(r,c) = d$ for r = 1 to 5
for c = 1 to 5 print cma$;csvData$(r,c); cma$ = "," next c cma$ = "" print
next r</lang>
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20 Row to change:?4 Col to change;?4 Change Row 4 Col 4 from 15 to ?99 C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,99,19 4,8,12,16,20
Scala
<lang Scala>import scala.io.Source
object parseCSV extends App {
val rawData = """|C1,C2,C3,C4,C5
|1,5,9,13,17 |2,6,10,14,18 |3,7,11,15,19 |20,21,22,23,24""".stripMargin
val data = Seq((Source.fromString(rawData).getLines()).map(_.split(",")).toSeq: _*)
val output = ((data.take(1).flatMap(x => x) :+ "SUM").mkString(",") +: // Header line data.drop(1).map(_.map(_.toInt)). // Convert per line each array of String to array of integer map(cells => (cells, cells.sum)). //Add sum column to assemble a tuple. Part 1 are original numbers, 2 is the sum map(part => s"${part._1.mkString(",")},${part._2}")).mkString("\n")
println(output) /* Outputs:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 20,21,22,23,24,110
- /
}</lang>
Seed7
The program below assumes that the input file has the name csvDataManipulation.in and is in the same directory as the program.
<lang seed7>$ include "seed7_05.s7i";
const proc: main is func
local var file: input is STD_NULL; var array array string: csvData is 0 times 0 times ""; var integer: line is 0; begin input := open(dir(PROGRAM) & "/csvDataManipulation.in", "r"); while hasNext(input) do csvData &:= split(getln(input), ","); end while; csvData[3][3] := "X"; for key line range csvData do writeln(join(csvData[line], ",")); end for; end func;</lang>
- Output:
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,X,14,18 3,7,11,15,19 4,8,12,16,20
Sidef
For simple files we can use the split method. <lang ruby># Read var csvfile = %f'data.csv'; var fh = csvfile.open_r; var header = fh.line.trim_end.split(','); var csv = fh.lines.map { .trim_end.split(',').map{.to_num} }; fh.close;
- Write
var out = csvfile.open_w; out.say([header..., 'SUM'].join(',')); csv.each { |row| out.say([row..., row.sum].join(',')) }; out.close;</lang>
For complex files, the Text::CSV library is recommended. <lang ruby>var csv = require('Text::CSV').new(
Hash(eol => "\n")
);
- Open
var csvfile = %f'data.csv'; var fh = csvfile.open_r;
- Read
var rows = []; var header = csv.getline(fh); while (var row = csv.getline(fh)) {
rows.append(row.map{.to_num});
}
- Process
header.append('SUM'); rows.each { |row| row.append(row.sum) };
- Write
var out = csvfile.open_w; [header, rows...].each { |row|
csv.print(out, row);
};</lang>
Tcl
<lang tcl>package require struct::matrix package require csv
proc addSumColumn {filename {title "SUM"}} {
set m [struct::matrix]
# Load the CSV in set f [open $filename] csv::read2matrix $f $m "," auto close $f
# Add the column with the sums set sumcol [$m columns] $m add column $title for {set i 1} {$i < [$m rows]} {incr i} {
# Fill out a dummy value $m set cell $sumcol $i 0 $m set cell $sumcol $i [tcl::mathop::+ {*}[$m get row $i]]
}
# Write the CSV out set f [open $filename w] csv::writematrix $m $f close $f
$m destroy
}
addSumColumn "example.csv"</lang>
- Output (in example.csv):
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Although, for this specific small task, <lang tcl>set f [open example.csv r] puts "[gets $f],SUM" while { [gets $f row] > 0 } { puts "$row,[expr [string map {, +} $row]]" } close $f</lang> suffices.
TUSCRIPT
<lang tuscript> $$ MODE DATA $$ csv=* C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20 $$ MODE TUSCRIPT LOOP/CLEAR n,line=csv
IF (n==1) THEN line=CONCAT (line,",SUM") ELSE lineadd=EXCHANGE(line,":,:':") sum=SUM(lineadd) line=JOIN(line,",",sum) ENDIF csv=APPEND(csv,line)
ENDLOOP </lang>
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
TXR
<lang txr>@(coll)@{name /[^,]+/}@(end) @(collect :vars (value sum)) @ (bind sum 0) @ (coll)@{value /[^,]+/}@(set sum @(+ sum (int-str value)))@(end) @(end) @(output) @ (rep)@name,@(last)SUM@(end) @ (repeat) @ (rep)@value,@(last)@sum@(end) @ (end) @(end) </lang>
UNIX Shell
<lang bash>exec 0<"$1" # open the input file on stdin exec 1>"$1.new" # open an output file on stdout {
read -r header echo "$header,SUM" IFS=, while read -r -a numbers; do sum=0 for num in "${numbers[@]}"; do (( sum += num )) done
# can write the above loop as # sum=$(( $(IFS=+; echo "${numbers[*]}") ))
echo "${numbers[*]},$sum" done
} && mv "$1" "$1.bak" && mv "$1.new" "$1"</lang>
To make this work with ksh, change
read -a
to
read -A
Ursa
<lang ursa>#
- csv data manipulation
- declare a string stream to hold lines
decl string<> lines
- open the file specified on the command line, halting
- execution if they didn't enter one. it will be created if
- it doesn't exist yet
decl file f if (< (size args) 2)
out "error: please specify a csv file" endl console stop
end if f.create args<1> f.open args<1>
- read in all lines from the file
set lines (f.readlines)
- append sum column to header
set lines<0> (+ lines<0> ",SUM")
- determine sums and append them
decl int i sum for (set i 1) (< i (size lines)) (inc i)
set sum 0 for (decl int j) (< j (size (split lines ","))) (inc j) set sum (int (+ sum (int (split lines ",")<j>))) end for set lines (+ lines (+ "," sum))
end for
- delete the file, then create it again
f.delete args<1> f.create args<1>
- output all lines to the file
for (set i 0) (< i (size lines)) (inc i)
out lines endl f
end for</lang>
VBScript
<lang vb>'Instatiate FSO. Set objFSO = CreateObject("Scripting.FileSystemObject") 'Open the CSV file for reading. The file is in the same folder as the script and named csv_sample.csv. Set objInCSV = objFSO.OpenTextFile(objFSO.GetParentFolderName(WScript.ScriptFullName) & "\csv_sample.csv",1,False) 'Set header status to account for the first line as the column headers. IsHeader = True 'Initialize the var for the output string. OutTxt = "" 'Read each line of the file. Do Until objInCSV.AtEndOfStream line = objInCSV.ReadLine If IsHeader Then OutTxt = OutTxt & line & ",SUM" & vbCrLf IsHeader = False Else OutTxt = OutTxt & line & "," & AddElements(line) & vbCrLf End If Loop 'Close the file. objInCSV.Close 'Open the same file for writing. Set objOutCSV = objFSO.OpenTextFile(objFSO.GetParentFolderName(WScript.ScriptFullName) & "\csv_sample.csv",2,True) 'Write the var OutTxt to the file overwriting existing contents. objOutCSV.Write OutTxt 'Close the file. objOutCSV.Close Set objFSO = Nothing
'Routine to add each element in a row. Function AddElements(s) arr = Split(s,",") For i = 0 To UBound(arr) AddElements = AddElements + CInt(arr(i)) Next End Function</lang>
- Input:
C1,C2,C3,C4,C5 1,5,9,13,17 2,6,10,14,18 3,7,11,15,19 4,8,12,16,20
- Output:
C1,C2,C3,C4,C5,SUM 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
Vedit macro language
This example adds 100 to the values in each cell at row n+1, column n. <lang vedit>File_Open("input.csv") for (#1 = 0; #1 < 4; #1++) {
Goto_Line(#1+2) // line (starting from line 2) if (#1) { Search(",", ADVANCE+COUNT, #1) // column } #2 = Num_Eval() // #2 = old value Del_Char(Chars_Matched) // delete old value Num_Ins(#2+100, LEFT+NOCR) // write new value
} File_Save_As("output.csv", OK+NOMSG) </lang> output.csv:
C1,C2,C3,C4,C5 101,5,9,13,17 2,106,10,14,18 3,7,111,15,19 4,8,12,116,20
Visual FoxPro
<lang vfp> CLOSE DATABASES ALL SET SAFETY OFF MODIFY FILE file1.csv NOEDIT
- !* Create a cursor with integer columns
CREATE CURSOR tmp1 (C1 I, C2 I, C3 I, C4 I, C5 I) APPEND FROM file1.csv TYPE CSV SELECT C1, C2, C3, C4, C5, C1+C2+C3+C4+C5 As sum ; FROM tmp1 INTO CURSOR tmp2 COPY TO file2.csv TYPE CSV MODIFY FILE file2.csv NOEDIT IN SCREEN SET SAFETY ON </lang>
zkl
<lang zkl>csvFile:=File("test.csv"); header:=csvFile.readln().strip(); // remove trailing "\n" and leading white space listOfLines:=csvFile.pump(List,fcn(line){ line.strip().split(",").apply("toInt") });
newFile:=File("test2.csv","w"); newFile.writeln(header + ",sum"); listOfLines.pump(newFile.writeln,fcn(ns){ String(ns.concat(","),",",ns.sum()) }); newFile.close();</lang>
- Output:
$ cat test2.csv C1,C2,C3,C4,C5,sum 1,5,9,13,17,45 2,6,10,14,18,50 3,7,11,15,19,55 4,8,12,16,20,60
- Programming Tasks
- Solutions by Programming Task
- Ada
- ALGOL 68
- AutoHotkey
- AWK
- C
- C++
- C sharp
- Clojure
- Common Lisp
- D
- EchoLisp
- ECL
- Erlang
- Euphoria
- F Sharp
- Forth
- Fortran
- FreeBASIC
- FunL
- Go
- Groovy
- Haskell
- Icon
- Unicon
- J
- Java
- Apache commons-csv
- Jq
- Julia
- Lingo
- Lua
- Mathematica
- Wolfram Language
- Maple
- MATLAB
- Octave
- NetRexx
- Nim
- Perl
- Perl 6
- PHP
- PicoLisp
- PL/I
- Prolog
- PowerShell
- PureBasic
- Python
- R
- Racket
- REXX
- Ruby
- Run BASIC
- Scala
- Seed7
- Sidef
- Tcl
- Tcllib
- TUSCRIPT
- TXR
- UNIX Shell
- Ursa
- VBScript
- Vedit macro language
- Visual FoxPro
- Zkl