larray.read_csv
- larray.read_csv(filepath_or_buffer, nb_axes=None, index_col=None, sep=',', headersep=None, decimal='.', fill_value=nan, na=nan, sort_rows=False, sort_columns=False, wide=True, dialect='larray', **kwargs) Array [source]
Read csv file and returns an array with the contents.
- Parameters
- filepath_or_bufferstr or any file-like object
Path where the csv file has to be read or a file handle.
- nb_axesint or None, optional
Number of axes of output array. The first
nb_axes
- 1 columns and the header of the CSV file will be used to set the axes of the output array. If not specified, the number of axes is given by the position of the first column header including a\
character plus one. If no column header includes a\
character, the array is assumed to have one axis. Defaults to None.- index_collist or None, optional
Positions of columns for the n-1 first axes (ex. [0, 1, 2, 3]). Defaults to None (see nb_axes above).
- sepstr, optional
Separator to use. Defaults to ‘,’.
- headersepstr or None, optional
Specific separator to use for headers. Defaults to None (uses sep).
- decimalstr, optional
Character to use as decimal point. Defaults to ‘.’.
- fill_valuescalar or Array, optional
Value used to fill cells corresponding to label combinations which are not present in the input. Defaults to NaN.
- sort_rowsbool, optional
Whether to sort the rows alphabetically (sorting is more efficient than not sorting). Defaults to False.
- sort_columnsbool, optional
Whether to sort the columns alphabetically (sorting is more efficient than not sorting). Defaults to False.
- widebool, optional
Whether to assume the array is stored in “wide” format. If False, the array is assumed to be stored in “narrow” format: one column per axis plus one value column. Defaults to True.
- dialect{‘classic’, ‘larray’, ‘liam2’}, optional
Name of dialect. Defaults to ‘larray’.
- **kwargs
Extra keyword arguments are passed on to pandas.read_csv
- Returns
- Array
Notes
Without using any argument to tell otherwise, the csv files are assumed to be in this format:
axis0_name,axis1_name\axis2_name,axis2_label0,axis2_label1 axis0_label0,axis1_label0,value,value axis0_label0,axis1_label1,value,value axis0_label1,axis1_label0,value,value axis0_label1,axis1_label1,value,value
For example:
country,gender\time,2013,2014,2015 Belgium,Male,5472856,5493792,5524068 Belgium,Female,5665118,5687048,5713206 France,Male,31772665,32045129,32174258 France,Female,33827685,34120851,34283895 Germany,Male,39380976,39556923,39835457 Germany,Female,41142770,41210540,41362080
Examples
>>> csv_dir = get_example_filepath('examples') >>> fname = csv_dir / 'population.csv'
>>> # The data below is derived from a subset of the demo_pjan table from Eurostat >>> read_csv(fname) country gender\time 2013 2014 2015 Belgium Male 5472856 5493792 5524068 Belgium Female 5665118 5687048 5713206 France Male 31772665 32045129 32174258 France Female 33827685 34120851 34283895 Germany Male 39380976 39556923 39835457 Germany Female 41142770 41210540 41362080
Missing label combinations
>>> fname = csv_dir / 'population_missing_values.csv' >>> # let's take a look inside the CSV file. >>> # they are missing label combinations: (Paris, male) and (New York, female) >>> with open(fname) as f: ... print(f.read().strip()) country,gender\time,2013,2014,2015 Belgium,Male,5472856,5493792,5524068 Belgium,Female,5665118,5687048,5713206 France,Female,33827685,34120851,34283895 Germany,Male,39380976,39556923,39835457 >>> # by default, cells associated with missing label combinations are filled with NaN. >>> # In that case, an int array is converted to a float array. >>> read_csv(fname) country gender\time 2013 2014 2015 Belgium Male 5472856.0 5493792.0 5524068.0 Belgium Female 5665118.0 5687048.0 5713206.0 France Male nan nan nan France Female 33827685.0 34120851.0 34283895.0 Germany Male 39380976.0 39556923.0 39835457.0 Germany Female nan nan nan >>> # using argument 'fill_value', you can choose which value to use to fill missing cells. >>> read_csv(fname, fill_value=0) country gender\time 2013 2014 2015 Belgium Male 5472856 5493792 5524068 Belgium Female 5665118 5687048 5713206 France Male 0 0 0 France Female 33827685 34120851 34283895 Germany Male 39380976 39556923 39835457 Germany Female 0 0 0
Specify the number of axes of the output array (useful when the name of the last axis is implicit)
>>> fname = csv_dir / 'population_missing_axis_name.csv' >>> # let's take a look inside the CSV file. >>> # The name of the last axis is missing. >>> with open(fname) as f: ... print(f.read().strip()) country,gender,2013,2014,2015 Belgium,Male,5472856,5493792,5524068 Belgium,Female,5665118,5687048,5713206 France,Male,31772665,32045129,32174258 France,Female,33827685,34120851,34283895 Germany,Male,39380976,39556923,39835457 Germany,Female,41142770,41210540,41362080 >>> # read the array stored in the CSV file as is >>> arr = read_csv(fname) >>> # we expected a 3 x 2 x 3 array with data of type int >>> # but we got a 6 x 4 array with data of type object >>> arr.info 6 x 4 country [6]: 'Belgium' 'Belgium' 'France' 'France' 'Germany' 'Germany' {1} [4]: 'gender' '2013' '2014' '2015' dtype: object memory used: 192 bytes >>> # using argument 'nb_axes', you can force the number of axes of the output array >>> arr = read_csv(fname, nb_axes=3) >>> # as expected, we have a 3 x 2 x 3 array with data of type int >>> arr.info 3 x 2 x 3 country [3]: 'Belgium' 'France' 'Germany' gender [2]: 'Male' 'Female' {2} [3]: 2013 2014 2015 dtype: int64 memory used: 144 bytes
Read array saved in “narrow” format (wide=False)
>>> fname = csv_dir / 'population_narrow_format.csv' >>> # let's take a look inside the CSV file. >>> # Here, data are stored in a 'narrow' format. >>> with open(fname) as f: ... print(f.read().strip()) country,time,value Belgium,2013,11137974 Belgium,2014,11180840 Belgium,2015,11237274 France,2013,65600350 France,2014,66165980 France,2015,66458153 >>> # to read arrays stored in 'narrow' format, you must pass wide=False to read_csv >>> read_csv(fname, wide=False) country\time 2013 2014 2015 Belgium 11137974 11180840 11237274 France 65600350 66165980 66458153