larray.read_csv

larray.read_csv(filepath_or_buffer, nb_axes=None, index_col=None, sep=',', headersep=None, decimal='.', fill_value=nan, na=nan, sort_rows=False, sort_columns=False, wide=True, dialect='larray', **kwargs) → Array[source]

Read csv file and returns an array with the contents.

Parameters

filepath_or_bufferstr or any file-like object: Path where the csv file has to be read or a file handle.
nb_axesint or None, optional: Number of axes of output array. The first nb_axes - 1 columns and the header of the CSV file will be used to set the axes of the output array. If not specified, the number of axes is given by the position of the first column header including a \ character plus one. If no column header includes a \ character, the array is assumed to have one axis. Defaults to None.
index_collist or None, optional: Positions of columns for the n-1 first axes (ex. [0, 1, 2, 3]). Defaults to None (see nb_axes above).
sepstr, optional: Separator to use. Defaults to ‘,’.
headersepstr or None, optional: Specific separator to use for headers. Defaults to None (uses sep).
decimalstr, optional: Character to use as decimal point. Defaults to ‘.’.
fill_valuescalar or Array, optional: Value used to fill cells corresponding to label combinations which are not present in the input. Defaults to NaN.
sort_rowsbool, optional: Whether to sort the rows alphabetically (sorting is more efficient than not sorting). Defaults to False.
sort_columnsbool, optional: Whether to sort the columns alphabetically (sorting is more efficient than not sorting). Defaults to False.
widebool, optional: Whether to assume the array is stored in “wide” format. If False, the array is assumed to be stored in “narrow” format: one column per axis plus one value column. Defaults to True.
dialect{‘classic’, ‘larray’, ‘liam2’}, optional: Name of dialect. Defaults to ‘larray’.
**kwargs: Extra keyword arguments are passed on to pandas.read_csv

Returns

Array

Notes

Without using any argument to tell otherwise, the csv files are assumed to be in this format:

axis0_name,axis1_name\axis2_name,axis2_label0,axis2_label1
axis0_label0,axis1_label0,value,value
axis0_label0,axis1_label1,value,value
axis0_label1,axis1_label0,value,value
axis0_label1,axis1_label1,value,value

For example:

country,gender\time,2013,2014,2015
Belgium,Male,5472856,5493792,5524068
Belgium,Female,5665118,5687048,5713206
France,Male,31772665,32045129,32174258
France,Female,33827685,34120851,34283895
Germany,Male,39380976,39556923,39835457
Germany,Female,41142770,41210540,41362080

Examples

>>> csv_dir = get_example_filepath('examples')
>>> fname = csv_dir / 'population.csv'

>>> # The data below is derived from a subset of the demo_pjan table from Eurostat
>>> read_csv(fname)
country  gender\time      2013      2014      2015
Belgium         Male   5472856   5493792   5524068
Belgium       Female   5665118   5687048   5713206
 France         Male  31772665  32045129  32174258
 France       Female  33827685  34120851  34283895
Germany         Male  39380976  39556923  39835457
Germany       Female  41142770  41210540  41362080

Missing label combinations

>>> fname = csv_dir / 'population_missing_values.csv'
>>> # let's take a look inside the CSV file.
>>> # they are missing label combinations: (Paris, male) and (New York, female)
>>> with open(fname) as f:
...     print(f.read().strip())
country,gender\time,2013,2014,2015
Belgium,Male,5472856,5493792,5524068
Belgium,Female,5665118,5687048,5713206
France,Female,33827685,34120851,34283895
Germany,Male,39380976,39556923,39835457
>>> # by default, cells associated with missing label combinations are filled with NaN.
>>> # In that case, an int array is converted to a float array.
>>> read_csv(fname)
country  gender\time        2013        2014        2015
Belgium         Male   5472856.0   5493792.0   5524068.0
Belgium       Female   5665118.0   5687048.0   5713206.0
 France         Male         nan         nan         nan
 France       Female  33827685.0  34120851.0  34283895.0
Germany         Male  39380976.0  39556923.0  39835457.0
Germany       Female         nan         nan         nan
>>> # using argument 'fill_value', you can choose which value to use to fill missing cells.
>>> read_csv(fname, fill_value=0)
country  gender\time      2013      2014      2015
Belgium         Male   5472856   5493792   5524068
Belgium       Female   5665118   5687048   5713206
 France         Male         0         0         0
 France       Female  33827685  34120851  34283895
Germany         Male  39380976  39556923  39835457
Germany       Female         0         0         0

Specify the number of axes of the output array (useful when the name of the last axis is implicit)

>>> fname = csv_dir / 'population_missing_axis_name.csv'
>>> # let's take a look inside the CSV file.
>>> # The name of the last axis is missing.
>>> with open(fname) as f:
...     print(f.read().strip())
country,gender,2013,2014,2015
Belgium,Male,5472856,5493792,5524068
Belgium,Female,5665118,5687048,5713206
France,Male,31772665,32045129,32174258
France,Female,33827685,34120851,34283895
Germany,Male,39380976,39556923,39835457
Germany,Female,41142770,41210540,41362080
>>> # read the array stored in the CSV file as is
>>> arr = read_csv(fname)
>>> # we expected a 3 x 2 x 3 array with data of type int
>>> # but we got a 6 x 4 array with data of type object
>>> arr.info
6 x 4
 country [6]: 'Belgium' 'Belgium' 'France' 'France' 'Germany' 'Germany'
 {1} [4]: 'gender' '2013' '2014' '2015'
dtype: object
memory used: 192 bytes
>>> # using argument 'nb_axes', you can force the number of axes of the output array
>>> arr = read_csv(fname, nb_axes=3)
>>> # as expected, we have a 3 x 2 x 3 array with data of type int
>>> arr.info
3 x 2 x 3
 country [3]: 'Belgium' 'France' 'Germany'
 gender [2]: 'Male' 'Female'
 {2} [3]: 2013 2014 2015
dtype: int64
memory used: 144 bytes

Read array saved in “narrow” format (wide=False)

>>> fname = csv_dir / 'population_narrow_format.csv'
>>> # let's take a look inside the CSV file.
>>> # Here, data are stored in a 'narrow' format.
>>> with open(fname) as f:
...     print(f.read().strip())
country,time,value
Belgium,2013,11137974
Belgium,2014,11180840
Belgium,2015,11237274
France,2013,65600350
France,2014,66165980
France,2015,66458153
>>> # to read arrays stored in 'narrow' format, you must pass wide=False to read_csv
>>> read_csv(fname, wide=False)
country\time      2013      2014      2015
     Belgium  11137974  11180840  11237274
      France  65600350  66165980  66458153