iris.pandas#

Provide conversion to and from Pandas data structures.

See also: http://pandas.pydata.org/

In this module:

iris.pandas.as_cube(pandas_array, copy=True, calendars=None)[source]#

Convert a Pandas Series/DataFrame into a 1D/2D Iris Cube.

Deprecated since version 3.3.0: This function is scheduled for removal in a future release, being replaced by iris.pandas.as_cubes(), which offers richer dimensional intelligence.

Parameters
  • pandas_array (pandas.Series or pandas.DataFrame) – The Pandas object to convert

  • copy (bool, default=True) – Whether to copy pandas_array, or to create array views where possible. Provided in case of memory limit concerns.

  • calendars (dict, optional) – A dict mapping a dimension to a calendar. Required to convert datetime indices/columns.

Notes

This function will copy your data by default.

Example usage:

as_cube(series, calendars={0: cf_units.CALENDAR_360_DAY})
as_cube(data_frame, calendars={1: cf_units.CALENDAR_STANDARD})

↑ top ↑

iris.pandas.as_cubes(pandas_structure, copy=True, calendars=None, aux_coord_cols=None, cell_measure_cols=None, ancillary_variable_cols=None)[source]#

Convert a Pandas Series/DataFrame into n-dimensional Iris Cubes, including dimensional metadata.

The index of pandas_structure will be used for generating the Cube dimension(s) and DimCoords. Other dimensional metadata may span multiple dimensions - based on how the column values vary with the index values.

Parameters
  • pandas_structure (pandas.Series or pandas.DataFrame) – The Pandas object to convert

  • copy (bool, default=True) – Whether the Cube data is a copy of the pandas_structure column, or a view of the same array. Arrays other than the data (coords etc.) are always copies. This option is provided to help with memory size concerns.

  • calendars (dict, optional) – Calendar conversions for individual date-time coordinate columns/index-levels e.g. {"my_column": cf_units.CALENDAR_360_DAY}.

  • aux_coord_cols (list of str, optional) – Names of columns to be converted into AuxCoord, CellMeasure and AncillaryVariable objects.

  • cell_measure_cols (list of str, optional) – Names of columns to be converted into AuxCoord, CellMeasure and AncillaryVariable objects.

  • ancillary_variable_cols (list of str, optional) – Names of columns to be converted into AuxCoord, CellMeasure and AncillaryVariable objects.

Returns

One Cube for each column not referenced in aux_coord_cols/cell_measure_cols/ancillary_variable_cols.

Return type

CubeList

Notes

A DataFrame using columns as a second data dimension will need to be β€˜melted’ before conversion. See the Examples for how.

Dask DataFrames are not supported.

Examples

>>> from iris.pandas import as_cubes
>>> import numpy as np
>>> from pandas import DataFrame, Series

Converting a simple Series :

>>> my_series = Series([300, 301, 302], name="air_temperature")
>>> converted_cubes = as_cubes(my_series)
>>> print(converted_cubes)
0: air_temperature / (unknown)         (unknown: 3)
>>> print(converted_cubes[0])
air_temperature / (unknown)         (unknown: 3)
    Dimension coordinates:
        unknown                             x

A DataFrame, with a custom index becoming the DimCoord :

>>> my_df = DataFrame({
...     "air_temperature": [300, 301, 302],
...     "longitude": [30, 40, 50]
...     })
>>> my_df = my_df.set_index("longitude")
>>> converted_cubes = as_cubes(my_df)
>>> print(converted_cubes[0])
air_temperature / (unknown)         (longitude: 3)
    Dimension coordinates:
        longitude                             x

A DataFrame representing two 3-dimensional datasets, including a 2-dimensional AuxCoord :

>>> my_df = DataFrame({
...     "air_temperature": np.arange(300, 312, 1),
...     "air_pressure": np.arange(1000, 1012, 1),
...     "longitude": [0, 10] * 6,
...     "latitude": [25, 25, 35, 35] * 3,
...     "height": ([0] * 4) + ([100] * 4) + ([200] * 4),
...     "in_region": [True, False, False, False] * 3
... })
>>> print(my_df)
    air_temperature  air_pressure  longitude  latitude  height  in_region
0               300          1000          0        25       0       True
1               301          1001         10        25       0      False
2               302          1002          0        35       0      False
3               303          1003         10        35       0      False
4               304          1004          0        25     100       True
5               305          1005         10        25     100      False
6               306          1006          0        35     100      False
7               307          1007         10        35     100      False
8               308          1008          0        25     200       True
9               309          1009         10        25     200      False
10              310          1010          0        35     200      False
11              311          1011         10        35     200      False
>>> my_df = my_df.set_index(["longitude", "latitude", "height"])
>>> my_df = my_df.sort_index()
>>> converted_cubes = as_cubes(my_df, aux_coord_cols=["in_region"])
>>> print(converted_cubes)
0: air_temperature / (unknown)         (longitude: 2; latitude: 2; height: 3)
1: air_pressure / (unknown)            (longitude: 2; latitude: 2; height: 3)
>>> print(converted_cubes[0])
air_temperature / (unknown)         (longitude: 2; latitude: 2; height: 3)
    Dimension coordinates:
        longitude                             x            -          -
        latitude                              -            x          -
        height                                -            -          x
    Auxiliary coordinates:
        in_region                             x            x          -

Pandas uses NaN rather than masking data. Converted Cubes can be masked in downstream user code :

>>> my_series = Series([300, np.NaN, 302], name="air_temperature")
>>> converted_cube = as_cubes(my_series)[0]
>>> print(converted_cube.data)
[300.  nan 302.]
>>> converted_cube.data = np.ma.masked_invalid(converted_cube.data)
>>> print(converted_cube.data)
[300.0 -- 302.0]

If the DataFrame uses columns as a second dimension, pandas.melt() should be used to convert the data to the expected n-dimensional format :

>>> my_df = DataFrame({
...     "latitude": [35, 25],
...     0: [300, 301],
...     10: [302, 303],
... })
>>> print(my_df)
   latitude    0   10
0        35  300  302
1        25  301  303
>>> my_df = my_df.melt(
...     id_vars=["latitude"],
...     value_vars=[0, 10],
...     var_name="longitude",
...     value_name="air_temperature"
... )
>>> print(my_df)
   latitude longitude  air_temperature
0        35         0              300
1        25         0              301
2        35        10              302
3        25        10              303
>>> my_df = my_df.set_index(["latitude", "longitude"])
>>> my_df = my_df.sort_index()
>>> converted_cube = as_cubes(my_df)[0]
>>> print(converted_cube)
air_temperature / (unknown)         (latitude: 2; longitude: 2)
    Dimension coordinates:
        latitude                             x             -
        longitude                            -             x

↑ top ↑

iris.pandas.as_data_frame(cube, copy=True)[source]#

Convert a 2D cube to a Pandas DataFrame.

Parameters

DataFrame. (* cube - The cube to convert to a Pandas) –

Kwargs:

  • copy - Whether to make a copy of the data.

    Defaults to True. Must be True for masked data and some data types (see notes below).

Note

This function will copy your data by default. If you have a large array that cannot be copied, make sure it is not masked and use copy=False.

Note

Pandas will sometimes make a copy of the array, for example when creating from an int32 array. Iris will detect this and raise an exception if copy=False.

↑ top ↑

iris.pandas.as_series(cube, copy=True)[source]#

Convert a 1D cube to a Pandas Series.

Parameters

Series. (* cube - The cube to convert to a Pandas) –

Kwargs:

  • copy - Whether to make a copy of the data.

    Defaults to True. Must be True for masked data.

Note

This function will copy your data by default. If you have a large array that cannot be copied, make sure it is not masked and use copy=False.