Ibis

the portable Python dataframe library

Why Ibis? Installation Tutorial: getting started Blog GitHub Chat RSS

Install

We recommend starting with the default backend (DuckDB).

pip install 'ibis-framework[duckdb,examples]'

1: Install Ibis with the DuckDB backend along with examples.

Show supported backends

Backends

Need to use Ibis with a backend that isn’t currently supported? Let us know!

You can install Ibis and a supported backend with pip, conda, mamba, or pixi.

Install with the bigquery extra:

pip install 'ibis-framework[bigquery]'

Connect using ibis.bigquery.connect.

Warning

Note that the ibis-framework package is not the same as the ibis package in PyPI. These two libraries cannot coexist in the same Python environment, as they are both imported with the ibis module name.

Install with the clickhouse extra:

pip install 'ibis-framework[clickhouse]'

Connect using ibis.clickhouse.connect.

Warning

Install with the dask extra:

pip install 'ibis-framework[dask]'

Connect using ibis.dask.connect.

Warning

Install with the datafusion extra:

pip install 'ibis-framework[datafusion]'

Connect using ibis.datafusion.connect.

Warning

Install with the druid extra:

pip install 'ibis-framework[druid]'

Connect using ibis.druid.connect.

Warning

Install with the duckdb extra:

pip install 'ibis-framework[duckdb]'

Connect using ibis.duckdb.connect.

Warning

Install with the exasol extra:

pip install 'ibis-framework[exasol]'

Connect using ibis.exasol.connect.

Warning

Install alongside the apache-flink package:

pip install ibis-framework apache-flink

Connect using ibis.flink.connect.

Warning

Install with the impala extra:

pip install 'ibis-framework[impala]'

Connect using ibis.impala.connect.

Warning

Install with the mssql extra:

pip install 'ibis-framework[mssql]'

Connect using ibis.mssql.connect.

Warning

Install with the mysql extra:

pip install 'ibis-framework[mysql]'

Connect using ibis.mysql.connect.

Warning

Install with the oracle extra:

pip install 'ibis-framework[oracle]'

Connect using ibis.oracle.connect.

Warning

Install with the pandas extra:

pip install 'ibis-framework[pandas]'

Connect using ibis.pandas.connect.

Warning

Install with the polars extra:

pip install 'ibis-framework[polars]'

Connect using ibis.polars.connect.

Warning

Install with the postgres extra:

pip install 'ibis-framework[postgres]'

Connect using ibis.postgres.connect.

Warning

Install with the pyspark extra:

pip install 'ibis-framework[pyspark]'

Connect using ibis.pyspark.connect.

Warning

Install with the risingwave extra:

pip install 'ibis-framework[risingwave]'

Connect using ibis.risingwave.connect.

Warning

Install with the snowflake extra:

pip install 'ibis-framework[snowflake]'

Connect using ibis.snowflake.connect.

Warning

Install with the sqlite extra:

pip install 'ibis-framework[sqlite]'

Connect using ibis.sqlite.connect.

Warning

Install with the trino extra:

pip install 'ibis-framework[trino]'

Connect using ibis.trino.connect.

Warning

Install the ibis-bigquery package:

conda install -c conda-forge ibis-bigquery

Connect using ibis.bigquery.connect.

Install the ibis-clickhouse package:

conda install -c conda-forge ibis-clickhouse

Connect using ibis.clickhouse.connect.

Install the ibis-dask package:

conda install -c conda-forge ibis-dask

Connect using ibis.dask.connect.

Install the ibis-datafusion package:

conda install -c conda-forge ibis-datafusion

Connect using ibis.datafusion.connect.

Install the ibis-druid package:

conda install -c conda-forge ibis-druid

Connect using ibis.druid.connect.

Install the ibis-duckdb package:

conda install -c conda-forge ibis-duckdb

Connect using ibis.duckdb.connect.

Install the ibis-exasol package:

conda install -c conda-forge ibis-exasol

Connect using ibis.exasol.connect.

Install the ibis-flink package:

conda install -c conda-forge ibis-flink

Connect using ibis.flink.connect.

Install the ibis-impala package:

conda install -c conda-forge ibis-impala

Connect using ibis.impala.connect.

Install the ibis-mssql package:

conda install -c conda-forge ibis-mssql

Connect using ibis.mssql.connect.

Install the ibis-mysql package:

conda install -c conda-forge ibis-mysql

Connect using ibis.mysql.connect.

Install the ibis-oracle package:

conda install -c conda-forge ibis-oracle

Connect using ibis.oracle.connect.

Install the ibis-pandas package:

conda install -c conda-forge ibis-pandas

Connect using ibis.pandas.connect.

Install the ibis-polars package:

conda install -c conda-forge ibis-polars

Connect using ibis.polars.connect.

Install the ibis-postgres package:

conda install -c conda-forge ibis-postgres

Connect using ibis.postgres.connect.

Install the ibis-pyspark package:

conda install -c conda-forge ibis-pyspark

Connect using ibis.pyspark.connect.

Install the ibis-risingwave package:

conda install -c conda-forge ibis-risingwave

Connect using ibis.risingwave.connect.

Install the ibis-snowflake package:

conda install -c conda-forge ibis-snowflake

Connect using ibis.snowflake.connect.

Install the ibis-sqlite package:

conda install -c conda-forge ibis-sqlite

Connect using ibis.sqlite.connect.

Install the ibis-trino package:

conda install -c conda-forge ibis-trino

Connect using ibis.trino.connect.

Install the ibis-bigquery package:

mamba install -c conda-forge ibis-bigquery

Connect using ibis.bigquery.connect.

Install the ibis-clickhouse package:

mamba install -c conda-forge ibis-clickhouse

Connect using ibis.clickhouse.connect.

Install the ibis-dask package:

mamba install -c conda-forge ibis-dask

Connect using ibis.dask.connect.

Install the ibis-datafusion package:

mamba install -c conda-forge ibis-datafusion

Connect using ibis.datafusion.connect.

Install the ibis-druid package:

mamba install -c conda-forge ibis-druid

Connect using ibis.druid.connect.

Install the ibis-duckdb package:

mamba install -c conda-forge ibis-duckdb

Connect using ibis.duckdb.connect.

Install the ibis-exasol package:

mamba install -c conda-forge ibis-exasol

Connect using ibis.exasol.connect.

Install the ibis-flink package:

mamba install -c conda-forge ibis-flink

Connect using ibis.flink.connect.

Install the ibis-impala package:

mamba install -c conda-forge ibis-impala

Connect using ibis.impala.connect.

Install the ibis-mssql package:

mamba install -c conda-forge ibis-mssql

Connect using ibis.mssql.connect.

Install the ibis-mysql package:

mamba install -c conda-forge ibis-mysql

Connect using ibis.mysql.connect.

Install the ibis-oracle package:

mamba install -c conda-forge ibis-oracle

Connect using ibis.oracle.connect.

Install the ibis-pandas package:

mamba install -c conda-forge ibis-pandas

Connect using ibis.pandas.connect.

Install the ibis-polars package:

mamba install -c conda-forge ibis-polars

Connect using ibis.polars.connect.

Install the ibis-postgres package:

mamba install -c conda-forge ibis-postgres

Connect using ibis.postgres.connect.

Install the ibis-pyspark package:

mamba install -c conda-forge ibis-pyspark

Connect using ibis.pyspark.connect.

Install the ibis-risingwave package:

mamba install -c conda-forge ibis-risingwave

Connect using ibis.risingwave.connect.

Install the ibis-snowflake package:

mamba install -c conda-forge ibis-snowflake

Connect using ibis.snowflake.connect.

Install the ibis-sqlite package:

mamba install -c conda-forge ibis-sqlite

Connect using ibis.sqlite.connect.

Install the ibis-trino package:

mamba install -c conda-forge ibis-trino

Connect using ibis.trino.connect.

Add the ibis-bigquery package:

pixi add ibis-bigquery

Connect using ibis.bigquery.connect.

Add the ibis-clickhouse package:

pixi add ibis-clickhouse

Connect using ibis.clickhouse.connect.

Add the ibis-dask package:

pixi add ibis-dask

Connect using ibis.dask.connect.

Add the ibis-datafusion package:

pixi add ibis-datafusion

Connect using ibis.datafusion.connect.

Add the ibis-druid package:

pixi add ibis-druid

Connect using ibis.druid.connect.

Add the ibis-duckdb package:

pixi add ibis-duckdb

Connect using ibis.duckdb.connect.

Add the ibis-exasol package:

pixi add ibis-exasol

Connect using ibis.exasol.connect.

Add the ibis-flink package:

pixi add ibis-flink

Connect using ibis.flink.connect.

Add the ibis-impala package:

pixi add ibis-impala

Connect using ibis.impala.connect.

Add the ibis-mssql package:

pixi add ibis-mssql

Connect using ibis.mssql.connect.

Add the ibis-mysql package:

pixi add ibis-mysql

Connect using ibis.mysql.connect.

Add the ibis-oracle package:

pixi add ibis-oracle

Connect using ibis.oracle.connect.

Add the ibis-pandas package:

pixi add ibis-pandas

Connect using ibis.pandas.connect.

Add the ibis-polars package:

pixi add ibis-polars

Connect using ibis.polars.connect.

Add the ibis-postgres package:

pixi add ibis-postgres

Connect using ibis.postgres.connect.

Add the ibis-pyspark package:

pixi add ibis-pyspark

Connect using ibis.pyspark.connect.

Add the ibis-risingwave package:

pixi add ibis-risingwave

Connect using ibis.risingwave.connect.

Add the ibis-snowflake package:

pixi add ibis-snowflake

Connect using ibis.snowflake.connect.

Add the ibis-sqlite package:

pixi add ibis-sqlite

Connect using ibis.sqlite.connect.

Add the ibis-trino package:

pixi add ibis-trino

Connect using ibis.trino.connect.

See the backend support matrix for details on operations supported. Open a feature request if you’d like to see support for an operation in a given backend. If the backend supports it, we’ll do our best to add it quickly!

Show quickstart

Quickstart

See the getting started tutorial for a more in-depth introduction to Ibis. Below is a quick overview.

import ibis
import ibis.selectors as s

ibis.options.interactive = True

t = ibis.examples.penguins.fetch()
t.head(3)

1: Ensure you install Ibis first.
2: Use interactive mode for exploratory data analysis (EDA) or demos.
3: Load a dataset from the built-in examples.
4: Display the table.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Ibis is a dataframe library with familiar syntax.

t[10:15]

1: Display a slice of the table.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           37.8 │          17.1 │               186 │        3300 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           37.8 │          17.3 │               180 │        3700 │ NULL   │  2007 │
│ Adelie  │ Torgersen │           41.1 │          17.6 │               182 │        3200 │ female │  2007 │
│ Adelie  │ Torgersen │           38.6 │          21.2 │               191 │        3800 │ male   │  2007 │
│ Adelie  │ Torgersen │           34.6 │          21.1 │               198 │        4400 │ male   │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

Show analytics

Analytics

Ibis is built for easy analytics at scale in Python.

(
    t.filter(ibis._["body_mass_g"] != None)
    .group_by(["species", "island"])
    .aggregate(count=ibis._.count())
    .order_by(ibis.desc("count"))
)

1: Group by species and island, and compute the number of rows in each group.

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ int64 │
├───────────┼───────────┼───────┤
│ Gentoo    │ Biscoe    │   123 │
│ Chinstrap │ Dream     │    68 │
│ Adelie    │ Dream     │    56 │
│ Adelie    │ Torgersen │    51 │
│ Adelie    │ Biscoe    │    44 │
└───────────┴───────────┴───────┘

Show EDA + visualization

Exploratory data analysis (EDA) and visualization

Exploratory data analysis

Ibis has built-in methods for exploration and visualization.

num_species = int(t.select("species").nunique().to_pandas())
t["species"].topk(num_species)

1: Compute the number of species in the dataset.
2: Display the top species by count.

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ species   ┃ CountStar(penguins) ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ string    │ int64               │
├───────────┼─────────────────────┤
│ Adelie    │                 152 │
│ Gentoo    │                 124 │
│ Chinstrap │                  68 │
└───────────┴─────────────────────┘

Visualization

Ibis works with any Python plotting library that supports the dataframe interchange protocol.

grouped = (
    t.group_by("species")
    .aggregate(count=ibis._.count())
    .order_by(ibis.desc("count"))
)
grouped

1: Setup data to plot.
2: Display the table.

┏━━━━━━━━━━━┳━━━━━━━┓
┃ species   ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━┩
│ string    │ int64 │
├───────────┼───────┤
│ Adelie    │   152 │
│ Gentoo    │   124 │
│ Chinstrap │    68 │
└───────────┴───────┘

pip install altair

import altair as alt

chart = (
    alt.Chart(grouped.to_pandas())
    .mark_bar()
    .encode(
        x="species",
        y="count",
        tooltip=["species", "count"],
    )
    .properties(width=600, height=400)
    .interactive()
)
chart

pip install matplotlib

import matplotlib.pyplot as plt

chart = grouped.to_pandas().plot.bar(
    x="species",
    y="count",
    figsize=(600 / 100, 400 / 100),
)
plt.show()

pip install plotly

import plotly.express as px

chart = px.bar(
    grouped.to_pandas(),
    x="species",
    y="count",
    width=600,
    height=400,
)
chart

pip install plotnine

from plotnine import ggplot, aes, geom_bar, theme

chart = (
    ggplot(
        grouped,
        aes(x="species", y="count"),
    )
    + geom_bar(stat="identity")
    + theme(figure_size=(600 / 100, 400 / 100))
)
chart

pip install seaborn

import seaborn as sns

chart = sns.barplot(
    data=grouped.to_pandas(),
    x="species",
    y="count",
)
chart.figure.set_size_inches(600 / 100, 400 / 100)

Show data science

Data science

Use Ibis with your favorite data science libraries for concise and efficient workflows.

import ibis.selectors as s


def transform(t):
    t = t.mutate(
        s.across(s.numeric(), {"zscore": lambda x: (x - x.mean()) / x.std()})
    ).dropna()
    return t


f = transform(t.drop("year"))
f.select("species", "island", s.contains("zscore"))

1: Import the selectors module.
2: Define a function to transform the table for code reuse (compute z-scores on numeric columns).
3: Apply the function to the table and assign it to a new variable.
4: Display the transformed table.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm_zscore ┃ bill_depth_mm_zscore ┃ flipper_length_mm_zscore ┃ body_mass_g_zscore ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━┩
│ string  │ string    │ float64               │ float64              │ float64                  │ float64            │
├─────────┼───────────┼───────────────────────┼──────────────────────┼──────────────────────────┼────────────────────┤
│ Adelie  │ Torgersen │             -0.883205 │             0.784300 │                -1.416272 │          -0.563317 │
│ Adelie  │ Torgersen │             -0.809939 │             0.126003 │                -1.060696 │          -0.500969 │
│ Adelie  │ Torgersen │             -0.663408 │             0.429833 │                -0.420660 │          -1.186793 │
│ Adelie  │ Torgersen │             -1.322799 │             1.088129 │                -0.562890 │          -0.937403 │
│ Adelie  │ Torgersen │             -0.846572 │             1.746426 │                -0.776236 │          -0.688012 │
│ Adelie  │ Torgersen │             -0.919837 │             0.328556 │                -1.416272 │          -0.719186 │
│ Adelie  │ Torgersen │             -0.864888 │             1.240044 │                -0.420660 │           0.590115 │
│ Adelie  │ Torgersen │             -0.516876 │             0.227280 │                -1.345156 │          -1.249141 │
│ Adelie  │ Torgersen │             -0.974787 │             2.050255 │                -0.705121 │          -0.500969 │
│ Adelie  │ Torgersen │             -1.707443 │             1.999617 │                -0.207315 │           0.247203 │
│ …       │ …         │                     … │                    … │                        … │                  … │
└─────────┴───────────┴───────────────────────┴──────────────────────┴──────────────────────────┴────────────────────┘

pip install scikit-learn

import plotly.express as px
from sklearn.decomposition import PCA

X = f.select(s.contains("zscore"))

n_components = 3
pca = PCA(n_components=n_components).fit(X)

t_pca = ibis.memtable(pca.transform(X)).rename(
    {"pc1": "col0", "pc2": "col1", "pc3": "col2"}
)

f = f.mutate(row_number=ibis.row_number().over()).join(
    t_pca.mutate(row_number=ibis.row_number().over()),
    "row_number",
)

px.scatter_3d(
    f.to_pandas(),
    x="pc1",
    y="pc2",
    z="pc3",
    color="species",
    symbol="island",
)

1: Import data science libraries
2: Select “features” (numeric columns) as X
3: Compute PCA
4: Create a table from the PCA results
5: Join the PCA results to the original table
6: Plot the results

Show input and output

Input and output

Ibis supports a variety of input and output options.

Data platforms

You can connect Ibis to any supported backend to read and write data in backend-native tables.

Code

con = ibis.duckdb.connect("penguins.ddb")
t = con.create_table("penguins", t.to_pyarrow(), overwrite=True)

con = ibis.duckdb.connect("penguins.ddb")
t = con.table("penguins")
t.head(3)

1: Connect to a backend.
2: Load a table.
3: Display the table.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

grouped = (
    t.group_by(["species", "island"])
    .aggregate(count=ibis._.count())
    .order_by(ibis.desc("count"))
)
con.create_table("penguins_grouped", grouped.to_pyarrow(), overwrite=True)

1: Create a lazily evaluated Ibis expression.
2: Write to a table.

┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species   ┃ island    ┃ count ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string    │ string    │ int64 │
├───────────┼───────────┼───────┤
│ Gentoo    │ Biscoe    │   124 │
│ Chinstrap │ Dream     │    68 │
│ Adelie    │ Dream     │    56 │
│ Adelie    │ Torgersen │    52 │
│ Adelie    │ Biscoe    │    44 │
└───────────┴───────────┴───────┘

File formats

Depending on the backend, you can read and write data in several file formats.

pip install 'ibis-framework[duckdb]'

t.to_csv("penguins.csv")
ibis.read_csv("penguins.csv").head(3)

1: Write the table to a CSV file. Dependent on backend.
2: Read the CSV file into a table. Dependent on backend.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

pip install 'ibis-framework[duckdb,deltalake]'

t.to_delta("penguins.delta", mode="overwrite")
ibis.read_delta("penguins.delta").head(3)

1: Write the table to a Delta Lake table. Dependent on backend.
2: Read the Delta Lake table into a table. Dependent on backend.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

pip install 'ibis-framework[duckdb]'

t.to_parquet("penguins.parquet")
ibis.read_parquet("penguins.parquet").head(3)

1: Write the table to a Parquet file. Dependent on backend.
2: Read the Parquet file into a table. Dependent on backend.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ int64             │ int64       │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │               181 │        3750 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │               186 │        3800 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │               195 │        3250 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

With other Python libraries

Ibis uses Apache Arrow for efficient data transfer to and from other libraries. Ibis tables implement the __dataframe__ and __array__ protocols, so you can pass them to any library that supports these protocols.

You can convert Ibis tables to pandas dataframes.

pip install pandas

df = t.to_pandas()
df.head(3)

1: Returns a pandas dataframe.

	species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
0	Adelie	Torgersen	39.1	18.7	181.0	3750.0	male	2007
1	Adelie	Torgersen	39.5	17.4	186.0	3800.0	female	2007
2	Adelie	Torgersen	40.3	18.0	195.0	3250.0	female	2007

Or you can convert pandas dataframes to Ibis tables.

t = ibis.memtable(df)
t.head(3)

1: Returns an Ibis table.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

You can convert Ibis tables to Polars dataframes.

pip install polars

import polars as pl

df = pl.from_arrow(t.to_pyarrow())
df.head(3)

shape: (3, 8)

species	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	sex	year
str	str	f64	f64	f64	f64	str	i64
"Adelie"	"Torgersen"	39.1	18.7	181.0	3750.0	"male"	2007
"Adelie"	"Torgersen"	39.5	17.4	186.0	3800.0	"female"	2007
"Adelie"	"Torgersen"	40.3	18.0	195.0	3250.0	"female"	2007

Or Polars dataframes to Ibis tables.

t = ibis.memtable(df)
t.head(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

You can convert Ibis tables to PyArrow tables.

pip install pyarrow

t.to_pyarrow()

pyarrow.Table
species: string
island: string
bill_length_mm: double
bill_depth_mm: double
flipper_length_mm: double
body_mass_g: double
sex: string
year: int64
----
species: [["Adelie","Adelie","Adelie","Adelie","Adelie",...,"Chinstrap","Chinstrap","Chinstrap","Chinstrap","Chinstrap"]]
island: [["Torgersen","Torgersen","Torgersen","Torgersen","Torgersen",...,"Dream","Dream","Dream","Dream","Dream"]]
bill_length_mm: [[39.1,39.5,40.3,null,36.7,...,55.8,43.5,49.6,50.8,50.2]]
bill_depth_mm: [[18.7,17.4,18,null,19.3,...,19.8,18.1,18.2,19,18.7]]
flipper_length_mm: [[181,186,195,null,193,...,207,202,193,210,198]]
body_mass_g: [[3750,3800,3250,null,3450,...,4000,3400,3775,4100,3775]]
sex: [["male","female","female",null,"female",...,"male","female","male","male","female"]]
year: [[2007,2007,2007,2007,2007,...,2009,2009,2009,2009,2009]]

Or PyArrow batches:

t.to_pyarrow_batches()

<pyarrow.lib.RecordBatchReader at 0x7fff5015b780>

And you can convert PyArrow tables to Ibis tables.

ibis.memtable(t.to_pyarrow()).head(3)

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ bill_length_mm ┃ bill_depth_mm ┃ flipper_length_mm ┃ body_mass_g ┃ sex    ┃ year  ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ float64        │ float64       │ float64           │ float64     │ string │ int64 │
├─────────┼───────────┼────────────────┼───────────────┼───────────────────┼─────────────┼────────┼───────┤
│ Adelie  │ Torgersen │           39.1 │          18.7 │             181.0 │      3750.0 │ male   │  2007 │
│ Adelie  │ Torgersen │           39.5 │          17.4 │             186.0 │      3800.0 │ female │  2007 │
│ Adelie  │ Torgersen │           40.3 │          18.0 │             195.0 │      3250.0 │ female │  2007 │
└─────────┴───────────┴────────────────┴───────────────┴───────────────────┴─────────────┴────────┴───────┘

You can convert Ibis tables to torch tensors.

pip install torch

t.select(s.numeric()).limit(3).to_torch()

{'col2': tensor([39.1000, 39.5000, 40.3000], dtype=torch.float64),
 'col3': tensor([18.7000, 17.4000, 18.0000], dtype=torch.float64),
 'col4': tensor([181., 186., 195.], dtype=torch.float64),
 'col5': tensor([3750., 3800., 3250.], dtype=torch.float64),
 'col7': tensor([2007, 2007, 2007], dtype=torch.int16)}

You can directly call the __dataframe__ protocol on Ibis tables, though this is typically handled by the library you’re using.

t.__dataframe__()

<ibis.expr.types.dataframe_interchange.IbisDataFrame at 0x7fff6c9f3590>

You can directly call the __array__ protocol on Ibis tables, though this is typically handled by the library you’re using.

t.__array__()

array([['Adelie', 'Torgersen', 39.1, ..., 3750.0, 'male', 2007],
       ['Adelie', 'Torgersen', 39.5, ..., 3800.0, 'female', 2007],
       ['Adelie', 'Torgersen', 40.3, ..., 3250.0, 'female', 2007],
       ...,
       ['Chinstrap', 'Dream', 49.6, ..., 3775.0, 'male', 2009],
       ['Chinstrap', 'Dream', 50.8, ..., 4100.0, 'male', 2009],
       ['Chinstrap', 'Dream', 50.2, ..., 3775.0, 'female', 2009]],
      dtype=object)

Show SQL + Python

SQL + Python

Ibis has the ibis.to_sql to generate SQL strings.

In a Jupyter notebook or IPython shell session, the output of ibis.to_sql will be syntax highlighted.

In a plain Python REPL use print(ibis.to_sql(...)) to pretty print SQL.

Ibis uses SQLGlot under the hood to allow passing a dialect parameter to SQL methods.

dialect = "bigquery"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  `t1`.`species`,
  `t1`.`island`,
  `t1`.`count`
FROM (
  SELECT
    `t0`.`species`,
    `t0`.`island`,
    COUNT(*) AS `count`
  FROM `penguins` AS `t0`
  GROUP BY
    1,
    2
) AS `t1`
ORDER BY
  `t1`.`count` DESC

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "snowflake"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" DESC NULLS LAST

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "oracle"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" "t0"
  GROUP BY
    "t0"."species",
    "t0"."island"
) "t1"
ORDER BY
  "t1"."count" DESC

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "mysql"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  `t1`.`species`,
  `t1`.`island`,
  `t1`.`count`
FROM (
  SELECT
    `t0`.`species`,
    `t0`.`island`,
    COUNT(*) AS `count`
  FROM `penguins` AS `t0`
  GROUP BY
    1,
    2
) AS `t1`
ORDER BY
  `t1`.`count` DESC

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "mssql"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  [t1].[species],
  [t1].[island],
  [t1].[count]
FROM (
  SELECT
    [t0].[species],
    [t0].[island],
    COUNT(*) AS [count]
  FROM [penguins] AS [t0]
  GROUP BY
    [t0].[species],
    [t0].[island]
) AS [t1]
ORDER BY
  [t1].[count] DESC

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "postgres"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" DESC NULLS LAST

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "sqlite"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" DESC

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘

dialect = "trino"
sql = ibis.to_sql(
    grouped,
    dialect=dialect,
)
sql

1: Set the dialect.
2: Convert the table to a SQL string.
3: Display the SQL string.

SELECT
  "t1"."species",
  "t1"."island",
  "t1"."count"
FROM (
  SELECT
    "t0"."species",
    "t0"."island",
    COUNT(*) AS "count"
  FROM "penguins" AS "t0"
  GROUP BY
    1,
    2
) AS "t1"
ORDER BY
  "t1"."count" DESC

You can chain Ibis expressions and .sql together.

con.sql(sql, dialect=dialect).filter(ibis._["species"] == "Adelie")

1: Chain .sql calls and Ibis expressions together.

┏━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━┓
┃ species ┃ island    ┃ count ┃
┡━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━┩
│ string  │ string    │ int64 │
├─────────┼───────────┼───────┤
│ Adelie  │ Dream     │    56 │
│ Adelie  │ Torgersen │    52 │
│ Adelie  │ Biscoe    │    44 │
└─────────┴───────────┴───────┘