Developing and Contributing to Ibis

For a primer on general open source contributions, see the pandas contribution guide. The project will be run much like pandas has been.

Linux Test Environment Setup

Conda Environment Setup

  1. Install the latest version of miniconda:

    # Download the miniconda bash installer
    curl -Ls -o $HOME/miniconda.sh \
        https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    
    # Run the installer
    bash $HOME/miniconda.sh -b -p $HOME/miniconda
    
    # Put the conda command on your PATH
    export PATH="$HOME/miniconda/bin:$PATH"
    
  2. Install the development environment of your choice (Python 3.6 in this example), activate and install ibis in development mode:

    # Create a conda environment ready for ibis development
    conda env create --name ibis36 --file=ci/requirements-dev-3.6.yml
    
    # Activate the conda environment
    source activate ibis36
    
    # Install ibis
    python setup.py develop
    

All-in-One Command

The following command does three steps:

  1. Downloads the test data

  2. Starts each backend via docker-compose

  3. Initializes the backends with the test tables

    cd ci
    bash build.sh
    

To use specific backends follow the instructions below.

Download Test Dataset

  1. Install docker

  2. Download the test data:

    By default this will download and extract the dataset under testing/ibis-testing-data.

    ci/datamgr.py download
    

Setting Up Test Databases

To start each backends

cd ci
docker-compose up

Impala (with UDFs)

  1. Start the Impala docker image in another terminal:

    # Keeping this running as long as you want to test ibis
    docker run --tty --rm --hostname impala cpcloud86/impala:java8
    
  2. Load data and UDFs into impala:

    ci/impalamgr.py load --data --data-dir ibis-testing-data
    

Clickhouse

  1. Start the Clickhouse Server docker image in another terminal:

    # Keeping this running as long as you want to test ibis
    docker run --rm -p 9000:9000 --tty yandex/clickhouse-server
    
  2. Load data:

    ci/datamgr.py clickhouse
    

PostgreSQL

PostgreSQL can be used from either the installation that resides on the Impala docker image or from your machine directly.

Here’s how to load test data into PostgreSQL:

ci/datamgr.py postgres

SQLite

SQLite comes already installed on many systems. If you used the conda setup instructions above, then SQLite will be available in the conda environment.

ci/datamgr.py sqlite

MapD

MapD can be used from either a docker image or from your machine directly.

  1. Start the MapD Server docker image in another terminal:

    # Keeping this running as long as you want to test ibis
    docker run -d -v $HOME/mapd-docker-storage:/mapd-storage -p 9090-9092:9090-9092 mapd/mapd-ce-cpu
    

Here’s how to load test data into MapD:

ci/datamgr.py mapd

Running Tests

You are now ready to run the full ibis test suite:

pytest ibis

Contribution Ideas

Here’s a few ideas to think about outside of participating in the primary development roadmap:

  • Documentation
  • Use cases and IPython notebooks
  • Other SQL-based backends (Presto, Hive, Spark SQL)
  • S3 filesytem support
  • Integration with MLLib via PySpark