Installation and Getting Started¶
Getting up and running with Ibis involves installing the Python package and connecting to HDFS and Impala. If you don’t have a Hadoop cluster available with Impala, see Using Ibis with the Cloudera Quickstart VM below for instructions to use a VM to get up and running quickly.
Ibis requires a working Python 2.6, 2.7, or 3.4 installation. We recommend Anaconda.
Installing the Python package¶
Install ibis using
conda, whenever it becomes available):
pip install ibis-framework
This installs the
ibis library to your configured Python environment.
Ibis can also be installed with Kerberos support for its HDFS functionality:
pip install ibis-framework[kerberos]
Some platforms will require that you have Kerberos installed to build properly.
- Redhat / CentOS:
yum install krb5-devel
- Ubuntu / Debian:
apt-get install libkrb5-dev
Ibis SQLite Quickstart¶
See http://blog.ibis-project.org/sqlite-crunchbase-quickstart/ for a quickstart using SQLite. Otherwise read on to try out Ibis on Impala.
Creating a client¶
To create an Ibis “client”, you must first connect your services and assemble
the client using
import ibis hdfs = ibis.hdfs_connect(host=webhdfs_host, port=webhdfs_port) con = ibis.impala.connect(host=impala_host, port=impala_port, hdfs_client=hdfs)
Both method calls can take
auth_mechanism='LDAP' to connect to Kerberos clusters. Depending on your
cluster setup, this may also include SSL. See the API reference for more, along with the Impala shell reference, as the
connection semantics are identical.
We are collecting IPython notebooks for learning here: http://github.com/cloudera/ibis-notebooks. Some of these notebooks will be reproduced as part of the documentation.
Using Ibis with the Cloudera Quickstart VM¶
Using Ibis with Impala requires a running Impala cluster, so we have provided a lean VirtualBox image to simplify the process for those looking to try out Ibis (without setting up a cluster) or start contributing code to the project.
What follows are streamlined setup instructions for the VM. If you wish to
download it directly and setup from the
ova file, use this download link.
The VM was built with Oracle VirtualBox 4.3.28.
curl -s https://raw.githubusercontent.com/cloudera/ibis-notebooks/master/setup/bootstrap.sh | bash
To use Ibis with the special Cloudera Quickstart VM follow the below instructions:
- Make sure Anaconda is installed. You can get it from http://continuum.io/downloads. Now prepend the Anaconda Python to your path like this
pip install ibis-framework
git clone https://github.com/cloudera/ibis-notebooks.git
The setup script will download a VirtualBox appliance image and import it in
VirtualBox. In addition, it will create a new host only network adapter with
DHCP. After the VM is started, it will extract the current IP address and add a
new /etc/hosts entry pointing from the IP of the VM to the hostname
quickstart.cloudera. The reason for this entry is that Hadoop and HDFS
require a working reverse name mapping. If you don’t want to run the automated
steps make sure to check the individual steps in the file