Impala/HDFS intro and Setup

Getting started

You’re going to want to make sure you can import ibis

In [1]:
import ibis
import os

If you have WebHDFS available, connect to HDFS with according to your WebHDFS config. For kerberized or more complex HDFS clusters please look at http://hdfscli.readthedocs.org/en/latest/ for info on connecting. You can use a connection from that library instead of using hdfs_connect

In [2]:
hdfs_port = os.environ.get('IBIS_WEBHDFS_PORT', 50070)
hdfs = ibis.hdfs_connect(host='quickstart.cloudera', port=hdfs_port)

Finally, create the Ibis client

In [3]:
con = ibis.impala.connect('quickstart.cloudera', hdfs_client=hdfs)
con
Out[3]:
<ibis.impala.client.ImpalaClient at 0x7fa018d73eb8>

Obviously, substitute the parameters that are appropriate for your environment (see docstring for ibis.impala.connect). impala.connect uses the same parameters as Impyla’s (https://pypi.python.org/pypi/impyla) DBAPI interface