Impala/HDFS intro and Setup

Getting started

You’re going to want to make sure you can import ibis

import ibis
import os

If you have WebHDFS available, connect to HDFS with according to your WebHDFS config. For kerberized or more complex HDFS clusters please look at http://hdfscli.readthedocs.org/en/latest/ for info on connecting. You can use a connection from that library instead of using hdfs_connect

hdfs_port = os.environ.get('IBIS_WEBHDFS_PORT', 50070)
hdfs = ibis.hdfs_connect(host='quickstart.cloudera', port=hdfs_port)

Finally, create the Ibis client

con = ibis.impala.connect('quickstart.cloudera', hdfs_client=hdfs)
con
<ibis.impala.client.ImpalaClient at 0x7f04f55f4950>

Obviously, substitute the parameters that are appropriate for your environment (see docstring for ibis.impala.connect). impala.connect uses the same parameters as Impyla’s (https://pypi.python.org/pypi/impyla) DBAPI interface