API Reference

Creating connections

These methods are in the ibis module namespace, and your main point of entry to using Ibis.

hdfs_connect([host, port, protocol, ...]) Connect to HDFS

Impala client

These methods are available on the Impala client object after connecting to your HDFS cluster (ibis.hdfs_connect) and connecting to Impala with ibis.impala.connect.

connect([host, port, database, timeout, ...]) Create an ImpalaClient for use with Ibis.
ImpalaClient.close() Close Impala connection and drop any temporary objects
ImpalaClient.database([name]) Create a Database object for a given database name that can be used for

Database methods

ImpalaClient.set_database(name) Set the default database scope for client
ImpalaClient.create_database(name[, path, force]) Create a new Impala database
ImpalaClient.drop_database(name[, force]) Drop an Impala database
ImpalaClient.list_databases([like]) List databases in the Impala cluster.
ImpalaClient.exists_database(name) Checks if a given database exists
ImpalaDatabase.create_table(table_name[, obj]) Dispatch to ImpalaClient.create_table.
ImpalaDatabase.drop([force]) Drop the database
ImpalaDatabase.namespace(ns) Creates a derived Database instance for collections of objects having a common prefix.
ImpalaDatabase.table(name) Return a table expression referencing a table in this database

Table methods

The ImpalaClient object itself has many helper utility methods. You’ll find the most methods on ImpalaTable.

ImpalaClient.database([name]) Create a Database object for a given database name that can be used for
ImpalaClient.table(name[, database]) Create a table expression that references a particular table in the
ImpalaClient.sql(query) Convert a SQL query to an Ibis table expression
ImpalaClient.raw_sql(query[, results]) Execute a given query string.
ImpalaClient.list_tables([like, database]) List tables in the current (or indicated) database.
ImpalaClient.exists_table(name[, database]) Determine if the indicated table or view exists
ImpalaClient.drop_table(table_name[, ...]) Drop an Impala table
ImpalaClient.create_table(table_name[, obj, ...]) Create a new table in Impala using an Ibis table expression.
ImpalaClient.insert(table_name[, obj, ...]) Insert into existing table.
ImpalaClient.truncate_table(table_name[, ...]) Delete all rows from, but do not drop, an existing table
ImpalaClient.get_schema(table_name[, database]) Return a Schema object for the indicated table and database
ImpalaClient.cache_table(table_name[, ...]) Caches a table in cluster memory in the given pool.
ImpalaClient.load_data(table_name, path[, ...]) Wraps the LOAD DATA DDL statement.
ImpalaClient.get_options() Return current query options for the Impala session
ImpalaClient.set_options(options)
ImpalaClient.set_compression_codec(codec) Parameters

The best way to interact with a single table is through the ImpalaTable object you get back from ImpalaClient.table.

ImpalaTable.add_partition(spec[, location]) Add a new table partition, creating any new directories in HDFS if necessary.
ImpalaTable.alter([location, format, ...]) Change setting and parameters of the table.
ImpalaTable.alter_partition(spec[, ...]) Change setting and parameters of an existing partition
ImpalaTable.column_stats() Return results of SHOW COLUMN STATS as a pandas DataFrame
ImpalaTable.compute_stats([incremental, async]) Invoke Impala COMPUTE STATS command to compute column, table, and partition statistics.
ImpalaTable.describe_formatted() Return parsed results of DESCRIBE FORMATTED statement
ImpalaTable.drop() Drop the table from the database
ImpalaTable.drop_partition(spec) Drop an existing table partition
ImpalaTable.files() Return results of SHOW FILES statement
ImpalaTable.insert([obj, overwrite, ...]) Insert into Impala table.
ImpalaTable.invalidate_metadata()
ImpalaTable.load_data(path[, overwrite, ...]) Wraps the LOAD DATA DDL statement.
ImpalaTable.metadata() Return parsed results of DESCRIBE FORMATTED statement
ImpalaTable.partition_schema() For partitioned tables, return the schema (names and types) for the
ImpalaTable.partitions() Return a pandas.DataFrame giving information about this table’s partitions.
ImpalaTable.refresh()
ImpalaTable.rename(new_name[, database]) Rename table inside Impala.
ImpalaTable.stats() Return results of SHOW TABLE STATS as a DataFrame.

Creating views is also possible:

ImpalaClient.create_view(name, expr[, database]) Create an Impala view from a table expression
ImpalaClient.drop_view(name[, database, force]) Drop an Impala view
ImpalaClient.drop_table_or_view(name[, ...]) Attempt to drop a relation that may be a view or table

Accessing data formats in HDFS

ImpalaClient.avro_file(hdfs_dir, avro_schema) Create a (possibly temporary) table to read a collection of Avro data.
ImpalaClient.delimited_file(hdfs_dir, schema) Interpret delimited text files (CSV / TSV / etc.) as an Ibis table.
ImpalaClient.parquet_file(hdfs_dir[, ...]) Make indicated parquet file in HDFS available as an Ibis table.

Executing expressions

ImpalaClient.execute(expr[, params, limit, ...]) Compile and execute Ibis expression using this backend client
ImpalaClient.disable_codegen([disabled]) Turn off or on LLVM codegen in Impala query execution

PostgreSQL client

The PostgreSQL client is accessible through the ibis.postgres namespace.

Use ibis.postgres.connect with a SQLAlchemy-compatible connection string to create a client.

connect([host, user, password, port, ...]) Create an Ibis client connected to a PostgreSQL database.
PostgreSQLClient.database([name]) Create a Database object for a given database name that can be used for
PostgreSQLClient.list_tables([like, database]) List tables in the current (or indicated) database.
PostgreSQLClient.list_databases()
PostgreSQLClient.table(name[, database]) Create a table expression that references a particular table in the

SQLite client

The SQLite client is accessible through the ibis.sqlite namespace.

Use ibis.sqlite.connect to create a SQLite client.

connect([path, create]) Create an Ibis client connected to a SQLite database.
SQLiteClient.attach(name, path[, create]) Connect another SQLite database file
SQLiteClient.database([name]) Create a Database object for a given database name that can be used for
SQLiteClient.list_tables([like, database]) List tables in the current (or indicated) database.
SQLiteClient.table(name[, database]) Create a table expression that references a particular table in the

HDFS

Client objects have an hdfs attribute you can use to interact directly with HDFS.

HDFS.ls(hdfs_path[, status]) Return contents of directory
HDFS.chmod(hdfs_path, permissions) Change permissions of a file of directory
HDFS.chown(hdfs_path[, owner, group]) Change owner (and/or group) of a file or directory
HDFS.get(hdfs_path[, local_path, overwrite]) Download remote file or directory to the local filesystem
HDFS.head(hdfs_path[, nbytes, offset]) Retrieve the requested number of bytes from a file
HDFS.put(hdfs_path, resource[, overwrite, ...]) Write file or directory to HDFS
HDFS.put_tarfile(hdfs_path, local_path[, ...]) Write contents of tar archive to HDFS directly without having to
HDFS.rm(path) Delete a single file
HDFS.rmdir(path) Delete a directory and all its contents
HDFS.size(hdfs_path) Return total size of file or directory
HDFS.status(path)

Top-level expression APIs

These methods are available directly in the ibis module namespace.

case() Similar to the .case method on array expressions, create a case builder
literal(value) Create a scalar expression from a Python value
schema([pairs, names, types]) Validate and return an Ibis Schema object
table(schema[, name]) Create an unbound Ibis table for creating expressions.
timestamp(value) Returns a timestamp literal if value is likely coercible to a timestamp
where(boolean_expr, true_expr, false_null_expr) Equivalent to the ternary expression: if X then Y else Z
ifelse(arg, true_expr, false_expr) Shorthand for implementing ternary expressions
coalesce(*args) Compute the first non-null value(s) from the passed arguments in left-to-right order.
greatest(*args) Compute the largest value (row-wise, if any arrays are present) among the supplied arguments.
least(*args) Compute the smallest value (row-wise, if any arrays are present) among the supplied arguments.
negate(arg) Negate a numeric expression
desc(expr) Create a sort key (when used in sort_by) by the passed array expression or column name.
now() Compute the current timestamp
NA A scalar value expression representing NULL
null() Create a NULL/NA scalar
expr_list(exprs)
row_number() Analytic function for the current row number, starting at 0
window([preceding, following, group_by, ...]) Create a window clause for use with window (analytic and aggregate) functions.
trailing_window(periods[, group_by, order_by]) Create a trailing window for use with aggregate window functions.
cumulative_window([group_by, order_by]) Create a cumulative window clause for use with aggregate window functions.

General expression methods

Expr.compile([limit]) Compile expression to whatever execution target, to verify
Expr.equals(other)
Expr.execute([limit, async]) If this expression is based on physical tables in a database backend, execute it against that backend.
Expr.pipe(f, *args, **kwargs) Generic composition function to enable expression pipelining
Expr.verify() Returns True if expression can be compiled to its attached client

Table methods

TableExpr.add_column(expr[, name]) Add indicated column expression to table, producing a new table.
TableExpr.aggregate(table[, metrics, by, having]) Aggregate a table with a given set of reductions, with grouping expressions, and post-aggregation filters.
TableExpr.count() Returns the computed number of rows in the table expression
TableExpr.distinct() Compute set of unique rows/tuples occurring in this table
TableExpr.info([buf]) Similar to pandas DataFrame.info.
TableExpr.filter(table, predicates) Select rows from table based on boolean expressions
TableExpr.get_column(name) Get a reference to a single column from the table
TableExpr.get_columns(iterable) Get multiple columns from the table
TableExpr.group_by([by]) Create an intermediate grouped table expression, pending some group operation to be applied with it.
TableExpr.groupby([by]) Create an intermediate grouped table expression, pending some group operation to be applied with it.
TableExpr.limit(table, n[, offset]) Select the first n rows at beginning of table (may not be deterministic depending on implementatino and presence of a sorting).
TableExpr.mutate(table[, exprs]) Convenience function for table projections involving adding columns
TableExpr.projection(table, exprs) Compute new table expression with the indicated column expressions from this table.
TableExpr.relabel(table, substitutions[, ...]) Change table column names, otherwise leaving table unaltered
TableExpr.schema() Get the schema for this table (if one is known)
TableExpr.set_column(table, name, expr) Replace an existing column with a new expression
TableExpr.sort_by(table, sort_exprs) Sort table by the indicated column expressions and sort orders
TableExpr.union(left, right[, distinct]) Form the table set union of two table expressions having identical schemas.
TableExpr.view() Create a new table expression that is semantically equivalent to the current one, but is considered a distinct relation for evaluation purposes (e.g.
TableExpr.join(left, right[, predicates, how]) Perform a relational join between two tables.
TableExpr.cross_join(*args, **kwargs) Perform a cross join (cartesian product) amongst a list of tables, with
TableExpr.inner_join(other[, predicates]) Perform a relational join between two tables.
TableExpr.left_join(other[, predicates]) Perform a relational join between two tables.
TableExpr.outer_join(other[, predicates]) Perform a relational join between two tables.
TableExpr.semi_join(other[, predicates]) Perform a relational join between two tables.
TableExpr.anti_join(other[, predicates]) Perform a relational join between two tables.

Grouped table methods

GroupedTableExpr.aggregate([metrics])
GroupedTableExpr.count([metric_name]) Convenience function for computing the group sizes (number of rows per group) given a grouped table.
GroupedTableExpr.having(expr) Add a post-aggregation result filter (like the having argument in
GroupedTableExpr.mutate([exprs]) Returns a table projection with analytic / window functions applied.
GroupedTableExpr.order_by(expr) Expressions to use for ordering data for a window function computation.
GroupedTableExpr.over(window) Add a window clause to be applied to downstream analytic expressions
GroupedTableExpr.projection(exprs) Like mutate, but do not include existing table columns
GroupedTableExpr.size([metric_name]) Convenience function for computing the group sizes (number of rows per group) given a grouped table.

Generic value methods

Scalar or array methods

ValueExpr.between(arg, lower, upper) Check if the input expr falls between the lower/upper bounds passed.
ValueExpr.cast(arg, target_type) Cast value(s) to indicated data type.
ValueExpr.coalesce(*args) Compute the first non-null value(s) from the passed arguments in left-to-right order.
ValueExpr.fillna(arg, fill_value) Replace any null values with the indicated fill value
ValueExpr.isin(arg, values) Check whether the value expression is contained within the indicated list of values.
ValueExpr.notin(arg, values) Like isin, but checks whether this expression’s value(s) are not contained in the passed values.
ValueExpr.nullif(value, null_if_expr) Set values to null if they match/equal a particular expression (scalar or array-valued).
ValueExpr.hash(arg[, how]) Compute an integer hash value for the indicated value expression.
ValueExpr.isnull(arg) Returns true if values are null
ValueExpr.notnull(arg) Returns true if values are not null
ValueExpr.over(expr, window) Turn an aggregation or full-sample analytic operation into a windowed operation.
ValueExpr.typeof(arg) Return the data type of the argument according to the current backend
ValueExpr.add(other)
ValueExpr.sub(other)
ValueExpr.mul(other)
ValueExpr.div(other)
ValueExpr.pow(other)
ValueExpr.rdiv(other)
ValueExpr.rsub(other)
ValueExpr.case(arg) Create a new SimpleCaseBuilder to chain multiple if-else statements.
ValueExpr.cases(arg, case_result_pairs[, ...]) Create a case expression in one shot.
ValueExpr.substitute(arg, value[, ...]) Substitute (replace) one or more values in a value expression

Array methods

ArrayExpr.distinct(arg) Compute set of unique values occurring in this array.
ArrayExpr.count(expr[, where]) Compute cardinality / sequence size of expression.
ArrayExpr.min([where])
ArrayExpr.max([where])
ArrayExpr.approx_median([where])
ArrayExpr.approx_nunique([where])
ArrayExpr.group_concat(arg[, sep]) Concatenate values using the indicated separator (comma by default) to
ArrayExpr.nunique(arg) Shorthand for foo.distinct().count(); computing the number of unique values in an array.
ArrayExpr.summary(arg[, exact_nunique, prefix]) Compute a set of summary metrics from the input value expression
ArrayExpr.value_counts(arg[, metric_name]) Compute a frequency table for this value expression
ArrayExpr.first(arg)
ArrayExpr.last(arg)
ArrayExpr.dense_rank(arg) Compute position of first element within each equal-value group in sorted order, ignoring duplicate values.
ArrayExpr.rank(arg) Compute position of first element within each equal-value group in sorted order.
ArrayExpr.lag(arg[, offset, default])
ArrayExpr.lead(arg[, offset, default])
ArrayExpr.cummin(arg) Cumulative min.
ArrayExpr.cummax(arg) Cumulative max.

General numeric methods

Scalar or array methods

NumericValue.abs(arg) Absolute value
NumericValue.ceil(arg) Round up to the nearest integer value greater than or equal to this value
NumericValue.floor(arg) Round down to the nearest integer value less than or equal to this value
NumericValue.sign(arg)
NumericValue.exp(arg)
NumericValue.sqrt(arg)
NumericValue.log(arg[, base]) Perform the logarithm using a specified base
NumericValue.ln(arg) Natural logarithm
NumericValue.log2(arg) Logarithm base 2
NumericValue.log10(arg) Logarithm base 10
NumericValue.round(arg[, digits]) Round values either to integer or indicated number of decimal places.
NumericValue.nullifzero(arg) Set values to NULL if they equal to zero.
NumericValue.zeroifnull(arg)

Array methods

NumericArray.sum([where])
NumericArray.mean([where])
NumericArray.std(arg[, where, how]) Compute standard deviation of numeric array
NumericArray.var(arg[, where, how]) Compute standard deviation of numeric array
NumericArray.cumsum(arg) Cumulative sum.
NumericArray.cummean(arg) Cumulative mean.
NumericArray.bottomk(arg, k[, by])
NumericArray.topk(arg, k[, by]) Produces
NumericArray.bucket(arg, buckets[, closed, ...]) Compute a discrete binning of a numeric array
NumericArray.histogram(arg[, nbins, ...]) Compute a histogram with fixed width bins

Integer methods

Scalar or array methods

IntegerValue.convert_base(arg, from_base, ...) Convert number (as integer or string) from one base to another
IntegerValue.to_timestamp(arg[, unit]) Convert integer UNIX timestamp (at some resolution) to a timestamp type

String methods

All string operations are valid either on scalar or array values

StringValue.convert_base(arg, from_base, to_base) Convert number (as integer or string) from one base to another
StringValue.length(arg) Compute length of strings
StringValue.lower(arg) Convert string to all lowercase
StringValue.upper(arg) Convert string to all uppercase
StringValue.reverse(arg)
StringValue.ascii_str(arg)
StringValue.strip(arg) Remove whitespace from left and right sides of string
StringValue.lstrip(arg) Remove whitespace from left side of string
StringValue.rstrip(arg) Remove whitespace from right side of string
StringValue.capitalize(arg)
StringValue.contains(arg, substr) Determine if indicated string is exactly contained in the calling string.
StringValue.like(pattern) Wildcard fuzzy matching function equivalent to the SQL LIKE directive.
StringValue.parse_url(arg, extract[, key]) Returns the portion of a URL corresponding to a part specified
StringValue.substr(start[, length]) Pull substrings out of each string value by position and maximum length.
StringValue.left(nchars) Return left-most up to N characters from each string.
StringValue.right(nchars) Split up to nchars starting from end of each string.
StringValue.repeat(n) Returns the argument string repeated n times
StringValue.find(substr[, start, end]) Returns position (0 indexed) of first occurence of substring,
StringValue.translate(from_str, to_str) Returns string with set of ‘from’ characters replaced by set of ‘to’ characters.
StringValue.find_in_set(str_list) Returns postion (0 indexed) of first occurence of argument within a list of strings.
StringValue.join(strings) Joins a list of strings together using the calling string as a separator
StringValue.replace(arg, pattern, replacement) Replaces each exactly occurrence of pattern with given replacement string.
StringValue.lpad(length[, pad]) Returns string of given length by truncating (on right)
StringValue.rpad(length[, pad]) Returns string of given length by truncating (on right)
StringValue.rlike(arg, pattern) Search string values using a regular expression.
StringValue.re_search(arg, pattern) Search string values using a regular expression.
StringValue.re_extract(arg, pattern, index) Returns specified index, 0 indexed, from string based on regex pattern
StringValue.re_replace(arg, pattern, replacement) Replaces match found by regex with replacement string.

Timestamp methods

All timestamp operations are valid either on scalar or array values

TimestampValue.truncate(arg, unit) Zero out smaller-size units beyond indicated unit.
TimestampValue.year()
TimestampValue.month()
TimestampValue.day()
TimestampValue.hour()
TimestampValue.minute()
TimestampValue.second()
TimestampValue.millisecond()

Boolean methods

BooleanValue.ifelse(arg, true_expr, false_expr) Shorthand for implementing ternary expressions
BooleanArray.any(arg)
BooleanArray.all(arg)
BooleanArray.cumany(arg) Cumulative any
BooleanArray.cumall(arg) Cumulative all

Category methods

Category is a logical type with either a known or unknown cardinality. Values are represented semantically as integers starting at 0.

CategoryValue.label(arg, labels[, nulls]) Format a known number of categories as strings

Decimal methods

DecimalValue.precision(arg)
DecimalValue.scale(arg)