Impala is a massively parallel processing SQL query engine for Hadoop. In short, it is very very fast when running queries, with significant performances even over Hive.
For a start:
- Take a look at https://www.tableau.com/support/drivers
- Go to https://www.cloudera.com/documentation/other/connectors.html
- Select OS & driver connection type & version
- Driver will be downloaded
- Also download documentation at same page or use the link below (note that you can skip this for Linux)
- http://www.cloudera.com/documentation/other/connectors/impala-odbc/latest/Cloudera-ODBC-Driver-for-Impala-Install-Guide.pdf
- Follow the documentation for driver setup, main thing is to configure the DSN (Data Source Name)
First, go to Windows ODBC manager. Go to Windows > Search > 64-bit ODBC Manager.
Go to Impala IDE, HUE and check the database name to connect.
cd /opt/cloudera/impalaodbc/Setup
- copy odbc.ini & odbcinst.ini to user directory
cp odbc.ini ~
cp odbcinst.ini ~
nano odbc.ini
- Change data source name (DNS) to your liking
- add HOST=[your_ip]
- add PORT=[21050]
- change both files to hidden
mv odbc.ini .odbd.ini
mv odbcinst.ini .odbcinst.ini
- Locate where iodbc driver manager libraries are located. Use
sudo find / -name "*iodbc*
to search- Then add the following environment variables to the bash profile
nano .bash_profile
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:”/Volumes/Macintosh HD/usr/lib/“
export ODBCINI=~/.odbc.ini
export ODBCINSTINI=~/.odbcinst.ini
export CLOUDERAIMPALAODBCINI=~/.cloudera.impalaodbc.ini
Linux configuration is very straight forward. After installation of the driver, change the
/etc/odbcinst.ini
& /opt/cloudera/impalaodbc/lib/64/cloudera.impalaodbc.ini
files as stated in this link.
From the link, choose the driver and OS (Linux).
- In Tableau Desktop, go to
Cloudera Hadoop ODBC driver
- Enter Server, Port (21050), Type (Impala)
If the previous ODBC connection does not work,
click on Other Databases (ODBC)
and enter the details for DSN name, Server IP, & Port (21050)
- Other Databases (ODBC)
- At DSN > select the DSN name you created
- Click Connect
- Enter Server, Port (21050) & database (default)
- Click Sign In