Giter VIP home page Giter VIP logo

hive-virtual-schema's Introduction

Hive Virtual Schema

Build Status

Quality Gate Status

Security Rating Reliability Rating Maintainability Rating Technical Debt

Code Smells Coverage Duplicated Lines (%) Lines of Code

Overview

The Hive Virtual Schema provides an abstraction layer that makes an external Hive accessible from an Exasol database through regular SQL commands. The contents of the external Hive database are mapped to virtual tables which look like and can be queried as any regular Exasol table.

If you want to set up a Virtual Schema for a different database system, please head over to the Virtual Schemas Repository.

Features

  • Access a Hive database in read only mode from an Exasol database, using a Virtual Schema.

Table of Contents

Information for Users

Find all the documentation in the Virtual Schemas project.

Information for Developers

hive-virtual-schema's People

Contributors

anastasiiasergienko avatar ckunki avatar kaklakariada avatar morazow avatar pj-spoelders avatar shmuma avatar umitbuyuksahin avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rohankumardubey

hive-virtual-schema's Issues

Update dependencies

Update dependencies to remove references to discontinued repository maven.exasol.com

Currently build fails with

Failed to collect dependencies at com.exasol:exasol-testcontainers:jar:6.4.0 -> com.exasol:database-cleaner:jar:1.0.2 -> com.exasol:exasol-jdbc:jar:7.1.11: Failed to read artifact descriptor for com.exasol:exasol-jdbc:jar:7.1.11:

Could not transfer artifact com.exasol:exasol-jdbc:pom:7.1.11 from/to maven.exasol.com (https://maven.exasol.com/artifactory/exasol-releases)

Updating to latest version of exasol-testcontainers should do the trick.

Please consider to rename error-codes (requested by #29) in this release, as well.

Ensure all connections are closed on failure

Ticket mentioned as "IntRef" reported that when user cancels an SQL query in Exasol database the resulting Hive query does not get canceled.

The current ticket therefore requests to inspect and if possible to improve the connection handling in VSHIVE.
See also ticket exasol/virtual-schema-common-jdbc#151 for improved connection handling in VSCJDBC.

If possible then VSHIVE should ensure that especially in the case of an exception VSHIVE closes all connections.

Acceptance criteria

  1. Connection handling in VSHIVE is analysed and result of analysis is attached or linked to the current ticket
  2. If possible connection handling is improved

See for example HiveSqlDialect.java#L144

@Override
protected RemoteMetadataReader createRemoteMetadataReader() {
    try {
        return new HiveMetadataReader(this.connectionFactory.getConnection(), this.properties);
    } catch (final SQLException exception) {
        throw new RemoteMetadataReaderException(ExaError.messageBuilder("E-VSHIVE-1")
                .message("Unable to create Hive remote metadata reader. Caused by: {{cause|u}}",
                        exception.getMessage()) //
                .toString(), exception);
    }
}

Update tests to V8 VSHIVE

Since 2023-06-02 for version 8.18.1 of Exasol database a Docker image is available on Dockerhub.

The current ticket therefore requests to update the integration tests of VSHIVE to use version 8.18.1 as latest default version.

Please note sibling-tickets for all JDBC-based virtual schemas.

Update user guide with additional feedback

Situation

Recently we got a feedback from one of our customer that documentation, parts related to Kerberos, could be improved. We should update the docs accordingly to include the feedback.

Acceptance Criteria

  • Add short explanation on difference between Apache JDBC driver and Cloudera JDBC driver
  • Add remark on the difference of JDBC properties, for example, SSLTrustStorePwd instead of sslTrustStorePassword
  • Add remark on updating krb5.conf file, for example, moving parameters from included directories
  • Add section on enabling logging

Hide keystore and truststore file password when using SSL enabled Hive connection

Situation

It is possible to connect to Hive server that has SSL enabled. You can configure the driver to access a specific TrustStore or KeyStore that contains the appropriate certificate. The locations of the TrustStore and KeyStore can be provided in the connection URL.

For example:

'jdbc:hive2://<HOST>:10000;AuthMech=3;SSL=1;AllowSelfSignedCerts=1;SSLKeyStore=/buckets/bfsdefault/bucket1/keystore.jks;SSLKeyStorePwd=<KEYSTORE_PASSWORD>;SSLTrustStore=/buckets/bfsdefault/bucket1/truststore.jks;SSLTrustStorePwd=<TRUSTSTORE_PASSWORD>'

The KeyStore and TrustStore file passwords are visible in the connection URL. Maybe they can be hidden in the IDENTIFIER part of the connection object.

Acceptance Criteria

  • The truststore and keystore file passwords are hidden in connection object
  • The SSL enabled connection works

Reference to the hadoop-etl-udfs repo

Please note that User guide in this repo refers to the "create_kerberos_conn.py" file from the now archived repository hadoop-etl-udfs.

Now it just looks weird.

Maybe maintainer later even decides to remove the hadoop-etl-udfs repo, who knows ...

Please consider hosting the "create_kerberos_conn.py" file in an active project.

Link to user Guide to Kerberos on docs.exasol

File KerberosConfigurationCreator.java already links to docs.exasol.

Please note that the official Exasol documentation

  • for the latest version V8 does not contain any reference to Kerberos anymore,
  • while for V7 this is still the case.

See IntRef linking to doc-ticket.

Acceptance criteria

The current ticket therefore requests to

  • add a link to the VSHIVE User Guide to make this easier to find and follow for the users of VSHIVE.

Update Apache Thrift

Situation

Dependabot reported the following CVE in the Apache Thrift dependency:

CVE-2020-13949
high severity
Vulnerable versions: >= 0.9.3, <= 0.13.0
Patched version: 0.14.0

In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could send short messages which would result in a large memory allocation, potentially leading to denial of service.

https://github.com/exasol/hive-virtual-schema/security/dependabot/pom.xml/org.apache.thrift:libthrift/open

Acceptance Criteria

  • Apache Thrift dependency updated to 0.14.0 or later

๐Ÿ” CVE-2024-26308: org.apache.commons:commons-compress:jar:1.25.0:test

Summary

Allocation of Resources Without Limits or Throttling vulnerability in Apache Commons Compress.This issue affects Apache Commons Compress: from 1.21 before 1.26.

Users are recommended to upgrade to version 1.26, which fixes the issue.

CVE: CVE-2024-26308
CWE: CWE-770

References

Generated query depends on locale

The Hive VS generates queries that depend on the default locale. E.g. when the locale is en_DE, doubles are formatted with a , as decimal point which potentially breaks the query.

To reproduce the problem, run unit tests like this:

_JAVA_OPTIONS="-Duser.country=DE -Duser.language=en" mvn test

This will cause unit tests to fail. In contrast, running with locale en_US the tests will succeed:

_JAVA_OPTIONS="-Duser.country=US -Duser.language=en" mvn test

To solve this, we need to upgrade to a new virtual-schema-common-jdbc that fixes exasol/virtual-schema-common-jdbc#119

Review Hive JDBC driver dependencies

Check exclusions list and remove org.apache.thrift:libthrift dependency from our pom when a new version of Hive JDBC driver is released.

Todo list:

  1. org.apache.hive:hive-jdbc (current version 3.1.2)
    If a new version is released:
  • Check if excluded libraries were updated and if they don't contain security risks anymore (comment out exclusions and run audit-dependencies)
  • Check if org.apache.thrift:libthrift version is at least 0.13.0 in the new JDBC driver. If it is, remove libthrift dependency from our pom file.
  1. org.apache.hbase:hbase-server (current version 2.2.5)
    If a new version is released:
    -Also check if excluded libraries were updated

Check if dialect can support the recently added functions

Situation

We have added a few new functions to the common part recently.
We need to check if some dialects could support them. The list of the new function capabilities to check:

  • FN_BIT_LROTATE -> missing in Hive
  • FN_BIT_RROTATE -> missing in Hive
  • FN_BIT_LSHIFT -> added
  • FN_BIT_RSHIFT -> added
  • FN_FROM_POSIX_TIME -> missing in Hive
  • FN_HOUR -> added
  • FN_INITCAP -> added
  • FN_AGG_EVERY -> missing in Hive
  • FN_AGG_SOME -> missing in Hive
  • FN_AGG_MUL_DISTINCT -> missing in Hive
  • FN_PRED_IS_JSON -> missing in Hive
  • FN_PRED_IS_NOT_JSON -> missing in Hive
  • FN_HASHTYPE_MD5 -> missing in Hive
  • FN_HASHTYPE_SHA1 -> missing in Hive
  • FN_HASHTYPE_SHA256 -> missing in Hive
  • FN_HASHTYPE_SHA512 -> missing in Hive
  • FN_HASHTYPE_TIGER -> missing in Hive
  • FN_AGG_MUL -> missing in Hive
  • FN_JSON_VALUE -> is not fully supported in Hive (no ON EMPTY and ON ERROR clauses), so I've decided not to add it
  • FN_MIN_SCALE -> missing in Hive
  • FN_AGG_LISTAGG -> missing in Hive
  • FN_AGG_LISTAGG_DISTINCT -> missing in Hive
  • FN_AGG_LISTAGG_SEPARATOR -> missing in Hive
  • FN_AGG_LISTAGG_ON_OVERFLOW_ERROR -> missing in Hive
  • FN_AGG_LISTAGG_ON_OVERFLOW_TRUNCATE -> missing in Hive
  • FN_AGG_LISTAGG_ORDER_BY -> missing in Hive
  • FN_AGG_COUNT_TUPLE -> added

๐Ÿ” CVE-2024-25710: org.apache.commons:commons-compress:jar:1.25.0:test

Summary

Loop with Unreachable Exit Condition ('Infinite Loop') vulnerability in Apache Commons Compress.This issue affects Apache Commons Compress: from 1.3 through 1.25.0.

Users are recommended to upgrade to version 1.26.0 which fixes the issue.

CVE: CVE-2024-25710
CWE: CWE-835

References

Update to VSCJDBC 10.0.1

Update dependencies to use enhanced Datatype Detection For Result Sets from virtual-schemas-common-jdbc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.