Giter VIP home page Giter VIP logo

apache / drill Goto Github PK

View Code? Open in Web Editor NEW
1.9K 157.0 980.0 69.57 MB

Apache Drill is a distributed MPP query layer for self describing data

Home Page: https://drill.apache.org/

License: Apache License 2.0

Java 96.94% CMake 0.07% C++ 1.64% Shell 0.33% C 0.09% Batchfile 0.02% FreeMarker 0.58% JavaScript 0.23% CSS 0.01% Dockerfile 0.03% ANTLR 0.06% HTML 0.01% HCL 0.01%
java big-data drill sql hive hadoop jdbc parquet

drill's Introduction

Apache Drill

Build Status Artifact License Stack Overflow Join Drill Slack

Apache Drill is a distributed MPP query layer that supports SQL and alternative query languages against NoSQL and Hadoop data storage systems. It was inspired in part by Google's Dremel.

Developers

Please read Environment.md for setting up and running Apache Drill. For complete developer documentation see DevDocs.md.

More Information

Please see the Apache Drill Website or the Apache Drill Documentation for more information including:

  • Remote Execution Installation Instructions
  • Running Drill on Docker instructions
  • Information about how to submit logical and distributed physical plans
  • More example queries and sample data
  • Find out ways to be involved or discuss Drill

Join the community!

Apache Drill is an Apache Foundation project and is seeking all types of users and contributions. Please say hello on the Apache Drill mailing list.You can also join our Google Hangouts or join our Slack Channel if you need help with using or developing Apache Drill (more information can be found on Apache Drill website).

Export Control

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country's laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See http://www.wassenaar.org/ for more information.
The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code. The following provides more details on the included cryptographic software: Java SE Security packages are used to provide support for authentication, authorization and secure sockets communication. The Jetty Web Server is used to provide communication via HTTPS. The Cyrus SASL libraries, Kerberos Libraries and OpenSSL Libraries are used to provide SASL based authentication and SSL communication.

drill's People

Contributors

adeneche avatar adityakishore avatar agozhiy avatar arina-ielchiieva avatar cgivre avatar cwestin avatar dsbos avatar hsuanyi avatar ilooner avatar jacques-n avatar jaltekruse avatar jinfengni avatar jnturton avatar kazydubb avatar kkhatua avatar laurentgo avatar luocooong avatar mehant avatar parthchandra avatar paul-rogers avatar pjfanning avatar ppadma avatar pwong-mapr avatar stevenmphillips avatar tdunning avatar vdiravka avatar vkorukanti avatar vrozov avatar vrtx avatar vvysotskyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

drill's Issues

Propage SqlState through JDBC

Is your feature request related to a problem? Please describe.
We have to handle certain SqlStates by parsing exception text, which is dangerous taking future changes into account.
Particularly we need to handle:

  • table does not exists when dropping or selecting a table
  • table already exists when creating a table

Describe the solution you'd like
SqlState is filled with deterministic codes.

Describe alternatives you've considered
We have to handle certain SqlStates by parsing exception text, which is dangerous taking future changes into account.

Additional context
My use case: we create cache tables in Drill/Dremio and register related metadata about those tables in our Redis.
It may happen that cache table disappears or Redis is purged.
Then we may try to create duplicate table or drop non-existing table.
We need to be able to detect TABLE DOES NOT EXIST and TABLE ALREADY EXISTS states and react accordingly.
Parsing exception text is definitely not optimal solution, message can be changed and bad things can happen.

[Notice] welcome the ideas and contribution with next release 1.20.0

Is your feature request related to a problem? Please describe.
  Hello guys. Apache Drill is a community-driven project. We are welcome any of your contributions (in any way). Drill team had invested a lot of time in the support of community since 2020 Q4.

Describe the solution you'd like
  We are regularly discussing how to attract more developers to contribute. Such as adding more friendly guides on the website, marking tasks for newcomers at the JIRA, and update the documentation in time... There are many ways to take part in the Drill :

  1. Create issues with your want (use the Issues on Github).
  2. Create JIRA on the issues.apache.org.
  3. Git fork & clone and contribute the PR. quick-start.
  4. Learning the Drill :
  1. Discussion using the Mailing Lists.
  2. Talk anything on the Slack channel. (Extremely active !)
  3. To Help with testing, feedback and resolve issues with the above ways.

Describe alternatives you've considered
  Before that, Drill does not enable the Issues on Github. However, we are happy to see that Drill community is more actively. YES, It's time to use a new simple way to talk with our users and developers.
  I recommend that you can create issues on the Github if you want to talk first, then create the JIRA on the apache issues. That, We can both keep the knowledge to the JIRA and quickly support the users and developers on the Github.

Additional context

  1. Welcome to contact us If you want to show your use case.
  2. Community Over Code. Hope to see you soon...

add setting to HTTP Storage Plugin to access https without validating the ssl certificate

many of the servers deployed locally with self signed certificate which cannot be verified or used by HTTP Storage Plugin

this will return error as the following

Error message: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

It will be great to add setting to HTTP Storage Plugin to access https sites without validating the ssl certificate or verifing the hostname.

for example, similar to apache drill SSL JDBC connection parameters disableCertificateVerification and disableHostVerification

Thanks

Trying to use the jdbcldap driver: ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Exception during pool initialization. com.octetstring.jdbcLdap.jndi.SQLNamingException: Operations Error

Hello

I'm trying to connect Apache Drill to an ldap server through the jdbcldap driver (https://github.com/elbosso/openldap-jdbcldap/)

But Apache drill (1.20) fails with com.octetstring.jdbcLdap.jndi.SQLNamingException: Operations Error

22:13:04.528 [1dbb462f-c8ea-71a8-63fc-3b4a03b7879b:foreman] ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Exception during pool initialization.
com.octetstring.jdbcLdap.jndi.SQLNamingException: Operations Error
	at com.novell.ldap.LDAPResponse.getResultException(LDAPResponse.java:402)
	at com.novell.ldap.LDAPResponse.chkResultCode(LDAPResponse.java:365)
	at com.novell.ldap.LDAPSearchResults.next(LDAPSearchResults.java:289)
	at com.novell.ldap.LDAPConnection.read(LDAPConnection.java:2897)
	at com.novell.ldap.LDAPConnection.read(LDAPConnection.java:2864)
	at com.novell.ldap.LDAPConnection.fetchSchema(LDAPConnection.java:4187)
	at com.octetstring.jdbcLdap.util.TableDef.<init>(TableDef.java:108)
	at com.octetstring.jdbcLdap.jndi.JndiLdapConnection.generateTables(JndiLdapConnection.java:819)
	at com.octetstring.jdbcLdap.jndi.JndiLdapConnection.<init>(JndiLdapConnection.java:404)
	at com.octetstring.jdbcLdap.sql.JdbcLdapDriver.connect(JdbcLdapDriver.java:100)
	at com.zaxxer.hikari.util.DriverDataSource.getConnection(DriverDataSource.java:138)
	at com.zaxxer.hikari.pool.PoolBase.newConnection(PoolBase.java:364)
	at com.zaxxer.hikari.pool.PoolBase.newPoolEntry(PoolBase.java:206)
	at com.zaxxer.hikari.pool.HikariPool.createPoolEntry(HikariPool.java:476)
	at com.zaxxer.hikari.pool.HikariPool.checkFailFast(HikariPool.java:561)
	at com.zaxxer.hikari.pool.HikariPool.<init>(HikariPool.java:115)
	at com.zaxxer.hikari.HikariDataSource.<init>(HikariDataSource.java:81)
	at org.apache.drill.exec.store.jdbc.JdbcStoragePlugin.initDataSource(JdbcStoragePlugin.java:161)
	at org.apache.drill.exec.store.jdbc.JdbcStoragePlugin.<init>(JdbcStoragePlugin.java:56)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
	at org.apache.drill.exec.store.ClassicConnectorLocator.create(ClassicConnectorLocator.java:273)
	at org.apache.drill.exec.store.ConnectorHandle.newInstance(ConnectorHandle.java:102)
	at org.apache.drill.exec.store.PluginHandle.plugin(PluginHandle.java:142)
	at org.apache.drill.exec.store.StoragePluginRegistryImpl.getPlugin(StoragePluginRegistryImpl.java:563)
	at org.apache.calcite.jdbc.DynamicRootSchema.loadSchemaFactory(DynamicRootSchema.java:107)
	at org.apache.calcite.jdbc.DynamicRootSchema.getSchema(DynamicRootSchema.java:87)
	at org.apache.calcite.jdbc.DynamicRootSchema.getImplicitSubSchema(DynamicRootSchema.java:73)
	at org.apache.calcite.jdbc.CalciteSchema.getSubSchema(CalciteSchema.java:265)
	at org.apache.calcite.sql.validate.SqlValidatorUtil.getSchema(SqlValidatorUtil.java:1050)
	at org.apache.drill.exec.planner.sql.conversion.DrillCalciteCatalogReader.isValidSchema(DrillCalciteCatalogReader.java:171)
	at org.apache.drill.exec.planner.sql.conversion.DrillValidator.validateFrom(DrillValidator.java:66)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:3383)
	at org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
	at org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:1009)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:969)
	at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:216)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:944)
	at org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:651)
	at org.apache.drill.exec.planner.sql.conversion.SqlConverter.validate(SqlConverter.java:190)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:647)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:195)
	at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:169)
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283)
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163)
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:128)
	at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93)
	at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:593)
	at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:274)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
22:13:05.631 [1dbb462f-c8ea-71a8-63fc-3b4a03b7879b:foreman] ERROR com.zaxxer.hikari.pool.HikariPool - HikariPool-2 - Exception during pool initialization.
com.octetstring.jdbcLdap.jndi.SQLNamingException: Operations Error

I opened also a ticket at
elbosso/openldap-jdbcldap#5

Fyi I can connect to ldap (&jdbcldap) using sqlline command line tool after solving some issues with the help of sqlline developers (@snuyanzin and @julianhyde) and @elbosso (julianhyde/sqlline#450)

Thanks in advance
Matteo

HTTP Storage Plugin: allow JSON in postBody

Is your feature request related to a problem? Please describe.

Hi all,
I just started out evaluating Drill for querying RDBMS and JSON document store (CouchDB) at the same time. Connecting to the CouchDB REST API gives me a bit of a headache though:

I set up the http storage plugin to connect to my CouchDB webservice {db}/_find endpoint which allows to query documents. The query parameters need to be supplied as JSON "selector" object in the Post body (CouchDB selector syntax). My experiments and the docs tell me, that it is not possible to pass a JSON object as the postBody parameter of a http storage plugin connection though:

postBody: Contains data, in the form of key value pairs, which are sent during a POST request. The post body should be in the of a block of text with key/value pairs: (...)

Describe the solution you'd like

Instead of only accepting key=value pairs in plain text as content of the postBody, allow full JSON object support.

Maybe this is already possible somehow? If so, I would greatly appreciate a quick pointer in the right direction.

Best regards,
Markus


My plugin config:

{
  "type": "http",
  "connections": {
    "couch": {
      "url": "http://thisismyhost.url:8080/medic/_find",
      "method": "POST",
      "headers": {
        "Content-Type": "application/json"
      },
      "authType": "basic",
      "userName": "user",
      "password": "password",
      "postBody": "{\"selector\":{\"_id\":{\"$gt\":0}}}",
      "params": null,
      "dataPath": "docs",
      "requireTail": false,
      "inputType": "json",
      "xmlDataLevel": 1
    }
  },
  "proxyType": "direct",
  "enabled": true
}

Slack invite expired

Hello, I would like to join your Slack but unfortunately the link in the readme seems to be expired 🙁

username as dynamic property in storage plugin

Is your feature request related to a problem? Please describe.
I would like to use Phoenix-Queryserver with active impersonation through jdbc storage plugin in drill.

At the moment we got phoenix-queryserver to work with kerberos auth and a specific keytab for drill. Drill is configured with active impersonation, so if Bob submits a query to drill, drill impersonates Bob agains HDFS and HBase.

Because of the fixed keytab and principals in jdbc string, all queries against phoenix are submitted with the user specified at the keytab.

To fulfill all security requirements we need active impersonation against phoenix.

Describe the solution you'd like
I would like to specify the original username of the drill query in the connection string of the jdbc storage plugin, as it is described here: https://phoenix.apache.org/server.html#Impersonation

{
  "type": "jdbc",
  "driver": "org.apache.phoenix.queryserver.client.Driver",
  "url": "jdbc:phoenix:thin:url=http://localhost:8765?doAs=$user;serialization=PROTOBUF;authentication=SPNEGO;principal=drill/[email protected];keytab=/etc/hadoop/conf/drill.keytab",
  "username": null,
  "password": null,
  "caseInsensitiveTableNames": false,
  "enabled": true
}

If Bob sends a Query like SELECT * FROM phoenix.schema.table drill should sends the query with doAs=Bob

Describe alternatives you've considered
none

Additional context
If i try to update the current configuration with the above example i get following error:

Please retry: Error while creating / updating storage : Remote driver error: RuntimeException: org.apache.phoenix.exception.PhoenixIOException: 
org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=$user, scope=SYSTEM:CATALOG, params=[table=SYSTEM:CATALOG],action=EXEC) 
 at org.apache.hadoop.hbase.security.access.AccessChecker.requirePermission(AccessChecker.java:127) 
 at org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:425) 
 at org.apache.hadoop.hbase.security.access.AccessController.preEndpointInvocation(AccessController.java:2123) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$69.call(RegionCoprocessorHost.java:1667) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$EndpointOperation.call(RegionCoprocessorHost.java:1771) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1827) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1810) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preEndpointInvocation(RegionCoprocessorHost.java:1662) 
 at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8518) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2282) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2264) 
 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36808) 
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2399) 
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) -> PhoenixIOException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=$user, scope=SYSTEM:CATALOG, params=[table=SYSTEM:CATALOG],action=EXEC) 
 at org.apache.hadoop.hbase.security.access.AccessChecker.requirePermission(AccessChecker.java:127) 
 at org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:425) 
 at org.apache.hadoop.hbase.security.access.AccessController.preEndpointInvocation(AccessController.java:2123) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$69.call(RegionCoprocessorHost.java:1667) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$EndpointOperation.call(RegionCoprocessorHost.java:1771) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1827) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1810) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preEndpointInvocation(RegionCoprocessorHost.java:1662) 
 at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8518) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2282) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2264) 
 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36808) 
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2399) 
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) -> AccessDeniedException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=$user, scope=SYSTEM:CATALOG, params=[table=SYSTEM:CATALOG],action=EXEC) 
 at org.apache.hadoop.hbase.security.access.AccessChecker.requirePermission(AccessChecker.java:127) 
 at org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:425) 
 at org.apache.hadoop.hbase.security.access.AccessController.preEndpointInvocation(AccessController.java:2123) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$69.call(RegionCoprocessorHost.java:1667) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$EndpointOperation.call(RegionCoprocessorHost.java:1771) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1827) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1810) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preEndpointInvocation(RegionCoprocessorHost.java:1662) 
 at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8518) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2282) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2264) 
 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36808) 
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2399) 
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) -> RemoteWithExtrasException: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions (user=$user, scope=SYSTEM:CATALOG, params=[table=SYSTEM:CATALOG],action=EXEC) 
 at org.apache.hadoop.hbase.security.access.AccessChecker.requirePermission(AccessChecker.java:127) 
 at org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:425) 
 at org.apache.hadoop.hbase.security.access.AccessController.preEndpointInvocation(AccessController.java:2123) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$69.call(RegionCoprocessorHost.java:1667) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost$EndpointOperation.call(RegionCoprocessorHost.java:1771) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperation(RegionCoprocessorHost.java:1827) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.execOperationWithResult(RegionCoprocessorHost.java:1810) 
 at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.preEndpointInvocation(RegionCoprocessorHost.java:1662) 
 at org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8518) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2282) 
 at org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2264) 
 at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36808) 
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2399) 
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) 
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)

So we can see that Phoenix Queryserver tries to impersonate, but got the fix string $user

Elasticsearch and RDBMS (Oracle) JOIN fails with ClassCastException

Describe the bug
I am trying to join data from Elasticsearch with data from an Oracle RDBMS. My query looks like this:

SELECT o.a, o.b, o.c, e.d, e.f FROM oracle.schema.table o JOIN elastic.index e ON o.a=e.a

My problem is that this query produces a ClassCastException:

Caused by: java.lang.ClassCastException: org.apache.calcite.plan.Convention$Impl cannot be cast to org.apache.drill.exec.store.jdbc.DrillJdbcConvention at org.apache.drill.exec.store.jdbc.JdbcPrel.<init>(JdbcPrel.java:55) at org.apache.drill.exec.store.jdbc.JdbcIntermediatePrel.finalizeRel(JdbcIntermediatePrel.java:65) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:322) at org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:272) at org.apache.calcite.rel.RelShuttleImpl.visitChild(RelShuttleImpl.java:55) at org.apache.calcite.rel.RelShuttleImpl.visitChildren(RelShuttleImpl.java:69) at org.apache.calcite.rel.RelShuttleImpl.visit(RelShuttleImpl.java:131) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:324) at org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:272) at org.apache.calcite.rel.RelShuttleImpl.visitChild(RelShuttleImpl.java:55) at org.apache.calcite.rel.RelShuttleImpl.visitChildren(RelShuttleImpl.java:69) at org.apache.calcite.rel.RelShuttleImpl.visit(RelShuttleImpl.java:131) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:324) at org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:272) at org.apache.calcite.rel.RelShuttleImpl.visitChild(RelShuttleImpl.java:55) at org.apache.calcite.rel.RelShuttleImpl.visitChildren(RelShuttleImpl.java:69) at org.apache.calcite.rel.RelShuttleImpl.visit(RelShuttleImpl.java:131) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:324) at org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:272) at org.apache.calcite.rel.RelShuttleImpl.visitChild(RelShuttleImpl.java:55) at org.apache.calcite.rel.RelShuttleImpl.visitChildren(RelShuttleImpl.java:69) at org.apache.calcite.rel.RelShuttleImpl.visit(RelShuttleImpl.java:131) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:324) at org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:272) at org.apache.calcite.rel.RelShuttleImpl.visitChild(RelShuttleImpl.java:55) at org.apache.calcite.rel.RelShuttleImpl.visitChildren(RelShuttleImpl.java:69) at org.apache.calcite.rel.RelShuttleImpl.visit(RelShuttleImpl.java:131) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler$PrelFinalizer.visit(DefaultSqlHandler.java:324) at org.apache.calcite.rel.AbstractRelNode.accept(AbstractRelNode.java:272) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:437) at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:174) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:283) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPhysicalPlan(DrillSqlWorker.java:163) at org.apache.drill.exec.planner.sql.DrillSqlWorker.convertPlan(DrillSqlWorker.java:140) at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:93) at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:592) at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:273) ... 1 common frames omitted

I've also tried joing through the WHERE clause but with the same result.

To Reproduce
Steps to reproduce the behavior:

  1. Configure Elasticsearch and Oracle storage
  2. Run query similiar to the one in the description
  3. See error

Expected behavior
A dataset is built consisting of columns retrieved from Elasticsearch and from Oracle joined on the common property

Screenshots
N/A

Desktop (please complete the following information):

  • Docker container built from commit 512ac6c
  • Chrome

Improve Docker image and documentation

It would be great if the Docker image could be improved;

  1. Drill documentation referrs to $DRILL_HOME environment variable which is not set in the Docker Image. Additionally in the doucmenation says it can be found in /etc/drill/conf while in the Docker image Drill has been installed into /opt/drill/conf
  2. Add information how to add (mount) custom configuration files into the Docker image or even better, just make sure there is a Docker section in Drill's documentation whereever appropriate

Drill Apache Druid Plugin support for V0.17 and up

The current Drill Apache Druid Plugin (drill/contrib/storage-druid) is not compatible with versions later than V0.16.
Later versions of druid have deprecated the SELECT API in favor of the SCAN API.

Rework the plugin to issue SCAN calls instead of SELECT calls so that is can be used with later versions of Druid.

I'm not aware of any alternatives.

Snowflake - adopt JDBC parameter CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX

Is your feature request related to a problem? Please describe.
I specify JDBC URL like this:
jdbc:snowflake://<account>.snowflakecomputing.com:443?warehouse=<warehouse>&db=<db>&CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX=true

Despite I use CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX=true , INFORMATION_SCHEMA.SCHEMATA contains schemata from all warehouses and databases (the user has access to everywhere).

In our platform, when I use this parameter and I call e.g. dbMetadata.getTables() , it collects only tables from corresponding warehouse and database.

Querying all warehouses can be very expensive causing SELECT from Drill INFORMATION_SCHEMA takes ages.

Describe the solution you'd like
Parameter CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX is properly propagated to the connection from Drill to Snowflake, only objects from warehouse/database specified in the JDBC URL are populated into Drill INFORMATION_SCHEMA.

Describe alternatives you've considered
The only workaround solution is to limit privileges of DB user (what warehouses/databases he can see). This is not a valid workaround in our case (testing user has read access to huge number of testing warehouses/databases).

XML format plugin concatenates attribute values from multiple sub-elements with the same name

When an XML element has multiple sub-elements of the same name, and those sub-elements have attributes, the attribute values get concatenated in a way that it's impossible to separate.

For example, start with the documentation's published "list of books" example. Add three sub-elements named "extra" to one of the books, each having two attributes (name and value). The following is excerpted from an XML that I have attached.

<book>
<author>Mark Twain</author>
<title>The Adventures of Tom Sawyer</title>
<category>FICTION</category>
<year>1876</year>
<extra name="width" value="6"/>
<extra name="height" value="10"/>
<extra name="depth" value="2"/>
</book>

The output for this turns into:
+-----------------------------------------------------------------+------------+---------------------------------+-------------+------+-----------------------------------------+
| attributes | author | title | category | year | authors |
+-----------------------------------------------------------------+------------+---------------------------------+-------------+------+-----------------------------------------+
| {"extra_name":"widthheightdepth","extra_value":"6102"} | Mark Twain | The Adventures of Tom Sawyer | FICTION | 1876 | {} |

It shows only one value for the "extra_name" attributes, which is the concatenation of the names "width", "height", and "depth" into "widthheightdepth". Similarly it only shows one value for the "extra_value" attributes, which is the concatenation of the values "6" "10" and "2" into "6102". Unfortunately it's impossible to know how to separate those concatenated strings.

I would have expected to see something like one of the following for the attributes output instead, so that the different attribute values are separable:

{{"extra_name":"width","extra_value":"6"},{"extra_name":"height","extra_value":"10"},{"extra_name":"depth","extra_value":"2"}}
or
{"extra_name":["width","height","depth"],"extra_value":["6","10","2"]}

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser: N/A
  • Version: 1.19.0

books-multiple-extras.xml.txt

Remove/Update contrib/storage-hive copy of log4j Strings.java

Is your feature request related to a problem? Please describe.

Relates to pjfanning/excel-streaming-reader#76.

https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/logging/log4j/util/Strings.java is a copy of the real Log4j class but causes issues for Apache POI Excel code that relies on the latest Log4j code.

Describe the solution you'd like

If this class is really necessary, could you consider updating it to match the latest Log4j code? As long as Drill keeps its own copy of this Strings class, it is likely to regularly hit issues with being incompatible with latest log4j releases.

https://issues.apache.org/jira/browse/HIVE-23088 is marked as fixed in recent Hive releases - so ideally, Drill team should remove the copy of the Strings class in Drill contrib/storage-hive. I can only find Hive 3.1.2 in Maven Central (this doesn't have the HIVE-23088 fix) - so I'm not really sure what Apache Hive team are doing with the release numbers.

If you really need to keep the copy of the log4j Strings class, the latest log4j release code seems safe in that it is self-contained (other than using core Java code).

https://github.com/apache/logging-log4j2/blob/rel/2.14.1/log4j-api/src/main/java/org/apache/logging/log4j/util/Strings.java

package org.apache.logging.log4j.util;

import java.util.Iterator;
import java.util.Locale;
import java.util.Objects;

/**
 * <em>Consider this class private.</em>
 * 
 * @see <a href="http://commons.apache.org/proper/commons-lang/">Apache Commons Lang</a>
 */
public final class Strings {

    /**
     * The empty string.
     */
    public static final String EMPTY = "";
    
    /**
     * OS-dependent line separator, defaults to {@code "\n"} if the system property {@code ""line.separator"} cannot be
     * read.
     */
    public static final String LINE_SEPARATOR = PropertiesUtil.getProperties().getStringProperty("line.separator",
            "\n");

    /**
     * Returns a double quoted string.
     * 
     * @param str a String
     * @return {@code "str"}
     */
    public static String dquote(final String str) {
        return Chars.DQUOTE + str + Chars.DQUOTE;
    }
    
    /**
     * Checks if a String is blank. A blank string is one that is either
     * {@code null}, empty, or all characters are {@link Character#isWhitespace(char)}.
     *
     * @param s the String to check, may be {@code null}
     * @return {@code true} if the String is {@code null}, empty, or or all characters are {@link Character#isWhitespace(char)}
     */
    public static boolean isBlank(final String s) {
        if (s == null || s.isEmpty()) {
            return true;
        }
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            if (!Character.isWhitespace(c)) {
                return false;
            }
        }
        return true;
    }

    /**
     * <p>
     * Checks if a CharSequence is empty ("") or null.
     * </p>
     *
     * <pre>
     * Strings.isEmpty(null)      = true
     * Strings.isEmpty("")        = true
     * Strings.isEmpty(" ")       = false
     * Strings.isEmpty("bob")     = false
     * Strings.isEmpty("  bob  ") = false
     * </pre>
     *
     * <p>
     * NOTE: This method changed in Lang version 2.0. It no longer trims the CharSequence. That functionality is
     * available in isBlank().
     * </p>
     *
     * <p>
     * Copied from Apache Commons Lang org.apache.commons.lang3.StringUtils.isEmpty(CharSequence)
     * </p>
     *
     * @param cs the CharSequence to check, may be null
     * @return {@code true} if the CharSequence is empty or null
     */
    public static boolean isEmpty(final CharSequence cs) {
        return cs == null || cs.length() == 0;
    }

    /**
     * Checks if a String is not blank. The opposite of {@link #isBlank(String)}.
     *
     * @param s the String to check, may be {@code null}
     * @return {@code true} if the String is non-{@code null} and has content after being trimmed.
     */
    public static boolean isNotBlank(final String s) {
        return !isBlank(s);
    }

    /**
     * <p>
     * Checks if a CharSequence is not empty ("") and not null.
     * </p>
     *
     * <pre>
     * Strings.isNotEmpty(null)      = false
     * Strings.isNotEmpty("")        = false
     * Strings.isNotEmpty(" ")       = true
     * Strings.isNotEmpty("bob")     = true
     * Strings.isNotEmpty("  bob  ") = true
     * </pre>
     *
     * <p>
     * Copied from Apache Commons Lang org.apache.commons.lang3.StringUtils.isNotEmpty(CharSequence)
     * </p>
     *
     * @param cs the CharSequence to check, may be null
     * @return {@code true} if the CharSequence is not empty and not null
     */
    public static boolean isNotEmpty(final CharSequence cs) {
        return !isEmpty(cs);
    }

    /**
     * <p>Joins the elements of the provided {@code Iterable} into
     * a single String containing the provided elements.</p>
     *
     * <p>No delimiter is added before or after the list. Null objects or empty
     * strings within the iteration are represented by empty strings.</p>
     *
     * @param iterable  the {@code Iterable} providing the values to join together, may be null
     * @param separator  the separator character to use
     * @return the joined String, {@code null} if null iterator input
     */
    public static String join(final Iterable<?> iterable, final char separator) {
        if (iterable == null) {
            return null;
        }
        return join(iterable.iterator(), separator);
    }

    /**
     * <p>Joins the elements of the provided {@code Iterator} into
     * a single String containing the provided elements.</p>
     *
     * <p>No delimiter is added before or after the list. Null objects or empty
     * strings within the iteration are represented by empty strings.</p>
     *
     * @param iterator  the {@code Iterator} of values to join together, may be null
     * @param separator  the separator character to use
     * @return the joined String, {@code null} if null iterator input
     */
    public static String join(final Iterator<?> iterator, final char separator) {

        // handle null, zero and one elements before building a buffer
        if (iterator == null) {
            return null;
        }
        if (!iterator.hasNext()) {
            return EMPTY;
        }
        final Object first = iterator.next();
        if (!iterator.hasNext()) {
            return Objects.toString(first, EMPTY);
        }

        // two or more elements
        final StringBuilder buf = new StringBuilder(256); // Java default is 16, probably too small
        if (first != null) {
            buf.append(first);
        }

        while (iterator.hasNext()) {
            buf.append(separator);
            final Object obj = iterator.next();
            if (obj != null) {
                buf.append(obj);
            }
        }

        return buf.toString();
    }

    /**
     * <p>Gets the leftmost {@code len} characters of a String.</p>
     *
     * <p>If {@code len} characters are not available, or the
     * String is {@code null}, the String will be returned without
     * an exception. An empty String is returned if len is negative.</p>
     *
     * <pre>
     * StringUtils.left(null, *)    = null
     * StringUtils.left(*, -ve)     = ""
     * StringUtils.left("", *)      = ""
     * StringUtils.left("abc", 0)   = ""
     * StringUtils.left("abc", 2)   = "ab"
     * StringUtils.left("abc", 4)   = "abc"
     * </pre>
     *
     * <p>
     * Copied from Apache Commons Lang org.apache.commons.lang3.StringUtils.
     * </p>
     * 
     * @param str  the String to get the leftmost characters from, may be null
     * @param len  the length of the required String
     * @return the leftmost characters, {@code null} if null String input
     */
    public static String left(final String str, final int len) {
        if (str == null) {
            return null;
        }
        if (len < 0) {
            return EMPTY;
        }
        if (str.length() <= len) {
            return str;
        }
        return str.substring(0, len);
    }

    /**
     * Returns a quoted string.
     * 
     * @param str a String
     * @return {@code 'str'}
     */
    public static String quote(final String str) {
        return Chars.QUOTE + str + Chars.QUOTE;
    }
    
    /**
     * <p>
     * Removes control characters (char &lt;= 32) from both ends of this String returning {@code null} if the String is
     * empty ("") after the trim or if it is {@code null}.
     *
     * <p>
     * The String is trimmed using {@link String#trim()}. Trim removes start and end characters &lt;= 32.
     * </p>
     *
     * <pre>
     * Strings.trimToNull(null)          = null
     * Strings.trimToNull("")            = null
     * Strings.trimToNull("     ")       = null
     * Strings.trimToNull("abc")         = "abc"
     * Strings.trimToNull("    abc    ") = "abc"
     * </pre>
     *
     * <p>
     * Copied from Apache Commons Lang org.apache.commons.lang3.StringUtils.trimToNull(String)
     * </p>
     *
     * @param str the String to be trimmed, may be null
     * @return the trimmed String, {@code null} if only chars &lt;= 32, empty or null String input
     */
    public static String trimToNull(final String str) {
        final String ts = str == null ? null : str.trim();
        return isEmpty(ts) ? null : ts;
    }

    private Strings() {
        // empty
    }

    /**
     * Shorthand for {@code str.toUpperCase(Locale.ROOT);}
     * @param str The string to upper case.
     * @return a new string
     * @see String#toLowerCase(Locale)
     */
    public static String toRootUpperCase(final String str) {
        return str.toUpperCase(Locale.ROOT);
    }

    /**
     * Creates a new string repeating given {@code str} {@code count} times.
     * @param str input string
     * @param count the repetition count
     * @return the new string
     * @throws IllegalArgumentException if either {@code str} is null or {@code count} is negative
     */
    public static String repeat(final String str, final int count) {
        Objects.requireNonNull(str, "str");
        if (count < 0) {
            throw new IllegalArgumentException("count");
        }
        StringBuilder sb = new StringBuilder(str.length() * count);
        for (int index = 0; index < count; index++) {
            sb.append(str);
        }
        return sb.toString();
    }

}

Embedded startup with kerberos error

i have configured the drill-override.conf

drill.exec: {
  cluster-id: "drillbits1",
  zk.connect: "localhost:2181",
  impersonation {
    enabled: true,
    max_chained_user_hops: 3
  },
  security: {
                user.auth.enabled: true,
                auth.mechanisms: ["KERBEROS"],
                auth.principal: "hbase/[email protected]",
                auth.keytab: "/root/hbase.keytab" ,
                user.encryption.sasl.enabled: true
              }
}

but when i run embedded it broken like that

java.sql.SQLException: Failure in starting embedded Drillbit: org.apache.drill.exec.exception.DrillbitStartupException: Authentication is enabled for WebServer but none of the security mechanism was configured properly. Please verify the configurations and try again.
	at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:131)
	at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newDrillConnection(DrillJdbc41Factory.java:67)
	at org.apache.drill.jdbc.impl.DrillFactory.newConnection(DrillFactory.java:67)
	at org.apache.calcite.avatica.UnregisteredDriver.connect(UnregisteredDriver.java:138)
	at org.apache.drill.jdbc.Driver.connect(Driver.java:75)
	at sqlline.DatabaseConnection.connect(DatabaseConnection.java:135)
	at sqlline.DatabaseConnection.getConnection(DatabaseConnection.java:192)
	at sqlline.Commands.connect(Commands.java:1364)
	at sqlline.Commands.connect(Commands.java:1244)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sqlline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:38)
	at sqlline.SqlLine.dispatch(SqlLine.java:730)
	at sqlline.SqlLine.initArgs(SqlLine.java:410)
	at sqlline.SqlLine.begin(SqlLine.java:515)
	at sqlline.SqlLine.start(SqlLine.java:267)
	at sqlline.SqlLine.main(SqlLine.java:206)
Caused by: org.apache.drill.exec.exception.DrillbitStartupException: Authentication is enabled for WebServer but none of the security mechanism was configured properly. Please verify the configurations and try again.
	at org.apache.drill.exec.server.rest.auth.DrillHttpSecurityHandlerProvider.<init>(DrillHttpSecurityHandlerProvider.java:108)
	at org.apache.drill.exec.server.rest.WebServer.createServletContextHandler(WebServer.java:227)
	at org.apache.drill.exec.server.rest.WebServer.start(WebServer.java:154)
	at org.apache.drill.exec.server.Drillbit.run(Drillbit.java:234)
	at org.apache.drill.jdbc.impl.DrillConnectionImpl.<init>(DrillConnectionImpl.java:122)
	... 18 more

what other flies need i to confgure to solve it

Count(<All>) query is discarding filters on Mongo plugin

Describe the bug
When below query is run on MongoDB plugin using Drill 1.19 it is returning an invalid count as output
select count(*) from Table_A where Col_1 <= 3

To Reproduce
Data: Table_A
Col_1 | Col_2

1 | A
2 | B
3 | C
4 | D
5 | E

Expected behavior
Count : 3

Actual Result : 5

For a table created in Kudu through Apache Impala - Not able to run "select * from <some table>" using Apache Drill

Describe the bug
Created a table in Kudu through Apache Impala (table is given below) and inserted several records. Then I connected to Apache Kudu through Apache Drill, and able to list tables fine. But when I have to fetch data from the table "impala::default.customer" below, I get below error when the primary key if of "BIGINT" data type, and I don't error when it's any other Data Type for the Primary Key (details are discussed on Stackoverflow.com link below).

apache drill (kudu)> select * from impala::default.customer;

Error: INTERNAL_ERROR ERROR: org.apache.kudu.client.NonRecoverableException: Invalid scan stop key: Error decoding composite key component 'id': key too short: Fragment: 1:0 [Error Id: 244fbbc4-42ea-40cf-9cc0-ee30caec7eca on ubuntu-VirtualBox:31010] (state=,code=0) ".

Table ==>
CREATE TABLE customer(
id BIGINT,
first_name STRING,
last_name STRING,
age INT,
customer_id STRING,
PRIMARY KEY(id))
PARTITION BY HASH PARTITIONS 4
STORED AS KUDU;

To Reproduce
Steps to reproduce the behavior:
Followed these links to setup Single node cluster on VirtualBox (Ubuntu 20.03)

Apache Kudu QuickStart
First setup Kudu - https://kudu.apache.org/docs/quickstart.html
Then setup Impala - https://github.com/apache/kudu/tree/master/examples/quickstart/impala

  1. Create a table in Kudu through Impala (table definition as above) (assume Kudu, Impala are setup as mentioned on the above links). Used latest version of Apache Drill, and ran "drill-embedded" script.
  2. Insert some data in this table
  3. Connect Drill to Kudu and try to browse the table created above, and I get error
    apache drill (kudu)> select * from impala::default.customer;

Error: INTERNAL_ERROR ERROR: org.apache.kudu.client.NonRecoverableException: Invalid scan stop key: Error decoding composite key component 'id': key too short: Fragment: 1:0 [Error Id: 244fbbc4-42ea-40cf-9cc0-ee30caec7eca on ubuntu-VirtualBox:31010] (state=,code=0) ".

  1. Please notice that I saw this error only when the Primary Key was of BIGINT type, it was fine for other data types.

Expected behavior
Should be able to browse the data just fine

Desktop (please complete the following information):

  • OS: VirtualBox, Ubuntu 20.03

Additional context
This issue has been discussed "cgivre" with at Stackoverflow.com,
https://stackoverflow.com/questions/69400913/apache-drill-and-apache-kudu-not-able-to-run-select-from-some-table-usin

Thanks,
Vikas Kumar

Expose `org.apache.drill.test` artifact so that end-users can use `ClusterFixtureBuilder` to create embedded Drill applications

Hello, I would like to embed Drill in a JVM application, running as a single node in-memory.
I will feed it Calcite RelNode relational expressions to execute that my application is generating.

Browsing the code to try to find out how best to go about this, I found in ClusterFixtureBuilder.java:

(If this isn't the best/easiest way to embed a single Drill node please let me know and I will delete this issue 😅)

/**
* Build a Drillbit and client with the options provided. The simplest
* builder starts an embedded Drillbit, with the "dfs" name space,
* a max width (parallelization) of 2.
* <p>
* Designed primarily for unit tests: the builders provide control
* over all aspects of the Drillbit or cluster. Can also be used to
* create an embedded Drillbit, use the zero-argument
* constructor which will omit creating set of test-only directories
* and will skip creating the test-only storage plugins and other
* configuration. In this mode, you should configure the builder
* to read from a config file, or specify all the non-default
* config options needed.
*/
public class ClusterFixtureBuilder {

/**
* Create the embedded Drillbit and client, applying the options set
* in the builder. Best to use this in a try-with-resources block:
* <pre><code>
* FixtureBuilder builder = ClientFixture.newBuilder()
* .property(...)
* .sessionOption(...)
* ;
* try (ClusterFixture cluster = builder.build();
* ClientFixture client = cluster.clientFixture()) {
* // Do the test
* }
* </code></pre>
* Note that you use a single cluster fixture to create any number of
* drillbits in your cluster. If you want multiple clients, create the
* first as above, the others (or even the first) using the
* {@link ClusterFixture#clientBuilder()}. Using the client builder
* also lets you set client-side options in the rare cases that you
* need them.
*/
public ClusterFixture build() {
return new ClusterFixture(this);
}

But it looks like there is no Maven artifact or .jar to download to include this functionality as an end user =/

I tried to copy-paste the primary classes, but there is a spiderweb of dependencies through out the org.apache.drill.test and org.apache.drill.exec.testing packages.

Update the copyright year [for newcomers]

Describe the bug

We need to update the copyright year in NOTICE files (this repository), and some resource files on the drill-site project.

apache/drill :
drill/NOTICE
drill/distribution/src/main/resources/NOTICE

apache/drill-site :
drill-site/_includes/footer.html
and more?

Expected behavior

You can refer to this pull request to submit your contribution, thank you.

Additional context

  1. apache/drill is the codebase, apache/drill-site is the website, we should make a separate pull request.
  2. If you would like to know about the process of contribution, refer to Help beginners to contribute to open source projects.
  3. Please let us know if you need any help, mailing list and Slack.

mongo connector in version 1.19.0 not reaching clusters/replica sets

Describe the bug
Trying drill:1.19.0 with the mongo string mongodb+srv://[username:password@]host[/[database][?options]] result in a java unknownhost exception host+27017 port
This is working with version 1.18.0 and the previous mongo java driver
Tested with mongo server 3.6 and 4.4.10

To Reproduce
Steps to reproduce the behavior:

  1. Go to Storage tab
  2. Create new storage of type mongo with mongodb+srv protocol style
  3. Try to list schemes
  4. See error at logs sqlline.log

Expected behavior
Flawless mongodb+srv service connection

Screenshots

Desktop (please complete the following information):

  • OS: docker image drill:1.19.0

Smartphone (please complete the following information):

Additional context
Maybe related to this improvement: https://issues.apache.org/jira/browse/DRILL-7903

What is the tag check function? what does tag of 10 and 84 mean in the check?

Hi Team I meet the below error when I use drill maven dependency version 1.17.0, I deployed drill on EKS and it works when I port-forward the service, but after I config an ALB ingress, the error occurred (the same ALB ingress config works for port 8047 so I can see the UI, but not working for 31010), the message is not very comprehensive, so anyone please help to explain, what does 10 and 84 as tag mean? Thanks

2022-03-04 20:01:25.194 ERROR 53489 --- [ Client-1] o.o.a.d.exec.rpc.RpcExceptionHandler : Exception in RPC communication. Connection: /10.179.34.218:56013 <--> apache-drill-ui.some-ingress-path/10.225.132.104:80 (user client). Closing connection.

oadd.io.netty.handler.codec.CorruptedFrameException: Expected to read a tag of 10 but actually received a value of 84. Happened after reading 0 message.
at oadd.org.apache.drill.exec.rpc.RpcDecoder.checkTag(RpcDecoder.java:126) ~[drill-jdbc-all-1.17.0.jar!/:1.17.0]
at oadd.org.apache.drill.exec.rpc.RpcDecoder.decode(RpcDecoder.java:61) ~[drill-jdbc-all-1.17.0.jar!/:1.17.0]
at oadd.org.apache.drill.exec.rpc.RpcDecoder.decode(RpcDecoder.java:35) ~[drill-jdbc-all-1.17.0.jar!/:1.17.0]

And the ALB config is as below:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
alb.ingress.kubernetes.io/backend-protocol: HTTP
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
alb.ingress.kubernetes.io/scheme: internal
alb.ingress.kubernetes.io/security-groups: EKS-ALB-sg
alb.ingress.kubernetes.io/success-codes: 200-499
alb.ingress.kubernetes.io/tags: Environment=dev,Team=EKSAdminTeam
alb.ingress.kubernetes.io/target-type: 'ip'
kubernetes.io/ingress.class: alb
name: ingress-2
spec:
rules:

  • host: -----
    http:
    paths:
    - backend:
    serviceName: drill-service
    servicePort: 31010
    path: /*

Invalid usage of the option NEXT in the FETCH statement when querying SQL Server and using LIMIT clause

Describe the bug
Querying a SQL Server with drill jdbc fails.
Logs:

Sql: SELECT *
FROM "dbo"."Order"
FETCH NEXT 1000 ROWS ONLY
Plugin: sqlsrv
Fragment: 0:0

[Error Id: 55b002f2-ee01-4412-b979-2f253074fdc9 on drill.internal.cloudapp.net:31010]
	at org.apache.drill.exec.server.rest.RestQueryRunner.submitQuery(RestQueryRunner.java:99)
	at org.apache.drill.exec.server.rest.RestQueryRunner.run(RestQueryRunner.java:54)
	at org.apache.drill.exec.server.rest.QueryResources.submitQuery(QueryResources.java:159)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)

..... trimmed for readibility ....

Caused by: java.lang.Exception: Invalid usage of the option NEXT in the FETCH statement.
	at com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:217)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1655)
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.doExecutePreparedStatement(SQLServerPreparedStatement.java:440)
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement$PrepStmtExecCmd.doExecute(SQLServerPreparedStatement.java:385)
	at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7505)
	at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2445)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:191)
	at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:166)
	at com.microsoft.sqlserver.jdbc.SQLServerPreparedStatement.executeQuery(SQLServerPreparedStatement.java:297)
	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
	at org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup(JdbcRecordReader.java:192)
	at org.apache.drill.exec.physical.impl.ScanBatch.getNextReaderIfHas(ScanBatch.java:331)
	at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:227)
	at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:298)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:111)
	at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
	at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:85)
	at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:170)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:103)
	at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:81)
	at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:93)
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:323)
	at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:310)
	at .......(:0)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:310)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at .......(:0)

Steps to reproduce the behavior:
Installing sqljdbc42.jar
Succesfully configuring the storage plugin pointing to an Azure SQL Server. e.g.

{
  "type": "jdbc",
  "driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver",
  "url": "jdbc:sqlserver://<hostname>;database=anydb",
  "username": "user",
  "password": "pwd",
  "enabled": true
}

Write a simple query with a specific limit e.g.

SELECT * FROM sqlsrv.dbo.any_table LIMIT 3 --or any number

Expected behavior
A specific number of row set by the LIMIT clause

Additional context
The sql server is deployed in Azure, so the jdbc is querying latest version of SQL Server.

Apache Drill vs Apache Calcite

I have been exploring about apache drill and apache calcite. I know apache drill uses calcite internally for sql parsing and creating query plan, I wanted to know what is the major difference between the two. I see apache calcite is also able to make sql queries to multiple datasource.

Error message: Hostname 192.168.30.10 not verified:

Hi,

i am trying to use http plugin to connect to a local server that has self signed certificate and i am getting the following messege when i run a query

org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Failed to read the HTTP response body

Error message: Hostname 192.168.30.10 not verified:
certificate: sha256/g/W9segvBmGap2pfdFU2WZ7LdHnsJiMf+9BDTkLc1kc=
DN: CN=mydomain.com, O=Support, L=McLean, ST=VA, C=US
subjectAltNames: []
Connection: actiondetails
Plugin: http1
URL: https://192.168.30.10/library/code.json?id=6110

i tried to add the certificate to the java but no luck.

how can i fix this issue? or how can i disable certificate verification in Drill? thanks in advance

Update the Slack site [for newcomers]

Describe the bug

As the short-link (https://bit.ly/2VM0XS8) has expired, we cannot click on the "join" character to access the Slack channel.

I recommended that you can remove the site of "join" directly, the "Slack Channel" is also provide the correct site.

or [join](https://bit.ly/2VM0XS8) our [Slack Channel](https://join.slack.com/t/apache-drill/shared_invite/enQtNTQ4MjM1MDA3MzQ2LTJlYmUxMTRkMmUwYmQ2NTllYmFmMjU4MDk0NjYwZjBmYjg0MDZmOTE2ZDg0ZjBlYmI3Yjc4Y2I2NTQyNGVlZTc) if you need help with using or developing Apache Drill (more information can be found on [Apache Drill website](http://drill.apache.org/)).

Expected behavior

Update the README.md, let the "join" character not use the hyperlinks.

Screenshots
image

Additional context

  1. If you would like to know about the process of contribution, refer to Help beginners to contribute to open source projects.
  2. Please let us know if you need any help, mailing list and Slack.

[1.20.0-SNAPSHOT] select * to mongodb results in empty $project INTERNAL_ERROR

Describe the bug
"select *" like queries to mongodb collections result in an INTERNAL_ERROR due to empty $project sent to mongodb server

2021-11-11 12:07:58,589 [1e72f861-8eaa-c18f-2d86-15ac4f0a90f5:frag:0:0] INFO  o.a.d.exec.physical.impl.ScanBatch - User Error Occurred: Command failed with error 51272 (Location51272): 'Invalid $project :: caused by :: projection specification must have at least one field' on server host.docker.internal:27019. The full response is {"ok": 0.0, "errmsg": "Invalid $project :: caused by :: projection specification must have at least one field", "code": 51272, "codeName": "Location51272"} (Command failed with error 51272 (Location51272): 'Invalid $project :: caused by :: projection specification must have at least one field' on server host.docker.internal:27019. The full response is {"ok": 0.0, "errmsg": "Invalid $project :: caused by :: projection specification must have at least one field", "code": 51272, "codeName": "Location51272"})
org.apache.drill.common.exceptions.UserException: INTERNAL_ERROR ERROR: Command failed with error 51272 (Location51272): 'Invalid $project :: caused by :: projection specification must have at least one field' on server host.docker.internal:27019. The full response is {"ok": 0.0, "errmsg": "Invalid $project :: caused by :: projection specification must have at least one field", "code": 51272, "codeName": "Location51272"}

To Reproduce
Steps to reproduce the behavior:

  1. Create a mongodb storage
  2. try a query with no projection (select *)
  3. Error will appear on GUI

Expected behavior
in version 1.18 and 1.19 this was returning the full fields from collection

Additional context
Here are full logs:
error_log_select_mongodb.txt

XML format plugin misses top-level attributes

Consider the example shown on the XML format plugin's documentation page (https://drill.apache.org/docs/xml-format-plugin/), which shows a list of books (simplified below to one book for brevity):

<books>
   <book>
     <author>O.-J. Dahl</author>
     <title binding="hardcover" subcategory="non-fiction">Structured Programming</title>
     <category>PROGRAMMING</category>
     <year>1972</year>
   </book>
 </books>

With dataLevel=2, each book becomes a row in the query output table, and the title attributes will end up correctly in the attributes column as shown on the documentation page.

However, if the top-level item has attributes, these attributes are not captured anywhere within the query output table.
My data is similar to this, and I need to capture the important attributes of the top-level items in my list.
Could they be added to the attributes column?

If the element above had a weight attribute, e.g.: <book weight="0.8">
then could it be added to the attributes column, e.g.: {"book_weight":0.8","title_binding":"hardcover","title_subcategory":"non-fiction"} ?

Drill cannot startup if java path contains whitespace

Describe the bug
My JAVA_PATH set to "/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home",
when run ./drillbit.sh start, drill cannot startup and report error:
/Users/Chester/Gwssi/ApacheDrill/apache-drill-1.19.0-SNAPSHOT/bin/runbit: line 109: /Library/Internet: No such file or directory
/Users/Chester/Gwssi/ApacheDrill/apache-drill-1.19.0-SNAPSHOT/bin/runbit: line 109: exec: /Library/Internet: cannot execute: No such file or directory

To Reproduce
Steps to reproduce the behavior:

  1. set JAVA_PATH to a jdk path which contains whitespace:
    export JAVA_PATH=/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home
  2. startup drill:
    ./bin/drillbit.sh start
  3. See error in log file:
    log/drillbit.out

Expected behavior
Drill cannot startup normally.

Screenshots
image
image

Desktop (please complete the following information):

  • MacOS Big Sur 11.2.3

Additional context
print the command of line 109 in /bin/runbit:
echo exec $BITCMD

See the print result in log/drillbit.out:
image
The whitespace in java path doesn't come up with a backslash escape, so that exec cannot find the correct java path.
After I set the java path to a jdk without white space, like:
/Library/Java/JavaVirtualMachines/jdk1.8.0_171.jdk/Contents/Home
Drill startup process back to normal.

Due to CVE-2022-26612 (9.8 critical), upgrade Hadoop from 3.2.2 to 3.2.3

Describe the bug
Could you please help upgrade Hadoop version from 3.2.2 to 3.2.3 so that we avoid CVE-2022-26612

To Reproduce
Scan Apache Drill 1.19 with black duck or other vulnerability scanner.

Expected behavior
Apache Drill using Hadoop 3.2.3

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Any
  • Browser Any
  • Version Any

Smartphone (please complete the following information):

  • Device: Any
  • OS: Any
  • Browser Any
  • Version Any

Additional context
Add any other context about the problem here.

verifySSLCert always has true as value and cannot change it to false

Hi,

after compiling apache-drill-1.20.0-SNAPSHOT
the new parameter verifySSLCert always has true as value, i tried many time to change it but when i check it back i found it is true

i tried false and "false" and i changed the location but same it keep it true

the following is the storage i created
{
"type": "http",
"connections": {
"actiondetails": {
"url": "https://192.168.30.10/?",
"requireTail": true,
"method": "GET",
"headers": {
"Authorization": "Bearer xqzJZ7pCoz_EWJwQ1GYW"
},
"authType": "none",
"inputType": "json",
"xmlDataLevel": 1,
"verifySSLCert": true
}
},
"proxyType": "direct",
"enabled": true

Please help
Thanks

CVE-2020-8908 in Guava v.28.2-jre, should upgrade to v.30.1.1

Describe the bug
CVE-2020-8908 in Guava v.28.2-jre, should upgrade to v.30.1.1

To Reproduce
Please check vulnerability section in :
google/guava#4011

Expected behavior
Upgrading to v30.1.1 will mitigate this vulnerability.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: all
  • Browser all
  • Version all

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

IllegalArgumentException: databaseName can not be null - Mongo DB

Hi,

I am trying to connect mongodb located on AWS DocumentDB database. But each time when I query it through an exception that "UserRemoteException : SYSTEM ERROR: IllegalArgumentException: databaseName can not be null"
org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: IllegalArgumentException: databaseName can not be null

following is my mongodb configuration (original values modified) which contain database name as well
{ "type": "mongo", "connection": "mongodb://myusername:[email protected]:27017/DatabaseName?authSource=admin&readPreference=primary&ssl=true&tlsAllowInvalidCertificates=true&tlsAllowInvalidHostnames=true", "enabled": true }

Even though I provide the database name but drill still through null databaseName exception

I have installed drill on my windows machine and started drill by using embedded command "drill-embedded.bat"

add time unit WEEK to EXTRACT()

Is your feature request related to a problem? Please describe.
We work with small files in hdfs, our current process for good performance is to save all data before current month to parquet (new table) and then UNION ALL with current month files (and save as view). this process is called once a month.

Now we are faced with high velocity data, so that we have to call the above process every week instead of once a month.
We could do this through other apache solutions but DRILL is the most convenient way.

Describe the solution you'd like

 SELECT EXTRACT(WEEK from CURRENT_DATE) as weekno,CURRENT_DATE FROM (VALUES(1));
+--------+--------------+
| weekno | CURRENT_DATE |
+--------+--------------+
| 48     | 2021-11-30   |
+--------+--------------+

Describe alternatives you've considered

  1. Go back 7 days with INTERVAL
WHERE `dir0` > DATE_SUB(CURRENT_DATE,INTERVAL '7' DAY)

SAS format plugin. Select fails if the first row contains null values

Select from dfs.path_to_sas_file fails with error "Cannot invoke "Object.getClass()" because "this.firstRow[counter]" is null"
if the first row of sas file contains null values.

Can be reproduced with any sas file that contain null values in the first row.
Apparently, the problem is here:
drill\contrib\format-sas\src\main\java\org\apache\drill\exec\store\sas\SasBatchReader.java

private TupleMetadata buildSchema() {
    SchemaBuilder builder = new SchemaBuilder();
    List<Column> columns = sasFileReader.getColumns();
    int counter = 0;
    for (Column column : columns) {
      String fieldName = column.getName();
      try {
        MinorType type = getType(**firstRow[counter].getClass()**.getSimpleName());
        if (type == MinorType.BIGINT && !column.getFormat().isEmpty()) {
          logger.debug("Found possible time");
          type = MinorType.TIME;
        }
        builder.addNullable(fieldName, type);
      } catch (Exception e) {
        throw UserException.dataReadError()
          .message("Error with column type: " + firstRow[counter].getClass().getSimpleName())
          .addContext(errorContext)
          .build(logger);
      }
      counter++;
    }

    return builder.buildSchema();
  }

image

Support the UTC formatter in the JSON Reader

Is your feature request related to a problem? Please describe.
Support the UTC formatter in the JSON Reader

Describe the solution you'd like
optional 1 :

  • use a few of the most popular formatter and the try-catch
  • minimal changes

example :

LocalDateTime ldt;
try {
  OffsetDateTime originalDateTime = OffsetDateTime.parse(parser.getValueAsString(), DateUtility.isoFormatTimeStamp);
  ldt = originalDateTime.toLocalDateTime();
} catch (DateTimeParseException e) {
  ldt = LocalDateTime.parse(parser.getValueAsString(), DateUtility.utcFormatDateTime); // "yyyy-MM-dd'T'HH:mm:ss'Z'"
} catch (DateTimeParseException e) {
  ldt = LocalDateTime.parse(parser.getValueAsString(), DateUtility.utcFormatTimeStamp); // "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"
}
OffsetDateTime utcDateTime = OffsetDateTime.of(ldt, ZoneOffset.UTC);

optional 2 :

  • allow the user to define the UTC formatter use the ALTER SESSION SET syntax.
  • not sure the framework support to extend this feature.

example (Dummy) :

ALTER SESSION SET `store.json.date_formatter` = "yyyy-MM-dd'T'HH:mm:ss'Z'"
LocalDateTime ldt;
if (hasUTCKeyword(parser.getValueAsString()) { // value.indexOf('T') > 0 & value.indexOf('Z') > value.indexOf('T')
  ldt = LocalDateTime.parse(parser.getValueAsString(), session_date_formatter);
} else {
  ldt = OffsetDateTime.parse(parser.getValueAsString(), session_date_formatter).toLocalDateTime();
}

Describe alternatives you've considered
NONE

Additional context

When the date value as the ISODate (without the timezone, or called 0 timezone) store in mongo and set the store.mongo.bson.record.reader to false:

{
  "_id" : ObjectId("5da7760149b3f000195cabb"),
  "date" : ISODate("2019-09-24T20:06:56Z")
}

Drill got the error stack error :

Caused by: java.lang.Exception: Text '2019-09-30T20:47:43Z' could not be parsed at index 19

Because the OffsetDateTime parse the date string use the fixed formatter yyyy-MM-dd'T'HH:mm:ss.SSSXX. Then, the OffsetDateTime is not allowed to accept the UTC formatter ***T***Z (or called 0 timezone) :
example 1:

yyyy-MM-dd'T'HH:mm:ss.SSS'Z'

example 2:

yyyy-MM-dd'T'HH:mm:ss'Z'

Linked resource :

case VALUE_STRING:
OffsetDateTime originalDateTime = OffsetDateTime.parse(parser.getValueAsString(), DateUtility.isoFormatTimeStamp);
OffsetDateTime utcDateTime = OffsetDateTime.of(originalDateTime.toLocalDateTime(), ZoneOffset.UTC); // strips the time zone from the original
ts.writeTimeStamp(utcDateTime.toInstant().toEpochMilli());
break;

public static final DateTimeFormatter isoFormatTimeStamp= buildFormatter("yyyy-MM-dd'T'HH:mm:ss.SSSXX");

Drill Compile error

Since my local cluster zookeeper cluster is version 3.4.7, I downloaded Drill version 1.17.0 and used the official package. When the hive data source is selected, the database can be accessed normally, but when the show tables is executed, it prompts

ERROR hive.log-Got exception: org.apache.thrift.TApplicationException Invalid method name:'get_tables_by_type'

Drill version :1.17.0
image

Therefore, I lowered the hive version to 2.1.1 to try. The picture below shows the error I encountered during compilation. I look forward to the author's reply! Thanks!

image

Use of catalog inside queries

Is your feature request related to a problem? Please describe.
Hello, I am developing a connector to query Drill from Dremio via JDBC and I noticed that the catalog name ("DRILL") cannot be used to fully qualify a table when querying it. E.g. the query select * from DRILL.mypostgresdb.mytable gives the following error:

java.sql.SqlException: VALIDATION ERROR: Schema [[DRILL, mypostgresdb]] is not valid with respect to either root schema or default schema

while select * from mypostgresdb.mytable is the valid query. By analysing the DatabaseMetaData returned when I connect to Drill, though, I see that tables are associated to a schema which is associated to a catalog, therefore I would expect to reference a table as catalog.schema.table. In fact when Dremio retrieves the metadata from Drill, Drill returns “DRILL” as the root of its schemas, causing Dremio to submit this back to Drill which causes the queries to fail.

Describe the solution you'd like
Allowing the use of catalog to qualify tables inside queries.

Describe alternatives you've considered
Removing the concept of catalog, as there is a unique catalog, although I am not sure of its use, so I don't know the implications.

While trying to connect drillbit nodes using zk style from local its throws no known host.

Describe the bug

While connecting from drillbit style I m able to connect and do all such query, but if I use zk style it doesn't work. instead it throws error as : below image

image

So the only way which I can connect with zk style from my local is to edit my etc/hosts file and add host IP and address like below
34.66.137.60 drill-data-poc-m.us-central1-b.c.stoked-jigsaw-327806.internal

plus I ssh into the one of the nodes and try to login via those nodes I m able to use the zk style and I m able to login using external IP

as below :

image

but using the same external IP I m not able to login from my local its not working.
For testing its fine to add in etc/hosts file and connect but in production we would have N number of nodes we can't just add the IP and address of the nodes in etc/hosts every time.

To Reproduce
Steps to reproduce the behavior:

1.Install drillbit with zk quorum in 3 nodes with prerequisites showed in this link : https://drill.apache.org/docs/distributed-mode-prerequisites/
2.use the zk style to connect from your local : jdbc:drill:zk=:2181/drill/drill-data-poc
3. once you get a error : No such host is known
4. add the ip and node address in etc/hosts
5. now try to run the same command you will be able to connect.

Expected behavior

The Expected behavior is I should just connect to dill bit nodes when I enter my connection details but its not happening until I add the IP and node address details in etc/hosts file.

Desktop (please complete the following information):

  • OS: windows 10 , ubuntu 20.04
  • Version Drill 1.19

@cgivre @luocooong @jnturton

CVE-2022-24823: Upgrade to Netty v.4.1.77.Final

Describe the bug
CVE-2022-24823 in Netty .4.1.73.Final.

This will also help us catch netty/netty@185f8b2

To Reproduce
Steps to reproduce the behavior:

  1. Check Apache Drill pom.xml file: https://github.com/apache/drill/blob/master/pom.xml#L123
    It ships the netty v4.1.73.
  2. Read through : https://nvd.nist.gov/vuln/detail/CVE-2022-24823 for more details why v4.1.73. is vulnerable

Expected behavior
Drill to use 4.1.77

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

Drill scan all the parquet file from query root for metadata if there is a "inner join " in query

Describe the bug
Drill scan all the parquet file from query root for metadata if there is a "inner join " in query.

To Reproduce
Steps to reproduce the behavior:

# /data/01/ is the query root in this case
# prepare the directory for drill to use in query
mkdir -p /data/01/2021/11/2021-11-23/
# download a parquet file for drill to query
curl https://raw.githubusercontent.com/apache/drill/master/sample-data/nation.parquet -o /data/01/2021/11/2021-11-23/data.parquet

# prepre inner join directory ,
mkdir -p /data/PRO/item
# we prepare a invalid parquet file, this file is not supposed to be scan in query
mkdir -p /data/01/2010/01/2010-01-01/
echo "abc" > /data/01/2010/01/2010-01-01/data.parquet



# query drill endpoint by curl
json="{\"queryType\":\"SQL\", \"query\": \"SELECT COUNT(*) FROM  dfs.\`/data/01\` as t INNER JOIN dfs.\`/data/PRO/item\` item  ON t.N_REGIONKEY = item.ID WHERE t.dir2 >='2021-11-23' AND t.dir2<='2021-11-30' AND (REPEATED_CONTAINS(item.CATEGORIES,1031) OR REPEATED_CONTAINS(item.CATEGORIES,1047))\", \"autoLimit\":1}"
drill_host="localhost:8047"
curl -XPOST  -H "Content-Type: application/json" "$drill_host/query.json" -d "$json"

Expected behavior
As we only query t.dir2 >='2021-11-23' AND t.dir2<='2021-11-30' , and invalite file is under dir2="2010-01-01" ,
the expected behavior is drill perform query without any error, but it it return data.parquet is not a Parquet file, the result approve that drill scan all the parquet file from query root directory.

Screenshots

{
  "errorMessage" : "SYSTEM ERROR: RuntimeException: file:/data/01/2010/01/2010-01-01/data.parquet is not a Parquet file (too small length: 4)\n\n\nPlease, refer to logs for more information.\n\n[Error Id: ce4e61af-5df8-440e-81d2-673c89106e5f on drill-0.drill:31010]"
}

Additional context
Drill return successfully if no inner join in query

# query drill endpoint by curl
json="{\"queryType\":\"SQL\", \"query\": \"SELECT COUNT(*) FROM  dfs.\`/data/01\` as t WHERE t.dir2 >='2021-11-23' AND t.dir2<='2021-11-30'\", \"autoLimit\":1}"
drill_host="localhost:8047"
curl -XPOST  -H "Content-Type: application/json" "$drill_host/query.json" -d "$json"
 {
  "queryId" : "1e4ce295-3052-a66c-b68f-96cf4a97806d",
  "columns" : [ "EXPR$0" ],
  "rows" : [ {
    "EXPR$0" : "25"
  } ],
  "metadata" : [ "BIGINT" ],
  "queryState" : "COMPLETED",
  "attemptedAutoLimit" : 1
}

docker does not mount host volume into container

Describe the bug
docker does not mount volume into apache drill container

To Reproduce

docker run -i --name drill \
-p 8047:8047 \
-t apache/drill \
-v ${PWD}/data:/data \
/bin/bash

In another terminal run:
docker exec -it drill bash

ls /data
ls: cannot access '/data': No such file or directory

Expected behavior
should mount the /data volume

Desktop (please complete the following information):
tested in both

  • OS: windows 10/wsl2/Ubuntu-20.04.3
  • OS: ubuntu focal -

Tested with simple command with different container and docker mounts volume

docker run --rm -v ${PWD}/data:/data alpine ls /data
nation.parquet

1.20.0-SNAPSHOT Fails on mongodb cluster query

Describe the bug
Drill is not able to query mongodb cluster, it throws a NPE

2021-10-28 16:21:52,535 [1e8531df-4e21-f062-707b-b9cbb72c619e:foreman] INFO  o.a.d.e.s.mongo.MongoStoragePlugin - Created srv protocol connection to [address=mongoclusterDNSisOK:27017, user=USERalsoOK].
2021-10-28 16:21:52,535 [1e8531df-4e21-f062-707b-b9cbb72c619e:foreman] INFO  o.a.d.e.s.mongo.MongoStoragePlugin - Number of open connections 1.
2021-10-28 16:21:53,165 [1e8531df-4e21-f062-707b-b9cbb72c619e:foreman] WARN  o.a.d.e.s.m.s.MongoSchemaFactory - Failure while getting collection names from 'config'. Command failed with error 13 (Unauthorized): 'not authorized on config to execute command ...
2021-10-28 16:21:53,208 [1e8531df-4e21-f062-707b-b9cbb72c619e:foreman] WARN  o.a.d.e.s.m.s.MongoSchemaFactory - Failure while getting collection names from 'local'. Command failed with error 13 (Unauthorized): 'not authorized on local to execute command ...
2021-10-28 16:21:58,711 [1e8531df-4e21-f062-707b-b9cbb72c619e:frag:1:0] INFO  o.a.d.e.w.fragment.FragmentExecutor - 1e8531df-4e21-f062-707b-b9cbb72c619e:1:0: State change requested AWAITING_ALLOCATION --> FINISHED
2021-10-28 16:21:58,711 [1e8531df-4e21-f062-707b-b9cbb72c619e:frag:1:0] INFO  o.a.d.e.w.f.FragmentStatusReporter - 1e8531df-4e21-f062-707b-b9cbb72c619e:1:0: State to report: FINISHED
2021-10-28 16:21:58,711 [drill-executor-6] ERROR o.a.d.exec.server.BootStrapContext - org.apache.drill.exec.work.WorkManager$WorkerBee$1.run() leaked an exception.
java.lang.NullPointerException: null
	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:347)
	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
	at java.base/java.lang.Thread.run(Thread.java:832)

The DBs unauthorized are cluster-private stuff, these warning lines are also as DEBUG at 1.18.0 logs but after the query is starting on the requested DB without problems.

To Reproduce
Steps to reproduce the behavior:

  1. Create a mongodb cluster storage
  2. Try a query
  3. Check the logs

Expected behavior
Running the query

Additional context
[version] apache/drill:master-openjdk-14
[DIGEST] 55e6551a8523 (2021/10/28 5:23pm)
[OS] docker

[DISCUSSION] ValueVectors Replacement

This feature request has been transcribed from messages posted in #2412 and to the mailing list in the first week of January 2021. The topic is what might replace the current memory structures used for data, ValueVectors.

Query Profiles Error

Describe the bug
After I launch a drillbit from Eclipse (See tutorial here: https://github.com/paul-rogers/drill/wiki/Launch-a-Drillbit-From-Eclipse), I logged into localhost:8047 to see if Drill functions well, and basically it did. But when I submitted a query and click the tag "Profile" to see the profile of this query, a error with error message was reported:

RESOURCE ERROR: Failed to get profiles from persistent or ephemeral store.\n\n\n[Error Id: f4c362c7-124d-452a-a209-82633ca6098e ]

Fortunately, I fixed this problem by chance, and the operation is quite simple: open the debug configuration of Eclipse, then add DRILL_LOG_DIR =/a path directed to a certain directory in the Environment tab. However, I can not see underlying relationship between the two things, so I deemed this as a bug and report it here.

Desktop (please complete the following information):

  • OS: Ubuntu 20.04.3 LTS
  • Browser: Firefox
  • Version: Drill 1.20 snapshot

Apache Iceberg Input Format Plugin

Is your feature request related to a problem? Please describe.

As discussed at the Users Mailing List it looks like more and more people are using deltalake or iceberg in spark for transactional working with big tables.

Additionally i saw that drill is using iceberg as storage engine for metadata.

I think this kind of storage format is used more and more in cloud architectures because it departments wants to use as less tools as possible to provide a big data product. With iceberg they can build consistant and scalable big data structures for stream and batch processing at the same storage layer with a single tool, Spark.

The problem is how to provide the data to customers. In my opinion Spark itself is too slow for interactive querying by a lot of people or BI Tools. Thats the point where Drill enters the stage.

Describe the solution you'd like

I would like to query Iceberg Tables with Drill like a Folder of Parquet Files in DFS.

SELECT * FROM dfs.'path/to/iceberg/table'

Additionally it would be great to make use of time-travel Feature via snapshots and timestamp-ms https://iceberg.apache.org/spec/#snapshots

SELECT snapshots[0].timestamp-ms FROM dfs.'path/to/iceberg/table'

SELECT * FROM dfs.'path/to/iceberg/table' WHERE snapshot-timestamp-ms = '2021-06-07 20:15:46.378'

Describe alternatives you've considered

Alternatives are just switch to another MPP System like Dremio or Presto.

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.