Giter VIP home page Giter VIP logo

bigdata_docker's Introduction

Hi I'm Fábio Jardim 👋

Linkedin Twitter Badge Gmail Badge

Welcome to my profile! With a consolidated and diversified career in the field of technology and data, I have been driving organizations to reach their maximum potential through data-driven decision-making.

Over more than 20 years, I have had the honor of holding key positions such as Data Director, Engineering Manager, and Head of Big Data, where I created and implemented solutions in analytics, machine learning, data engineering, data architecture, and big data that transformed the corporate culture towards becoming data-driven. I successfully led projects in companies of various sizes and sectors, including retail, banking, and internet.


Speak About

Technology Data Engineering Analytics Career Leadership Education


Technologies

docker kafka spark databricks delta parquet druid dbt airflow trino presto hadoop minio hive nifi Apache Flink python pandas scikit-learn mongo mysql postgres redis elasticsearch MicrosoftSQLServer powerbi metabase Grafana kibana opensource aws Azure Google Cloud linux git github powershell

bigdata_docker's People

Contributors

fabiogjardim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigdata_docker's Issues

Dúvida sobre docker

Olá Fábio,

Por favor, de acordo com a imagem do ecossistema, cada um dos itens será colocado em um container específico? Por exemplo, o MongoDB e o Mongo Express ficariam em containers separados ou no mesmo container?

Muito obrigado,

Daniel Adorno Gomes

Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog

Grande Fábio,

sua distribuição caiu como uma luva pra mim, agradeço muito.

Entretanto estou com um erro ao tentar realizar qualquer conexão do Spark com o Hive. Dá mensagem
Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog. Tanto pelo Jupyter, como diretamente no pyspark dentro da vm do spark.

Já reinstalei tudo (apaguei todas as imagens do docker e inicializei somente o bigdata_docker, verifiquei se tem alguma porta em conflito, aumentei os recursos do Docker para 4 CPU, 16 GB de memória, 4 swap, e não mudou nada. Não achei nada de relevante nas pesquisas pela net.

Estou rodando em um iMac (24 GB RAM) com MacOS Catalina 10.15.4 e Docker 2.2.0.5 .

O restante está tudo funcionando, o HUE o Presto e o Metabase acessam normalmente o Hive.

Agradeço se puder me dar alguma idéia do que está errado. Não alterei nenhuma configuração sua ou das imagens.

root@jupyter-spark:/opt/spark/conf# pyspark
Python 3.5.3 (default, Sep 27 2018, 17:25:39)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
20/04/11 17:11:24 WARN spark.SparkConf: Note that spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone/kubernetes and LOCAL_DIRS in YARN).
20/04/11 17:11:25 WARN util.Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Welcome to
____ __
/ / ___ / /
\ / _ / _ `/ __/ '/
/
/ .
_/_,// //_\ version 2.4.1
/
/

Using Python version 3.5.3 (default, Sep 27 2018 17:25:39)
SparkSession available as 'spark'.

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
sqlContext.sql("show databases").show()
Traceback (most recent call last):
File "/opt/spark/python/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.sql.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:192)
at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:103)
at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:102)
at org.apache.spark.sql.hive.HiveSessionStateBuilder.org$apache$spark$sql$hive$HiveSessionStateBuilder$$externalCatalog(HiveSessionStateBuilder.scala:39)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.hive.HiveSessionStateBuilder$$anonfun$1.apply(HiveSessionStateBuilder.scala:54)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:90)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:247)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand$$anonfun$2.apply(databases.scala:44)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand$$anonfun$2.apply(databases.scala:44)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.command.ShowDatabasesCommand.run(databases.scala:44)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:194)
at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3364)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3363)
at org.apache.spark.sql.Dataset.(Dataset.scala:194)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:79)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:189)
... 36 more
Caused by: java.lang.NoClassDefFoundError: org/apache/hadoop/hive/ql/metadata/HiveException
at org.apache.spark.sql.hive.HiveExternalCatalog.(HiveExternalCatalog.scala:71)
... 41 more
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.ql.metadata.HiveException
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 42 more

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/opt/spark/python/pyspark/sql/context.py", line 358, in sql
return self.sparkSession.sql(sqlQuery)
File "/opt/spark/python/pyspark/sql/session.py", line 767, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in call
File "/opt/spark/python/pyspark/sql/utils.py", line 79, in deco
raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: "Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':"

fatal: repository 'https://github.com/fabiobjardim/bigdata_docker.git/' not found

Olá Fábio, tudo bem?

Tentei executar esse comando, de acordo com as instruções em sua página em: https://github.com/fabiogjardim/bigdata_docker mas não funcionou:

C:\docker>git clone http://github.com/fabiobjardim/bigdata_docker.git
Cloning into 'bigdata_docker'...
info: please complete authentication in your browser...
remote: Repository not found.
fatal: repository 'https://github.com/fabiobjardim/bigdata_docker.git/' not found

Windows 10 Enterprise 64-bit

Olá... Infelizmente meu SO: Windows 10 Enterprise 64-bit não tem suporte para virtualizacão:
image

image

Atualmente utilizo o Docker Desktop Windows, dependente do Hyper-V, que quando ativo é incompatível com o VirtualBox...

Infelizmente por se tratar de um computador corporativo, não posso alterar a BIOS para ativar a virtualizacão.

Com base nesse cenário, alguma sugestão? Infelizmente não conheco muito de docker, mas acho que dever ter alguma alternativa.

ingest data / demo example

Hi,

Hope you are all well !

Is it possible to provide an example of ingesting a csv file into this stack ?

Thanks in advance for any insights or inputs on that issue.

Cheers,
X

Problema para iniciar imagem mysql

Bom dia amigos,

Estou tentando iniciar a imagem do mysql, entretando após iniciar ele reinicia. Olhando o log, tenho o seguinte erro

`2020-05-29 01:57:46+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-05-29 01:57:48+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2020-05-29 01:57:48+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.29-1debian10 started.
2020-05-29T01:57:48.720344Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2020-05-29T01:57:48.732126Z 0 [Note] mysqld (mysqld 5.7.29) starting as process 1 ...
2020-05-29T01:57:48.749124Z 0 [Note] InnoDB: PUNCH HOLE support available
2020-05-29T01:57:48.749141Z 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2020-05-29T01:57:48.749144Z 0 [Note] InnoDB: Uses event mutexes
2020-05-29T01:57:48.749146Z 0 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2020-05-29T01:57:48.749148Z 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-05-29T01:57:48.749324Z 0 [Note] InnoDB: Number of pools: 1
2020-05-29T01:57:48.749400Z 0 [Note] InnoDB: Using CPU crc32 instructions
2020-05-29T01:57:48.750569Z 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2020-05-29T01:57:48.759106Z 0 [Note] InnoDB: Completed initialization of buffer pool
2020-05-29T01:57:48.761304Z 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-05-29T01:57:48.809724Z 0 [Note] InnoDB: Highest supported file format is Barracuda.
2020-05-29T01:57:48.822850Z 0 [Note] InnoDB: Log scan progressed past the checkpoint lsn 155984619
2020-05-29T01:57:48.822870Z 0 [Note] InnoDB: Doing recovery: scanned up to log sequence number 155984628
2020-05-29T01:57:48.822874Z 0 [Note] InnoDB: Database was not shutdown normally!
2020-05-29T01:57:48.822876Z 0 [Note] InnoDB: Starting crash recovery.
2020-05-29T01:57:49.364512Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.364549Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.364555Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.364559Z 0 [ERROR] InnoDB: File ./ibtmp1: 'delete' returned OS error 101.
2020-05-29T01:57:49.364563Z 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-05-29T01:57:49.365233Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.365244Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.365247Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.365250Z 0 [ERROR] InnoDB: File ./ibtmp1: 'stat' returned OS error 101.
2020-05-29T01:57:49.365275Z 0 [ERROR] InnoDB: os_file_get_status() failed on './ibtmp1'. Can't determine file permissions
2020-05-29T01:57:49.365278Z 0 [ERROR] InnoDB: Could not create the shared innodb_temporary.
2020-05-29T01:57:49.365280Z 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2020-05-29T01:57:49.566478Z 0 [ERROR] InnoDB: Operating system error number 1 in a file operation.
2020-05-29T01:57:49.566527Z 0 [ERROR] InnoDB: Error number 1 means 'Operation not permitted'
2020-05-29T01:57:49.566563Z 0 [Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html
2020-05-29T01:57:49.566568Z 0 [ERROR] InnoDB: File ./ibtmp1: 'delete' returned OS error 101.
2020-05-29T01:57:49.566573Z 0 [ERROR] Plugin 'InnoDB' init function returned error.
2020-05-29T01:57:49.566576Z 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2020-05-29T01:57:49.566581Z 0 [ERROR] Failed to initialize builtin plugins.
2020-05-29T01:57:49.566583Z 0 [ERROR] Aborting

2020-05-29T01:57:49.566587Z 0 [Note] Binlog end
2020-05-29T01:57:49.566638Z 0 [Note] Shutting down plugin 'CSV'
2020-05-29T01:57:49.569736Z 0 [Note] mysqld: Shutdown complete`

Assim está minha configuracão da imagem no .yml

database: image: fjardim/mysql container_name: database hostname: database ports: - "33061:3306" deploy: resources: limits: memory: 500m command: mysqld --innodb-flush-method=O_DSYNC --innodb-use-native-aio=OFF --init-file /data/application/init.sql volumes: - /c/docker/bigdata_docker/data/mysql/data:/var/lib/mysql - /c/docker/bigdata_docker/data/init.sql:/data/application/init.sql environment: MYSQL_ROOT_USER: root MYSQL_ROOT_PASSWORD: secret MYSQL_DATABASE: hue MYSQL_USER: root MYSQL_PASSWORD: secret
Alguma idéia do que pode estar causando o erro? Lembrado que estou usando o Windows 10 e docker desktop para executar tudo.

Obrigado!

move no lugar de rm

em uma parte do código, voce pede para renomear o arquivo, porém o comando que é passado é um move. Acredito que o correto seria rename

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.