Comments (19)
Focus on your dissertation, I'm busy too.
Let's keep in touch.
Thank you
from sparkler.
@francesco1119 Thanks for reaching out.
Server refused connection at: http://localhost:8983/solr/crawldb
says solr service is not running.
Please check/debug why solr is not starting up. If it is running why you are getting this exception
Caused by: org.apache.http.conn.HttpHostConnectException: Connect to localhost:8983 [localhost/127.0.0.1] failed: Connection refused
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:156)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:376)
at org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:393)
at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:236)
at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:564)
from sparkler.
Hi @thammegowda and thank you for your help.
In the documentation is not written I have to install solr
.
In fact I taught it came within the docker image....
I followed your documentation and it says that installing solr
is an option.
As I'm a new user I can help you out to rewrite your documentation but sincerely I have no idea why solr
is not starting
from sparkler.
@francesco1119
the dockler.sh is supposed to start solr service.
I just ran it now and I got
bash dockler.sh
Cant find docker image sparkler-local. Going to Fetch it
Fetching uscdatascience/sparkler:latest and tagging as sparkler-local
[...truncated]
Found image: 7bf3f592ca23
No container is running for 7bf3f592ca23. Starting it...
Starting solr server inside the container
Waiting up to 180 seconds to see Solr running on port 8983 [/]
Started Solr server on port 8983 (pid=61). Happy searching!
In the last part of the output, it starts solr, waits until solr is up before going to the next step.
I don't see these messages in your output.
from sparkler.
@thammegowda , I tried again.
I run bash dockler.sh
and I receive:
Cant find docker image sparkler-local. Going to Fetch it
Fetching uscdatascience/sparkler:latest and tagging as sparkler-local
latest: Pulling from uscdatascience/sparkler
Digest: sha256:4395aa8e69a220cd3bf52ada94aa6dc2ed3e84919470a007faf9cf80f89308eb
Status: Image is up to date for uscdatascience/sparkler:latest
docker.io/uscdatascience/sparkler:latest
Found image: 7bf3f592ca23
No container is running for 7bf3f592ca23. Starting it...
Starting solr server inside the container
Waiting up to 180 seconds to see Solr running on port 8983 [-]
Started Solr server on port 8983 (pid=62). Happy searching!
Going to launch the shell inside sparkler's docker container.
You can press CTRL-D to exit.
You can rerun this script to resume.
You can access solr at http://localhost:8983/solr when solr is running
You can spark master UI at http://localhost:4041/ when spark master is running
Some useful queries:
- Get stats on groups, status, depth:
http://localhost:8983/solr/crawldb/query?q=*:*&rows=0&facet=true&&facet.field=crawl_id&facet.field=status&facet.field=group&facet.field=discover_depth
Inside docker, you can do the following:
/data/solr/bin/solr - command line tool for administering solr
start -force -> start solr
stop -force -> stop solr
status -force -> get status of solr
restart -force -> restart solr
/data/sparkler/bin/sparkler.sh - command line interface to sparkler
inject - inject seed urls
crawl - launch a crawl job
And yes, everything is fine, solr
is up and running:
I then run /data/sparkler/bin/sparkler.sh inject -id 1 -su 'http://www.bbc.com/news'
and the command executes correctly. Or at least this is what I believe:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sparkler/sparkler-app-0.3.1-SNAPSHOT/lib/org.apache.logging.log4j.log4j-slf4j-impl-2.11.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/sparkler/sparkler-app-0.3.1-SNAPSHOT/lib/org.slf4j.slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-12-01 19:57:12 INFO PluginService$:53 - Loading plugins...
2021-12-01 19:57:12 INFO PluginService$:62 - 2 plugin(s) Active: [urlfilter-regex, urlfilter-samehost]
2021-12-01 19:57:13 WARN PluginService$:65 - 4 extra plugin(s) available but not activated: Set(fetcher-chrome, scorer-dd-svn, fetcher-jbrowser, fetcher-htmlunit)
2021-12-01 19:57:13 DEBUG PluginService$:68 - Loading urlfilter-regex
2021-12-01 19:57:13 INFO PluginService$:73 - Extensions found: []
2021-12-01 19:57:13 DEBUG PluginService$:68 - Loading urlfilter-samehost
2021-12-01 19:57:13 INFO PluginService$:73 - Extensions found: []
2021-12-01 19:57:13 INFO PluginService$:82 - Recognised Plugins: Map()
2021-12-01 19:57:13 INFO Injector$:108 - Injecting 1 seeds
>>jobId = 1
2021-12-01 19:57:13 WARN PluginService$:49 - Stopping all plugins... Runtime is about to exit.
And when I pass to the very last step with /data/sparkler/bin/sparkler.sh crawl -id 1 -tn 100 -i 2 # id=1, top 100 URLs, do -i=2 iterations
:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/sparkler/sparkler-app-0.3.1-SNAPSHOT/lib/org.apache.logging.log4j.log4j-slf4j-impl-2.11.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/sparkler/sparkler-app-0.3.1-SNAPSHOT/lib/org.slf4j.slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/data/sparkler/sparkler-app-0.3.1-SNAPSHOT/lib/org.apache.spark.spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
2021-12-01 19:58:24 WARN NativeCodeLoader:60 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-12-01 19:58:26 INFO Crawler$:160 - Setting local job: {User-Agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Sparkler/${project.version}, Accept=text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8, Accept-Language=en-US,en}
2021-12-01 19:58:26 INFO Crawler$:174 - Committing crawldb..
2021-12-01 19:58:26 INFO Crawler$:219 - Starting the job:1, task:906e00e1-7369-4a64-9593-17fe85d0566a
2021-12-01 19:58:26 INFO MemexCrawlDbRDD$:54 - selecting 1 out of 1
2021-12-01 19:58:27 DEBUG SolrResultIterator$:63 - Query status:UNFETCHED, Start = 0
2021-12-01 19:58:27 DEBUG SolrResultIterator$:77 - Reached the end of result set
2021-12-01 19:58:27 DEBUG SolrResultIterator$:79 - closing solr client.
2021-12-01 19:58:27 WARN BlockManager:69 - Block rdd_3_0 could not be removed as it was not found on disk or in memory
2021-12-01 19:58:27 ERROR Executor:94 - Exception in task 0.0 in stage 1.0 (TID 1)
java.lang.NoSuchMethodError: 'void net.jpountz.lz4.LZ4BlockInputStream.<init>(java.io.InputStream, net.jpountz.lz4.LZ4FastDecompressor, java.util.zip.Checksum, boolean)'
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:154) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:165) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:126) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.shuffle.BlockStoreShuffleReader.$anonfun$read$1(BlockStoreShuffleReader.scala:74) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:630) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:70) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486) ~[org.scala-lang.scala-library-2.12.12.jar:?]
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492) ~[org.scala-lang.scala-library-2.12.12.jar:?]
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) ~[org.scala-lang.scala-library-2.12.12.jar:?]
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:116) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.scheduler.Task.run(Task.scala:127) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) ~[org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) [org.apache.spark.spark-core_2.12-3.0.1.jar:3.0.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
2021-12-01 19:58:27 WARN TaskSetManager:69 - Lost task 0.0 in stage 1.0 (TID 1, 969ed83b7c3d, executor driver): java.lang.NoSuchMethodError: 'void net.jpountz.lz4.LZ4BlockInputStream.<init>(java.io.InputStream, net.jpountz.lz4.LZ4FastDecompressor, java.util.zip.Checksum, boolean)'
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:154)
at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:165)
at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:126)
at org.apache.spark.shuffle.BlockStoreShuffleReader.$anonfun$read$1(BlockStoreShuffleReader.scala:74)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:630)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:70)
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:116)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
2021-12-01 19:58:27 ERROR TaskSetManager:73 - Task 0 in stage 1.0 failed 1 times; aborting job
Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:567)
at edu.usc.irds.sparkler.Main$.main(Main.scala:50)
at edu.usc.irds.sparkler.Main.main(Main.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, 969ed83b7c3d, executor driver): java.lang.NoSuchMethodError: 'void net.jpountz.lz4.LZ4BlockInputStream.<init>(java.io.InputStream, net.jpountz.lz4.LZ4FastDecompressor, java.util.zip.Checksum, boolean)'
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:154)
at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:165)
at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:126)
at org.apache.spark.shuffle.BlockStoreShuffleReader.$anonfun$read$1(BlockStoreShuffleReader.scala:74)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:630)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:70)
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:116)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2152)
at edu.usc.irds.sparkler.pipeline.Crawler.score(Crawler.scala:254)
at edu.usc.irds.sparkler.pipeline.Crawler.$anonfun$run$1(Crawler.scala:231)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at edu.usc.irds.sparkler.pipeline.Crawler.run(Crawler.scala:179)
at edu.usc.irds.sparkler.base.CliTool.run(CliTool.scala:34)
at edu.usc.irds.sparkler.base.CliTool.run$(CliTool.scala:32)
at edu.usc.irds.sparkler.pipeline.Crawler.run(Crawler.scala:50)
at edu.usc.irds.sparkler.pipeline.Crawler$.main(Crawler.scala:338)
at edu.usc.irds.sparkler.pipeline.Crawler.main(Crawler.scala)
... 6 more
Caused by: java.lang.NoSuchMethodError: 'void net.jpountz.lz4.LZ4BlockInputStream.<init>(java.io.InputStream, net.jpountz.lz4.LZ4FastDecompressor, java.util.zip.Checksum, boolean)'
at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:154)
at org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:165)
at org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:126)
at org.apache.spark.shuffle.BlockStoreShuffleReader.$anonfun$read$1(BlockStoreShuffleReader.scala:74)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:630)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:70)
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:31)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:155)
at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:41)
at org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:116)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:362)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:830)
I'm litterary following your documentation
from sparkler.
from sparkler.
Thank you @lewismc ,
I have installed the latest version of Docker today, this is the only thing that has changed since yesterday.
So maybe this change at environment level has trigged something that allowed me to go to the next step.
...we will never know what that was. Sorry I haven't noted on what Docker version I have tested yesterday, it might have been something 6 month old but not more.
I'm watching your repository and I will definitely try your next release as soon as it's out.
Can you please conform that you have tried on your end and I experience the same with a fresh new installation.
Otherwise if you can't reproduce I keep investigating.
from sparkler.
@lewismc , I see where the problem is:
Caused by: java.lang.NoSuchMethodError: 'void net.jpountz.lz4.LZ4BlockInputStream.<init>(java.io.InputStream, net.jpountz.lz4.LZ4FastDecompressor, java.util.zip.Checksum, boolean)'
Is mentioned on the very first page of your GitHub project:
<exclusions>
<exclusion>
<groupId>net.jpountz.lz4</groupId>
<artifactId>lz4</artifactId>
</exclusion>
</exclusions>
The exclusion of that class was hardcoded
from sparkler.
I believe this issue is due to Spark and Kafka being incompatible on lz4 dependency; https://stackoverflow.com/a/51052507/1506477
And excluding lz4 from Kafka is the right thing to do (hence exclusion is good!)
However, in docker hub, https://hub.docker.com/repository/docker/uscdatascience/sparkler
I see the docker image was last updated 6 months ago, but this exclusion commit is newer.
I think rebuilding Docker image and releasing it should fix this
https://github.com/USCDataScience/sparkler/wiki/Build-and-Deploy#docker-build
from sparkler.
from sparkler.
Yes @thammegowda , the link you provided has an update dating back to September that says:
Update: This appears to be an issue with Kafka 0.11.x.x and earlier version. As of 1.x.x Kafka seems to have moved away from using the problematic net.jpountz.lz4 library. Therefore, using latest Kafka (1.x) with latest Spark (2.3.x) should not have this issue.
Hence latest Spark with Latest Kafka will probably give no problem.
I look forward to test your new image.
from sparkler.
@lewismc Docs are here https://github.com/USCDataScience/sparkler/blob/master/Release-Checklist.md
I believe @buggtb has been releasing docker images since I left IRDS/JPL.
from sparkler.
@buggtb any chance of you performing a release of the new convenience binaries? Thanks
from sparkler.
I am also facing this issue. Since the code is already merged with the fix, I tried to build docker image from it and it fails here -
Step 8/13 : COPY ./sparkler-ui/sparkler-dashboard/sparkler-ui-*.war /data/solr/server/solr-webapp/sparkler
COPY failed: no source files were specified
Can you please share steps to build sparkler-ui. I don't see sparkler-dashboard in the sparkler-ui.
from sparkler.
Hi @thammegowda & @lewismc , let me know when you have a stable Docker release and I will test it on my end.
Thank you
from sparkler.
Hi all
I am having my dissertation defense this month, so totally focused on that. I will have more availability for this project in April (after my dissertation)
@buggtb @karanjeets @chrismattmann any help or suggestions here, sir/bro?
from sparkler.
Hi @thammegowda , how is going?
have you had the time to have a look at Sparkler?
I haven't experienced the same issue since but I have new problems
from sparkler.
Hello @lewismc , the error seems to have changed since last year.
If I execute:
sudo docker run -v elastic:/elasticsearch-7.17.0/data ghcr.io/uscdatascience/sparkler/sparkler:main inject -id myid -su 'http://www.bbc.com/news'
the error now is:
15:52:08.623 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config'
15:52:08.624 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'null'
15:52:08.625 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'fetcher-chrome'
15:52:08.626 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'urlfilter-regex'
15:52:08.627 [main] DEBUG org.pf4j.AbstractExtensionFinder - Loading class 'edu.usc.irds.sparkler.plugin.RegexURLFilter' using class loader 'org.pf4j.PluginClassLoader@158a8276'
15:52:08.637 [main] DEBUG org.pf4j.AbstractExtensionFinder - Checking extension type 'edu.usc.irds.sparkler.plugin.RegexURLFilter'
15:52:08.639 [main] DEBUG org.pf4j.AbstractExtensionFinder - No extensions found for extension point 'edu.usc.irds.sparkler.Config'
15:52:08.639 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'databricks-api'
15:52:08.640 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'fetcher-htmlunit'
15:52:08.641 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'url-injector'
15:52:08.642 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'urlfilter-samehost'
15:52:08.643 [main] DEBUG org.pf4j.AbstractExtensionFinder - Loading class 'edu.usc.irds.sparkler.plugin.UrlFilterSameHost' using class loader 'org.pf4j.PluginClassLoader@5fbe4146'
15:52:08.644 [main] DEBUG org.pf4j.AbstractExtensionFinder - Checking extension type 'edu.usc.irds.sparkler.plugin.UrlFilterSameHost'
15:52:08.645 [main] DEBUG org.pf4j.AbstractExtensionFinder - No extensions found for extension point 'edu.usc.irds.sparkler.Config'
15:52:08.646 [main] DEBUG org.pf4j.AbstractExtensionFinder - Finding extensions of extension point 'edu.usc.irds.sparkler.Config' for plugin 'scorer-dd-svn'
15:52:08.647 [main] DEBUG org.pf4j.AbstractExtensionFinder - No extensions found for extension point 'edu.usc.irds.sparkler.Config'
15:52:08.822 [main] INFO edu.usc.irds.sparkler.service.Injector$ - Injecting 1 seeds
15:52:12.990 [main] DEBUG org.apache.http.impl.nio.client.MainClientExec - [exchange: 1] start execution
15:52:13.007 [main] DEBUG org.apache.http.client.protocol.RequestAddCookies - CookieSpec selected: default
15:52:13.038 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - Re-using cached 'basic' auth scheme for http://localhost:9200
15:52:13.040 [main] DEBUG org.apache.http.client.protocol.RequestAuthCache - No credentials for preemptive authentication
15:52:13.041 [main] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 1] Request connection for {}->http://localhost:9200
15:52:13.045 [main] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection request: [route: {}->http://localhost:9200][total kept alive: 0; route allocated: 0 of 10; total allocated: 0 of 30]
15:52:13.088 [pool-2-thread-1] DEBUG org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager - Connection request failed
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.base/java.lang.Thread.run(Thread.java:829)
15:52:13.089 [pool-2-thread-1] DEBUG org.apache.http.impl.nio.client.InternalHttpAsyncClient - [exchange: 1] connection request failed
15:52:13.092 [pool-2-thread-1] DEBUG org.elasticsearch.client.RestClient - request [GET http://localhost:9200/] failed
java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.base/java.lang.Thread.run(Thread.java:829)
15:52:13.095 [pool-2-thread-1] DEBUG org.elasticsearch.client.RestClient - added [[host=http://localhost:9200]] to blacklist
Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at edu.usc.irds.sparkler.Main$.main(Main.scala:71)
at edu.usc.irds.sparkler.Main.main(Main.scala)
Caused by: ElasticsearchException[java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused]; nested: ExecutionException[java.net.ConnectException: Connection refused]; nested: ConnectException[Connection refused];
at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2695)
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:2171)
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:2137)
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:2105)
at org.elasticsearch.client.RestHighLevelClient.index(RestHighLevelClient.java:1241)
at edu.usc.irds.sparkler.storage.elasticsearch.ElasticsearchProxy.$anonfun$commitCrawlDb$1(ElasticsearchProxy.scala:175)
at edu.usc.irds.sparkler.storage.elasticsearch.ElasticsearchProxy.$anonfun$commitCrawlDb$1$adapted(ElasticsearchProxy.scala:172)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at edu.usc.irds.sparkler.storage.elasticsearch.ElasticsearchProxy.commitCrawlDb(ElasticsearchProxy.scala:172)
at edu.usc.irds.sparkler.service.Injector.run(Injector.scala:137)
at edu.usc.irds.sparkler.base.CliTool.run(CliTool.scala:34)
at edu.usc.irds.sparkler.base.CliTool.run$(CliTool.scala:32)
at edu.usc.irds.sparkler.service.Injector.run(Injector.scala:39)
at edu.usc.irds.sparkler.service.Injector$.main(Injector.scala:188)
at edu.usc.irds.sparkler.service.Injector.main(Injector.scala)
... 6 more
Caused by: java.util.concurrent.ExecutionException: java.net.ConnectException: Connection refused
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:257)
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:244)
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:75)
at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2692)
... 22 more
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174)
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148)
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351)
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221)
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64)
at java.base/java.lang.Thread.run(Thread.java:829)
I also have a doubt, on Docker I find 2 different repositories:
docker pull uscdatascience/sparkler:latest
docker pull ghcr.io/uscdatascience/sparkler/sparkler:main
(I'm using this one currently)
which is which?
from sparkler.
It's fixed now
from sparkler.
Related Issues (20)
- Update CI so users can download built Sparkler package
- Investigate pipeline frameworks HOT 1
- Sparkler cannot be executed on Databricks because sparkContext not pulled from sparkSession
- Sparkler Elasticsearch storage engine HOT 10
- Elasticsearch for Sparkler - Command Line Configuration HOT 2
- Elasticsearch for Sparkler - Containerization Logic HOT 4
- Elasticsearch for Sparkler - Maven Profiles
- Elasticsearch for Sparkler - Factory Design Pattern HOT 1
- Broken run script HOT 4
- Fix sparkler CI build
- Writing Data to Elasticsearch Storage Engine HOT 1
- Failed to create thread
- Unit Tests for Sparkler and Elasticsearch
- Debugging Elasticsearch Connection
- Sparkler not distributing work over nodes HOT 1
- Exclude net.jpountz.lz4 lz4 from kafka-clients dependency in sparkler-app/pom.xml HOT 2
- Build fails: could not find com.browserup:browserup-proxy-core:jar:3.0.0-SNAPSHOT
- Error from server at http://localhost:8983/solr/crawldb: ERROR: [doc=<>] unknown field 'contenthash' HOT 3
- warning: usage of JAVA_HOME is deprecated, use ES_JAVA_HOME HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkler.