vesoft-inc / nebula-spark-utils Goto Github PK
View Code? Open in Web Editor NEWSpark related libraries and tools
Spark related libraries and tools
add summary report in the end, including:
具体问题描述可以查看:https://discuss.nebula-graph.com.cn/t/topic/3667
报错信息为:org.rocksdb.RocksDBException: Keys must be added in strict ascending order.
具体报错如下:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 8.0 failed 4 times, most recent failure: Lost task 4.3 in stage 8.0 (TID 96, executor 10): org.rocksdb.RocksDBException: Keys must be added in strict ascending order.
at org.rocksdb.SstFileWriter.put(Native Method)
at org.rocksdb.SstFileWriter.put(SstFileWriter.java:132)
at com.vesoft.nebula.exchange.writer.NebulaSSTWriter.write(FileBaseWriter.scala:42)
at com.vesoft.nebula.exchange.processor.EdgeProcessor$$anonfun$process$3$$anonfun$apply$3.apply(EdgeProcessor.scala:245)
at com.vesoft.nebula.exchange.processor.EdgeProcessor$$anonfun$process$3$$anonfun$apply$3.apply(EdgeProcessor.scala:219)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at com.vesoft.nebula.exchange.processor.EdgeProcessor$$anonfun$process$3.apply(EdgeProcessor.scala:219)
at com.vesoft.nebula.exchange.processor.EdgeProcessor$$anonfun$process$3.apply(EdgeProcessor.scala:214)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$28.apply(RDD.scala:980)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
v14 is too old!
can we update guava version like exchange1.0? just need replace 'getHostText' to 'getHost'.
Nebula Graph 2.0.0
Deployment type (Single-Host)
Hardware info
Disk in use (HDD)
CPU: Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz * 10
Ram: 64 G
+----+------------------+------------------+----------------+---------+------------+--------------------+-------------+-----------+ | ID | Name | Partition Number | Replica Factor | Charset | Collate | Vid Type | Atomic Edge | Group | +----+------------------+------------------+----------------+---------+------------+--------------------+-------------+-----------+ | 1 | "CallConnection" | 10 | 1 | "utf8" | "utf8_bin" | "FIXED_STRING(80)" | "false" | "default" | +----+------------------+------------------+----------------+---------+------------+--------------------+-------------+-----------+
I wanna use pagerank algorithm. I use below configuration:
{
# Spark relation config
spark: {
app: {
name: LPA
# spark.app.partitionNum
partitionNum:100
}
master:local
}
data: {
# data source. optional of nebula,csv,json
source: nebula
# data sink, means the algorithm result will be write into this sink. optional of nebula,csv,text
sink: nebula
# if your algorithm needs weight
hasWeight: false
}
# Nebula Graph relation config
nebula: {
# algo's data source from Nebula. If data.source is nebula, then this nebula.read config can be valid.
read: {
# Nebula metad server address, multiple addresses are split by English comma
metaAddress: "127.0.0.1:9559"
# Nebula space
space: CallConnection
# Nebula edge types, multiple labels means that data from multiple edges will union together
labels: ["callTO"]
# Nebula edge property name for each edge type, this property will be as weight col for algorithm.
# Make sure the weightCols are corresponding to labels.
# weightCols: ["start_year"]
}
# algo result sink into Nebula. If data.sink is nebula, then this nebula.write config can be valid.
write:{
# Nebula graphd server address, multiple addresses are split by English comma
graphAddress: "127.0.0.1:9669"
# Nebula metad server address, multiple addresses are split by English comma
metaAddress: "127.0.0.1:9559,127.0.0.1:9560"
user:user
pswd:password
# Nebula space name
space:CallConnection
# Nebula tag name, the algorithm result will be write into this tag
tag:pagerank
}
}
# local: {
# # algo's data source from Nebula. If data.source is csv or json, then this local.read can be valid.
# read:{
# filePath: "hdfs://127.0.0.1:9000/edge/work_for.csv"
# # srcId column
# srcId:"_c0"
# # dstId column
# dstId:"_c1"
# # weight column
# #weight: "col3"
# # if csv file has header
# header: false
# # csv file's delimiter
# delimiter:","
# }
# # algo result sink into local file. If data.sink is csv or text, then this local.write can be valid.
# write:{
# resultPath:/tmp/
# }
# }
algorithm: {
# the algorithm that you are going to execute,pick one from [pagerank, louvain, connectedcomponent,
# labelpropagation, shortestpaths, degreestatic, kcore, stronglyconnectedcomponent, trianglecount,
# betweenness]
executeAlgo: pagerank
# PageRank parameter
pagerank: {
maxIter: 2
resetProb: 0.15 # default 0.15
}
}
}
after run spark-submit as this
spark-submit --master local --class com.vesoft.nebula.algorithm.Main nebula-algorithm-2.0.0.jar -p classes/Algorithm.conf
after about several minutes and use all CPU but not RAM,
here is the spark UI
spark-submit --master local --class com.vesoft.nebula.algorithm.Main nebula-algorithm-2.0.0.jar -p classes/Algorithm.conf
21/05/19 06:19:53 WARN Utils: Your hostname, orientserver resolves to a loopback address: 127.0.1.1; using 10.10.1.121 instead (on interface ens33)
21/05/19 06:19:53 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/05/19 06:20:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (com.vesoft.nebula.algorithm.Main$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/05/19 06:20:00 INFO SparkContext: Running Spark version 2.4.7
21/05/19 06:20:00 INFO SparkContext: Submitted application: LPA
21/05/19 06:20:00 INFO SecurityManager: Changing view acls to: orient
21/05/19 06:20:00 INFO SecurityManager: Changing modify acls to: orient
21/05/19 06:20:00 INFO SecurityManager: Changing view acls groups to:
21/05/19 06:20:00 INFO SecurityManager: Changing modify acls groups to:
21/05/19 06:20:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(orient); groups with view permissions: Set(); users with modify permissions: Set(orient); groups with modify permissions: Set()
21/05/19 06:20:00 INFO Utils: Successfully started service 'sparkDriver' on port 36273.
21/05/19 06:20:00 INFO SparkEnv: Registering MapOutputTracker
21/05/19 06:20:00 INFO SparkEnv: Registering BlockManagerMaster
21/05/19 06:20:00 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/19 06:20:00 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/19 06:20:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e15b4c05-c314-463e-9f06-a1fca904f7e7
21/05/19 06:20:00 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
21/05/19 06:20:01 INFO SparkEnv: Registering OutputCommitCoordinator
21/05/19 06:20:01 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/05/19 06:20:01 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.10.1.121:4040
21/05/19 06:20:01 INFO SparkContext: Added JAR file:/home/orient/NebulaJavaTools/nebula-spark-utils/nebula-algorithm/target/nebula-algorithm-2.0.0.jar at spark://10.10.1.121:36273/jars/nebula-algorithm-2.0.0.jar with timestamp 1621405201288
21/05/19 06:20:01 INFO Executor: Starting executor ID driver on host localhost
21/05/19 06:20:01 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34449.
21/05/19 06:20:01 INFO NettyBlockTransferService: Server created on 10.10.1.121:34449
21/05/19 06:20:01 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/19 06:20:01 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.10.1.121, 34449, None)
21/05/19 06:20:01 INFO BlockManagerMasterEndpoint: Registering block manager 10.10.1.121:34449 with 366.3 MB RAM, BlockManagerId(driver, 10.10.1.121, 34449, None)
21/05/19 06:20:01 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.10.1.121, 34449, None)
21/05/19 06:20:01 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.10.1.121, 34449, None)
21/05/19 06:20:01 INFO ReadNebulaConfig$: NebulaReadConfig={space=CallConnection,label=callTO,returnCols=List(),noColumn=true,partitionNum=10}
21/05/19 06:20:02 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/orient/NebulaJavaTools/nebula-spark-utils/nebula-algorithm/target/spark-warehouse').
21/05/19 06:20:02 INFO SharedState: Warehouse path is 'file:/home/orient/NebulaJavaTools/nebula-spark-utils/nebula-algorithm/target/spark-warehouse'.
21/05/19 06:20:02 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
21/05/19 06:20:02 INFO NebulaDataSource: create reader
21/05/19 06:20:02 INFO NebulaDataSource: options {spacename=CallConnection, nocolumn=true, metaaddress=127.0.0.1:9559, label=callTO, type=edge, connectionretry=2, timeout=6000, executionretry=1, paths=[], limit=1000, returncols=, partitionnumber=10}
21/05/19 06:20:02 INFO NebulaDataSourceEdgeReader: dataset's schema: StructType(StructField(_srcId,StringType,false), StructField(_dstId,StringType,false), StructField(_rank,LongType,false))
21/05/19 06:20:04 INFO NebulaDataSource: create reader
21/05/19 06:20:04 INFO NebulaDataSource: options {spacename=CallConnection, nocolumn=true, metaaddress=127.0.0.1:9559, label=callTO, type=edge, connectionretry=2, timeout=6000, executionretry=1, paths=[], limit=1000, returncols=, partitionnumber=10}
21/05/19 06:20:04 INFO DataSourceV2Strategy:
Pushing operators to class com.vesoft.nebula.connector.NebulaDataSource
Pushed Filters:
Post-Scan Filters:
Output: _srcId#0, _dstId#1, _rank#2L
21/05/19 06:20:05 INFO CodeGenerator: Code generated in 205.90032 ms
21/05/19 06:20:05 INFO SparkContext: Starting job: fold at VertexRDDImpl.scala:90
21/05/19 06:20:05 INFO DAGScheduler: Registering RDD 9 (mapPartitions at VertexRDD.scala:356) as input to shuffle 3
21/05/19 06:20:05 INFO DAGScheduler: Registering RDD 27 (mapPartitions at VertexRDDImpl.scala:247) as input to shuffle 1
21/05/19 06:20:05 INFO DAGScheduler: Registering RDD 23 (mapPartitions at VertexRDDImpl.scala:247) as input to shuffle 0
21/05/19 06:20:05 INFO DAGScheduler: Registering RDD 31 (mapPartitions at GraphImpl.scala:208) as input to shuffle 2
21/05/19 06:20:05 INFO DAGScheduler: Got job 0 (fold at VertexRDDImpl.scala:90) with 10 output partitions
21/05/19 06:20:05 INFO DAGScheduler: Final stage: ResultStage 4 (fold at VertexRDDImpl.scala:90)
21/05/19 06:20:05 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 0, ShuffleMapStage 3)
21/05/19 06:20:05 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 0, ShuffleMapStage 3)
21/05/19 06:20:05 INFO DAGScheduler: Submitting ShuffleMapStage 0 (VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[9] at mapPartitions at VertexRDD.scala:356), which has no missing parents
21/05/19 06:20:05 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 16.4 KB, free 366.3 MB)
21/05/19 06:20:05 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 7.6 KB, free 366.3 MB)
21/05/19 06:20:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.10.1.121:34449 (size: 7.6 KB, free: 366.3 MB)
21/05/19 06:20:05 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
21/05/19 06:20:05 INFO DAGScheduler: Submitting 10 missing tasks from ShuffleMapStage 0 (VertexRDD.createRoutingTables - vid2pid (aggregation) MapPartitionsRDD[9] at mapPartitions at VertexRDD.scala:356) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9))
21/05/19 06:20:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks
21/05/19 06:20:06 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 9761 bytes)
21/05/19 06:20:06 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
21/05/19 06:20:06 INFO Executor: Fetching spark://10.10.1.121:36273/jars/nebula-algorithm-2.0.0.jar with timestamp 1621405201288
21/05/19 06:20:06 INFO TransportClientFactory: Successfully created connection to /10.10.1.121:36273 after 44 ms (0 ms spent in bootstraps)
21/05/19 06:20:06 INFO Utils: Fetching spark://10.10.1.121:36273/jars/nebula-algorithm-2.0.0.jar to /tmp/spark-17e5e989-b080-4a41-aac3-f9fd56f03752/userFiles-693448ab-9c10-468b-a2a9-c926b9e62800/fetchFileTemp3358155586060883895.tmp
21/05/19 06:20:06 INFO Executor: Adding file:/tmp/spark-17e5e989-b080-4a41-aac3-f9fd56f03752/userFiles-693448ab-9c10-468b-a2a9-c926b9e62800/nebula-algorithm-2.0.0.jar to class loader
21/05/19 06:20:07 INFO NebulaEdgePartitionReader: partition index: 1, scanParts: List(1)
21/05/19 06:20:07 INFO CodeGenerator: Code generated in 23.996116 ms
after some minutes,
just do nothing and all task gone!
Migrate data from sst files only adds forward edge but doesn't add reverse edge.
this is spark job log
driver.log
space sst_test
tag idno
data example
230421197906123305,0,客户0,0,2021/1/1,女,博士,离异,1,black 230421197906123305,1,客户1,1,2021/1/1,女,博士,离异,1,black 230421197906123305,2,客户2,2,2021/1/1,女,博士,离异,1,black 533101196807196696,3,客户3,3,2021/1/1,女,博士,离异,1,black 533101196807196696,4,客户4,4,2021/1/1,女,博士,离异,1,black 220722198306264943,5,客户5,5,2021/1/1,女,博士,离异,1,black 220722198306264943,6,客户6,6,2021/1/1,女,博士,离异,1,black 220722198306264943,7,客户7,7,2021/1/1,女,博士,离异,1,black 310200197802274261,8,客户8,8,2021/1/1,女,博士,离异,1,black 310200197802274261,9,客户9,9,2021/1/1,女,博士,离异,1,black
edge address_id
data example
`贵州省黔东南苗族侗族自治州榕江县镜湖花园22栋177号,230421197906123305,2019-02-20 08:02:20
贵州省黔东南苗族侗族自治州榕江县镜湖花园22栋177号,230421197906123305,2019-01-30 13:26:01
贵州省黔东南苗族侗族自治州榕江县镜湖花园22栋177号,230421197906123305,2019-03-31 03:19:30
四川省泸州市龙马潭区锋尚名居8栋734号,533101196807196696,2019-05-14 22:31:58
四川省泸州市龙马潭区锋尚名居8栋734号,533101196807196696,2020-02-20 06:09:22
河北省邢台市巨鹿县市聚福园16栋228号,220722198306264943,2020-12-11 12:23:53
河北省邢台市巨鹿县市聚福园16栋228号,220722198306264943,2020-01-08 05:19:52
河北省邢台市巨鹿县市聚福园16栋228号,220722198306264943,2020-04-29 09:23:55
浙江省丽水地区青田县尚书苑8栋271号,310200197802274261,2019-09-03 01:40:32
浙江省丽水地区青田县尚书苑8栋271号,310200197802274261,2019-04-23 18:37:49`
application-sst.conf
# Spark relation config
spark: {
app: {
name: Nebula Exchange 2.5
}
master:local
driver: {
cores: 1
maxResultSize: 1G
memory: 1G
}
executor: {
memory: 4G
instances: 1
cores: 10
}
cores:{
max: 48
}
}
# Nebula Graph relation config
nebula: {
address:{
graph:["xxxx:9669"]
meta:["xxxx:9559"]
}
user: root
pswd: nebula
space: sst_test
# parameters for SST import, not required
path:{
local: "/tmp"
remote: "/user/frms/sst"
hdfs.namenode: "hdfs://xxxx:8020"
}
# nebula client connection parameters
connection {
timeout: 3000
retry: 3
}
# nebula client execution parameters
execution {
retry: 3
}
error: {
# max number of failures, if the number of failures is bigger than max, then exit the application.
max: 32
# failed import job will be recorded in output path
output: /user/frms/error
}
# use google's RateLimiter to limit the requests send to NebulaGraph
rate: {
# the stable throughput of RateLimiter
limit: 1024
# Acquires a permit from RateLimiter, unit: MILLISECONDS
# if it can't be obtained within the specified timeout, then give up the request.
timeout: 1000
}
}
# Processing tags
# There are tag config examples for different dataSources.
tags: [
# csv
{
name: address
type: {
source: csv
sink: sst
}
path: "hdfs://xxxx:8020/user/frms/nebula-data/address.csv"
# if your csv file has no header, then use _c0,_c1,_c2,.. to indicate fields
fields: []
nebula.fields: []
vertex: {
field: _c0
# policy: "hash"
}
separator: ","
header: false
batch: 2560
partition: 32
}
{
name: idno
type: {
source: csv
sink: sst
}
path: "hdfs://xxxx:8020/user/frms/nebula-data/id.csv"
fields: [_c1,_c2,_c3,_c4,_c5,_c6,_c7,_c8,_c9]
nebula.fields: [a1,a2,a3,a4,a5,a6,a7,a8,a9]
vertex: {
field: _c0
# policy: "hash"
}
separator: ","
header: false
batch: 2560
partition: 32
}
]
# Processing edges
# There are edge config examples for different dataSources.
edges: [
# csv
{
name: address_id
type: {
source: csv
sink: sst
}
path: "hdfs://xxxx:8020/user/frms/nebula-data/addressid.csv"
fields: [_c2]
nebula.fields: [create_time]
source: _c0
target: _c1
# ranking: rank
separator: ","
header: false
batch: 2560
partition: 32
}
]
}
run command
./spark-2.4.8-bin-hadoop2.6/bin/spark-submit --master yarn-client --class com.vesoft.nebula.exchange.Exchange nebula-exchange-2.5-SNAPSHOT.jar -c application-sst.conf
when data source vertex is null, exchange have not a good warning but just a exception like 'java.lang.NullPointException'.when data source is not strict,someone do not know why and how to fix it.
go get 2.0 时, 报错.
命令以及报错详情:
$ go get -v github.com/vesoft-inc/[email protected]
go get github.com/vesoft-inc/[email protected]: github.com/vesoft-inc/[email protected]: invalid version: module contains a go.mod file, so major version must be compatible: should be v0 or v1, not v2
查阅文档可知, go 官方推荐在 v2 及更高版本中, 使用 {path}/v2
的方式, 从而管理多个版本. 参考文档: Go Modules: v2 and Beyond
如何解决
module github.com/vesoft-inc/nebula-go/v2
.如何复现
unset GOPROXY
go get -v github.com/vesoft-inc/[email protected]
希望exchange支持clickhouse数据源
I hope that exchange can support datasource ClickHouse
.
版本:Nebula Exchange 2.0, Nebula Graph 2.0.1 spark2.4
os:Ubuntu 18.04
按照文档说明编写了neo4j_application.conf ,执行命令${SPARK_HOME}/bin/spark-submit --master "local" --class com.vesoft.nebula.exchange.Exchange /root/nebula-spark-utils/nebula-exchange/target/nebula-exchange-2.0.0.jar -c /root/nebula-spark-utils/nebula-exchange/target/classes/neo4j_application.conf 报错误:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.neo4j.driver.exceptions.ClientException: Database name parameter for selecting database is not supported in Bolt Protocol Version 3. Database name: 'graph.db'
再次参考文档,文档上说:Exchange使用Neo4j Driver 4.0.1实现对Neo4j数据的读取。查找Neo4j官方未找到Neo4j Driver ,但找到neo4j-spark-connector 不知道是不是这个,请确认下,若不是,希望给出Neo4j Driver 4.0.1的地址,
Hi,
Thanks for this great work!!
I tried some cluster algorithm and generate cluster id for each vertex. Now I need to compute betweenness centrality of on each cluster(because the whole graph is too large). But I do not know how to generate subgraphs according to the cluster ids, and how could I compute betweenness centrality in parallel on each subgraph.
Would you please tell me how could I make this work?
can not find com.vesoft.client.2.0.0-SNAPSHOT jar.
According to the readme instructions, once I've cloned nebula-spark-utils repo I should just enter
mvn clean package -Dmaven.test.skip=true -Dgpg.skip -Dmaven.javadoc.skip=true
Nevertheless there's no way to run it and I always get a BUILD FAILURE ERROR:
Here is the output i get
[INFO] Scanning for projects...
[WARNING]
[WARNING] Some problems were encountered while building the effective model for com.vesoft:nebula-exchange:jar:2.0.0
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-surefire-plugin is missing. @ line 98, column 21
[WARNING] 'build.plugins.plugin.version' for net.alchim31.maven:scala-maven-plugin is missing. @ line 173, column 21
[WARNING]
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING]
[INFO]
[INFO] ---------------------< com.vesoft:nebula-exchange >---------------------
[INFO] Building nebula-exchange 2.0.0
[INFO] --------------------------------[ jar ]---------------------------------
Downloading from SparkPackagesRepo: http://dl.bintray.com/spark-packages/maven/neo4j-contrib/neo4j-spark-connector/2.4.5-M1/neo4j-spark-connector-2.4.5-M1.pom
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 7.472 s
[INFO] Finished at: 2021-05-10T23:16:41+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal on project nebula-exchange: Could not resolve dependencies for project com.vesoft:nebula-exchange:jar:2.0.0: Failed to collect dependencies at neo4j-contrib:neo4j-spark-connector:jar:2.4.5-M1: Failed to read artifact descriptor for neo4j-contrib:neo4j-spark-connector:jar:2.4.5-M1: Could not transfer artifact neo4j-contrib:neo4j-spark-connector:pom:2.4.5-M1 from/to SparkPackagesRepo (http://dl.bintray.com/spark-packages/maven): Authorization failed for http://dl.bintray.com/spark-packages/maven/neo4j-contrib/neo4j-spark-connector/2.4.5-M1/neo4j-spark-connector-2.4.5-M1.pom 403 Forbidden -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
Am I missing something?
Version: v2.0.0
Debugging with breakpoint, it seems that the louvain result only contains the core vertecies in community, which means we cannot figure out all vertecies in one community.
Here is my test data:
src,dst,weight
1,2,1.0
1,3,5.0
2,1,5.0
2,3,5.0
3,4,1.0
4,5,1.0
4,6,1.0
4,7,1.0
4,8,1.0
5,7,1.0
5,8,1.0
6,8,1.0
params:
val louvainConfig = new LouvainConfig(5, 3, 0.005)
val louvainResult = LouvainAlgo.apply(spark, data, louvainConfig, hasWeight = false)
And the result by calling louvainResult.show():
+---+--------+
|_id|_louvain|
+---+--------+
| 4| 4|
| 1| 1|
| 5| 5|
+---+--------+
Apparently, there are more than 3 vertecies.
And I think it's more meaningful to users by changing the code to:
def getCommunities(G: Graph[VertexData, Double]): RDD[Row] = {
val communities = G.vertices
.flatMap(x => {
x._2.innerVertices
.map(y => {
Row(y, x._2.cId)
})
})
communities
}
Nebula Chinese forum related: https://discuss.nebula-graph.com.cn/t/topic/4701/5
Hadoop maintains a cache for fileSystem, so don't close the hdfs fileSystem in the code, or sometimes it occurs a "Filesystem closed" error.
in error data,rank index maybe null,will happend NullPointException , should assert it.
是否有像 elasticsearch 那样的方法来导入到 nebula
options = OrderedDict()
options["es.nodes"] = ['127.0.0.1:9200']
options["es.index.auto.create"] = "true"
options["es.resource"] = "nebula/docs"
df.write.format("org.elasticsearch.spark.sql") \
.options(**options) \
.save(mode='append')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.