Comments (2)
With the last question I mean that for example for a process that has different output files written and therefore different executions the target our output file is IL_CLE_2_1_2 with a URL of file:/fastdisk3/flight_searches/2_1_2/IL_CLE_2_1_2
This is the picture
Inside the processing node this is the node where it is written
Maybe the partitions are cuasing the change in data source name?
Then another execution that uses the same output data source as input source and has the same URL the data source name is different (it is called by the last partition).
Any reason for this? I want to have the complete view in the lineage diagram of both executions.
Thanks in advance
from spline.
how do different executions get related? By the data source name?
By the data source URI - this can be different for each data source type, but at the end it's a String in the database.
how does a data source node get named?
There are multiple plugins of difference data source types in Spark Spline Agent. Each Plugin is responsible for extracting the URIs from its data sources.
Maybe the partitions are cuasing the change in data source name?
Yes, this is a known issue.
Generally, this is an unsolvable problem. Consider OS path and one file having multiple aliases or server that is accessible from different IP addresses from different networks.
We want to solve this eventually by allowing Spline admin to define which URIs should be considered the same data source, but work on this haven't even began.
See the issue here: #689
from spline.
Related Issues (20)
- Admin: print a backup reminder before database migration
- Admin: add `--dry-run` option
- Fix override of JaCoCo argLine settings in pom file
- Security layer HOT 2
- Spline support for expand operation
- Improve code-coverage & add GH check action HOT 1
- Multi-arch docker builds
- Add incoming REST payload size into the persistent object extra info
- Delete older or unusable execution events HOT 4
- `TxInfo` property name discrepancy, and missing index.
- Kafka :: message failure handling HOT 2
- spline kafka trust packages HOT 3
- Spline Admin - replication factor set to 1 HOT 1
- The class 'ExecutionPlan_1.2' is not in the trusted packages | Old Changes in "develop" branch not available under latest release tags. HOT 1
- Add date to default log message
- 400 Arango error on inserting to executionPlan HOT 1
- Useless AQL error logs
- Databricks- Support of Delta live tables
- CI :: Build fails with ClassNotFound
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spline.