Comments (2)
The fact that you can use Nextclade JSON to seed your particular database at all is a little miracle and I would not recommend to rely on it going forward. As mentioned in the docs, JSON output is unstable. Also there will be massive breaking changes in the coming weeks in Nextclade v3.
JSON format is used for internal communication between different parts of Nextclade, and as you've discovered, this is just a serialized internal struct. It naturally changes during routine development.
As a small research lab we are focusing on science and we don't have time to commit to maintain a stable external JSON format at this point, and will not have resources to adjust to the requirements of downstream projects. We experiment and break things a lot and reserve a right to change the JSON format at any time without warning.
So while you can submit a PR to change the format now (assuming there is no loss of functionality and correctness, we will likely accept it), I don't see it helping much in long term.
One thing that we considered to facilitate usage of JSON output is to provide a JSON schema for the format, but this would not help much in your use case.
Perhaps writing a middleware tool to ingest TSV output is a better solution for downstream projects? TSV output is much more stable - it follows semantic versioning. You can then maintain a stable output format of your liking, and to open-source the tool for the community who happen to use your particular toolset.
Also, Spark seems like a massive overkill to me. Internally our scientists use TSV with pandas/polars and it works decently well. Maybe this could also fit to your project?
If you have other ideas let us know.
from nextclade.
Thanks for your comments and suggestions.
I discussed with a few of our team members and we will look into using the TSV output in lieu of JSON.
We do want to thank you for your work and making this tool available.
This has enabled us to do research and help us made some contributions in the public health space.
from nextclade.
Related Issues (20)
- Incorporation of enterovirus dataset into nextalde docker container HOT 1
- Include aligned sequences and translations in ndjson
- ENH(nextalign cli): show default values in --help usage statement HOT 4
- Maximum Sequence Limit? HOT 2
- Web: Grey scale coloring for region/country/divison if scale not predefined in reference tree HOT 5
- How to decide if the reversionSubstitutions are valid variants or not and whether to keep them? HOT 2
- Direct Auspice SVG Download
- ENH: enlarge Visualization when more nucleotides/Codons even if mutations not fall in striclty adjacent codons but close enough to need a whole view. HOT 2
- linux-musl artefact gets slower rather than faster when parallelizing in contrast to gnu HOT 5
- Parsing PCR primers HOT 1
- how many SARS-COV-2 sequences can nextclade handle in a MSA file? HOT 8
- Is there any example for handling HIV data? HOT 2
- Web crash: The target <seqName> could not be identified in the dom HOT 2
- Show "browser not supported HOT 1
- Show "browser not supported" modal only once per session HOT 3
- if the qc.overallStatus of my sequences are mediocre, can we keep them for next step analysis? HOT 5
- web-based nextclade issue when using another reference HOT 8
- web(minor): when customizing dataset files, it always says "pasted sequences" even if the field is for tree HOT 2
- I upload 1490 sequences to nextclade, and upload to auspice.us, why it shows me 4255 sequences? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextclade.