ground-context / ground Goto Github PK
View Code? Open in Web Editor NEWAn open-source, vendor-neutral data context service.
Home Page: http://www.ground-context.org
License: Apache License 2.0
An open-source, vendor-neutral data context service.
Home Page: http://www.ground-context.org
License: Apache License 2.0
Looks like we import java.util.Optional a lot but don't use it.
Suggests that we don't have very good linters running on the code. Would be good to put something into place to run automagically.
Hotels.com team identified a bug, but details needed.
According to the Ground Wiki, it shows that we can use the client library to generate the data context.
GroundClient client = new GroundClient(host, port);
int managerId = client.createNode("manager");
int engineerId = client.createNode("engineer");
Is the GroundClient packages? Also, When I want to use POST API to generate the data, it always reply bad request. Then, where can I get start with?
Thanks.
Wrong link to CIDR17.pdf is confusing new contributors
"docs/CIDR17.pdf" should be changed to "resources/docs/CIDR17.pdf".
General cleanup, deduplication of code, addressing of TODOs, etc. Also, write proper documentation for everything.
Seems like 4d8dc40 broke the Neo4j builds, as it doesn't seem like NEO4J_VERSION
is being populated in the scripts.
I run this project with this result:
"Action not found
For request 'Get /'
These routes have been tried ,in this order:"
what is the situation?
Add an API that allows for retrieval of all entities that are tagged with something.
As a part of the database setup scripts, automatically index Tag
s based on the key in order to allow quick lookups for these queries.
Build a wrapper library for relational metadata. The wrapper consists of a set of Ground Structure
s that contain information about relational entities as well as a set of import and export Python(?) scripts that
This should cover things such including (but not limited to):
Ideally, we'd like the the Hive Metastore integration in #7 to use this wrapper and augment it with Hive-specific information as necessary.
Set up pipeline that extracts HDFS metadata from Gobblin, writes it into a Kafka topic, and reads from the Kafka topic to ingest into Ground. This notifies us of new files that are created in HDFS. Eventually, we want to take this metadata and spit these events off to a featurization or parsing Aboveground service that extracts additional metadata from the files detected by Gobblin..
The second half of the pipeline that reads from Gobblin and writes into Ground will also be used by the Git integration pipeline.
Build Ground implementation of Hive's RawStore
interface in order to allow Ground as a drop-in replacement for the Hive Metastore.
For the MVP, we won't be providing versioning semantics for Hive queries, but we do need to figure out how we're going to change the version of metadata that Ground returns.
Hi,
this is really interesting work. I was just playing with the Docker containers and noticed that the example for running the code here https://hub.docker.com/r/groundcontext/ground/ states that the open port should be 8080, but in reality it should be 9090 and 9191.
docker run -d --rm --name ground -p 9090:9090 -p 9191:9191 --link neo:neo groundcontext/ground
Cheers, Jan
We should decide what the reference model is here -- Protobufs is one end of a spectrum, how far do we go?
The pom.xml file was removed in commit 5959981 breaking the maven build instructions in the readme and on ground-context.org.
I see sbt artifacts in the project - if a switch to sbt was intentional then the documentation needs to be updated; else pom.xml should be put back.
Right now, when creating JSON requests empty maps (for parameters
for RichVersion
s and tags
for everything) must be specified because they're being set to null. They should be autopopulated to empty Map
s if null
is passed in.
The getting started wiki page makes it pretty easy to get started with - however, there's a slight hiccup with the docker images and the github plugin image (the one that runs 'python parsegitlog.py')
the config.ini file for kafka and ground services both point to localhost, but in the docker setup they're linked to 'kafka' and 'ground' for hostnames. Once I exec'ed into the image and fixed the config file it worked.
Without it, you'll get an error like so:
docker logs ea212d2e6ad9
Traceback (most recent call last):
File "parsegitlog.py", line 121, in
bootstrap_servers=[config['Kafka']['url'] + ":" + config['Kafka']['port']])
File "/usr/local/lib/python3.5/site-packages/kafka/consumer/group.py", line 284, in init
self._client = KafkaClient(metrics=self._metrics, **self.config)
File "/usr/local/lib/python3.5/site-packages/kafka/client_async.py", line 202, in init
self.config['api_version'] = self.check_version(timeout=check_timeout)
File "/usr/local/lib/python3.5/site-packages/kafka/client_async.py", line 791, in check_version
raise Errors.NoBrokersAvailable()
kafka.errors.NoBrokersAvailable: NoBrokersAvailable
Please let me know , if there is a proposal to treat Solr as backend store besides elasticsearch. I would like to contribute in that area.
The license is specified here. Should be easy to script.
In order to remove some of the friction in setting up and playing with a Ground Alpha, we should have a Vagrant instance or Dockerfile(s) that set up HDFS, Kafka, Gobblin, and Ground and link them together. Users won't have to install any software to get started with Ground.
Put together a tutorial that briefly explains what Ground does and explains the integrations we've built. Get users started with the Docker / Vagrant instance set up by #13.
Demonstrate the usefulness of Ground by having users load canonical Hive example data, run simple Hive queries, and rewind time using Ground (i.e., the functionality provided in #7). We can then show "time travel" queries using Ground's older versions of Hive metadata. This obviously requires Hive integration from #7 and also requires HDFS integration from #8 and git integration from #9.
In addition, this demo should come with a simple demonstration of lineage (maybe a graph we show them that they can recreate with existing metadata) as well as a simple Aboveground service that does something like duplicate file detection.
Lastly, we refer them to the wrapper layers in #11 and #12 to show them "canonical" examples of building wrapper libraries. We need some documentation with best practices for building your own, so they can ingest metadata from their environment as well as tips for writing an Aboveground service that somehow consumes metadata from Ground.
Document all the external facing APIs and use something like Swagger to automatically generate API docs for us.
The wiki page [https://github.com/ground-context/ground/wiki] links to https://github.com/ground-context/ground/blob/master/CIDR17.pdf and returns Github's 404 page.
This integration is has three parts.
This depends on the pipeline that reads from Kafka and writes into Ground specified in #8.
Currently, nulls are serialized as strings instead of Postgres nulls.
From:
http://www.ground-context.org/wiki/index.html
JIRA link is broken
Links to:
https://ground.atlassian.net/projects/GROUND/issues
Result:
404 Page Not found
I am trying to setup the latest version v0.1.2 on Ubuntu 16.04.
As per install/getting started steps: http://www.ground-context.org/wiki/index
After the unzip of v0.1.2.tar.gz, I should be able to start the postgresql as: ./bin/ground-postgres.*
But it looks like either there is some step missing or the package do not have postgresql.
Please clarify.
Stumbled upon this from the Wherehows page, interesting project, like the fundamental approach.
Are there any community communication channels to facilitate communication? I have questions to determine if the project would be useful for our organization / development / evaluation.
https://github.com/ground-context/ground/wiki - the API documentation link on this page appears to be broken.
Build a wrapper library for file system metadata. The wrapper consists of a set of Ground Structure
s that contain information about file system entities as well as a set of import and export Python(?) scripts that
This should cover things such including (but not limited to):
fs.stat
informationIdeally, we'd like the the HDFS integration in #8 to use this wrapper and augment it with HDFS-specific information as necessary.
It'd be very helpful to get a sense of an application using ground. Maybe it'd be possible to post the code for the impact analysis experiment? That looks like it's not terribly complicated on external dependencies and should be easy to interpret.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.