raystack / compass Goto Github PK
View Code? Open in Web Editor NEWCompass is an enterprise data catalog that makes it easy to find, understand, and govern data.
Home Page: https://compass-raystack.vercel.app/
License: Apache License 2.0
Compass is an enterprise data catalog that makes it easy to find, understand, and govern data.
Home Page: https://compass-raystack.vercel.app/
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
We need to introduce a new grpc
interface. The grpc
interface could be used later by cli
when we are building it.
Describe the solution you'd like
Approach should be discussed in this issue.
Is your feature request related to a problem? Please describe.
I have a field description
and ownership
in every record. I also expect to have some tags populated in every record. I want user to be able to do search by
Describe the solution you'd like
API Changes
The existing Search API is using HTTP GET and accept the search query and other filter params in the query param like this.
/search/?text=<text>&filter.environment=integration&filter.landscape=vn&filter.landscape=th
Comparing to the existing search that does search to all available datasets and fields, searching by description
, owner
, and tags
would only consider specific field
of the data to search. To support that, we could add a new query param called search-by
or searchby
or within
. The value of this would be the field name in the dataset with elasticsearch
-like accessing field format.
For example:
Given a schema of a dataset like this
{
"urn": "a-urn",
"name": "a record",
"service": "bigquery",
"description": "a description",
"data": {
"properties": {
"attributes": {
"dataset": "a_dataset",
},
"labels": {
"created_by": "table creator"
}
}
}
/search/?text=<text>&searchby=description
/search/?text=<text>&searchby=data.properties.attributes.dataset
Is your feature request related to a problem? Please describe.
Currently we have no defined standard to raise an issue regarding feature request or bug in this repo. In other odpf repo (meteor for example), we have ISSUE_TEMPLATE
to help us standardize Issue format.
Describe the solution you'd like
Add github ISSUE_TEMPLATE
in columbus repo like the one we define in meteor.
Is your feature request related to a problem? Please describe.
In #88, we add more filtering feature in Get Assets API. However this is only for HTTP API, we need to replicate this feature in grpc since we will deprecate HTTP API later and use grpc-gateway instead.
Describe the solution you'd like
Is your feature request related to a problem? Please describe.
Pagination feature is already implemented in v1beta1 assets api , but filtering by certain fields are still not available.
Describe the solution you'd like
Add filtering by certain fields by handling columbus data in structured way.
Is your feature request related to a problem? Please describe.
When doing lint with golangci-lint
. I found there are several linter warning. We could fix all warning to avoid some unwanted bugs or unintended behaviour. After fixing all lint warning/error, we could add a new lint
in github workflow.
Describe the solution you'd like
Fixing every line of code that throw lint warning/error when using golangci-lint
.
Is your feature request related to a problem? Please describe.
Columbus always returns all records for a given type when fetching. This could take really some time since some of my records inside a type could be more than 5k.
Describe the solution you'd like
New querystring to fetch only a certain size and offset with default size to e.g.20
example request
?from=10&size=20
example response
{
"data": [] // records
"total": 100 // all available records (this will help with pagination in clientside)
}
Is your feature request related to a problem? Please describe.
I have two sources of updating an asset with different information. I don't want each update to overwrite each other.
Both sources will update different fields inside asset.data
field.
Describe the solution you'd like
I would like a new API to patch an Asset instead of fully updating it.
Solution:
[PATCH] /v1beta1/assets
{
"urn": "some-urn", // this is required to identify an asset
"type": "table", // this is required to identify an asset
"service": "bigquery", // this is required to identify an asset
"data": {
"fieldFromSource1": "some-value",
}
}
Is your feature request related to a problem? Please describe.
We want to introduce user aware information in columbus.
Describe the solution you'd like
As a starting point, we need to store user information. In Columbus, we already have a DB storage. We could store user information in the DB and introduce our own internal user id instead of using user id from an external system. To support the integration with external systems, in general, we just store a one-to-one mapping between internal user id to external user id and vice versa by creating a new User table as an adapter. We can use email as the external unique identifier. This will make Columbus less decoupled to the external systems. To do table lookup faster, we could create an index for the email column.
user | data type | sample value |
---|---|---|
id (PK) | UUID | 11234-4214214 |
email (UNIQUE) | STRING | [email protected] |
provider | STRING | shield |
created_at | TIMESTAMP | 12345667 |
updated_at | TIMESTAMP | 12345667 |
Is your feature request related to a problem? Please describe.
I want to star/bookmark a resource to be revisited again later and able to fetch all of my starred resources.
Describe the solution you'd like
Is your feature request related to a problem? Please describe.
I want to star/bookmark a resource to be revisited again later and able to fetch all of my starred resources.
Describe the solution you'd like
GET /v1/assets/{asset_id}/stargazers (List stargazers)
GET /v1/users/{user-id}/starred(List assets starred by some user)
trying to compile columbus with go, we got the following errors for each try/proxy conf.
go version
go version go1.16.5 windows/amd64
go env GOPROXY GONOPROXY
https://proxy.golang.org,direct
cd path\to\columbus
make
go build -ldflags "-X main.Version=" "github.com/odpf/columbus"
go: github.com/PaesslerAG/[email protected]: Get "https://proxy.golang.org/github.com/%21paessler%21a%21g/jsonpath/@v/v0.1.1.mod": read tcp 10.20.3.143:61982->64.233.177.141:443: wsarecv: An existing connection was forcibly closed by the remote host.
make: *** [Makefile:10: build] Error 1
go env -w GONOPROXY=https://github.com
go env GOPROXY GONOPROXY
https://proxy.golang.org,direct
https://github.com
make
go build -ldflags "-X main.Version=" "github.com/odpf/columbus"
go: github.com/PaesslerAG/[email protected]: Get "https://proxy.golang.org/github.com/%21paessler%21a%21g/jsonpath/@v/v0.1.1.mod": read tcp 10.20.3.143:62019->64.233.177.141:443: wsarecv: An existing connection was forcibly closed by the remote host.
make: *** [Makefile:10: build] Error 1
go env -w GOPROXY=direct
go env GOPROXY GONOPROXY
direct
https://github.com
make
go build -ldflags "-X main.Version=" "github.com/odpf/columbus"
go: github.com/PaesslerAG/[email protected]: reading github.com/PaesslerAG/jsonpath/go.mod at revision v0.1.1: unknown revision v0.1.1
make: *** [Makefile:10: build] Error 1
go env -u GONOPROXY
go env -w GOPROXY=http://my.company.proxy:port
make
go build -ldflags "-X main.Version=" "github.com/odpf/columbus"
go: github.com/PaesslerAG/[email protected]: reading http://my.comany.proxy:port/github.com/%21paessler%21a%21g/jsonpath/@v/v0.1.1.mod: 403 Forbidden
make: *** [Makefile:10: build] Error 1
Is your feature request related to a problem? Please describe.
The existing elasticsearch integration test require us to spin up docker in our local machine and set the ES_TEST_SERVER_URL
config to the elasticsearch host.
If we don't set the ES_TEST_SERVER_URL
config, everytime we run elasticsearch integration test, it will autocreate a new elasticsearch instance but once the test is done, the container is still left behind and doesn't get cleaned up.
Describe the solution you'd like
ES_TEST_SERVER_URL
config, we need to clean up elasticsearch container after doing integration testIs your feature request related to a problem? Please describe.
Once we have versioned metadata in #45 , we need some APIs to get the versioned metadata/assets.
Describe the solution you'd like
GET /v1/assets/{asset_id}/versions
GET /v1/assets/{asset_id}/versions/{version_num}
Is your feature request related to a problem? Please describe.
I want to see all activities done on the record such as creation
, update
, bookmarked
, issue creation
.
Describe the solution you'd like
Compass to store all record's activities.
Is your feature request related to a problem? Please describe.
There are intermittent errors when running github action lint with golangci-lint caused by timeout. The default timeout is 1m0s. We need to increase the timeout to give extra time for linter to work.
Describe the solution you'd like
Add timeout flag in golangci-lint
script
golangci-lint run --timeout 5m
Describe the bug
badge.svg
not found in Readme file
To Reproduce
In https://github.com/odpf/columbus, badge.svg
image is breaking
Expected behavior
In https://github.com/odpf/columbus, badge.svg
image is shown
Is your feature request related to a problem? Please describe.
Comments is needed in discussion feature described in #47
Describe the solution you'd like
{
"id": 1,
"body": "body here",
"owner": {
"id" : "1234-5678",
"Email": "[email protected]"
}, // User
"created_at" : timestamp,
"updated_at": timestamp
}
Column Name | Type | Example |
---|---|---|
id (PK) | serial | 1 |
discussion_id (FK) | serial | 1 |
body | text | This body could be written in markdown format |
owner | uuid | 1234-5678-9123 |
created_at | timestamp | ย |
updated_at | timestamp | ย |
Is your feature request related to a problem? Please describe.
The behaviour of ingesting a new asset is currently possible with 2 APIs PUT /v1beta1/assets
and PATCH /v1beta1/assets
. The difference is PATCH
api could support per-field patching while PUT
api will override the existing data. Currently we don't really need to have PUT
behaviour. We could remove it to avoid maintenance overhead.
Describe the solution you'd like
Remove PUT /v1beta1/assets
API
Additional context
We also need to remove the API definition in proton too. Right now it is fine to remove it in proton since we are still in the development phase.
Is your feature request related to a problem? Please describe.
Currently, a Type is a resource in Columbus. This means that type is dynamic in a way that users can just create any types they want and put records under it. We want to enforce Columbus own types (Table, Dashboard, Topic, Job) since most of the metadata fall in those categories.
Describe the solution you'd like
We need to change type from a resource to an enum/hardcoded type instead.
Detailed Tasks
Is your feature request related to a problem? Please describe.
Record is a resource is an asset in Columbus. Sometimes those terms are confusing. We need to make it consistent and make sure every metadata in Columbus is called asset(s)
.
Describe the solution you'd like
Update all record/resource terms to asset(s)
Tasks
GET /v1/types/{type_name}/records
to GET /v1/assets?type={type_name}
PUT /v1/types/{type_name}/records
to PUT /v1/assets with body {"type": type_name, "metadata": records}
Is your feature request related to a problem? Please describe.
We currently have an API to get resources belong to a caller (user) with API /v1beta1/user/starring
and /v1beta1/user/discussions
.
But in guardian the API is v1beta1/me
. I am thinking we can make it consistent across org by updating columbus API from /user
to /me
Describe the solution you'd like
Updating columbus API from /user
to /me
Is your feature request related to a problem? Please describe.
Columbus does not have its own dedicated register/login flow to manage users. Regardless, we need to store information to develop user awareness features.
Describe the solution you'd like
There is one option of entry points when/where we store external user id (email) and generate internal user id at the first time. For all API, we could accept an identity header (configurable via config yaml e.g. Columbus-User-Email) and the value of it is the external user identity (e.g. email) if the user does not exist in our DB.
Tasks
Is your feature request related to a problem? Please describe.
Integrate coveralls to track and manage code coverage
Describe the solution you'd like
Integrate coveralls to track coverage and add coverage badge in README.md
Is your feature request related to a problem? Please describe.
The existing implementation of tag-template feature in columbus is still using the old approach (record) with record type
and record urn
as the main resource identity. But now, we are calling resource in columbus as asset
and only has a single identity called asset id
. This makes tag-template feature inconsistent with other features in columbus.
Describe the solution you'd like
Migrate the identity that uses by tag-template from record type
and record urn
to asset id
Additional context
Some changes that are required are
Description | From | To |
---|---|---|
Create an asset tag | POST /v1beta1/tags/ |
POST /v1beta1/tags/assets |
Get,Update,Delete a tag of an asset | GET/PUT/DELETE /v1beta1/tags/types/{type}/records/{record_urn}/templates/{template_urn} |
GET/PUT/DELETE /v1beta1/tags/assets/{asset_id}/templates/{template_urn} |
Get all tags of an asset | GET /v1beta1/tags/types/{type}/records/{record_urn} |
GET /v1beta1/tags/assets/{asset_id} |
PR in proton is here
record_urn
and record_type
asset_id
with text
typetags_idx_record_urn_record_type_field_id
to tags_idx_asset_id_field_id
Describe the bug
Elasticsearch is silently dropping ingested metadata when there is a data type conflict in properties.attributes
field. Columbus returns 200 but the metadata is not ingested.
To Reproduce
Steps to reproduce the behavior:
dashboard
dashboard
index with properties.attributes.id
field is in integer e.g. 1234
dashboard
index with properties.attributes.id
field is in string e.g. "df431-54abf42-xxxx"
Expected behavior
Ingesting different kind of metadata that has the same type should always be succeed.
Additional context
This is happened when we tried to ingest metabase
and tableau
dashboard data. We ingest metabase
metadata first, then tableau
data. tableau
data is not ingested to elasticsearch because it has id
in string and metabase
has id
in long.
Is your feature request related to a problem? Please describe.
Having an API with user name for GET /users/{username}/stargazers
would be more proper way to do. But right now, username is still WIP and for temporary usage, we could expect email as an identity in the API.
Describe the solution you'd like
Is your feature request related to a problem? Please describe.
I want to create issues on a resource so that other users can see it.
Describe the solution you'd like
Is your feature request related to a problem? Please describe.
We need to update all Columbus APIs v1
to v1beta1
for better versioning and consistent with assets version in proton.
Describe the solution you'd like
Update all columbus v1
to v1beta1
Is your feature request related to a problem? Please describe.
Current meteor are still using manually generated mocks and creating a new mock could take some effort that can be easily avoided using auto generated mocks.
Describe the solution you'd like
Use testify/mockery to auto generate mocks.
Is your feature request related to a problem? Please describe.
We come to a conclusion to rename columbus
to compass
. There are several changes that we need to do.
Describe the solution you'd like
overview.svg
imagebuild_dev.yml
.gitignore
.goreleaser.yml
Dockerfile
& Dockerfile.dev
README.md
Makefile
config.yaml.example
docker-compose.yaml
add Service field to RecordV2 to store metadata source e.g. bigquery
, kafka
, etc
Is your feature request related to a problem? Please describe.
I have a field in my resources called total_usage
as integer. I want these resources with higher total_usage
value to have higher score so it will show on top level page when searching via /v1/search
.
Describe the solution you'd like
Allow boosting score using fields on the resource in Search API (/v1/search
).
Is your feature request related to a problem? Please describe.
I need the ability to tag a resource to give more context to a resource. e.g. tagging a resource with a deprecated/sensitive tag.
Describe the solution you'd like
https://github.com/odpf/dexter has a good tagging feature that would be nice if Columbus has the similar feature.
Is your feature request related to a problem? Please describe.
I have a record that contains a schema which changes I would like to keep track of. This would allow me to see how my schema changes over time.
Describe the solution you'd like
I want Columbus to version records whenever there is any changes to it. And also allows users to fetch and see all the previous versions.
Is your feature request related to a problem? Please describe.
Right now discovery context is tightly coupled with Record model. This could get complex fast if we want to add more features to a Record later. And the one incoming is tagging
feature from Dexter.
Describe the solution you'd like
Move discovery to its own package that depends on Record package. This would make record to be clean and decoupled from discovery context.
Is your feature request related to a problem? Please describe.
Right now it takes lots of memory and time when booting because Columbus fetches all assets from Discovery data, build the lineage graph from it, then stores the graph in memory.
Describe the solution you'd like
Use a proper lineage storage to avoid building and storing lineage on memory. It is possible to use Neo4j or even postgres for this.
Is your feature request related to a problem? Please describe.
Comments feature in discussions is currently just a simple list of text related with the respected discussion. We can add more features to the comments to give better user interaction and experience.
Describe the solution you'd like
Add more features to the comments
q&a
type,
Is your feature request related to a problem? Please describe.
Currently we are using GORM for:
It works fine at the moment, but right now we are depending on GORM auto migrate for db migration and if in the future we want to change our postgres client, it would harder to switch, especially if we have lots of table to migrate.
Describe the solution you'd like
For Database Migration, we can use https://github.com/golang-migrate/migrate
For ORM, we can use simple https://github.com/jmoiron/sqlx as it is a dedicated postgres client and https://github.com/jackc/pgx as its driver
Is your feature request related to a problem? Please describe.
I want to star/bookmark a resource to be revisited again later and able to fetch all of my starred resources.
Describe the solution you'd like
starring | type | Sample Value |
---|---|---|
id (PK) | SERIAL | 1 |
user_id (FK) | UUID | 11234-4214214 |
asset_id (FK) | UUID | 11234-4214214 |
created_at | TIMESTAMP | 12345667 |
updated_at | TIMESTAMP | 12345667 |
Is your feature request related to a problem? Please describe.
I want to be able to search based on user's past behaviour (table usage). The more table is being. used, the more relevant it is shown in the search results.
Describe the solution you'd like
Counting in table usage in relevancy only makes sense if we are searching in table
type context. Therefore, this feature only works in table
index/resource and wouldn't apply in universal/global context (where we search in all indices).
Basically, there are 2 possible implementation details to consider table usage in the relevancy parameter: implicit or explicit.
Implicit
Everytime users search within table
index, we always consider table usage as part of relevancy (e.g. always count it as boosting value)
Explicit
We could give user control whether they want to count table usage as the relevancy or not everytime users search within table
index by adding more param (query param) in Search API. The new query param could be called as sortby
rankby
with value usage
Proposed Decision
For now, we could go with the Explicit one. We could give user control whether they want to count table usage as the relevancy.
Is your feature request related to a problem? Please describe.
Columbus v1beta1 assets API should support nesting query params in data and query filters.
Describe the solution you'd like
The columbus should support nesting query in Data and Query Filtering :
For Data Filter : data[entity.properties.landscape]=internal
For Query Filter : q=internal&q_fields=data.entity.properties.landscape
Out of scope
Filtering array data in data
field
e.g.
"data": {
"key1": ["value1", "value2", "value3"]
}
Describe the bug
odpf.gitbook.io/compass/ not showing the correct compass docs but redirect request to https://odpf.gitbook.io/raccoon/compass instead.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The link https://odpf.gitbook.io/compass/ should show the correct gitbook compass docs
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Need @ravisuhag help to check this ๐๐ผ
Is your feature request related to a problem? Please describe.
I want columbus to support auto suggestions (search as you type)
Describe the solution you'd like
Here we are separating the problem into two: API Changes
and Search Logic
Define a new API
/suggestion/?text=<text>
and return list of suggested list of data with fields: urn
, name
, type
, and service
.
Use existing search API with a new boolean query param called suggestion
/search/?text=<text>&suggestion=true
Columbus will return list of suggested list of data with fields: urn
, name
, type
, and service
if suggestion=true
, else will return the search results as it is
Search logic for suggestion will impact how relevant are our suggestions. In this case, it is flexible to decide which one should we pick as for now, we could optimize and iterate it later. Reference.
Term suggester. The one that provides "similar" term, based on the edit distance. It provides suggestions based on data in the index, there are a lot of knobs and turns to tune it.
Phrase suggester. It's very similar to what term suggester is doing, but taking into account a whole phrase.
Completion suggester or search-as-you-type functionality. If first two are doing something like did you mean functionality or spellchecking, based on the actual terms in the index. This one should "show" you some 5 or 10 relevant docs, while user is typing, and for this one you need to manually index field of suggestion type, where later ES will do a fast lookup.
Context suggester. This one is a continuation of the completion suggester, with the idea of the some context where user is coming from (geo) or if engine wants to boost some company over another, just because they are paid for it, or something like this. In this case you also need to manually index additional data.
Proposed Approach
Is your feature request related to a problem? Please describe.
Currently we are relying on email
for external identifier. External identifier will be used on API URL path param. Since email
is a PII, it is not proper to pass it to path params.
Describe the solution you'd like
Instead of an email
, we can rely on more generic information like uuid
. We still collecting email
in columbus but that is not a primary external user identifier.
Changes needed are:
uuid
with type text
and nullable, and create an index for uuid
uuid
Describe the bug
Whenever user hits endpoint that is not registered, Columbus returns
status =.404
content-type = "text/plain"
body = "404 page not found"
This does not align with Columbus error body payload which is
{
"reason": "some-reason"
}
To Reproduce
Steps to reproduce the behavior:
/v1/nonexisting-api
Expected behavior
Return below payload when hitting non existing api
{
"reason": "Route not found"
}
Is your feature request related to a problem? Please describe.
I want to fetch all assets that I have created. Right now [GET] /v1beta1/assets
have a default size value of 20
. I am able to use big number for now but I don't think it is scalable.
Describe the solution you'd like
Make [GET] /v1beta1/assets
to disregard size limit if size is not given.
Is your feature request related to a problem? Please describe.
Discussions feature is already implemented in #47 , we need to add users API that list down all discussions related with specific user (recognized by User email header)
Describe the solution you'd like
/user/discussions
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.