Giter VIP home page Giter VIP logo

gitbase's Introduction

gitbase GitHub version Build Status codecov GoDoc Go Report Card

gitbase, is a SQL database interface to Git repositories.

This project is now part of source{d} Community Edition, which provides the simplest way to get started with a single command. Visit https://docs.sourced.tech/community-edition for more information.

It can be used to perform SQL queries about the Git history and about the Universal AST of the code itself. gitbase is being built to work on top of any number of git repositories.

gitbase implements the MySQL wire protocol, it can be accessed using any MySQL client or library from any language.

src-d/go-mysql-server is the SQL engine implementation used by gitbase.

Status

The project is currently in alpha stage, meaning it's still lacking performance in a number of cases but we are working hard on getting a performant system able to process thousands of repositories in a single node. Stay tuned!

Examples

You can see some query examples in gitbase documentation.

Motivation and scope

gitbase was born to ease the analysis of git repositories and their source code.

Also, making it MySQL compatible, we provide the maximum compatibility between languages and existing tools.

It comes as a single self-contained binary and it can be used as a standalone service. The service is able to process local repositories and integrates with existing tools and frameworks to simplify source code analysis on a large scale. The integration with Apache Spark is planned and is currently under active development.

Further reading

From here, you can directly go to getting started.

License

Apache License Version 2.0, see LICENSE

gitbase's People

Contributors

agarciamontoro avatar ajnavarro avatar ash-shaun avatar bake avatar bzz avatar campoy avatar carlosms avatar dczajkowski avatar dennybiasiolli avatar dpordomingo avatar eiso avatar erizocosmico avatar ferhatelmas avatar geekysrm avatar jfontan avatar kenshaw avatar kuba-- avatar lerentis avatar lwsanty avatar mcarmonaa avatar mcuadros avatar meyskens avatar nomeyer avatar prog1dev avatar quasilyte avatar smola avatar sumbach avatar tsolakoua avatar vmarkovtsev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gitbase's Issues

git: tree_entries table crashes

➜  gitql git:(f028104) ✗ gitql query 'SELECT size, name FROM blobs, tree_entries WHERE hash = entry_hash LIMIT 5'
SELECT size, name FROM blobs, tree_entries WHERE hash = entry_hash LIMIT 5
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x4976d0]

goroutine 1 [running]:
panic(0x832420, 0xc420014130)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/gitql/gitql/git.(*treeEntryIter).Next(0xc4202a67e0, 0xac7a80, 0xc4202a6bc0, 0x0, 0x0)
	/home/smola/dev/go/src/github.com/gitql/gitql/git/tree_entries.go:79 +0x50
github.com/gitql/gitql/sql/plan.(*crossJoinIterator).fillRows(0xc420342be0, 0xc42001ee60, 0x18)
	/home/smola/dev/go/src/github.com/gitql/gitql/sql/plan/cross_join.go:106 +0x38
github.com/gitql/gitql/sql/plan.(*crossJoinIterator).Next(0xc420342be0, 0x20, 0x7ffb5f379000, 0xc4202a68a0, 0x4)
	/home/smola/dev/go/src/github.com/gitql/gitql/sql/plan/cross_join.go:75 +0x365
github.com/gitql/gitql/sql/plan.(*filterIter).Next(0xc4202a6800, 0xc4201f24d0, 0x1, 0x1, 0x2)
	/home/smola/dev/go/src/github.com/gitql/gitql/sql/plan/filter.go:55 +0x38
github.com/gitql/gitql/sql/plan.(*limitIter).Next(0xc4202a6820, 0x4, 0x2, 0x1, 0x8b4759)
	/home/smola/dev/go/src/github.com/gitql/gitql/sql/plan/limit.go:61 +0x4c
github.com/gitql/gitql/sql/plan.(*iter).Next(0xc4202a6840, 0xc4202a6880, 0x2, 0x2, 0x2)
	/home/smola/dev/go/src/github.com/gitql/gitql/sql/plan/project.go:77 +0x38
main.(*CmdQuery).printQuery(0xc420011e00, 0xc4202717c0, 0x2, 0x2, 0xac6180, 0xc4202a6840)
	/home/smola/dev/go/src/github.com/gitql/gitql/cmd/gitql/query.go:102 +0x1fc
main.(*CmdQuery).executeQuery(0xc420011e00, 0x0, 0x0)
	/home/smola/dev/go/src/github.com/gitql/gitql/cmd/gitql/query.go:89 +0x33a
main.(*CmdQuery).Execute(0xc420011e00, 0xc42014c380, 0x0, 0x2, 0x1, 0x2)
	/home/smola/dev/go/src/github.com/gitql/gitql/cmd/gitql/query.go:39 +0x69
github.com/jessevdk/go-flags.(*Parser).ParseArgs(0xc420076780, 0xc42000c310, 0x2, 0x2, 0x2, 0x1, 0xc42001e630, 0xc4200fa780, 0xc4200fa6c8)
	/home/smola/dev/go/src/github.com/jessevdk/go-flags/parser.go:316 +0x8e6
github.com/jessevdk/go-flags.(*Parser).Parse(0xc420076780, 0x8b79fd, 0x7, 0x8c3a7d, 0x1d, 0x0)
	/home/smola/dev/go/src/github.com/jessevdk/go-flags/parser.go:186 +0x74
main.main()
	/home/smola/dev/go/src/github.com/gitql/gitql/cmd/gitql/main.go:15 +0x2e8

git: tags table is always empty

➜  gitql git:(f028104) ✗ gitql query 'SELECT * FROM tags'
SELECT * FROM tags
+------+------+--------------+-------------+-------------+---------+--------+
| HASH | NAME | TAGGER EMAIL | TAGGER NAME | TAGGER WHEN | MESSAGE | TARGET |
+------+------+--------------+-------------+-------------+---------+--------+
+------+------+--------------+-------------+-------------+---------+--------+

Make database structure table names readable

Right now the names are 2/3 characters:

type Database struct {
	name string
	cr   sql.Table
	tr   sql.Table
	rr   sql.Table
	ter  sql.Table
	br   sql.Table
	or   sql.Table
	rmr  sql.Table
}

Change the names to something that could be understood.

Panic executing: SELECT author_name from commits order by COUNT(*);

--> Executing query: SELECT author_name from commits order by COUNT(*);

panic: interface conversion: interface is string, not int32

goroutine 1 [running]:
panic(0x9839e0, 0xc420fb87c0)
	/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/gitql/gitql/sql.(*integerType).Compare(0xd5d788, 0x949d20, 0xc4201ef910, 0x949d20, 0xc429cec240, 0xc429cec240)
	<autogenerated>:45 +0x82
github.com/gitql/gitql/sql/plan.(*sorter).Less(0xc42811bf20, 0x0, 0x1792, 0x0)
	/home/antonio/work/src/github.com/gitql/gitql/sql/plan/sort.go:156 +0x1e5
sort.medianOfThree(0xcf59c0, 0xc42811bf20, 0x0, 0x1792, 0x2f24)
	/usr/local/go/src/sort/sort.go:74 +0x49
sort.doPivot(0xcf59c0, 0xc42811bf20, 0x0, 0xbc97, 0x53a887, 0xc42006c400)
	/usr/local/go/src/sort/sort.go:99 +0x601
sort.quickSort(0xcf59c0, 0xc42811bf20, 0x0, 0xbc97, 0x1f)
	/usr/local/go/src/sort/sort.go:188 +0x83
sort.Sort(0xcf59c0, 0xc42811bf20)
	/usr/local/go/src/sort/sort.go:222 +0x80
github.com/gitql/gitql/sql/plan.(*sortIter).computeSortedRows(0xc4219fe540, 0x1, 0xb)
	/home/antonio/work/src/github.com/gitql/gitql/sql/plan/sort.go:126 +0x292
github.com/gitql/gitql/sql/plan.(*sortIter).Next(0xc4219fe540, 0xc425fc24d0, 0xc4201ef610, 0xc42030d9e0, 0x4a19fc, 0xc425fc2480)
	/home/antonio/work/src/github.com/gitql/gitql/sql/plan/sort.go:92 +0xd9
github.com/gitql/gitql/sql/plan.(*iter).Next(0xc42571f940, 0xc4201ef600, 0x1, 0x1, 0x0, 0x0)
	/home/antonio/work/src/github.com/gitql/gitql/sql/plan/project.go:65 +0x38
main.(*cmdQueryBase).printQuery(0xc4201d6b40, 0xc42571f960, 0x1, 0x1, 0xcf2700, 0xc42571f940, 0xa34a17, 0x6, 0x0, 0x0)
	/home/antonio/work/src/github.com/gitql/gitql/cmd/gitql/query_base.go:65 +0x222
main.(*CmdShell).Execute(0xc4201d6b40, 0xc4201a96c0, 0x0, 0x1, 0x1, 0x1)
	/home/antonio/work/src/github.com/gitql/gitql/cmd/gitql/shell.go:75 +0xa5a
github.com/jessevdk/go-flags.(*Parser).ParseArgs(0xc420022a20, 0xc42000c5f0, 0x1, 0x1, 0x4, 0x2, 0xc42001c900, 0xc4200ab080, 0xc4200aaf48)
	/home/antonio/work/src/github.com/jessevdk/go-flags/parser.go:316 +0x8e6
github.com/jessevdk/go-flags.(*Parser).Parse(0xc420022a20, 0xa36250, 0x7, 0xa44624, 0x1d, 0x0)
	/home/antonio/work/src/github.com/jessevdk/go-flags/parser.go:186 +0x74
main.main()
	/home/antonio/work/src/github.com/gitql/gitql/cmd/gitql/main.go:16 +0x38b

Unexpected result executing: SELECT COUNT(*) as c FROM commits limit 10;

Executing the query without alias, the result is as expected:

!> SELECT COUNT(*) FROM commits;

--> Executing query: SELECT COUNT(*) FROM commits;

+----------+
| COUNT(*) |
+----------+
|    48279 |
+----------+

But if we add an alias:

!> SELECT COUNT(*) as c FROM commits limit 10;

--> Executing query: SELECT COUNT(*) as c FROM commits limit 10;

+------------------------------------------+
|                    C                     |
+------------------------------------------+
| 6fa4b393c01a84c9adf2e2435fba6de13227eabf |
| f6fe463165824f26efe6aaabaa352032f6f93886 |
| 2ba79c4f8ad74b87ef44dd692d46706adbb9e8d0 |
| d07b93103dc7e8bcba010541efa5b0a2394ea6a7 |
| 054bc50dde1a194bbd9a69a72004c2e18b19852f |
| deaab788af9a4f1ed8ed8193b20e3cffb1555b20 |
| 2cfc70f0de7c8902791d3b23a92d6462b9e11d72 |
| fc8af32b8ddaeddf12542cc233631b3dccf4724a |
| 27d5987585e08376915ca02ebb53cfc0a40a39f0 |
| bc01db39ce5bf4d227bc5ef6d9b95bb5f5390f5c |
+------------------------------------------+

commit_has_tree(commit_hash, tree_hash) UDF

We have ways of joining almost all tables in gitquery, except commits and trees.

  • repositories and remotes join by repo id.
  • repositories and refs join by repo id.
  • refs and commits join using history_idx udf.
  • commits and blobs join using commit_contains udf.
  • trees and blobs join using blob.hash and tree_entries.entry_hash.
  • trees and commits don't have anything to join.

We could have another UDF to perform the join between these two tables:

commit_has_tree(commit_hash, tree_hash)

For example, select commit messages with Go files:

SELECT message 
FROM commits 
  INNER JOIN tree_entries 
  ON commit_has_tree(hash, tree_hash) 
WHERE name LIKE '%.go';

If we add this UDF, I also propose to rename commit_contains to commit_has_blob, because commit can contain many things and is consistent with this naming.

Thoughts? /cc @mcarmonaa @jfontan @ajnavarro

sql: support NULL values

Add support for NULL values:

  • A field in a schema should have a "nullable bool" field.
  • Internal representation of types should use pointers (e.g. *string, not string).

sql: sqlparser dependency version problem

hi,

which youtube/vitess sqlparser version should I use to build gitql?

build from source:
gitql tag v0.3.0 with youtube/vitess branch release-2.1

sql/parse/parse.go:140: tn.Name.String undefined (type sqlparser.TableIdent has no field or method String)
sql/parse/parse.go:328: undefined: sqlparser.HexVal
sql/parse/parse.go:343: v.Name.Lowered undefined (type string has no field or method Lowered)

using go get:

ZhengdeMacBook-Pro:gitql hanzheng$ go get github.com/gitql/gitql
# github.com/gitql/gitql/sql/parse
sql/parse/parse.go:320: sqlparser.StrVal (type sqlparser.ValType) is not a type
sql/parse/parse.go:321: cannot convert v (type sqlparser.ValExpr) to type string
sql/parse/parse.go:324: undefined: sqlparser.NumVal
sql/parse/parse.go:326: cannot convert v (type sqlparser.ValExpr) to type string
sql/parse/parse.go:328: sqlparser.HexVal (type sqlparser.ValType) is not a type

thanks.

Experiencing trouble in installation

Hello. I am trying to use this tool but experiencing some trouble. When I input go get github.com/sqle/gitquery in my command line, the installation fails with following messages:

# github.com/sqle/gitquery
opt/go/src/github.com/sqle/gitquery/commits.go:52: cannot use cIter (type object.CommitIter) as type *object.CommitIter in field value:
    *object.CommitIter is pointer to interface, not interface
opt/go/src/github.com/sqle/gitquery/commits.go:65: i.i.Next undefined (type *object.CommitIter is pointer to interface, not interface)
opt/go/src/github.com/sqle/gitquery/commits.go:73: i.i.Close undefined (type *object.CommitIter is pointer to interface, not interface)

Since I am a total newbie in golang, I don't even know whether it is a bug or not. However, is this a bug? I am running go with go1.8.3 linux/amd64 in Ubuntu 16.04.

Thanks for reading. Looking forward to your response.

implement filter and column pushdown

We should implement filter and column pushdown once they land in go-mysql-server.

I don't think we have any computationally expensive columns yet, so we shouldn't care much about column pushdown, but we should about filter pushdown, which can reduce a lot the amount of data sent.

Build fails

go version - go version go1.8.1 linux/amd64

go get github.com/sqle/gitquery/cmd/gitquery

src/github.com/sqle/gitquery/cmd/gitquery/query_base.go:9:2: use of internal package not allowed

This kinda prevented us from showing the tool on MSR tool session.

sql/parser: integrate SHOW and DESCRIBE in the parser

Vitess SQL parser does not support SHOW and DESCRIBE. So we are currently matching them ad hoc before passing the query to the parser. They should be properly supported in the SQL parser, contributing it to Vitess or, if that's not possible, forking it.

Implement commit_contains(commit_hash, commit_blob)bool UDF

Depends on src-d/go-mysql-server#1

Previous comments:

This is missing the repo_id parameter, right?

After a talk we decided to do not add repo_id. The performance of that udfs will be improved using indexes. At the begining will be really slow.

So, if the repo_id is missing and the only things the UDF has are commit_hash and commit_blob, how are we supposed to retrieve that info?

Repository Pool does not have all repositories opened, right? So you can't just iterate them all until you find a match. The UDF should receive something with the repo associated to the given row or something along those lines. Otherwise, where is the UDF supposed to look for?

Given a commit hash, it will always contains the specified blob or not. In the future, we will have a bitmap index to be able to answer this kind of questions. Right now, the only way that we have to do it is iterate over all the repositories.

Also, if the commit is repeated in several repositories, it will appears n times on the result.

Also, you don't need to have all the repositories opened, you can iterate them and send commits per each repository, and filter that ones that does not match.

So, for each row that uses that UDF we have to iterate all repositories again?

Right now, yes. In the future it will be a simple query to an index. Also the UDF can be improved to be executed at the table iterator level, like another column. Doing this, you don't need to iterate over all the repositories per each column again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.