sourcegraph / scip Goto Github PK

View Code? Open in Web Editor NEW

234.0 234.0 32.0 1.97 MB

SCIP Code Intelligence Protocol

License: Apache License 2.0

Go 60.07% Rust 8.59% JavaScript 29.49% Python 0.15% Shell 1.70%

scip's Introduction

Docs • Contributing • Twitter • Discord

Sourcegraph makes it easy to read, write, and fix code—even in big, complex codebases.

Code search: Search all of your repositories across all branches and all code hosts.
Code intelligence: Navigate code, find references, see code owners, trace history, and more.
Fix and refactor: Roll out large-scale changes to many repositories at once and track big migrations.

Getting started

Development

Refer to the Developing Sourcegraph guide to get started.

Documentation

The doc directory has additional documentation for developing and understanding Sourcegraph:

Architecture: high-level architecture
Database setup: database best practices
Go style guide
Documentation style guide
GraphQL API: useful tips when modifying the GraphQL API
Contributing

License

This repository contains primarily non-OSS-licensed files. See LICENSE.

scip's People

Contributors

Stargazers

Watchers

scip's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

WARN: Fallback to renovate.json file as a preset is deprecated, please use a default.json file instead.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

chore(deps): update docker/build-push-action digest to e050dfa
chore(deps): update docker/login-action digest to 0d4c9c5
chore(deps): update ubuntu docker digest to 2e863c4
chore(deps): update dependency prettier to ^3.3.3
chore(deps): update dependency python to v3.12.4
chore(deps): update docker/metadata-action action to v5
chore(deps): update docker/setup-buildx-action action to v3
chore(deps): update docker/setup-qemu-action action to v3
chore(deps): update github artifact actions to v4 (major) (actions/download-artifact, actions/upload-artifact)

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

asdf

.tool-versions

golang 1.19.10

node 16.20.2

shellcheck 0.7.1

rust 1.71.0

python 3.11.9

cargo

bindings/rust/Cargo.toml

protobuf =3.2.0

pretty_assertions 1.2.1

dockerfile

dev/Dockerfile.bindings

ubuntu sha256:19478ce7fc2ffbce89df29fea5725a8d12e57de52eb9ea570890dc5852aac1ac

github-actions

.github/workflows/build-env-docker.yml

actions/checkout v3

docker/metadata-action v4

docker/setup-qemu-action v2

docker/setup-buildx-action v2

docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1

docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4

actions/upload-artifact v3

actions/attest-build-provenance v1

docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1

actions/download-artifact v3

docker/setup-buildx-action v2

docker/metadata-action v4

.github/workflows/formatting.yml

actions/checkout v3

.github/workflows/golang.yml

actions/checkout v3

.github/workflows/haskell.yml

actions/checkout v3

haskell/actions v2

.github/workflows/labeler.yml

github/issue-labeler v3.4

.github/workflows/project-board.yml

.github/workflows/protobuf-reprolang.yml

actions/checkout v3

docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1

.github/workflows/release.yml

actions/checkout v3

actions/checkout v3

wangyoucao577/go-release-action v1.40

.github/workflows/rust.yml

actions/checkout v3

.github/workflows/typescript.yml

actions/checkout v3

gomod

go.mod

npm

bindings/typescript/package.json

google-protobuf ^3.20.1

@types/google-protobuf 3.15.6

protoc-gen-ts 0.8.6

typescript ^4.9.0

package.json

prettier ^3.3.1

Check this box to trigger a request for Renovate to run again on this repository

usage of deprecated `Descriptor_Package` throughout our bindings

We should probably go through and handle Descriptor_Namespace instead, right?

CLI improvements: proper argument parsing + subcommands

We should introduce a subcommand structure

scip lsif --from=dump.lsif-typed --to=dump.lsif
scip lsif --from=dump.lsif --to=dump.lsif-typed
scip snapshot --output-directory=foobar

We could probably use the built-in flag package.

Let's break this up into two pieces:

Adding support for the conversion subcommand. (already implemented, with a slightly different structure)
Adding support for the snapshot subcommand. This is mostly implemented, but it needs to be exposed in a cross-language way. Right now, the implementation has some problems:
- It assumes that the language has prefix-based line comments, which is not always true (e.g. OCaml doesn't have line comments.)
- ~~It assumes that indentation doesn't matter and creates a difference in indentation between the input and output. However, I think it would be better to avoid that.~~ I think this is probably too complicated. Sublime does something similar in its tests, so I guess it is fine that the resulting code is not valid. 🤷
- It assumes that the indexer uses tabwidth=1, which seems unusual.
A "tricky" situation to consider here: Haskell commonly has definitions at the start of the line. Maybe we allow a --comment-syntax=<blah> argument, or [--line-comment-syntax=<blah> | --multiline-comment-syntax=<blah>]. A special string (say <content>) in the syntax will be replaced with the snapshot text. For multiline comments (either marked explicitly, or implicitly interpreted form --comment-syntax), we can coalesce them for readability. For Haskell's case, snapshots would end up looking like:
```
addOne :: Int -> Int
{-
^^^^^^ definition addOne
          ^^^ reference Int
              ^^ reference (->)
                 ^^^ reference Int
-}
```
(Ignore the above, on thinking about it a bit more, I think it would be way more complicated to implement than it is worth.)

Originally posted by @olafurpg in sourcegraph/sourcegraph#34983 (comment)

snapshot: use correct language-specific comment token for snapshot output

Noticed when trying out scip-ruby that the snapshot output uses // regardless of language, which is the wrong comment token for ruby. This used to be a thing in the old snapshot code in scip-java I believe, so itd be cool to reintroduce it here.

Missing `property:"definitions"` and `property:"references"` attributes on `"item"` edges

The output of lsif-node includes item edges with a "property" field

{"id":41229,"type":"edge","label":"item","outV":14627,"inVs":[39643],"document":38205,"property":"references"}

The output of scip convert does not include the "property" field.

Add new `SymbolRole.MainProgram` to indicate an executable entrypoint

It would be nice if there was a way for indexers to indicate locations with main entrypoints. This would make it possible to build a UI where users can easily discover main entrypoints in a given codebase.

CI: TypeScript typecheck job is not catching type errors

I was updating scip-typescript to use the latest copy of scip.ts and found a regression where this file no longer typechecks successfully. I was able to fix the problem by downgrading our protoc generator version in #112

This issue is a followup to identify why the typecheck CI job didn't fail when we bumped up the version of the protoc generator

Ignore `Document.symbols` that have no definition inside that document

Currently, the emitted LSIF has weird behavior if you add Document.symbols that have no definition occurrences in that document. We should ignore these symbols, and optionally print a warning to notify that these symbols are ignored. The warning message could be formatted like "did you intend to put this symbol into Index.external_symbols?

Missing `SyntaxKind`s

PunctuationComma?
LinkLiteral? (for markdown links)
CodeLiteral? (for inline markdown literals)

Let's discuss here and I'll update this and then add the corresponding syntax kinds as needed.

Doc comments for generated TypeScript bindings

I noticed this weirdness after updating the doc comments and not noticing any change in the generated TypeScript bindings. 96b8a17

scip/bindings/typescript/scip.ts

Lines 26 to 64 in e3f61b1

 export enum SyntaxKind { 

 UnspecifiedSyntaxKind = 0, 

 Comment = 1, 

 PunctuationDelimiter = 2, 

 PunctuationBracket = 3, 

 IdentifierKeyword = 4, 

 IdentifierOperator = 5, 

 Identifier = 6, 

 IdentifierBuiltin = 7, 

 IdentifierNull = 8, 

 IdentifierConstant = 9, 

 IdentifierMutableGlobal = 10, 

 IdentifierParameter = 11, 

 IdentifierLocal = 12, 

 IdentifierShadowed = 13, 

 IdentifierModule = 14, 

 IdentifierFunction = 15, 

 IdentifierFunctionDefinition = 16, 

 IdentifierMacro = 17, 

 IdentifierMacroDefinition = 18, 

 IdentifierType = 19, 

 IdentifierBuiltinType = 20, 

 IdentifierAttribute = 21, 

 RegexEscape = 22, 

 RegexRepeated = 23, 

 RegexWildcard = 24, 

 RegexDelimiter = 25, 

 RegexJoin = 26, 

 StringLiteral = 27, 

 StringLiteralEscape = 28, 

 StringLiteralSpecial = 29, 

 StringLiteralKey = 30, 

 CharacterLiteral = 31, 

 NumericLiteral = 32, 

 BooleanLiteral = 33, 

 Tag = 34, 

 TagAttribute = 35, 

 TagDelimiter = 36 

 }

I wonder if there is a setting we can change (or a different compatible protobuf generator we can use) to make sure that doc comments are preserved. It feels a little weird that this doesn't work out-of-the-box. 😕

Support a build identifier for identifying different builds of the same (code, rev, path, language) tuples

Specifically, if one has an x86_64 Linux build and arm64 macOS build, it would be useful for code nav data to include the build identifier. Thoughts on how this build identifier would be used etc.

The build identifier would be a new string field in SCIP, not a part of src-cli. That way any tool inspecting an index has access to the build identifier.
The build identifier would be used to identify when an upload overrides another upload.
The build identifier would be exposed in the UI (maybe only for repos which have non-empty build identifiers?). This would allow a user to select the build in some way, and then code intel requests from the frontend to the backend would also include the build identifier (in addition the the revision etc.).

`scip stats` fails with "no such file or directory"

Repeatable with latest version.

$ scip stats --from - < /data/s/gitlab-development-kit/gdk.scip
2022/10/02 11:51:47 stat /s/gitlab-development-kit: no such file or directory

gdk.zip

Improve error message "empty identifier"

With the attached index, I get an error

scip-ruby gem TODO TODO P#a=().: empty identifier
scip-ruby gem TODO TODO P#a=().
_____________________________^
scip-ruby gem TODO TODO P#w=().: empty identifier
scip-ruby gem TODO TODO P#w=().
_____________________________^

This error is not very helpful; I skimmed the code and it seems like this comes up with empty descriptor names? I put asserts in the scip-ruby code and it doesn't seem to be creating descriptors with empty names, so not sure what's up here.

Automate generation of Haskell bindings

Update dev/proto-generate.sh + Buf files as needed.
Update asdf action and protobuf workflow: This will catch errors if the Haskell bindings are out-of-sync.
Update Development.md to mention the Haskell bindings.
Remove manual instructions from bindings/haskell/README.md

Using index.scip locally?

I want to write a tool that can find all references of given variable based on index.scip locally, how could I parse index.scip file?

Bug in SCIP to LSIF conversion with isDefinition translation causes upload failures

Specifically, it seems to be wrong that we are returning the DefinitionResult here: https://sourcegraph.com/github.com/sourcegraph/scip/-/blob/bindings/go/scip/convert.go?L298 -- the return value of this function is accumulated into an array called allReferenceResultIds. I'm not entirely sure what the right fix is, but the failure mode here is that we end up with an error like:

"conversion.Correlate: dump malformed on element 121539: unknown reference to 4703 (expected a range) in element 121539"

So a range is being expected but a definitionResult is being passed in that situation. Just removing that return line doesn't seem to fix the issue, it causes further issues downstream -- it seems like maybe we need some more logic in this function maybe (https://sourcegraph.com/github.com/sourcegraph/scip/-/blob/bindings/go/scip/convert.go?L149) because it is assigning a value for the DefinitionResult field without checking if there is an isDefinition relationship associated with the symbol.

Add process-level testing infrastructure

As I wrote earlier in a PR, we should add some process-level testing so that we can test the CLI more thoroughly.

This would involve:

A small CLI for indexing reprolang code.
Some tests for existing subcommands where we invoke the scip CLI with some arguments and examine the output/exit code and/or certain golden files.

Automate publishing of Rust bindings to crates.io

Emit "textDocument/definition" even when `SymbolInformation` is missing

Symbol parser panics on symbol `"scip-go . . . "`

github.com/sourcegraph/scip/bindings/go/scip.(*symbolParser).current(...)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:96
github.com/sourcegraph/scip/bindings/go/scip.(*symbolParser).acceptEscapedIdentifier(0xc0000b54f8, {0x15d1ee6, 0xf}, 0x20)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:202 +0x176
github.com/sourcegraph/scip/bindings/go/scip.(*symbolParser).acceptSpaceEscapedIdentifier(...)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:189
github.com/sourcegraph/scip/bindings/go/scip.ParsePartialSymbol({0xc002deb5f0, 0xe}, 0x1)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:46 +0x2e5
github.com/sourcegraph/scip/bindings/go/scip.ParseSymbol(...)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:21
github.com/sourcegraph/scip/bindings/go/scip.(*SymbolFormatter).Format(0x40?, {0xc002deb5f0?, 0x0?})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol_formatter.go:35 +0x2e
github.com/sourcegraph/scip/bindings/go/scip/testutil.FormatSnapshot.func2({0xc002deb5f0, 0xe})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/testutil/format.go:68 +0x3d
github.com/sourcegraph/scip/bindings/go/scip/testutil.FormatSnapshot(0xc00303e1c0, 0x37?, {0x15ccadc, 0x2}, {0x15f2c50, 0x15f2c58, 0x15f2c60, 0x15f2c68, 0x15f2c70})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/testutil/format.go:103 +0xa9a
github.com/sourcegraph/scip/bindings/go/scip/testutil.FormatSnapshots(0xc0000b5c08, {0x15ccadc, 0x2}, {0x15f2c50, 0x15f2c58, 0x15f2c60, 0x15f2c68, 0x15f2c70})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/testutil/format.go:28 +0x170

scip CLI: Hang with ill-formed SCIP index

See attached index.

index.scip.gz

This seems to cause a hang when invoked as ../scip/scip snapshot --from index.scip with scip CLI from 37f914d.

Add `SymbolRole.Invisible` for occurrences that are not accessible at the definition site

In scip-java, the LSIF generation currently supports a pattern that I'm not able to encode with SCIP.

case class User(name: String)

User("John").productElement(1)

Goto definition on productElement should go to class User. However, find references on class User should not show usages of productElement. By marking the definition occurrence of productElement as "invisible" it would not impact "find references" results from the occurrence itself, it's only useful for goto definition.

I'm not fully sold on the "invisible" name. One alternative that comes to mind is SymbolRole.ReferencesOnly 🤔 I'm open for suggestions

Optionally diagnose ill-formed-ness related issues with indexes

Rough sketch of what this would look like:

Take a flag describing whether diagnostics should be emitted or not (or have a separate function for this which can be optionally called)
Once the SCIP index has been ingested, perform analyses about well-formedness if the flag was passed
Print diagnostics in scip convert (not suppressed by default) and in src-cli (suppressed by default)
Optional: Add new scip diagnose subcommand. If we add this, then maybe scip convert doesn't need to warn by default.

New subcommand for human readable SCIP output

Describe the request

I'd be great to have a command that would allow you to view the payload quickly using the scip cli tool. For example something like scip view. This would make quickly getting a view of the internals or troubleshooting something a breeze.

I do know that you can do protoc --decode=scip.Index -I /path/to/scip scip.proto < index.scip, but that often requires making sure you have protoc installed, making sure you ahve a copy of the scip.proto locally, ect. One command to do all that for you would be amazing.

Alternatives

If someone is looking for a quick alternative feel free to throw this in your local bin

#!/usr/bin/env bash

check_exists() {
  for c in "$@"
  do
    if ! command -v $c &> /dev/null
    then
      echo "You need $c installed to use this script"
      exit 1
    fi
  done
}

check_exists "curl" "protoc"

if [ -n "${XDG_CACHE_HOME}" ]
then
  SCIP_DOWNLOAD_PATH="${XDG_CACHE_HOME}/scip/proto"
else
  SCIP_DOWNLOAD_PATH="${HOME}/.cache/scip/proto"
fi

SCIP_FILE="${SCIP_DOWNLOAD_PATH}/scip.proto"

if [[ ! -f "${SCIP_FILE}" ]]
then
  curl -sLo ${SCIP_FILE} --create-dirs https://raw.githubusercontent.com/sourcegraph/scip/main/scip.proto
fi

protoc --decode=scip.Index -I ${SCIP_DOWNLOAD_PATH} scip.proto < ${PWD}/index.scip

Support inlay hints

Inlay hints add additional inline information to source code, like inferred types and parameter names, which language servers can provide. Supporting them in scip (and sourcegraph) would be great.

please add license text to the published rust crate

If I understand correctly, this is the repo for the published rust crate https://crates.io/crates/scip. Could you please add the license text to the published crate?

cli: Print help text on plain `scip` invocation

I couldn't quite figure out how to get the help text, so I've left a FIXME for now.

Proposal: add several fields to `SymbolInformation`

This issue is an umbrella for several proposed additions to SCIP, based on discussions with @donsbot on Mastodon .

`SymbolInformation.display_name`

This would be the name of the symbol, which is both helpful for local variables and avoids parsing the name from the symbol. The field could be name instead of display_name, we use display_name in SemanticDB to emphasize that this name is meant to be displayed (and should therefore not have special encoding for non-ASCII characters like emojis.

`SymbolInformation.owner`

Alternative name parent. The thinking with this field is that it avoids parsing the owner from the symbol, and it allows us to emit an owner for local symbols.

`SymbolInformation.kind`

An enum that specifies what kind of symbol this is (enum/interface/method/...). Currently, Descriptor.Suffix doesn't encode enough fine-grained information (and it's intentionally named "suffix" to emphasize that it's primarily related to the syntax of the symbol.

`SymbolInformation.signature_documentation`

A string-formatted rendering of the signature. Currently, indexers emit this information in the documentation field as markdown-formatted code blocks. Having a separate field makes it cleaner to extract only the signature. I propose we reserve the field SymbolInformation.signature for fully typed/structured signatures (not string-formatted signatures).

BazelBuildTool exits with 0 even if no files are found

It would be nice for CI purposes if the BazelBuildTool (//scip-semanticdb:bazel) failed with a non-zero exit code if no semantic db files were found

FR: Move main package from cmd/ into cmd/scip/

Hello! This is a feature request to move the main package for the scip tool into cmd/scip instead of just cmd. This is a minor thing, but being able to install this tool via go install github.com/sourcegraph/scip/cmd/scip@whatever would be excellent. Doing go install github.com/sourcegraph/scip/cmd@whatever works, but it names the binary cmd which is awkward and requires renaming.

Release needed for `x86_64` target or better install script

I ran the note in the release

TAG="v0.2.0" \
RELEASE_URL="https://github.com/sourcegraph/scip/releases/download/$TAG" \
OS="$(uname -s | tr '[:upper:]' '[:lower:]')" \
curl -L "$RELEASE_URL/scip-$OS-$(uname -m).tar.gz" \
  | tar xzf - scip

but on my mac $(uname -m) returns x86_64 and tries to download scip-darwin-x86_64.tar.gz which doesn't exist (changing it to amd64 works).

Set up linting with golangci-lint

Would be nice-to-have.

We can probably crib it from src-cli.

Is there a place to discuss with the devs about SCIP?

I'm consuming scip indexes and have some questions about relationships. Sorry to open an issue but I couldn't find a place like Discussions

New Syntax Kind: `IdentifierString`

Some identifiers (such as Golang imports) should look just like strings (since they are).

However, they should not be treated exactly like strings. We can add IdentifierString as a possible way to solve this problem.

cc @olafurpg

Improve panic message "no inVs"

Reproduce with attached SCIP index. This leads to a panic message which just says no inVs without any more context.

CI: Typescript CI is flaky

Failed job: https://github.com/sourcegraph/scip/actions/runs/4860856491/attempts/1
Re-run fixed it immediately.

Commit itself has nothing to do with code.

`scip --version` reports 0.1.0

There are some problems with scip command line arguments. I've installed v0.2.1.

$ ./scip --version
0.1.0
$ ./scip version
$ ./scip verwhat
$

Add a (forward) declaration role

C++ code can have forward declarations; it'd be useful to track that separately, so that we can be smarter in returning results. For example, in Find References, we can potentially separate out forward declarations, or reduce the ranking score.

I think we should add a new SymbolRole for forward declarations.

☂️ Update documentation for SCIP

There are several sub-parts to this.

Must-have before release

Announcement blog post: Olaf is working on this. sourcegraph/about#5437
scip: (this repo) Add some details to the project README. #29

Nice-to-have before release

Post-release

handbook: Add recommended spelling entry and code intel glossary entry (for brief description + link to docs).
- sourcegraph/handbook#3878
- sourcegraph/handbook#3879

Is it possible to read the indexed data without Sourcegraph?

Hi there!

Is there a way to read the indexed data without using Sourcegraph?

I'm working on a project where the server should support LSP communication from indexed data and I was going to use LSIF until I read this post: SCIP - a better code indexing format than LSIF

The thing is that for LSIF I was going to use a tool like LSIF reader in order to get the data from the dump and then I was going to implement something to handle LSP communication.. but I'm not sure about how to do something like that using SCIP..

Is there a tool to do that?
Do I need a tool to do that?
I tried generating a dump of a javascript file using scip-typescript index --infer-tsconfig and the result wasn't human-readable

I hope it's ok to ask here, but let me know if not..

thank you :)

Tests that verify documentation is up-to-date ignore subcommands

The subcommands docs were out of date because the tests only check the output of the tool as a whole, not individual subcommands: https://github.com/sourcegraph/scip/blob/main/cmd/main_test.go#L26

We should ideally check all of it.

Possible approaches:

Extend current one
Alternatively, generate the CLI.md entirely out of the tests and use git diff to see if it deviates from what is checked in. Like we do with snapshots

Replace docopt usage with urfave/cli

It looks like https://cli.urfave.org/v2/ has a fair bit of functionality like completion etc, and Olaf mentioned that commands without = fail with cryptic errors.

$ scip convert --from index.scip --to dump.lsif

panic: interface conversion: interface {} is nil, not string

goroutine 1 [running]:
main.convertMain(0x1552c60?)
	/Users/olafurpg/dev/sourcegraph/scip/cmd/convert.go:15 +0x474

I suppose this error could be fixed by removing the = in the docopt text, but it's not intuitive that a space is not accepted, and the error message for the crash is also not helpful.

Better automation for --version

Idea:

Use a separate cli_version.txt file (or similar).
Embed the file inside the binary and use that for printing the version.
Create a small release shell script which adds both the git tag, modifies the version file and does anything else that is necessary.
Add a check in the release pipeline that for a release binary, the compiled binary's printed --version is the same as the tag. (To avoid an accidental trigger of the release pipeline due to a direct tag push to actually publish a release with the wrong --version).

Improve scip snapshot default documentation output

We should improve the default scip snapshot output to not have truncated documentation lines. Right now, it seems to generate output with a single line of documentation by default, which just ends up being:

 //  documentation ```ts

for TypeScript. This isn't useful. Instead, perhaps we should default to omitting that line if the doc comment is empty/blank, and print the full doc comment if one is present.

No document vertex found after converting SCIP into LSIF

Hi, I have tried to run scip-java on guava and got a SCIP file successfully.
After that we used scip convert to convert it into LSIF file.
But I did not find any document vertice in this LSIF file.

converted LSIF:

{"id":1,"version":"0.4.3","projectRoot":"file:///workspaces/diffctx/guava/","positionEncoding":"utf-8","toolInfo":{"name":"scip-java","version":"0.8.18"},"type":"vertex","label":"metaData"}

// no any document vertex here
{"id":2,"type":"vertex","label":"definitionResult"}
{"id":3,"type":"vertex","label":"resultSet"}
{"id":4,"type":"vertex","label":"referenceResult"}
{"id":5,"type":"vertex","label":"hoverResult","result":{"contents":{"kind":"markdown","value":"```java\npublic class MonitorBenchmark\n```\n\n---\n\n Benchmarks for {@link Monitor}.\n\n @author Justin T. Sampson\n"}}}
{"id":6,"type":"edge","label":"textDocument/definition","inV":2,"outV":3}
{"id":7,"type":"edge","label":"textDocument/references","inV":4,"outV":3}
{"id":8,"type":"edge","label":"textDocument/hover","inV":5,"outV":3}
{"id":9,"type":"vertex","label":"moniker","identifier":"semanticdb maven . . com/google/common/util/concurrent/MonitorBenchmark#","kind":"export","scheme":"semanticdb"}
{"id":10,"type":"edge","label":"moniker","inV":9,"outV":3}
{"id":11,"type":"vertex","label":"definitionResult"}
{"id":12,"type":"vertex","label":"resultSet"}
{"id":13,"type":"vertex","label":"referenceResult"}

origin LSIF files:

{"id":1,"type":"vertex","label":"metaData","version":"0.4.3","projectRoot":"file:///aaaa","positionEncoding":"utf-16","toolInfo":{"name":"lsif-go","version":"1.9.3","args":["-v"]}}

// document vertex
{"id":2,"type":"vertex","label":"project","kind":"go"}
{"id":3,"type":"vertex","label":"document","uri":"file:///a.go","languageId":"go"}
{"id":4,"type":"vertex","label":"document","uri":"file:///b.go","languageId":"go"}
{"id":5,"type":"vertex","label":"document","uri":"file:///c.go","languageId":"go"}

Thanks in advance

How should we version the various packages?

There are several different packages in this repo.

scip CLI
SCIP Go bindings, which include a bunch of "hand-written" functionality in addition to the generated code for dealing with Protobufs. This is consumed as a library by src-cli and by scip CLI.
SCIP bindings for TypeScript, Rust and Haskell. Right now, these are all fully generated.
There is also the original Protobuf schema itself.

Not including reprolang since it is for internal testing.

Based on my understanding of Go modules, modules have version numbers but packages don't have a separate version. Right now, we're using 1 Go module which covers both the CLI package and the SCIP bindings.

Some questions and thoughts:

Should we start off with major version 0? Or do we want to use major version 1 now given that the schema has mostly stabilized?
I propose that the generated bindings should always have synced version numbers, and this should be in sync with the schema. Maybe we should add a variant to ProtocolVersion with a major+minor version number (IDK if a patch number is useful for it). Every time we update the schema and the bindings change, we add a new tag to the repo and have CI publish releases for the different bindings.
I'm not entirely sure if the Go bindings (especially the hand-written parts) should have the same version as the other ones, because we may want to tweak certain parts of the code for src-cli. However, splitting the Go bindings into separate packages (or even modules for versioning) would lead to more cumbersome APIs because Go doesn't have extension methods.
For the CLI, I think we should decouple its version from the protocol version, but include the supported SCIP version ranges in the help or version text somewhere. For example, if there is ever a v2 for the SCIP protocol itself, scip CLI v1.x could support both SCIP v1 and SCIP v2 without breaking changes. But I'm not super convinced about this... maybe it is needless complexity/future-proofing.

Introduce standard sharded SCIP index format

Protobuf has a size limit of 2GB per message.

A single index for Chromium is about 6GB, triggering an error in scip-clang.

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/message_lite.cc:402] scip.Index exceeded maximum protobuf size of 2GB: 6138839817

The indexer needs to shard the data ahead-of-time and emit that instead.

Proposed sharded index format:

Directory containing one or more *.shard.scip files, each of which contains a scip.Index. All shards must have the same metadata.
The default name for the outer directory will be index.scip.

src-cli can be pointed to index.scip with -file (maybe we should rename this flag?). It will be responsible for compressing/tarring and uploading the index.

The behavior should be as-if:

The metadata field of a scip.Index was populated by the metadata field in some shard.
Any other fields of scip.Index (currently documents and external_symbols) in *.shard.scip files in the archive are processed in lexicographic ordering based on shard file names.

We should add documentation about this format in the README or in the scip.proto file.

This feature requires changes in:

The backend to accept the new sharded format. sourcegraph/sourcegraph#51132
The lib/codeintel/upload package in the Sourcegraph monorepo, to create the tar archive. sourcegraph/sourcegraph#51134
src-cli, to use the newer lib/codeintel/upload package and pass the right content type header
scip-clang: to emit large indexes in the sharded format
This repo: docs update.

Maybe we should also mention this feature addition in various CHANGELOGs.

Add workflow for releases

Before the release, we should have a script that generates the change log (or write one manually).

After the release tag is pushed to the main branch:

Publish the TypeScript bindings to npm. We can look at the code in scip-typescript for this.
Publish the Rust bindings to crates.io.
Generate + upload Go binaries for the CLI. We can look at the code in src-cli for this.

We will probably mess up while trying to figure this out, so we may want to include links (under Development.md) to npm and crates.io docs on how to yank a package.

Proposal: create a relationship for method return type

Right now, I don't see a sensible way to connect a method return type to method definition, this seems like a Relationship that could be added, like "Returns" or "IsReturnedBy"
This would be useful for features that display Call Hierarchy

Using FlatBuffers

https://google.github.io/flatbuffers/ looks like more simple Procol Buffers alternative. Does it worth to use it instead?

	export enum SyntaxKind {
	UnspecifiedSyntaxKind = 0,
	Comment = 1,
	PunctuationDelimiter = 2,
	PunctuationBracket = 3,
	IdentifierKeyword = 4,
	IdentifierOperator = 5,
	Identifier = 6,
	IdentifierBuiltin = 7,
	IdentifierNull = 8,
	IdentifierConstant = 9,
	IdentifierMutableGlobal = 10,
	IdentifierParameter = 11,
	IdentifierLocal = 12,
	IdentifierShadowed = 13,
	IdentifierModule = 14,
	IdentifierFunction = 15,
	IdentifierFunctionDefinition = 16,
	IdentifierMacro = 17,
	IdentifierMacroDefinition = 18,
	IdentifierType = 19,
	IdentifierBuiltinType = 20,
	IdentifierAttribute = 21,
	RegexEscape = 22,
	RegexRepeated = 23,
	RegexWildcard = 24,
	RegexDelimiter = 25,
	RegexJoin = 26,
	StringLiteral = 27,
	StringLiteralEscape = 28,
	StringLiteralSpecial = 29,
	StringLiteralKey = 30,
	CharacterLiteral = 31,
	NumericLiteral = 32,
	BooleanLiteral = 33,
	Tag = 34,
	TagAttribute = 35,
	TagDelimiter = 36
	}