Giter VIP home page Giter VIP logo

scip's Introduction

DocsContributingTwitterDiscord

Build status Scorecard Latest release Discord Contributors


Sourcegraph makes it easy to read, write, and fix code—even in big, complex codebases.

  • Code search: Search all of your repositories across all branches and all code hosts.
  • Code intelligence: Navigate code, find references, see code owners, trace history, and more.
  • Fix and refactor: Roll out large-scale changes to many repositories at once and track big migrations.

Getting started



Development

Refer to the Developing Sourcegraph guide to get started.

Documentation

The doc directory has additional documentation for developing and understanding Sourcegraph:

License

This repository contains primarily non-OSS-licensed files. See LICENSE.

Copyright (c) 2018-present Sourcegraph Inc.

scip's People

Contributors

abitrolly avatar asutherland avatar cesrjimenez avatar ckipp01 avatar davidbarsky avatar donsbot avatar efritz avatar est31 avatar fannheyward avatar figsoda avatar gigaroby avatar jtibshirani avatar keynmol avatar kritzcreek avatar mrnugget avatar olafurpg avatar renovate[bot] avatar tjdevries avatar varungandhi-src avatar wfraser avatar zfy0701 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scip's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Repository problems

These problems occurred while renovating this repository. View logs.

  • WARN: Fallback to renovate.json file as a preset is deprecated, please use a default.json file instead.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): update docker/build-push-action digest to e050dfa
  • chore(deps): update docker/login-action digest to 0d4c9c5
  • chore(deps): update ubuntu docker digest to 2e863c4
  • chore(deps): update dependency prettier to ^3.3.3
  • chore(deps): update dependency python to v3.12.4
  • chore(deps): update docker/metadata-action action to v5
  • chore(deps): update docker/setup-buildx-action action to v3
  • chore(deps): update docker/setup-qemu-action action to v3
  • chore(deps): update github artifact actions to v4 (major) (actions/download-artifact, actions/upload-artifact)

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

asdf
.tool-versions
  • golang 1.19.10
  • node 16.20.2
  • shellcheck 0.7.1
  • rust 1.71.0
  • python 3.11.9
cargo
bindings/rust/Cargo.toml
  • protobuf =3.2.0
  • pretty_assertions 1.2.1
dockerfile
dev/Dockerfile.bindings
  • ubuntu sha256:19478ce7fc2ffbce89df29fea5725a8d12e57de52eb9ea570890dc5852aac1ac
github-actions
.github/workflows/build-env-docker.yml
  • actions/checkout v3
  • docker/metadata-action v4
  • docker/setup-qemu-action v2
  • docker/setup-buildx-action v2
  • docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1
  • docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4
  • actions/upload-artifact v3
  • actions/attest-build-provenance v1
  • docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1
  • actions/download-artifact v3
  • docker/setup-buildx-action v2
  • docker/metadata-action v4
.github/workflows/formatting.yml
  • actions/checkout v3
.github/workflows/golang.yml
  • actions/checkout v3
.github/workflows/haskell.yml
  • actions/checkout v3
  • haskell/actions v2
.github/workflows/labeler.yml
  • github/issue-labeler v3.4
.github/workflows/project-board.yml
.github/workflows/protobuf-reprolang.yml
  • actions/checkout v3
  • docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1
.github/workflows/release.yml
  • actions/checkout v3
  • actions/checkout v3
  • wangyoucao577/go-release-action v1.40
.github/workflows/rust.yml
  • actions/checkout v3
.github/workflows/typescript.yml
  • actions/checkout v3
gomod
go.mod
npm
bindings/typescript/package.json
  • google-protobuf ^3.20.1
  • @types/google-protobuf 3.15.6
  • protoc-gen-ts 0.8.6
  • typescript ^4.9.0
package.json
  • prettier ^3.3.1

  • Check this box to trigger a request for Renovate to run again on this repository

CLI improvements: proper argument parsing + subcommands

We should introduce a subcommand structure

scip lsif --from=dump.lsif-typed --to=dump.lsif
scip lsif --from=dump.lsif --to=dump.lsif-typed
scip snapshot --output-directory=foobar

We could probably use the built-in flag package.

Let's break this up into two pieces:

  • Adding support for the conversion subcommand. (already implemented, with a slightly different structure)

  • Adding support for the snapshot subcommand. This is mostly implemented, but it needs to be exposed in a cross-language way. Right now, the implementation has some problems:

    • It assumes that the language has prefix-based line comments, which is not always true (e.g. OCaml doesn't have line comments.)
    • It assumes that indentation doesn't matter and creates a difference in indentation between the input and output. However, I think it would be better to avoid that. I think this is probably too complicated. Sublime does something similar in its tests, so I guess it is fine that the resulting code is not valid. 🤷
    • It assumes that the indexer uses tabwidth=1, which seems unusual.

    A "tricky" situation to consider here: Haskell commonly has definitions at the start of the line. Maybe we allow a --comment-syntax=<blah> argument, or [--line-comment-syntax=<blah> | --multiline-comment-syntax=<blah>]. A special string (say <content>) in the syntax will be replaced with the snapshot text. For multiline comments (either marked explicitly, or implicitly interpreted form --comment-syntax), we can coalesce them for readability. For Haskell's case, snapshots would end up looking like:

    addOne :: Int -> Int
    {-
    ^^^^^^ definition addOne
              ^^^ reference Int
                  ^^ reference (->)
                     ^^^ reference Int
    -}

    (Ignore the above, on thinking about it a bit more, I think it would be way more complicated to implement than it is worth.)

Originally posted by @olafurpg in sourcegraph/sourcegraph#34983 (comment)

CI: TypeScript typecheck job is not catching type errors

I was updating scip-typescript to use the latest copy of scip.ts and found a regression where this file no longer typechecks successfully. I was able to fix the problem by downgrading our protoc generator version in #112

This issue is a followup to identify why the typecheck CI job didn't fail when we bumped up the version of the protoc generator

Ignore `Document.symbols` that have no definition inside that document

Currently, the emitted LSIF has weird behavior if you add Document.symbols that have no definition occurrences in that document. We should ignore these symbols, and optionally print a warning to notify that these symbols are ignored. The warning message could be formatted like "did you intend to put this symbol into Index.external_symbols?

Missing `SyntaxKind`s

  • PunctuationComma?
  • LinkLiteral? (for markdown links)
  • CodeLiteral? (for inline markdown literals)

Let's discuss here and I'll update this and then add the corresponding syntax kinds as needed.

Doc comments for generated TypeScript bindings

I noticed this weirdness after updating the doc comments and not noticing any change in the generated TypeScript bindings. 96b8a17

export enum SyntaxKind {
UnspecifiedSyntaxKind = 0,
Comment = 1,
PunctuationDelimiter = 2,
PunctuationBracket = 3,
IdentifierKeyword = 4,
IdentifierOperator = 5,
Identifier = 6,
IdentifierBuiltin = 7,
IdentifierNull = 8,
IdentifierConstant = 9,
IdentifierMutableGlobal = 10,
IdentifierParameter = 11,
IdentifierLocal = 12,
IdentifierShadowed = 13,
IdentifierModule = 14,
IdentifierFunction = 15,
IdentifierFunctionDefinition = 16,
IdentifierMacro = 17,
IdentifierMacroDefinition = 18,
IdentifierType = 19,
IdentifierBuiltinType = 20,
IdentifierAttribute = 21,
RegexEscape = 22,
RegexRepeated = 23,
RegexWildcard = 24,
RegexDelimiter = 25,
RegexJoin = 26,
StringLiteral = 27,
StringLiteralEscape = 28,
StringLiteralSpecial = 29,
StringLiteralKey = 30,
CharacterLiteral = 31,
NumericLiteral = 32,
BooleanLiteral = 33,
Tag = 34,
TagAttribute = 35,
TagDelimiter = 36
}

I wonder if there is a setting we can change (or a different compatible protobuf generator we can use) to make sure that doc comments are preserved. It feels a little weird that this doesn't work out-of-the-box. 😕

Support a build identifier for identifying different builds of the same (code, rev, path, language) tuples

Specifically, if one has an x86_64 Linux build and arm64 macOS build, it would be useful for code nav data to include the build identifier. Thoughts on how this build identifier would be used etc.

  1. The build identifier would be a new string field in SCIP, not a part of src-cli. That way any tool inspecting an index has access to the build identifier.
  2. The build identifier would be used to identify when an upload overrides another upload.
  3. The build identifier would be exposed in the UI (maybe only for repos which have non-empty build identifiers?). This would allow a user to select the build in some way, and then code intel requests from the frontend to the backend would also include the build identifier (in addition the the revision etc.).

Improve error message "empty identifier"

With the attached index, I get an error

scip-ruby gem TODO TODO P#a=().: empty identifier
scip-ruby gem TODO TODO P#a=().
_____________________________^
scip-ruby gem TODO TODO P#w=().: empty identifier
scip-ruby gem TODO TODO P#w=().
_____________________________^

This error is not very helpful; I skimmed the code and it seems like this comes up with empty descriptor names? I put asserts in the scip-ruby code and it doesn't seem to be creating descriptors with empty names, so not sure what's up here.

Automate generation of Haskell bindings

  • Update dev/proto-generate.sh + Buf files as needed.
  • Update asdf action and protobuf workflow: This will catch errors if the Haskell bindings are out-of-sync.
  • Update Development.md to mention the Haskell bindings.
  • Remove manual instructions from bindings/haskell/README.md

Using index.scip locally?

I want to write a tool that can find all references of given variable based on index.scip locally, how could I parse index.scip file?

Bug in SCIP to LSIF conversion with isDefinition translation causes upload failures

Specifically, it seems to be wrong that we are returning the DefinitionResult here: https://sourcegraph.com/github.com/sourcegraph/scip/-/blob/bindings/go/scip/convert.go?L298 -- the return value of this function is accumulated into an array called allReferenceResultIds. I'm not entirely sure what the right fix is, but the failure mode here is that we end up with an error like:

"conversion.Correlate: dump malformed on element 121539: unknown reference to 4703 (expected a range) in element 121539"

So a range is being expected but a definitionResult is being passed in that situation. Just removing that return line doesn't seem to fix the issue, it causes further issues downstream -- it seems like maybe we need some more logic in this function maybe (https://sourcegraph.com/github.com/sourcegraph/scip/-/blob/bindings/go/scip/convert.go?L149) because it is assigning a value for the DefinitionResult field without checking if there is an isDefinition relationship associated with the symbol.

Add process-level testing infrastructure

As I wrote earlier in a PR, we should add some process-level testing so that we can test the CLI more thoroughly.

This would involve:

  • A small CLI for indexing reprolang code.
  • Some tests for existing subcommands where we invoke the scip CLI with some arguments and examine the output/exit code and/or certain golden files.

Symbol parser panics on symbol `"scip-go . . . "`

github.com/sourcegraph/scip/bindings/go/scip.(*symbolParser).current(...)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:96
github.com/sourcegraph/scip/bindings/go/scip.(*symbolParser).acceptEscapedIdentifier(0xc0000b54f8, {0x15d1ee6, 0xf}, 0x20)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:202 +0x176
github.com/sourcegraph/scip/bindings/go/scip.(*symbolParser).acceptSpaceEscapedIdentifier(...)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:189
github.com/sourcegraph/scip/bindings/go/scip.ParsePartialSymbol({0xc002deb5f0, 0xe}, 0x1)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:46 +0x2e5
github.com/sourcegraph/scip/bindings/go/scip.ParseSymbol(...)
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol.go:21
github.com/sourcegraph/scip/bindings/go/scip.(*SymbolFormatter).Format(0x40?, {0xc002deb5f0?, 0x0?})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/symbol_formatter.go:35 +0x2e
github.com/sourcegraph/scip/bindings/go/scip/testutil.FormatSnapshot.func2({0xc002deb5f0, 0xe})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/testutil/format.go:68 +0x3d
github.com/sourcegraph/scip/bindings/go/scip/testutil.FormatSnapshot(0xc00303e1c0, 0x37?, {0x15ccadc, 0x2}, {0x15f2c50, 0x15f2c58, 0x15f2c60, 0x15f2c68, 0x15f2c70})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/testutil/format.go:103 +0xa9a
github.com/sourcegraph/scip/bindings/go/scip/testutil.FormatSnapshots(0xc0000b5c08, {0x15ccadc, 0x2}, {0x15f2c50, 0x15f2c58, 0x15f2c60, 0x15f2c68, 0x15f2c70})
	/Users/olafurpg/gopath/pkg/mod/github.com/sourcegraph/[email protected]/bindings/go/scip/testutil/format.go:28 +0x170

Add `SymbolRole.Invisible` for occurrences that are not accessible at the definition site

In scip-java, the LSIF generation currently supports a pattern that I'm not able to encode with SCIP.

case class User(name: String)

User("John").productElement(1)

Goto definition on productElement should go to class User. However, find references on class User should not show usages of productElement. By marking the definition occurrence of productElement as "invisible" it would not impact "find references" results from the occurrence itself, it's only useful for goto definition.

I'm not fully sold on the "invisible" name. One alternative that comes to mind is SymbolRole.ReferencesOnly 🤔 I'm open for suggestions

Optionally diagnose ill-formed-ness related issues with indexes

Rough sketch of what this would look like:

  • Take a flag describing whether diagnostics should be emitted or not (or have a separate function for this which can be optionally called)
  • Once the SCIP index has been ingested, perform analyses about well-formedness if the flag was passed
  • Print diagnostics in scip convert (not suppressed by default) and in src-cli (suppressed by default)
  • Optional: Add new scip diagnose subcommand. If we add this, then maybe scip convert doesn't need to warn by default.

New subcommand for human readable SCIP output

Describe the request

I'd be great to have a command that would allow you to view the payload quickly using the scip cli tool. For example something like scip view. This would make quickly getting a view of the internals or troubleshooting something a breeze.

I do know that you can do protoc --decode=scip.Index -I /path/to/scip scip.proto < index.scip, but that often requires making sure you have protoc installed, making sure you ahve a copy of the scip.proto locally, ect. One command to do all that for you would be amazing.

Alternatives

If someone is looking for a quick alternative feel free to throw this in your local bin

#!/usr/bin/env bash

check_exists() {
  for c in "$@"
  do
    if ! command -v $c &> /dev/null
    then
      echo "You need $c installed to use this script"
      exit 1
    fi
  done
}

check_exists "curl" "protoc"

if [ -n "${XDG_CACHE_HOME}" ]
then
  SCIP_DOWNLOAD_PATH="${XDG_CACHE_HOME}/scip/proto"
else
  SCIP_DOWNLOAD_PATH="${HOME}/.cache/scip/proto"
fi

SCIP_FILE="${SCIP_DOWNLOAD_PATH}/scip.proto"

if [[ ! -f "${SCIP_FILE}" ]]
then
  curl -sLo ${SCIP_FILE} --create-dirs https://raw.githubusercontent.com/sourcegraph/scip/main/scip.proto
fi

protoc --decode=scip.Index -I ${SCIP_DOWNLOAD_PATH} scip.proto < ${PWD}/index.scip

Support inlay hints

Inlay hints add additional inline information to source code, like inferred types and parameter names, which language servers can provide. Supporting them in scip (and sourcegraph) would be great.

Proposal: add several fields to `SymbolInformation`

This issue is an umbrella for several proposed additions to SCIP, based on discussions with @donsbot on Mastodon .

SymbolInformation.display_name

This would be the name of the symbol, which is both helpful for local variables and avoids parsing the name from the symbol. The field could be name instead of display_name, we use display_name in SemanticDB to emphasize that this name is meant to be displayed (and should therefore not have special encoding for non-ASCII characters like emojis.

SymbolInformation.owner

Alternative name parent. The thinking with this field is that it avoids parsing the owner from the symbol, and it allows us to emit an owner for local symbols.

SymbolInformation.kind

An enum that specifies what kind of symbol this is (enum/interface/method/...). Currently, Descriptor.Suffix doesn't encode enough fine-grained information (and it's intentionally named "suffix" to emphasize that it's primarily related to the syntax of the symbol.

SymbolInformation.signature_documentation

A string-formatted rendering of the signature. Currently, indexers emit this information in the documentation field as markdown-formatted code blocks. Having a separate field makes it cleaner to extract only the signature. I propose we reserve the field SymbolInformation.signature for fully typed/structured signatures (not string-formatted signatures).

FR: Move main package from cmd/ into cmd/scip/

Hello! This is a feature request to move the main package for the scip tool into cmd/scip instead of just cmd. This is a minor thing, but being able to install this tool via go install github.com/sourcegraph/scip/cmd/scip@whatever would be excellent. Doing go install github.com/sourcegraph/scip/cmd@whatever works, but it names the binary cmd which is awkward and requires renaming.

Release needed for `x86_64` target or better install script

I ran the note in the release

TAG="v0.2.0" \
RELEASE_URL="https://github.com/sourcegraph/scip/releases/download/$TAG" \
OS="$(uname -s | tr '[:upper:]' '[:lower:]')" \
curl -L "$RELEASE_URL/scip-$OS-$(uname -m).tar.gz" \
  | tar xzf - scip

but on my mac $(uname -m) returns x86_64 and tries to download scip-darwin-x86_64.tar.gz which doesn't exist (changing it to amd64 works).

New Syntax Kind: `IdentifierString`

Some identifiers (such as Golang imports) should look just like strings (since they are).

However, they should not be treated exactly like strings. We can add IdentifierString as a possible way to solve this problem.

cc @olafurpg

`scip --version` reports 0.1.0

There are some problems with scip command line arguments. I've installed v0.2.1.

$ ./scip --version
0.1.0
$ ./scip version
$ ./scip verwhat
$

Add a (forward) declaration role

C++ code can have forward declarations; it'd be useful to track that separately, so that we can be smarter in returning results. For example, in Find References, we can potentially separate out forward declarations, or reduce the ranking score.

I think we should add a new SymbolRole for forward declarations.

☂️ Update documentation for SCIP

There are several sub-parts to this.

Must-have before release

  • Announcement blog post: Olaf is working on this. sourcegraph/about#5437
  • scip: (this repo) Add some details to the project README. #29

Nice-to-have before release

Post-release

Is it possible to read the indexed data without Sourcegraph?

Hi there!

Is there a way to read the indexed data without using Sourcegraph?

I'm working on a project where the server should support LSP communication from indexed data and I was going to use LSIF until I read this post: SCIP - a better code indexing format than LSIF

The thing is that for LSIF I was going to use a tool like LSIF reader in order to get the data from the dump and then I was going to implement something to handle LSP communication.. but I'm not sure about how to do something like that using SCIP..

Is there a tool to do that?
Do I need a tool to do that?
I tried generating a dump of a javascript file using scip-typescript index --infer-tsconfig and the result wasn't human-readable

I hope it's ok to ask here, but let me know if not..

thank you :)

Replace docopt usage with urfave/cli

It looks like https://cli.urfave.org/v2/ has a fair bit of functionality like completion etc, and Olaf mentioned that commands without = fail with cryptic errors.

$ scip convert --from index.scip --to dump.lsif

panic: interface conversion: interface {} is nil, not string

goroutine 1 [running]:
main.convertMain(0x1552c60?)
	/Users/olafurpg/dev/sourcegraph/scip/cmd/convert.go:15 +0x474

I suppose this error could be fixed by removing the = in the docopt text, but it's not intuitive that a space is not accepted, and the error message for the crash is also not helpful.

Better automation for --version

Idea:

  • Use a separate cli_version.txt file (or similar).
  • Embed the file inside the binary and use that for printing the version.
  • Create a small release shell script which adds both the git tag, modifies the version file and does anything else that is necessary.
  • Add a check in the release pipeline that for a release binary, the compiled binary's printed --version is the same as the tag. (To avoid an accidental trigger of the release pipeline due to a direct tag push to actually publish a release with the wrong --version).

Improve scip snapshot default documentation output

We should improve the default scip snapshot output to not have truncated documentation lines. Right now, it seems to generate output with a single line of documentation by default, which just ends up being:

 //  documentation ```ts

for TypeScript. This isn't useful. Instead, perhaps we should default to omitting that line if the doc comment is empty/blank, and print the full doc comment if one is present.

No document vertex found after converting SCIP into LSIF

Hi, I have tried to run scip-java on guava and got a SCIP file successfully.
After that we used scip convert to convert it into LSIF file.
But I did not find any document vertice in this LSIF file.

converted LSIF:

{"id":1,"version":"0.4.3","projectRoot":"file:///workspaces/diffctx/guava/","positionEncoding":"utf-8","toolInfo":{"name":"scip-java","version":"0.8.18"},"type":"vertex","label":"metaData"}

// no any document vertex here
{"id":2,"type":"vertex","label":"definitionResult"}
{"id":3,"type":"vertex","label":"resultSet"}
{"id":4,"type":"vertex","label":"referenceResult"}
{"id":5,"type":"vertex","label":"hoverResult","result":{"contents":{"kind":"markdown","value":"```java\npublic class MonitorBenchmark\n```\n\n---\n\n Benchmarks for {@link Monitor}.\n\n @author Justin T. Sampson\n"}}}
{"id":6,"type":"edge","label":"textDocument/definition","inV":2,"outV":3}
{"id":7,"type":"edge","label":"textDocument/references","inV":4,"outV":3}
{"id":8,"type":"edge","label":"textDocument/hover","inV":5,"outV":3}
{"id":9,"type":"vertex","label":"moniker","identifier":"semanticdb maven . . com/google/common/util/concurrent/MonitorBenchmark#","kind":"export","scheme":"semanticdb"}
{"id":10,"type":"edge","label":"moniker","inV":9,"outV":3}
{"id":11,"type":"vertex","label":"definitionResult"}
{"id":12,"type":"vertex","label":"resultSet"}
{"id":13,"type":"vertex","label":"referenceResult"}

origin LSIF files:

{"id":1,"type":"vertex","label":"metaData","version":"0.4.3","projectRoot":"file:///aaaa","positionEncoding":"utf-16","toolInfo":{"name":"lsif-go","version":"1.9.3","args":["-v"]}}

// document vertex
{"id":2,"type":"vertex","label":"project","kind":"go"}
{"id":3,"type":"vertex","label":"document","uri":"file:///a.go","languageId":"go"}
{"id":4,"type":"vertex","label":"document","uri":"file:///b.go","languageId":"go"}
{"id":5,"type":"vertex","label":"document","uri":"file:///c.go","languageId":"go"}

Thanks in advance

How should we version the various packages?

There are several different packages in this repo.

  1. scip CLI
  2. SCIP Go bindings, which include a bunch of "hand-written" functionality in addition to the generated code for dealing with Protobufs. This is consumed as a library by src-cli and by scip CLI.
  3. SCIP bindings for TypeScript, Rust and Haskell. Right now, these are all fully generated.
  4. There is also the original Protobuf schema itself.

Not including reprolang since it is for internal testing.

Based on my understanding of Go modules, modules have version numbers but packages don't have a separate version. Right now, we're using 1 Go module which covers both the CLI package and the SCIP bindings.

Some questions and thoughts:

  1. Should we start off with major version 0? Or do we want to use major version 1 now given that the schema has mostly stabilized?
  2. I propose that the generated bindings should always have synced version numbers, and this should be in sync with the schema. Maybe we should add a variant to ProtocolVersion with a major+minor version number (IDK if a patch number is useful for it). Every time we update the schema and the bindings change, we add a new tag to the repo and have CI publish releases for the different bindings.
  3. I'm not entirely sure if the Go bindings (especially the hand-written parts) should have the same version as the other ones, because we may want to tweak certain parts of the code for src-cli. However, splitting the Go bindings into separate packages (or even modules for versioning) would lead to more cumbersome APIs because Go doesn't have extension methods.
  4. For the CLI, I think we should decouple its version from the protocol version, but include the supported SCIP version ranges in the help or version text somewhere. For example, if there is ever a v2 for the SCIP protocol itself, scip CLI v1.x could support both SCIP v1 and SCIP v2 without breaking changes. But I'm not super convinced about this... maybe it is needless complexity/future-proofing.

Introduce standard sharded SCIP index format

Protobuf has a size limit of 2GB per message.

A single index for Chromium is about 6GB, triggering an error in scip-clang.

[libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/message_lite.cc:402] scip.Index exceeded maximum protobuf size of 2GB: 6138839817

The indexer needs to shard the data ahead-of-time and emit that instead.

Proposed sharded index format:

  • Directory containing one or more *.shard.scip files, each of which contains a scip.Index. All shards must have the same metadata.
  • The default name for the outer directory will be index.scip.

src-cli can be pointed to index.scip with -file (maybe we should rename this flag?). It will be responsible for compressing/tarring and uploading the index.

The behavior should be as-if:

  • The metadata field of a scip.Index was populated by the metadata field in some shard.
  • Any other fields of scip.Index (currently documents and external_symbols) in *.shard.scip files in the archive are processed in lexicographic ordering based on shard file names.

We should add documentation about this format in the README or in the scip.proto file.

This feature requires changes in:

  • The backend to accept the new sharded format. sourcegraph/sourcegraph#51132
  • The lib/codeintel/upload package in the Sourcegraph monorepo, to create the tar archive. sourcegraph/sourcegraph#51134
  • src-cli, to use the newer lib/codeintel/upload package and pass the right content type header
  • scip-clang: to emit large indexes in the sharded format
  • This repo: docs update.

Maybe we should also mention this feature addition in various CHANGELOGs.

Add workflow for releases

Before the release, we should have a script that generates the change log (or write one manually).

After the release tag is pushed to the main branch:

  • Publish the TypeScript bindings to npm. We can look at the code in scip-typescript for this.
  • Publish the Rust bindings to crates.io.
  • Generate + upload Go binaries for the CLI. We can look at the code in src-cli for this.

We will probably mess up while trying to figure this out, so we may want to include links (under Development.md) to npm and crates.io docs on how to yank a package.

Proposal: create a relationship for method return type

Right now, I don't see a sensible way to connect a method return type to method definition, this seems like a Relationship that could be added, like "Returns" or "IsReturnedBy"
This would be useful for features that display Call Hierarchy

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.