Giter VIP home page Giter VIP logo

lookout-sdk's Introduction

lookout-sdk GitHub version PyPI version Build Status GoDoc

Toolkit for writing new analyzers for source{d} Lookout.

What Does the SDK Provide?

For the complete documentation of source{d} Lookout, please take a look at https://docs.sourced.tech/lookout.

For detailed information about the different parts of Lookout, and how they interact you can go to the Lookout architecture guide.

lookout-sdk provides:

  • proto definitions.
  • pre-generated libraries for Golang and Python, offering:
    • an easy access to the DataService API though a gRPC service. Lookout will take care of dealing with Git repositories, UAST extraction, programming language detection, etc.
    • low-level helpers to work around some protobuf/gRPC caveats.
  • quickstart examples of an Analyzer that detects language and number of functions (written in Go and in Python).

Caveats

For the gRPC client and server please follow these requirements:

  • set a common maximum gRPC message size in gRPC servers and clients. This is required to avoid hitting different gRPC limits when handling UASTs, that can be huge โ€”see grpc/grpc#7927โ€”. To do so use the included helpers in lookout-sdk:
    • go: using pb.NewServer and pb.DialContext.
    • python: using lookout.sdk.grpc.create_server and lookout.sdk.grpc.create_channel.
  • support RFC 3986 URI scheme; lookout-sdk includes helpers for this:
    • go: using pb.ToGoGrpcAddress and pb.Listen.
    • python: using lookout.sdk.grpc.to_grpc_address.
  • use insecure connection:
    • currently lookout expects to use insecure gRPC connections, as provided by pb.DialContext
    • python: run server using server.add_insecure_port(address) (example).

DataService

When DataService is being dialed, you should:

  • turn on gRPC Wait for Ready mode if your analyzer creates a connection to DataServer before it was actually started. This way the RPCs are queued until the chanel is ready:
  • golang: reset connection backoff to DataServer on event: if you keep the connection to DataServer open you need to reset the backoff when your analyzer receives a new event. Use the conn.ResetConnectBackoff method in your event handlers. It's needed to avoid broken connections after a lookoutd redeployment. In case of a long restart of lookoutd gRPC server, the backoff timeout may increase so much that the analyzer will not be able to reconnect before it makes the new request to DataServer.

Contributing

Contributions are welcome and very much appreciated ๐Ÿ™Œ

Please refer to our Contribution Guide for more details.

Community

source{d} has an amazing community of developers and contributors who are interested in Code As Data and/or Machine Learning on Code. Please join us! ๐Ÿ‘‹

Code of Conduct

All activities under source{d} projects are governed by the source{d} code of conduct.

License

Apache License Version 2.0, see LICENSE

lookout-sdk's People

Contributors

agarwalrounak avatar bzz avatar carlosms avatar dpordomingo avatar epicalyx avatar se7entyse7en avatar smacker avatar vmarkovtsev avatar zurk avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lookout-sdk's Issues

lookout.sdk python package not in the same namespace with lookout.style and lookout.core

The problem is that there is no namespace_packages defined in setup(...) call in setup.py.
It is easy to fix.

How to reproduse:
install style-analyzer: pip3 install style-analyzer and run

>>> import lookout.core
>>> import lookout.style
>>> import lookout.sdk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'lookout.sdk'

but lookout-sdk, as well as lookout-sdk-ml, are installed as dependencies.

Add helper method(s) for bblfsh

Related to src-d/lookout#339

The idea is to hide bblfsh dependencies from a user of sdk.
Current proposal is to create helper for client initializer. But maybe we can do better.
One of the pain points is that bblfsh client of user can be incompatible with lookout.
And the same problem exists with filtering.

Feature request: add the latest version of lookout-sdk binary which is guaranteed to work

We need to download the specific lookout-sdk binary which works with our client code for tests.
Our current approach is to take it from Gopkg.lock in src-d/lookout which works beautifully and reliably.
This is how it looks like: https://github.com/src-d/lookout-sdk-ml/blob/master/lookout/core/api/version.py

Since we want to eventually switch from artisanal homebrew to this project, I request the same feature here.

go: Move custom code from pb to sdk package

We have introduced new higher level go sdk package: #88

It makes sense to move custom code we wrote before from pb package (which contains auto generated code) to the new one. But we need to review first what we have there. Maybe not everything should be moved.

We also need to move /pb to /go/pb

Extract gRPC interceptors from lookout

Lookout contains two gRPC interceptors in the grpchelper package, see NewServer and DialContext in helper.go.

The Ctxlog interceptor requires that the analyzers also have these interceptors to pass around the log fields like this: lookoutd client -> analyzer server -> analyzer client -> lookoutd server.

If we move the Ctxtlog interceptor to this repo it probably makes sense to move also the Log interceptor, to have uniform logs for lookoutd and all the analyzers.

I'm not sure about extracting the util/ctxlog package to this repo. Maybe we just want to have methods to set and get map[string]interface{} in the context, using context.WithValue(), ctx.Value().

Ideally the python code should have the same functionality to get the log fields from the gRPC metadata.

Godoc badge and import path are inconsistent

Looking at the README, I realised that at some part you're pointed to gopkg.in/src-d/lookout-sdk.v0/pb, but the badge with the godoc reference points to github.com/src-d/lookout-sdk. They're equivalent, but they should be the consistent.

Feature request: support UAST cache

It would be great to have an ability to load UASTs from disk instead of parsing files every time. This is critical for ML benchmarks and quality evaluations, where we can analyze 1,000 predefined repositories at the same revisions. Thus we would have more stable timings, more stable experience (some driver may crash, e.g. cpp, or parse differently, e.g. js) and also run faster.

Make protogen could install grpcio_tools outside environment

In my specific case I'm using a python env built using conda and not venv. When I ran make protogen it actually installed grpcio_tools=1.13.0 in my global env as pip3 was pointing to the global pip. Given that inside an environment pip points to the "correct" pip2 or pip3 depending on the python version of the environment I'd suggest to replace pip3 with pip.

If we're concerned about the python version I'd add a check that tests it before generating the python files.

Make protogen more robust

I stumbled on 2 problems with current scripts:

  1. sed command is incompatible with sed installed by default on macos
  2. pip3 install --user doesn't work in virtualenv:
Can not perform a '--user' install. User site-packages are not visible in this virtualenv.

Integration tests could also test lookout

Since sdk is a lookout dependency, I wonder if it could be tested deeper here.
Currently, we only test the example analyzers with lookout-sdk binary (lookout-tool) what might be not enough.

We could try, for example, building lookout-sdk binary (lookout-tool) to ensure the sdk didn't break lookout.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.