Giter VIP home page Giter VIP logo

enry's Introduction

WE CONTINUE THE DEVELOPMENT AT go-enry/go-enry. This repository is abandoned, and no further updates will be done on the code base, nor issue/prs will be answered or attended.

enry GoDoc Build Status codecov

Programming language detector and toolbox to ignore binary or vendored files. enry, started as a port to Go of the original linguist Ruby library, that has an improved 2x performance.

CLI

The recommended way to install the enry command-line tool is to either download a release or run:

(cd "$(mktemp -d)" && go mod init enry && go get github.com/src-d/enry/v2/cmd/enry)

enry CLI accepts similar flags (--breakdown/--json) and produce an output, similar to linguist:

$ enry
97.71%	Go
1.60%	C
0.31%	Shell
0.22%	Java
0.07%	Ruby
0.05%	Makefile
0.04%	Scala
0.01%	Gnuplot

Note that enry's CLI does not need an actual git repository to work, which is an intentional difference from linguist.

Library

enry is also available as a native Go library with FFI bindings for multiple programming languages.

Go

In a Go module, import enry to the module by running:

go get github.com/src-d/enry/v2

The rest of the examples will assume you have either done this or fetched the library into your GOPATH.

// The examples here and below assume you have imported the library.
import "github.com/src-d/enry/v2"

lang, safe := enry.GetLanguageByExtension("foo.go")
fmt.Println(lang, safe)
// result: Go true

lang, safe := enry.GetLanguageByContent("foo.m", []byte("<matlab-code>"))
fmt.Println(lang, safe)
// result: Matlab true

lang, safe := enry.GetLanguageByContent("bar.m", []byte("<objective-c-code>"))
fmt.Println(lang, safe)
// result: Objective-C true

// all strategies together
lang := enry.GetLanguage("foo.cpp", []byte("<cpp-code>"))
// result: C++ true

Note that the returned boolean value safe is true if there is only one possible language detected.

To get a list of all possible languages for a given file, there is a plural version of the same API.

langs := enry.GetLanguages("foo.h",  []byte("<cpp-code>"))
// result: []string{"C", "C++", "Objective-C}

langs := enry.GetLanguagesByExtension("foo.asc", []byte("<content>"), nil)
// result: []string{"AGS Script", "AsciiDoc", "Public Key"}

langs := enry.GetLanguagesByFilename("Gemfile", []byte("<content>"), []string{})
// result: []string{"Ruby"}

Java bindings

Generated Java bindings using a C shared library and JNI are available under java.

A library is published on Maven as tech.sourced:enry-java for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Python bindings

Generated Python bindings using a C shared library and cffi are WIP under src-d/enry#154.

A library is going to be published on pypi as enry for macOS and linux platforms. Windows support is planned under src-d/enry#150.

Divergences from linguist

The enry library is based on the data from github/linguist version v7.5.1.

As opposed to linguist, enry CLI tool does not require a full Git repository in the filesystem in order to report languages.

Parsing linguist/samples the following enry results are different from linguist:

In all the cases above that have an issue number - we plan to update enry to match Linguist behavior.

Benchmarks

Enry's language detection has been compared with Linguist's on linguist/samples.

We got these results:

histogram

The histogram shows the number of files (y-axis) per time interval bucket (x-axis). Most of the files were detected faster by enry.

There are several cases where enry is slower than linguist due to Go regexp engine being slower than Ruby's on, wich is based on oniguruma library, written in C.

See instructions for running enry with oniguruma.

Why Enry?

In the movie My Fair Lady, Professor Henry Higgins is a linguist who at the very beginning of the movie enjoys guessing the origin of people based on their accent.

"Enry Iggins" is how Eliza Doolittle, pronounces the name of the Professor.

Development

To build enry's CLI run:

make build

this will generate a binary in the project's root directory called enry.

To run the tests use:

make test

Sync with github/linguist upstream

enry re-uses parts of the original github/linguist to generate internal data structures. In order to update to the latest release of linguist do:

$ git clone https://github.com/github/linguist.git .linguist
$ cd .linguist; git checkout <release-tag>; cd ..

# put the new release's commit sha in the generator_test.go (to re-generate .gold test fixtures)
# https://github.com/src-d/enry/blob/13d3d66d37a87f23a013246a1b0678c9ee3d524b/internal/code-generator/generator/generator_test.go#L18

$ make code-generate

To stay in sync, enry needs to be updated when a new release of the linguist includes changes to any of the following files:

There is no automation for detecting the changes in the linguist project, so this process above has to be done manually from time to time.

When submitting a pull request syncing up to a new release, please make sure it only contains the changes in the generated files (in data subdirectory).

Separating all the necessary "manual" code changes to a different PR that includes some background description and an update to the documentation on "divergences from linguist" is very much appreciated as it simplifies the maintenance (review/release notes/etc).

Misc

Running a benchmark & faster regexp engine

Benchmark

All benchmark scripts are in benchmarks directory.

Dependencies

As benchmarks depend on Ruby and Github-Linguist gem make sure you have:

  • Ruby (e.g using rbenv), bundler installed
  • Docker
  • native dependencies installed
  • Build the gem cd .linguist && bundle install && rake build_gem && cd -
  • Install it gem install --no-rdoc --no-ri --local .linguist/github-linguist-*.gem

Quick benchmark

To run quicker benchmarks you can either:

make benchmarks

to get average times for the main detection function and strategies for the whole samples set or:

make benchmarks-samples

if you want to see measures per sample file.

Full benchmark

If you want to reproduce the same benchmarks as reported above:

  • Make sure all dependencies are installed
  • Install gnuplot (in order to plot the histogram)
  • Run ENRY_TEST_REPO="$PWD/.linguist" benchmarks/run.sh (takes ~15h)

It will run the benchmarks for enry and linguist, parse the output, create csv files and plot the histogram.

Faster regexp engine (optional)

Oniguruma is CRuby's regular expression engine. It is very fast and performs better than the one built into Go runtime. enry supports swapping between those two engines thanks to rubex project. The typical overall speedup from using Oniguruma is 1.5-2x. However, it requires CGo and the external shared library. On macOS with Homebrew, it is:

brew install oniguruma

On Ubuntu, it is

sudo apt install libonig-dev

To build enry with Oniguruma regexps use the oniguruma build tag

go get -v -t --tags oniguruma ./...

and then rebuild the project.

License

Apache License, Version 2.0. See LICENSE

enry's People

Contributors

abeaumont avatar ajnavarro avatar bzz avatar campoy avatar creachadair avatar darkowlzz avatar dennwc avatar dpaz avatar dpordomingo avatar dvrkps avatar eiso avatar erizocosmico avatar juanjux avatar lafriks avatar mcarmonaa avatar mcuadros avatar pratik97 avatar silvia-odwyer avatar smola avatar suhaibmujahid avatar vmarkovtsev avatar zjvandeweg avatar zkry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

enry's Issues

Weird SIGSEGV

I ran the same source{d} engine pipeline on "100 repos" dataset (/data/siva on science-3, @bzz knows) nearly 50 times and it worked without errors. On 51st, I got the following stack trace:

unexpected fault address 0x0
fatal error: fault
panic: runtime error: slice bounds out of range

goroutine 17 [running, locked to thread]:
bytes.Count(0x7ff5ec077fe0, 0xa, 0x0, 0x1c42005cc20, 0x1, 0x20, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7ff5ec077fe0, 0xa, 0x0, 0x66, 0x6, 0x7ff5d828aed8)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x7ff5a2524a73, 0xc, 0x1c42001cde0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x1c42005ce48, 0x1c420018500)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x1c420078098, 0x1c4200007e0)
	/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff5ec076ed0, 0x2, 0x7ff5ec077fe0, 0xa, 0x0, 0x0, 0x0)
	command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
panic: runtime error: slice bounds out of range

goroutine 53 [running, locked to thread]:
bytes.Count(0x7ff5e4080020, 0xa, 0x0, 0x1c42005fc20, 0x1, 0x20, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7ff5e4080020, 0xa, 0x0, 0x0, 0x0, 0x0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x7ff5a2524a73, 0xc, 0x1c4200206e0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x1c42005fe48, 0x1c4200c4040)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x1c4200781d8, 0x1c420085680)
	/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff5e4088850, 0x7ff694018800, 0x7ff5e4080020, 0xa, 0x0, 0x0, 0x0)
	command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x7ff5a23fbbb1]

goroutine 51 [running, locked to thread]:
runtime.throw(0x7ff5a24caf58, 0x5)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/panic.go:596 +0x97 fp=0x1c42005dc60 sp=0x1c42005dc40
runtime.sigpanic()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/signal_unix.go:297 +0x290 fp=0x1c42005dcb0 sp=0x1c42005dc60
path/filepath.Base(0x7ff600003dc0, 0x7ff694011000, 0x2, 0x7ff600052360)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/path/filepath/path.go:431 +0x31 fp=0x1c42005dcc0 sp=0x1c42005dcb0
gopkg.in/src-d/enry%2ev1.GetLanguagesByFilename(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x7ff5a2ff8c08, 0x0, 0x0, 0x0, 0x0, ...)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:264 +0x3b fp=0x1c42005dcf8 sp=0x1c42005dcc0
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x7ff5a2524a73, 0xc, 0x1c42001e0e0)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129 fp=0x1c42005ddb0 sp=0x1c42005dcf8
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x1c42005de48, 0x1c4200c6040)
	/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55 fp=0x1c42005de00 sp=0x1c42005ddb0
main.GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x1c420078138, 0x1c420084cc0)
	/home/travis/build/src-d/enry/shared/enry.go:11 +0x55 fp=0x1c42005de48 sp=0x1c42005de00
main._cgoexpwrap_f7db11756761_GetLanguage(0x7ff600003dc0, 0x7ff694011000, 0x7ff600052360, 0x5, 0x34, 0x0, 0x0)
	command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a fp=0x1c42005de90 sp=0x1c42005de48
runtime.call64(0x0, 0x7ff6197ea008, 0x7ff6197ea0a0, 0x38)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:515 +0x4a fp=0x1c42005dee0 sp=0x1c42005de90
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:301 +0x1a1 fp=0x1c42005df58 sp=0x1c42005dee0
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86 fp=0x1c42005dfc0 sp=0x1c42005df58
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71 fp=0x1c42005dfe0 sp=0x1c42005dfc0
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1 fp=0x1c42005dfe8 sp=0x1c42005dfe0

goroutine 17 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 50 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078017, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005bf08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 52 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078117, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005ef08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 53 [running, locked to thread]:
	goroutine running on other thread; stack unavailable

goroutine 54 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078117, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c4207bbf08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 55 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078217, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c420058f08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 56 [syscall, locked to thread]:
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1

goroutine 57 [runnable, locked to thread]:
runtime.gopark(0x7ff5a2cd4388, 0x1c420090058, 0x7ff5a2524a73, 0xc, 0x1c420078217, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:271 +0x140
runtime.goparkunlock(0x1c420090058, 0x7ff5a2524a73, 0xc, 0x17, 0x3)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/proc.go:277 +0x60
runtime.chanrecv(0x7ff5a2ca4080, 0x1c420090000, 0x0, 0x7ff5a2387d01, 0x1c42005af08)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:513 +0x375
runtime.chanrecv1(0x7ff5a2ca4080, 0x1c420090000, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/chan.go:395 +0x35
runtime.cgocallbackg1(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:225 +0x1f4
runtime.cgocallbackg(0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/cgocall.go:184 +0x86
runtime.cgocallback_gofunc(0x0, 0x0, 0x0, 0x0)
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:767 +0x71
runtime.goexit()
	/home/travis/.gimme/versions/go1.8.linux.amd64/src/runtime/asm_amd64.s:2197 +0x1
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 852, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 990, in start
    self.socket.connect((self.address, self.port))
ConnectionRefusedError: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36349)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/py4j/java_gateway.py", line 1062, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

I cannot reproduce this easily, it is very rare. Engine version is 0.1.7.

Go generate commit name

When you use go generate when you have the .linguist in an old commit state it generates the code expected for that commit, but the commit hash from what is supposed to be extracted is still the last commit that .linguist has in his history.

Print usage on `slinguist --help`

It would be nice to have proper
In terminal

Actual usage as of v1.2.0

go get go get github.com/src-d/simple-linguist/cli/slinguist
slinguist --help
Usage of slinguist:

Expected: to have some explanation about flags, usage patterns --breakdown, json, etc.

Here is example of original github/linguist

linguist --help                                                                                       

  Linguist v5.0.8
  Detect language type for a file, or, given a directory, determine language breakdown.

  Usage: linguist <path>
         linguist <path> [--breakdown] [--json]
         linguist [--breakdown] [--json]

language detection difference between enry and linguist

Using linguist/samples as a set against run test the following issues were found:

  • with hello.ms we can't detect the language (Unix Assembly) because we don't have a matcher in contentMatchers (content.go) for Unix Assembly. Linguist use this regexp which we can't port.

  • all files for SQL language fall to the classifier because we don't parse right this disambiguator expresion for "*.sql" files. This expression doesn't comply with the pattern for the rest of heuristics.rb file.

Bug: the return code is non-zero in case of warnings

At the moment, when enry encounters something bad, it poisons the return code.

For example:

enry/bin/enry /home/sourced/Projects/ast2vec
2017/06/16 09:53:51 read /home/sourced/Projects/ast2vec/enry/src/gopkg.in/src-d/enry.v1: is a directory
{
  "Go": [
    "enry/src/github.com/src-d/enry/alias.go",
    "enry/src/github.com/src-d/enry/classifier.go",
    "enry/src/github.com/src-d/enry/cli/enry/main.go",
    "enry/src/github.com/src-d/enry/common.go",
    "enry/src/github.com/src-d/enry/common_test.go",
    "enry/src/github.com/src-d/enry/content.go",
    "enry/src/github.com/src-d/enry/documentation.go",
    "enry/src/github.com/src-d/enry/extension.go",
    "enry/src/github.com/src-d/enry/filename.go",
    "enry/src/github.com/src-d/enry/frequencies.go",
    "enry/src/github.com/src-d/enry/generate.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/aliases.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/documentation.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/extensions.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/filenames.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/generator.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/generator_test.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/heuristics.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/interpreters.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/langinfo.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/samplesfreq.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/types.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/vendor.go",
    "enry/src/github.com/src-d/enry/internal/code-generator/main.go",
    "enry/src/github.com/src-d/enry/internal/tokenizer/tokenize.go",
    "enry/src/github.com/src-d/enry/internal/tokenizer/tokenize_test.go",
    "enry/src/github.com/src-d/enry/interpreter.go",
    "enry/src/github.com/src-d/enry/modeline.go",
    "enry/src/github.com/src-d/enry/shebang.go",
    "enry/src/github.com/src-d/enry/type.go",
    "enry/src/github.com/src-d/enry/utils.go",
    "enry/src/github.com/src-d/enry/utils_test.go",
    "enry/src/github.com/src-d/enry/vendor.go",
    "enry/src/github.com/toqueteos/trie/example_test.go",
    "enry/src/github.com/toqueteos/trie/trie.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/bytes.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/bytes_test.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/lib.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/lib_test.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/string.go",
    "enry/src/gopkg.in/toqueteos/substring.v1/string_test.go"
  ],
  "Makefile": [
    "enry/src/github.com/src-d/enry/Makefile"
  ],
  "Python": [
    "ast2vec/__init__.py",
    "ast2vec/__main__.py",
    "ast2vec/dataset.py",
    "ast2vec/df.py",
    "ast2vec/glove_to_shards.py",
    "ast2vec/id2vec.py",
    "ast2vec/id_embedding.py",
    "ast2vec/prep.py",
    "ast2vec/repo2base.py",
    "ast2vec/repo2coocc.py",
    "ast2vec/repo2nbow.py",
    "ast2vec/swivel.py"
  ],
  "Ruby": [
    "enry/src/github.com/src-d/enry/internal/code-generator/generator/test_files/heuristics.test.rb"
  ],
  "Text": [
    "requirements.txt"
  ]
}

Return code is 2.

This is bad because it breaks any Python subprocess call as it checks the return code and raises an Exception if it is non-zero. Yet we've only got a non-critical warning.

I propose to return 0 in case of non-critical warnings. This is a standard convention of UNIX apps: if you survived, do not break the errcode.

GetLanguageByContent does nothing

In GetLanguageByContent (https://github.com/src-d/enry/blob/master/common.go#L88) we can see the passed filename is "", but in GetLanguagesByContent (https://github.com/src-d/enry/blob/master/common.go#L382), which is called by the aforementioned function, returns always nil if the extension is not matched, which always happens because GetLanguagesByContent is explicitly passing an empty string.

We should either fix this or remove this exported function, because it does nothing.

undefined: sort.Slice when using go get

Here is the error:

user@vm:~$ go get gopkg.in/src-d/enry.v1/...
# gopkg.in/src-d/enry.v1/cmd/enry
go/src/gopkg.in/src-d/enry.v1/cmd/enry/main.go:213: undefined: sort.Slice

The cmd is executed on Ubuntu 16.04.4 LTS, a VM with linux kernel 4.4.0-124-generic

CLI output for a single file is not JSON encoded

When run with -json on a single file - output is not a JSON

$ go get gopkg.in/src-d/enry.v1/...
$ enry -json ../../github.com/src-d/lookout/vendor/golang.org/x/crypto/cast5/cast5.go
cast5.go: 526 lines (493 sloc)
  type:      Text
  mime_type: text/x-go
  language:  Go

Same works for a dir:

enry -json ../../github.com/src-d/lookout/vendor/golang.org/x/crypto/cast5
{"Go":["cast5.go"]}

Feature request: add -version flag

sourced@sourced-MacBookPro:/tmp/enry$ /home/sourced/Projects/ast2vec/enry -version
flag provided but not defined: -version
 /home/sourced/Projects/ast2vec/enry v1.4.0 build: 09-07-2017_08_55_33 commit: 0fe0a97f67, based on linguist commit: 37979b2
 /home/sourced/Projects/ast2vec/enry, A simple (and faster) implementation of github/linguist
 usage: /home/sourced/Projects/ast2vec/enry <path>
        /home/sourced/Projects/ast2vec/enry [-json] [-breakdown] <path>
        /home/sourced/Projects/ast2vec/enry [-json] [-breakdown]

flag provided but not defined: -version

Update: I thought that there was such flag defined (the version was printed) but I was wrong. Thus I propose to add -version to print the version.

Surprising programming languages

I decided to list all of the languages that appear in Public Git Archive and I got a couple of surprising results.

Some of the results:

  • desktop
  • Regular Expression
  • Raw token data
  • Public Key
  • HTTP
  • NumPy

Any of these results above are technically languages, some are protocol, some are libraries, some are ... something else completely!

Should we add a clarification regarding these? I think it'd be interesting to have it for Public Git Archive, currently I get over 455 languages, but I suspect some of them are not technically languages.

Python bindings for enry

Same way as we have Java bindings for enry, wrapping a Go library built with -buildmode=c-shared it would be nice to have one for Python using ctypes FFI cffi or something similar.

Particular use case: one want to use https://github.com/bblfsh/sonar-checks/ API that would require knowing a language the file-to-be-checked is written in, to choose the write checks.

TODOs

  • Initial PoC: exposes 1-2 API (e.g \wo slices) #245
  • Minimal: expose only high-level language detection API #250
    (usable from Jupiter, after a manual build to enable #246)
  • Installable: expose all existing API, with the documentation and setup script
    pip install -e git+https://github.com/src-d/enry.git#egg=python
  • Publish: add new release profile to CI for
    • building .whl (linux, macOS)
    • publishing on pypi

Fix Henry Higgins guessing abilities in the README.md

From the readme:

Why Enry?

In the movie My Fair Lady, Professor Henry Higgins is one of the main characters. Henry is a linguist and at the very beginning of the movie enjoys guessing the nationality of people based on their accent.

Enry Iggins is how Eliza Doolittle, pronounces the name of the Professor during the first half of the movie.

Is he really guessing nationality or more like neighborhood where the person was raised?

Higgins claims that his knowledge of ‘simple phonetics’ (a branch of linguistics concerned with the study of the nature, production, and perception of sounds of speech) allows him to deduce a person’s origins to within six miles. Within London, he says, he can place a man within two miles, ‘sometimes within two streets’. It’s this knowledge that he uses to transform Eliza Doolittle into a socially acceptable semblance of a ‘lady’. The character of Higgins is said to have been inspired by Henry Sweet (1845–1912), a great phonetician whose works, including his History of English

Source: http://blog.oxforddictionaries.com/2013/03/my-fair-lady/

Runtime Error

When I try to process https://github.com/willfarrell/Browsers I get the following error:

panic: runtime error: index out of range

goroutine 1 [running]:
gopkg.in/src-d/enry%2ev1.getInterpreter(0xc42198d080, 0x81, 0x281, 0x12, 0xbec0a0)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:289 +0x396
gopkg.in/src-d/enry%2ev1.GetLanguagesByShebang(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0xbeb730, 0x0, 0x0, 0x0, 0x0, ...)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:270 +0x43
gopkg.in/src-d/enry%2ev1.GetLanguages(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0x0, 0x0, 0xc420e3cd60)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:126 +0x127
gopkg.in/src-d/enry%2ev1.GetLanguage(0xc4200ea28a, 0x12, 0xc42198d080, 0x81, 0x281, 0x0, 0x0)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/common.go:38 +0x53
main.main.func1(0xc4200ea230, 0x6c, 0xbbd740, 0xc421d17a00, 0x0, 0x0, 0x0, 0x0)
        /tmp/enry-dfekhdox/src/gopkg.in/src-d/enry.v1/cli/enry/main.go:80 +0x664
path/filepath.walk(0xc4200ea230, 0x6c, 0xbbd740, 0xc421d17a00, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:351 +0x81
path/filepath.walk(0xc4215483c0, 0x59, 0xbbd740, 0xc421d17930, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc4215482a0, 0x53, 0xbbd740, 0xc421d17860, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc420bd5d60, 0x4a, 0xbbd740, 0xc421d17790, 0xc4209915f0, 0x0, 0x0)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.walk(0xc420173680, 0x33, 0xbbd740, 0xc420954a90, 0xc4209915f0, 0x0, 0x30)
        /usr/lib/go-1.8/src/path/filepath/path.go:376 +0x414
path/filepath.Walk(0xc420173680, 0x33, 0xc4209915f0, 0x0, 0xc4209915c0)
        /usr/lib/go-1.8/src/path/filepath/path.go:398 +0x14c
main.main()

Here is the full log:
stderr.txt

enry consumes an extreme amount of memory

I run enry in the root of the following tree:

├── [4.0K]  ast2vec
│   ├── [ 526]  bblfsh_roles.py
│   ├── [2.2K]  bigartm.py
│   ├── [5.1K]  bow.py
│   ├── [8.9K]  cloning.py
│   ├── [1.6K]  coocc.py
│   ├── [2.1K]  df.py
│   ├── [ 314]  dump.py
│   ├── [3.8K]  enry.py
│   ├── [1.9K]  id2vec.py
│   ├── [ 11K]  id_embedding.py
│   ├── [ 773]  __init__.py
│   ├── [ 15K]  __main__.py
│   ├── [4.0K]  model2
│   │   ├── [4.9K]  base.py
│   │   ├── [   0]  __init__.py
│   │   ├── [2.7K]  join_bow.py
│   │   ├── [6.2K]  proxbase.py
│   │   ├── [2.3K]  prox.py
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [5.1K]  base.cpython-35.pyc
│   │   │   ├── [ 140]  __init__.cpython-35.pyc
│   │   │   ├── [2.8K]  join_bow.cpython-35.pyc
│   │   │   ├── [6.2K]  proxbase.cpython-35.pyc
│   │   │   ├── [3.1K]  prox.cpython-35.pyc
│   │   │   ├── [4.4K]  source2bow.cpython-35.pyc
│   │   │   └── [3.4K]  source2df.cpython-35.pyc
│   │   ├── [3.2K]  source2bow.py
│   │   └── [2.4K]  source2df.py
│   ├── [  85]  modelforgecfg.py
│   ├── [ 804]  pickleable_logger.py
│   ├── [4.0K]  __pycache__
│   │   ├── [ 654]  bblfsh_roles.cpython-35.pyc
│   │   ├── [2.5K]  bigartm.cpython-35.pyc
│   │   ├── [6.9K]  bow.cpython-35.pyc
│   │   ├── [8.9K]  cloning.cpython-35.pyc
│   │   ├── [2.3K]  coocc.cpython-35.pyc
│   │   ├── [3.3K]  df.cpython-35.pyc
│   │   ├── [ 461]  dump.cpython-35.pyc
│   │   ├── [3.9K]  enry.cpython-35.pyc
│   │   ├── [2.8K]  id2vec.cpython-35.pyc
│   │   ├── [ 11K]  id_embedding.cpython-35.pyc
│   │   ├── [1.2K]  __init__.cpython-35.pyc
│   │   ├── [ 11K]  __main__.cpython-35.pyc
│   │   ├── [1.8K]  meta.cpython-35.pyc
│   │   ├── [7.7K]  model.cpython-35.pyc
│   │   ├── [ 236]  modelforgecfg.cpython-35.pyc
│   │   ├── [2.4K]  nbow.cpython-35.pyc
│   │   ├── [1.4K]  pickleable_logger.cpython-35.pyc
│   │   ├── [ 819]  progress_bar.cpython-35.pyc
│   │   ├── [4.9K]  publish.cpython-35.pyc
│   │   ├── [1.0K]  resolve_symlink.cpython-35.pyc
│   │   ├── [2.6K]  source.cpython-35.pyc
│   │   ├── [ 15K]  swivel.cpython-35.pyc
│   │   ├── [2.1K]  token_parser.cpython-35.pyc
│   │   ├── [4.6K]  topics.cpython-35.pyc
│   │   ├── [4.4K]  uast.cpython-35.pyc
│   │   ├── [1.9K]  uast_ids_to_bag.cpython-35.pyc
│   │   ├── [1.9K]  voccoocc.cpython-35.pyc
│   │   └── [1.3K]  vw_dataset.cpython-35.pyc
│   ├── [4.0K]  repo2
│   │   ├── [ 21K]  base.py
│   │   ├── [3.3K]  cooccbase.py
│   │   ├── [1.3K]  coocc.py
│   │   ├── [   0]  __init__.py
│   │   ├── [2.1K]  nbow.py
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [ 20K]  base.cpython-35.pyc
│   │   │   ├── [3.8K]  cooccbase.cpython-35.pyc
│   │   │   ├── [2.6K]  coocc.cpython-35.pyc
│   │   │   ├── [ 139]  __init__.cpython-35.pyc
│   │   │   ├── [3.0K]  nbow.cpython-35.pyc
│   │   │   ├── [2.6K]  source.cpython-35.pyc
│   │   │   ├── [2.2K]  uast.cpython-35.pyc
│   │   │   ├── [1.5K]  voccoocc.cpython-35.pyc
│   │   │   └── [1.3K]  xbow.cpython-35.pyc
│   │   ├── [1.9K]  source.py
│   │   ├── [1.6K]  uast.py
│   │   └── [1014]  voccoocc.py
│   ├── [ 870]  resolve_symlink.py
│   ├── [1.9K]  source.py
│   ├── [ 20K]  swivel.py
│   ├── [4.0K]  tests
│   │   ├── [3.8M]  bow_1000.asdf
│   │   ├── [4.0K]  coocc
│   │   │   ├── [3.5M]  astropy_coocc.asdf
│   │   │   ├── [4.8M]  django_coocc.asdf
│   │   │   ├── [1.3K]  empty_coocc.asdf
│   │   │   ├── [  17]  error.asdf -> ../nbow_1000.asdf
│   │   │   ├── [341K]  flask_coocc.asdf
│   │   │   ├── [501K]  jinja2_coocc.asdf
│   │   │   └── [6.6M]  tensorflow_coocc.asdf
│   │   ├── [ 90K]  coocc.asdf
│   │   ├── [6.0K]  docfreq_1000.asdf
│   │   ├── [ 502]  fake_requests.py
│   │   ├── [1.1M]  id2vec_1000.asdf
│   │   ├── [ 458]  __init__.py
│   │   ├── [4.0K]  merge_bows
│   │   │   ├── [3.7K]  nbow_github.com&src-d&ast2vec.asdf
│   │   │   ├── [3.2K]  nbow_github.com&src-d&modelforge.asdf
│   │   │   └── [2.0K]  nbow_github.com&src-d&vecino.asdf
│   │   ├── [ 538]  models.py
│   │   ├── [3.8M]  nbow_1000.asdf
│   │   ├── [4.0K]  postproc
│   │   │   ├── [1.9M]  col_embedding.tsv
│   │   │   ├── [798K]  col_embedding.tsv.gz
│   │   │   ├── [1.9M]  row_embedding.tsv
│   │   │   └── [797K]  row_embedding.tsv.gz
│   │   ├── [4.0K]  __pycache__
│   │   │   ├── [1.4K]  fake_requests.cpython-35.pyc
│   │   │   ├── [ 676]  __init__.cpython-35.pyc
│   │   │   ├── [ 707]  models.cpython-35.pyc
│   │   │   ├── [2.2K]  test_bow2vw.cpython-35.pyc
│   │   │   ├── [3.2K]  test_bow.cpython-35.pyc
│   │   │   ├── [5.3K]  test_cloning.cpython-35.pyc
│   │   │   ├── [1.6K]  test_coocc.cpython-35.pyc
│   │   │   ├── [2.0K]  test_df.cpython-35.pyc
│   │   │   ├── [6.9K]  test_dump.cpython-35.pyc
│   │   │   ├── [1.7K]  test_enry.cpython-35.pyc
│   │   │   ├── [1.5K]  test_id2vec.cpython-35.pyc
│   │   │   ├── [ 10K]  test_id_embedding.cpython-35.pyc
│   │   │   ├── [1.3K]  test_join_bow.cpython-35.pyc
│   │   │   ├── [3.5K]  test_main.cpython-35.pyc
│   │   │   ├── [4.1K]  test_model2.cpython-35.pyc
│   │   │   ├── [1.1K]  test_pickleable_logger.cpython-35.pyc
│   │   │   ├── [2.4K]  test_repo2base.cpython-35.pyc
│   │   │   ├── [3.7K]  test_repo2coocc.cpython-35.pyc
│   │   │   ├── [4.3K]  test_repo2nbow.cpython-35.pyc
│   │   │   ├── [4.6K]  test_repo2source.cpython-35.pyc
│   │   │   ├── [1.6K]  test_repo2voccoocc.cpython-35.pyc
│   │   │   ├── [1.3K]  test_resolve_symlink.cpython-35.pyc
│   │   │   ├── [1.1K]  test_source2df.cpython-35.pyc
│   │   │   ├── [1.9K]  test_source.cpython-35.pyc
│   │   │   ├── [1.9K]  test_token_parser.cpython-35.pyc
│   │   │   └── [1.2K]  test_voccoocc.cpython-35.pyc
│   │   ├── [4.0K]  source
│   │   │   ├── [ 80K]  [email protected]
│   │   │   ├── [ 71K]  [email protected]
│   │   │   ├── [ 79K]  [email protected]
│   │   │   ├── [ 78K]  [email protected]
│   │   │   ├── [1.9K]  test_example.asdf
│   │   │   └── [  33]  test_example.py
│   │   ├── [4.0K]  swivel
│   │   │   ├── [ 15K]  col_sums.txt
│   │   │   ├── [ 27K]  col_vocab.txt
│   │   │   ├── [ 15K]  row_sums.txt
│   │   │   ├── [ 27K]  row_vocab.txt
│   │   │   ├── [ 10M]  shard-000-000.pb
│   │   │   └── [3.2M]  shard-000-000.pb.gz
│   │   ├── [ 959]  test_bigartm.py
│   │   ├── [1.8K]  test_bow2vw.py
│   │   ├── [2.0K]  test_bow.py
│   │   ├── [5.2K]  test_cloning.py
│   │   ├── [ 878]  test_coocc.py
│   │   ├── [1.3K]  test_df.py
│   │   ├── [6.0K]  test_dump.py
│   │   ├── [1.1K]  test_enry.py
│   │   ├── [1.0K]  test_id2vec.py
│   │   ├── [9.1K]  test_id_embedding.py
│   │   ├── [ 926]  test_join_bow.py
│   │   ├── [3.2K]  test_main.py
│   │   ├── [2.6K]  test_model2.py
│   │   ├── [ 541]  test_pickleable_logger.py
│   │   ├── [1.1K]  test_prox.py
│   │   ├── [5.7K]  test_repo2base.py
│   │   ├── [3.9K]  test_repo2coocc.py
│   │   ├── [3.7K]  test_repo2nbow.py
│   │   ├── [7.4K]  test_repo2source.py
│   │   ├── [3.1K]  test_repo2uast.py
│   │   ├── [ 983]  test_repo2voccoocc.py
│   │   ├── [  73]  test_repos_list.txt
│   │   ├── [ 993]  test_resolve_symlink.py
│   │   ├── [2.2K]  test_source2bow.py
│   │   ├── [ 710]  test_source2df.py
│   │   ├── [1.6K]  test_source.py
│   │   ├── [1.4K]  test_token_parser.py
│   │   ├── [2.1K]  test_topics.py
│   │   ├── [ 909]  test_uast.py
│   │   ├── [ 598]  test_voccoocc.py
│   │   ├── [ 36K]  topics.asdf
│   │   ├── [702K]  topics_readable.txt
│   │   ├── [333K]  uast.asdf
│   │   └── [ 88K]  voccoocc.asdf
│   ├── [2.0K]  token_parser.py
│   ├── [3.8K]  topics.py
│   ├── [1.3K]  uast_ids_to_bag.py
│   ├── [3.2K]  uast.py
│   ├── [1.2K]  voccoocc.py
│   └── [1.0K]  vw_dataset.py
├── [4.0K]  ast2vec.egg-info
│   ├── [   1]  dependency_links.txt
│   ├── [  51]  entry_points.txt
│   ├── [ 974]  PKG-INFO
│   ├── [ 229]  requires.txt
│   ├── [1.0K]  SOURCES.txt
│   └── [   8]  top_level.txt
├── [ 24M]  bow_matplotlib.asdf
├── [124M]  decorr_readable.txt.gz
├── [4.0K]  dist
│   ├── [ 21K]  ast2vec-0.1.0a0.tar.gz
│   ├── [ 23K]  ast2vec-0.1.1a0.tar.gz
│   ├── [ 23K]  ast2vec-0.1.2a0.tar.gz
│   ├── [ 36K]  ast2vec-0.2.0a0.tar.gz
│   ├── [ 36K]  ast2vec-0.2.1a0.tar.gz
│   ├── [ 38K]  ast2vec-0.2.2a0.tar.gz
│   ├── [ 38K]  ast2vec-0.2.3a0.tar.gz
│   ├── [ 38K]  ast2vec-0.2.4a0.tar.gz
│   └── [ 38K]  ast2vec-0.2.5a0.tar.gz
├── [4.0K]  doc
│   ├── [1.9K]  ast2vec.rst
│   ├── [2.5K]  ast2vec.tests.rst
│   ├── [4.0K]  _build
│   │   ├── [4.0K]  doctrees
│   │   │   ├── [263K]  ast2vec.doctree
│   │   │   ├── [135K]  ast2vec.tests.doctree
│   │   │   ├── [3.5M]  environment.pickle
│   │   │   ├── [5.1K]  index.doctree
│   │   │   └── [2.5K]  modules.doctree
│   │   └── [4.0K]  html
│   │       ├── [ 87K]  ast2vec.html
│   │       ├── [ 52K]  ast2vec.tests.html
│   │       ├── [ 36K]  genindex.html
│   │       ├── [5.9K]  index.html
│   │       ├── [4.0K]  _modules
│   │       │   ├── [4.0K]  ast2vec
│   │       │   │   ├── [8.0K]  coocc.html
│   │       │   │   ├── [9.6K]  df.html
│   │       │   │   ├── [9.8K]  dump.html
│   │       │   │   ├── [ 14K]  enry.html
│   │       │   │   ├── [8.6K]  id2vec.html
│   │       │   │   ├── [ 57K]  id_embedding.html
│   │       │   │   ├── [8.2K]  meta.html
│   │       │   │   ├── [ 36K]  model.html
│   │       │   │   ├── [ 13K]  nbow.html
│   │       │   │   ├── [ 26K]  publish.html
│   │       │   │   ├── [ 63K]  repo2base.html
│   │       │   │   ├── [ 21K]  repo2coocc.html
│   │       │   │   ├── [ 18K]  repo2nbow.html
│   │       │   │   ├── [ 81K]  swivel.html
│   │       │   │   ├── [4.0K]  tests
│   │       │   │   │   ├── [6.5K]  fake_requests.html
│   │       │   │   │   ├── [8.7K]  test_coocc.html
│   │       │   │   │   ├── [10.0K]  test_df.html
│   │       │   │   │   ├── [ 24K]  test_dump.html
│   │       │   │   │   ├── [ 11K]  test_enry.html
│   │       │   │   │   ├── [8.5K]  test_id2vec.html
│   │       │   │   │   ├── [ 40K]  test_id_embedding.html
│   │       │   │   │   ├── [ 12K]  test_main.html
│   │       │   │   │   ├── [ 32K]  test_model.html
│   │       │   │   │   ├── [9.6K]  test_nbow.html
│   │       │   │   │   ├── [ 21K]  test_publish.html
│   │       │   │   │   ├── [ 15K]  test_repo2coocc.html
│   │       │   │   │   └── [ 13K]  test_repo2nbow.html
│   │       │   │   └── [5.1K]  tests.html
│   │       │   └── [4.6K]  index.html
│   │       ├── [5.0K]  modules.html
│   │       ├── [1.9K]  objects.inv
│   │       ├── [9.2K]  py-modindex.html
│   │       ├── [3.2K]  search.html
│   │       ├── [ 14K]  searchindex.js
│   │       ├── [4.0K]  _sources
│   │       │   ├── [1.9K]  ast2vec.rst.txt
│   │       │   ├── [2.5K]  ast2vec.tests.rst.txt
│   │       │   ├── [ 375]  index.rst.txt
│   │       │   └── [  58]  modules.rst.txt
│   │       └── [4.0K]  _static
│   │           ├── [ 673]  ajax-loader.gif
│   │           ├── [ 10K]  alabaster.css
│   │           ├── [ 10K]  basic.css
│   │           ├── [ 756]  comment-bright.png
│   │           ├── [ 829]  comment-close.png
│   │           ├── [ 641]  comment.png
│   │           ├── [  42]  custom.css
│   │           ├── [8.0K]  doctools.js
│   │           ├── [ 202]  down.png
│   │           ├── [ 222]  down-pressed.png
│   │           ├── [ 286]  file.png
│   │           ├── [258K]  jquery-3.1.0.js
│   │           ├── [ 84K]  jquery.js
│   │           ├── [  90]  minus.png
│   │           ├── [  90]  plus.png
│   │           ├── [4.1K]  pygments.css
│   │           ├── [ 25K]  searchtools.js
│   │           ├── [ 34K]  underscore-1.3.1.js
│   │           ├── [ 12K]  underscore.js
│   │           ├── [ 203]  up.png
│   │           ├── [ 214]  up-pressed.png
│   │           └── [ 25K]  websupport.js
│   ├── [5.0K]  conf.py
│   ├── [4.0K]  Doc
│   │   └── [3.2K]  how_to_use_ast2vec.ipynb
│   ├── [ 375]  index.rst
│   ├── [ 850]  Makefile
│   ├── [  58]  modules.rst
│   └── [4.0K]  _static
├── [ 23M]  docfreq_1MM_serial.asdf
├── [1.2M]  docfreq_matplotlib.asdf
├── [9.5M]  enry
├── [2.3K]  gcs.json
├── [ 18K]  gimme
├── [1.0G]  id2vec_1MM_serial.asdf
├── [  33]  index.json
├── [ 11K]  LICENSE
├── [354M]  nbow_1MM_serial.asdf
├── [4.0K]  README.md
├── [ 214]  requirements.txt
├── [1.9K]  setup.py
├── [4.0K]  source
├── [ 20M]  token_docfreq.tsv.gz
└── [5.4K]  topic_modeling.md

27 directories, 283 files
1,7G	total

/usr/bin/time -v enry reports:

	User time (seconds): 1.08
	System time (seconds): 2.06
	Percent of CPU this job got: 72%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 0:04.31
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 4385928
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 108
	Minor (reclaiming a frame) page faults: 898379
	Voluntary context switches: 3938
	Involuntary context switches: 487
	Swaps: 0
	File system inputs: 3411248
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096
	Exit status: 0

The RSS is over 4 gigs! Wow! And my system freezes a bit.

Support (bare) git repositories

Right now enry works on checked out repositories. However, the project it was ported from linguist does work on git repositories.

Advantages of supporting this include that files matching.gitignore are skipped and it works on repositories that are bare. The git driver could be go-git or shell outs. The latter option would require a git binary however.

enry core

I use Java bindings (maven version 1.6.2).

Here is the log:

panic: runtime error: slice bounds out of range
goroutine 26 [running, locked to thread]:
bytes.Count(0x7fc5dc00e3c0, 0x47, 0x0, 0x1c4206fbc20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7fc5dc00e3c0, 0x47, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:166 +0xae
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x7fc5861f7c08, 0x0, 0x0, 0x0, 0x0
, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x7fc585723aae, 0xc, 0x1c420024de0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x1c4206fbe48, 0x1c4209cc040)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x1c42012e1d8, 0x1c420499340)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7fc5dc00e810, 0x7fc6f492d879, 0x7fc5dc00e3c0, 0x47, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a
Aborted (core dumped)

Sync with github/linguist

We synchronized with github/linguist in November 2017, an update is long overdue ;)


Latest enry v1.6.7 from Oct 24, 2018 is based on Linguist v5.2.0 commit 4cd558 from Sep 17, 2017.

This is an ☂️ issue with the goal to make enry use of at least at least Linguist v7.1.3 from Dec 12, 2018:

Results are not sorted

$ cd /path/to/go/git
$ enry
0.69%	Shell
0.34%	Markdown
0.34%	Makefile
98.28%	Go
0.34%	Text

Compare with linguist:

$ linguist
99.74%  Go
0.18%   Shell
0.09%   Makefile

I propose to sort the results by significance.

CLI documented in README

This might already be planned but I was reviewing the PR for the blog post and realised that when coming to the GH repository there is no documentation on the CLI commands (could just be the help output) in the README. Would be great to add before we announce (similar to pygments, most people will use it initially as a CLI).

fails to build: incorrect usage of internal package

go get gopkg.in/src-d/enry.v1/...
package gopkg.in/src-d/enry.v1/internal/code-generator
	imports gopkg.in/src-d/simple-linguist.v1/internal/code-generator/generator: use of internal package not allowed
go get gopkg.in/src-d/enry.v1/cli/enry
package gopkg.in/src-d/enry.v1/cli/enry: cannot find package "gopkg.in/src-d/enry.v1/cli/enry" in any of:
	/usr/lib/go-1.8/src/gopkg.in/src-d/enry.v1/cli/enry (from $GOROOT)
	/home/sourced/Projects/ast2vec/enry/src/gopkg.in/src-d/enry.v1/cli/enry (from $GOPATH)

This works after symlinking github.com/src-d/enry to gopkg.in/src-d/enry.v1

go get github.com/src-d/enry/cli/enry

Severe performance degradation

Something happened and now enry works even slower than linguist. Seriously, testing on src-d/ast2vec:

time ./enry .
{
  "Python": [
    "ast2vec/__init__.py",
    "ast2vec/__main__.py",
    "ast2vec/df.py",
    "ast2vec/dump.py",
    "ast2vec/enry.py",
    "ast2vec/id2vec.py",
    "ast2vec/id_embedding.py",
    "ast2vec/meta.py",
    "ast2vec/model.py",
    "ast2vec/nbow.py",
    "ast2vec/publish.py",
    "ast2vec/repo2base.py",
    "ast2vec/repo2coocc.py",
    "ast2vec/repo2nbow.py",
    "ast2vec/swivel.py",
    "ast2vec/tests/__init__.py",
    "ast2vec/tests/models.py",
    "ast2vec/tests/test_dump.py",
    "ast2vec/tests/test_enry.py"
  ],
  "Text": [
    "requirements.txt"
  ]
}

real	0m14.339s
user	0m10.740s
sys	0m1.972s
time linguist
100.00% Python

real	0m2.192s
user	0m2.016s
sys	0m0.104s

Question: GetLanguageByContent

Hi,

I'd like to use your library to detect the programming language from user input. So I won't have a file extension. I only have a string with source code. Can I achieve my goal using this project?

Thanks.

Nikolay

The results differ from linguist much

sourced@sourced-MacBookPro:/tmp$ git clone https://github.com/src-d/enry &>/dev/null
sourced@sourced-MacBookPro:/tmp$ cd enry
sourced@sourced-MacBookPro:/tmp/enry$ time linguist
99.25%  Go
0.35%   Shell
0.24%   Java
0.08%   Ruby
0.05%   Makefile
0.01%   Scala
0.01%   Gnuplot

real    0m1.945s
user    0m1.848s
sys    0m0.056s
sourced@sourced-MacBookPro:/tmp/enry$ time /home/sourced/Projects/ast2vec/enry 
3.28%    Makefile
63.93%    Go
9.84%    CSV
6.56%    Shell
1.64%    Gnuplot
1.64%    Text
3.28%    Ruby
3.28%    Scala
6.56%    Java

real    0m0.084s
user    0m0.072s
sys    0m0.008s

Problem with symlinks to folders

Description

If you have strange symlinks in your directory, enry can not handle it and produce output about errors.

Examples

Assume we are in some directory and there is no target subdirectory.

mkdir temp
cd temp
ln -s ../temp/ tmp
ln -s ../target tmp1
ln -s tmp2 tmp2
cd ..

ll temp shows

total 24
lrwxr-xr-x  1 k  staff     8B Jul 27 11:04 tmp -> ../temp/
lrwxr-xr-x  1 k  staff     9B Jul 27 11:04 tmp1 -> ../target
lrwxr-xr-x  1 k  staff     4B Jul 27 11:04 tmp2 -> tmp2

And call enry temp shows

2017/07/27 11:05:34 read /Users/k/work/rep/ast2vec/temp/tmp: is a directory
2017/07/27 11:05:34 open /Users/k/work/rep/ast2vec/temp/tmp1: no such file or directory
2017/07/27 11:05:34 open /Users/k/work/rep/ast2vec/temp/tmp2: too many levels of symbolic links

So, there are some problems with symlinks handling.

So

  • Can you fix it?
  • Also, sometimes it is better just ignore symlinks at all.
  • May be you can add a flag for it?

cli usage message is wrong

Flags must be provided before the path

$ enry -h
enry, A simple (and faster) implementation of github/linguist 
usage: enry <path>
       enry <path> [-json] [-breakdown]
       enry [-json] [-breakdown]
$ enry internal 
100.00%	Go
$ enry internal -json
100.00%	Go
$ enry -json internal
{"Go":["code-generator/generator/aliases.go","code-generator/generator/documentation.go","code-generator/generator/extensions.go","code-generator/generator/filenames.go","code-generator/generator/generator.go","code-generator/generator/generator_test.go","code-generator/generator/heuristics.go","code-generator/generator/interpreters.go","code-generator/generator/langinfo.go","code-generator/generator/linguist-commit.go","code-generator/generator/samplesfreq.go","code-generator/generator/types.go","code-generator/generator/vendor.go","code-generator/main.go","tokenizer/tokenize.go","tokenizer/tokenize_test.go"]}%

Slice out of range error

Calling to GetLanguage method, sometimes we receive this error:

panic: runtime error: slice bounds out of range
goroutine 17 [running, locked to thread]:
bytes.Count(0x7f627818c0d0, 0x4b, 0x0, 0x1c420038c20, 0x1, 0x20, 0x0)
/home/travis/.gimme/versions/go1.8.linux.amd64/src/bytes/bytes.go:62 +0x21d
gopkg.in/src-d/enry%2ev1.getHeaderAndFooter(0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:161 +0x9d
gopkg.in/src-d/enry%2ev1.GetLanguagesByModeline(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x7f62482e6c08, 0x0, 0x0, 0x0, 0x0, ...)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:142 +0x5c
gopkg.in/src-d/enry%2ev1.GetLanguages(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0, 0x0)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:126 +0x129
gopkg.in/src-d/enry%2ev1.GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x1c420038e48, 0x1c42001a500)
/home/travis/gopath/src/gopkg.in/src-d/enry.v1/common.go:38 +0x55
main.GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0)
/home/travis/build/src-d/enry/shared/enry.go:11 +0x55
main._cgoexpwrap_f7db11756761_GetLanguage(0x7f627818f4a0, 0x0, 0x7f627818c0d0, 0x4b, 0x0, 0x0, 0x0)
command-line-arguments/_obj/_cgo_gotypes.go:58 +0x9a

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.