Comments (4)
Hi @daveymathijssen ,
Thank you for your interest in our work!
The C# constructor was contributed by researchers from Microsoft, so I am not sure why did they remove numbers.
I have might removed number in the JavaExtractor as well, I'm not sure.
But if it improves your metrics when you do not remove the numbers, that's great!
Best,
Uri
from code2vec.
HI @urialon,
Thanks for your answer!
In the JavaExtractor, the same is happening in the normalizeName method.
Do you have any clue why?
from code2vec.
I'm guessing that at the time,
We did not want to spend embeddings on numbers, as we had ~1M embedding vocabulary anyway.
This is solved in newer models by segmenting tokens into subwords.
You can also check out our newer models
- code2seq: https://code2seq.org/
- PolyCoder: https://github.com/VHellendoorn/Code-LMs
- CodeBERT that we finetuned on multiple languages: https://github.com/neulab/code-bert-score#backend-model
Best,
Uri
from code2vec.
Thank you for your fast responses and insights!
from code2vec.
Related Issues (20)
- Converting Vector back to Contexts HOT 2
- How to release a model HOT 1
- Repeating metric values HOT 3
- Model for other task. HOT 2
- I run this "python3 code2vec.py --load models/dataset/saved_model_iter2 --test data/dataset/dataset.test.c2v" and I got this issue! is there any help? HOT 5
- I don't know how to apply the output files created by astminer. HOT 1
- Can I get the exact values for the context HOT 2
- Matrix size-incompatible during using sample model HOT 2
- bias-variance tradeoff HOT 1
- Application to real case study HOT 11
- Javascript Benchmark with Code2Vec HOT 3
- There is no entire model and model weights file to load HOT 4
- How to create code embeddings from Java codebase and store it in a vector database? HOT 4
- Issues encountered when processing big data HOT 1
- File Not found error HOT 2
- Queries regarding Java Extractor HOT 1
- Which version of JDK do I need to install before running this project? HOT 3
- How to create code2vec input HOT 7
- Queries on ...dict.c2v file HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from code2vec.