emilyalsentzer / clinicalbert Goto Github PK
View Code? Open in Web Editor NEWrepository for Publicly Available Clinical BERT Embeddings
License: MIT License
repository for Publicly Available Clinical BERT Embeddings
License: MIT License
I don't see the BERT tokenizers utilized in the code for the MIMIC fine tuning (seems to be that these have a custom tokenizer code), and this doesn't appear to do the WordPiece tokenization used in the rest of BERT. You do appear to use the BERT-based tokenizer for the MedNLI task. Please clarify.
Hi there; you're honestly doing God's work here and sharing it on hugging face.
I am however very confused by how to appropriately use this tool. I was originally trying to tokenize sentences with the clinicalbert trained on discharge summaries; and tried to see if it was able to recognize similar medical terminologies and lump them together, or return high similarity words. So far, it seems like the base bert performs better. Would there ever be a world where your work gets extended into a stsb type of a model?
Hello,
I would like to know how can Feature Vectors be generated from pandas series containing notes.
the notes of a single subject ID are combined as one note and preprocessed according to my requirements. Now I just want to create Embedding vectors for the notes. How can this be done?
Looks like failed to load the model from the ckpt, any hint? thanks.
I0731 15:33:06.451731 140067755902720 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /home/ec2-user/robin/clinicalBERT/output/model/model.ckpt.
2022-07-31 15:33:31.584083: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2022-07-31 15:33:31.795626: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_ids. Can't parse serialized Example.
2022-07-31 15:33:31.795633: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_mask. Can't parse serialized Example.
2022-07-31 15:33:31.795777: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_ids. Can't parse serialized Example.
2022-07-31 15:33:31.795937: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_ids. Can't parse serialized Example.
2022-07-31 15:33:31.795937: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_mask. Can't parse serialized Example.
2022-07-31 15:33:31.796025: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: masked_lm_ids. Can't parse serialized Example.
2022-07-31 15:33:31.796329: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: masked_lm_positions. Can't parse serialized Example.
2022-07-31 15:33:31.796348: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_ids. Can't parse serialized Example.
2022-07-31 15:33:31.796660: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_mask. Can't parse serialized Example.
2022-07-31 15:33:31.796796: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: segment_ids. Can't parse serialized Example.
2022-07-31 15:33:31.796889: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_mask. Can't parse serialized Example.
2022-07-31 15:33:31.796975: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: masked_lm_weights. Can't parse serialized Example.
2022-07-31 15:33:31.797047: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: masked_lm_positions. Can't parse serialized Example.
2022-07-31 15:33:31.797134: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_ids. Can't parse serialized Example.
2022-07-31 15:33:31.797213: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: masked_lm_positions. Can't parse serialized Example.
2022-07-31 15:33:31.797291: W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at example_parsing_ops.cc:240 : Invalid argument: Key: input_ids. Can't parse serialized Example.
ERROR:tensorflow:Error recorded from training_loop: 2 root error(s) found.
(0) Invalid argument: Key: input_ids. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseSingleExample}}]]
[[IteratorGetNext]]
(1) Invalid argument: Key: input_ids. Can't parse serialized Example.
[[{{node ParseSingleExample/ParseSingleExample}}]]
[[IteratorGetNext]]
[[IteratorGetNext/_4973]]
In your script at https://github.com/EmilyAlsentzer/clinicalBERT/blob/master/lm_pretraining/create_pretraining_data.py,
the do_lower_case is actually set to be "True".
So I went to load the model. When I checked your vocabulary, your vocabulary is a mixed of cased and uncased words since you inherit it from bioBERT. However, when I used your tokenizer to tokenize a sentence, I found out words will be lowered cased.
Do you mind clarifying this a bit? Thanks a lot.
Hi Emily,
After I acquire an access to MIMIC III database, I preprocess this data following your procedure (i.e. format_mimic_for_BERT.py).
But, I can not have a confidence about below results. Is it right result?
(after format_mimic_for_BERT.py)
Thanks
Young-Jun
Thanks for making clinical bert publicly available. In the paper, it was mentioned liked "We train and publicly release BERT-Base and BioBERT-finetuned models trained on both all clinical notes and only discharge summaries".
But only BioBERT finetuned models are available . I would like to know, when will you release BERT based fine tuned models?
I am getting this error
Weights of BertForMultiLable not initialized from pretrained model: ['classifier.weight', 'classifier.bias']
Weights from pretrained model not used in BertForMultiLable: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias']
while using the provided pretrained biobert from here in this repo.
here is the issue for reference.
Is there some issue in the model?
There are two bugs in the sent_tokenize_rules
function in heuristic_tokenize.py
We have not fixed them in this repo because we want to maintain the reproducibility of
our code at the time the work was published. However, anyone wanting to extend this work should make the following changes in heuristic_tokenize.py
:
.
should be replaced with \.
i.e. should be while re.search('\n\s*%d\.'%n,segment):
else: new_segments.append(segments[i])
) to the if statement at line 287 if (i == N-1) or is_title(segments[i+1]):
This fixes a bug where lists that have a title header will lose their first entry.Hello @EmilyAlsentzer,
This is a great contribution to the open source community! I have read your paper thoroughly: https://www.aclweb.org/anthology/W19-1909.pdf
I have few questions:
I would love to try out both clinicalBERT + BioBERT on few downstream tasks (disease identification), however I donot have lot of training dataset (infact zero training datasets) . Could you please point me to some available open source data repositories which already have: notes --> disease, mapping?
I see you have used typical BERT pertaining approach(MLM), however I would like to explore other pertaining strategies such as (Replaced Token Detection, from ELECTRA etc.)
I also see for pertaining you have used MIMIC-III datasets, I dont have access to this dataset, to evaluate. What would you suggest for pertaining datasets?
I also would love try new variants of transformers (larger ones, low parameter ones) + do multitask learning , so datasets (de-ID, non PHI sufficient) seems to be bottleneck, how to over come this ?
Would open source all my work in py-torch, if I could find a tangible data source. Please let me know. Thanks!
Hi Emily,
I still have one question.
For your model pretrained from Biobert, which version of Biobert you are using?
From biobert: https://github.com/naver/biobert-pretrained, there are 4 versions.
Thank you : )
Hi,
run_classifier.py is looking for this json file..mli_train_v1.jsonl
how to get or construct this file?
I tried Downloading using my preferred downloader axel and aria2 on ubuntu but it downloads blank file . Why so?
I can't seem to reproduce your results on MedNLI with the two released models using the same hyperparameters presented in your paper's Appendix B. You reported 84-85%, but I can only get to 81-82% on the test set. Do you know why? Are the reported results on the dev set or the test set? If relevant, I'm using the pytorch-pretrained-bert repo.
Hello,
this looks like a great piece of work, thank you for making it available. I tried to explore clinicalBERT for some NER tasks using the transformers library. I can obtain a list of token index results from torch.argmax, but I cannot find a suitable a set of labels (predictions
is containing values as large as 687). What am I doing wrong ?
from transformers import AutoTokenizer, AutoModel
import torch
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
label_list = ["B-IDNUM", "I-IDNUM", "B-HOSPITAL", "I-HOSPITAL", 'B-PATIENT', 'I-PATIENT', 'B-PHONE', 'I-PHONE',
'B-DATE', 'I-DATE', 'B-DOCTOR', 'I-DOCTOR', 'B-LOCATION-OTHER', 'I-LOCATION-OTHER', 'B-AGE', 'I-AGE', 'B-BIOID', 'I-BIOID',
'B-STATE', 'I-STATE','B-ZIP', 'I-ZIP', 'B-HEALTHPLAN', 'I-HEALTHPLAN', 'B-ORGANIZATION', 'I-ORGANIZATION',
'B-MEDICALRECORD', 'I-MEDICALRECORD', 'B-CITY', 'I-CITY', 'B-STREET', 'I-STREET', 'B-COUNTRY', 'I-COUNTRY',
'B-URL', 'I-URL',
'B-USERNAME', 'I-USERNAME', 'B-PROFESSION', 'I-PROFESSION', 'B-FAX', 'I-FAX', 'B-EMAIL', 'I-EMAIL', 'B-DEVICE', 'I-DEVICE',
'O', "X", "[CLS]", "[SEP]"]
sequence = "Patient had severe headache and took two Aspirine."
# Bit of a hack to get the tokens with the special tokens
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(sequence)))
inputs = tokenizer.encode(sequence, return_tensors="pt")
outputs = model(inputs)[0]
predictions = torch.argmax(outputs, dim=2)
print([(token, label_list[prediction]) for token, prediction in zip(tokens, predictions[0].tolist())])
Lars
Since ALBERT has SOTA performance + it is much smaller in model size, I was wondering if you planned on retraining based on ALBERT rather than vanilla BERT.
I tried using this model with HuggingFace's transformers.pipeline
to establish baseline on doing NER some data that I have but I was running into index errors based off the fact the the id2label dictionary in the config for the model has only 2 labels in it currently, {0: 'LABEL_0', 1: 'LABEL_1'}
. Do you have a full set of labels or should i go about getting these predictions in another way?
Thanks for the great repo. I tested the preprocessing script. It will process 100 notes every minute, which leads to a total ETA of 15 days. Any idea of expediting this or you spent a similar amount of time?
I have converted the scripts from tf1 to tf2 and I'm trying to use one of your pre trained models for my pre-training. It's throwing Key bert/embeddings/layer_normalization/beta not found in checkpoint
error. I understand that this error is cased by the one of the changed function in tensorflow v2. Where tf.contrib.layers.layer_norm(inputs=input_tensor, begin_norm_axis=-1, begin_params_axis=-1, scope=name)
replaced by tf.keras.layers.LayerNormalization(axis=-1)(input_tensor)
. Btw this is in model.py
line 364.
And without using the init_checkpoint
everything works fine.
Therefore, I would like to check. Did you have any build model using tensorflow v2 with above LayerNormalizatoin
Change.
I am just trying masked word prediction on pre-trained bio_clinicalbert but instead of getting english word output I am getting Chinese or Japanese words
Below is my code:
bio_bert_tokenizer = BertTokenizer.from_pretrained('Bio_ClinicalBERT')
bio_bert_model = BertForMaskedLM.from_pretrained('Bio_ClinicalBERT').eval()
input_ids, mask_idx = encode(bio_bert_tokenizer, text_sentence)
with torch.no_grad():
predict = bert_model(input_ids)[0]
bio_bert = decode(bio_bert_tokenizer, predict[0, mask_idx, :].topk(top_k).indices.tolist(), top_clean)
print(bio_bert)
And The output
I have downloaded all the model files from https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT/tree/main
I don't know if am doing a naive mistake or not please excuse me as am new to this whole transformers library.
Would you have an updated requirements.txt most of the modules are not found
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
When I am trying to get started with the model "emilyalsentzer/Bio_ClinicalBERT" using the model card at huggingface and this code
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
I get the following error:
Traceback (most recent call last):
File "test.py", line 4, in <module>
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
File "/Users/Lukas/miniconda3/envs/nlp/lib/python3.7/site-packages/transformers/tokenization_auto.py", line 124, in from_pretrained
"'xlm', 'roberta', 'ctrl'".format(pretrained_model_name_or_path))
ValueError: Unrecognized model identifier in emilyalsentzer/Bio_ClinicalBERT. Should contains one of 'bert', 'openai-gpt', 'gpt2', 'transfo-xl', 'xlnet', 'xlm', 'roberta', 'ctrl'
I would appreciate any help regarding this.
For MedNLI, it seems as though you had used tokenizer = BertTokenizer.from_pretrained(args.bert_model, do_lower_case=args.do_lower_case). Is it correct to say that the bert tokenizer you used for MedNLI is bert-base-cased as opposed to scispacy? If so, what is the thinking behind this?
Hi @EmilyAlsentzer,
I tried to extract features as you suggested but faced with a problem. When I run the original BERT example below everything works fine.
echo 'Who was Jim Henson ? ||| Jim Henson was a puppeteer' > /tmp/input.txt
python extract_features.py \
--input_file=/tmp/input.txt \
--output_file=/tmp/output.jsonl \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--layers=-1,-2,-3,-4 \
--max_seq_length=128 \
--batch_size=8
I changed the bert_config_file and init_checkpoint part and run the below code.
python extract_features.py --input_file=/tmp/input.txt --output_file=/tmp/output.jsonl --vocab_file=bert_pretrain_output_all_notes_150000/vocab.txt --bert_config_file=bert_pretrain_output_all_notes_150000/bert_config.json --init_checkpoint=bert_pretrain_output_all_notes_150000/model.ckpt --layers=-1,-2,-3,-4 --max_seq_length=128
I took the error message below. I think that the problem is with the init_checkpoint part and I try different names like "model.ckpt", "model.ckpt-150000" ... but none of them work.
tensorflow.python.framework.errors_impl.NotFoundError: Unsuccessful TensorSliceReader constructor: Failed to find any matching files for bert_pretrain_output_all_notes_150000/model.ckpt
So could you please help me to run ClinicalBert to extract features from clinical notes?
Also is it possible to use ClinicalBert to extract embeddings of each word in clinical notes ?
Thanks in advance.
Hi,
First of all, thank you for your great work!
I am trying to fine-tune emilyalsentzer/Bio_Discharge_Summary_BERT
on a downstream MLC task. As far as I understand, you initialized bioBert (with max. sequence length of 512) and trained with data that has max. sequence length of 128.
Less than 5% of my tokenized data have length between 128 and 512. Truncation is not an option in my application. I have imbalanced data set and that's why I don't want to filter sequences longer than 128.
My question would be, could fine-tunning with this data raise any issues down the line in terms of model performance?
Sorry, it is a bit of basic question, but I seem not to be able to find a concrete answer to the impacts of choosing a larger max_seq_len
than what you trained with.
Thank you again! :)
Thanks for making such a comprehensive bert model.
I am worried about the actual words that I find in the model though.
Author mentions that "The Bio_ClinicalBERT model was trained on all notes from MIMIC III, a database containing electronic health records from ICU patients at the Beth Israel Hospital in Boston, MA. For more details on MIMIC". I am supposing this would have mean that the vocab will also be updated.
But when i see the vocabulary words, I don't see medical concepts.
from transformers import TFBertModel, BertConfig, BertTokenizerFast
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizerFast.from_pretrained('emilyalsentzer/Bio_ClinicalBERT')
tokenizer.vocab.keys()
['Cafe', 'locomotive', 'sob', 'Emilio', 'Amazing', '##ired', 'Lai', 'NSA', 'counts', '##nius', 'assumes', 'talked', 'ク', 'rumor', 'Lund', 'Right', 'Pleasant', 'Aquino', 'Synod', 'scroll', '##cope', 'guitarist', 'AB', '##phere', 'resulted', 'relocation', 'ṣ', 'electors', '##tinuum', 'shuddered', 'Josephine', '"', 'nineteenth', 'hydroelectric', '##genic', '68', '1000', 'offensive', 'Activities', '##ito', 'excluded', '************', 'protruding', '1832', 'perpetual', 'cu', '##36', 'outlet', 'elaborate', '##aft', 'yesterday', '##ope', 'rockets', 'Eduard', 'straining', '510', 'passion', 'Too', 'conferred', 'geography', '38', 'Got', 'snail', 'cellular', '##cation', 'blinked', 'transmitted', 'Pasadena', 'escort', 'bombings', 'Philips', '##cky', 'sacks', '##Ñ', 'jumps', 'Advertising', 'Officer', '##ulp', 'potatoes', 'concentration', 'existed', '##rrigan', '##ier', 'Far', 'models', 'strengthen', 'mechanics'...]
Am i missing something here ?
Also, is there any uncased version present for this model ?
Hi,
thanks for your release. It's a great work. But I still have a question.
In the paper, you mentioned 'Clinical BERT and Clinical BioBERT were applied to four i2b2 NER tasks, all in IOB format'. I want to reproduce this work. But in this repo, you do not release NER codes in 'downstream_tasks' directory. Can you share the code?
Besides, can you tell me how to convert the four i2b2 datasets into BIO format ?
Thanks for this cool resource. I'm just trying to figure out if it's the best model for my project. In the results section of your paper, it says:
De-ID challenge data presents a different data distribution than MIMIC text. In MIMIC, PHI is identified and replaced with sentinel PHI markers, whereas in the de-ID task, PHI is masked with synthetic, but realistic PHI. This data drift would be problematic for any embedding model, but will be especially damaging to contextual embedding models like BERT because the underlying sentence structure will have changed: in raw MIMIC,sentences with PHI will universally have a sentinel PHI token. In contrast, in the de-ID corpus, all such sentences will have different synthetic masks, meaning that a canonical, nearly constant sentence structure present during BERT’s training will be non-existent at task-time. For these reasons, we think it is sensible that clinical BERT is not successful on the de-ID corpora.
I'm working with EHR for patients with multiple myeloma. The records are not de-identified in any way--they're just the regular doctors' notes, lab reports, etc. with real place names, person names, and dates. So to me, it sounds like my data is more like the de-ID dataset than the MIMIC dataset, since PHI aren't tagged in any way. Would I possibly be better off just using the regular BioBERT model then, since that model performed better on the de-ID dataset?
Hi,
I was wondering if there is ongoing work in publishing pre-trained weights in TF 2.x or TF 1.15 (V1)?
Many thanks,
I am trying to installing the dependencies by running
conda create --name <env> --file requirements.txt
However, since I don't have many of the required channels, my conda cannot install all of them. Moreover, my conda also cannot file the pip packages (pypi_0) from my current channels, even though I have pip. Can you please provide the .yml file of your environment, which contains the full description of the environment, including the conda channels?
Thanks!
Hi,
Can you please add this into huggingface/Transformer community model. It can be very useful to avail of built-in functions with the transformer library.
https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_example_script
Here are the details. It will be really helpful to test it on existing scripts.
Thanks
Kanwal
Hi Guys,
I need to know how to load the clinicalBERT model and run it. clinicalBERT exactly matches with my requirement and i was in searching online but i could't find any useful resources. Can you please help me on this model, Thanks in advanced.
Hi! I was able to use your clinical BERT models and slightly modified versions of your finetuning code to create a new NER model--worked like a charm :-) However, I'm having trouble loading the saved model and using it to make new predictions. To be more specific, if I set the train and eval flags to false and then run only predict, the model does appear to be able to make predictions on new input, but it is still attempting to split the input data into cross validation folds. I was wondering if you have some code and/or suggestions to avoid this? If not, I can figure it out, but I thought I'd check with you first to make sure I'm not missing anything. Thanks for your time, and thank you for sharing your fantastic work!
404 Client Error: Not Found for url: https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT/resolve/main/tf_model.h5
It seems like this may be related to the upgrade to ktrain v0.26.x?
Thanks for the release! Is this based on BERT base or BERT large? Also, is it the cased model or the uncased one?
Dear @EmilyAlsentzer
Your clinicalBERT is a great work and I want to reimplement it.
The run_ner.py requires python packages, such as 'modeling', 'optimization', and 'tokenization'.
Can you share the packages ?
Thank you
Hi:
First of all, thanks for the sharing of your pretrain model : )
I'm a WPI Data Science Master student, and I'm doing an NLP internship at Umass Medical school. Your pretrain model should be very helpful for me 👍
I have a question.
After I download the pretrain model, I found there are a set of TensorFlow model(1.2G) file and a PyTorch model file (400M).
Are they the same model?
I' trying to do MLC using the pre-trained weights (trained on all notes in this paper). The data is little biased i.e., some classes more frequently than others. After applying ML-ROS oversampling technique, Mean IRBl reduced, but still the data is biased, so the model is predicting the most frequently occurring labels everytime (for any random input). Do you have any suggestions here?
Hello,
the tokenizer has model_max_len=1000000000000000019884624838656
:
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
PreTrainedTokenizerFast(name_or_path='emilyalsentzer/Bio_ClinicalBERT', vocab_size=28996, model_max_len=1000000000000000019884624838656, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})
However, it was mentioned in the https://huggingface.co/emilyalsentzer/Bio_ClinicalBERT that maximum sequence length is 128. Could you please explain this moment?
Thanks!
Hi,
Is this repo from the same team as this or are they completely different?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.