jlanga / exfi Goto Github PK

View Code? Open in Web Editor NEW

3.0 3.0 1.0 1.37 MB

Get exons from a transcriptome and raw genomic reads using abyss-bloom and bedtools

License: MIT License

Python 95.11% Shell 4.89%

transcriptomics rna-seq assembly kmer bloom-filter abyss bedtools python3 conda bioconda

exfi's Introduction

exfi

Get exons from a transcriptome and raw genomic reads using abyss-bloom and bedtools

Citation

exfi was published on Ecology and Evolution

Langa J, Estonba A, Conklin D. EXFI: Exon and splice graph prediction without a reference genome. Ecol Evol. 2020;00:1–14. https://doi.org/10.1002/ece3.6587

Requirements

Docker or different apt and conda packages (see installation guide).

How to install

To install other dependencies, follow the instructions from the travis files:

Install packages with apt:

sudo apt install \
    autoconf build-essential bzip2 cmake curl gcc git libboost-dev libsdsl3 \
    libz-dev zlib1g

Install conda, then configure channels and install

conda config --add channels conda-forge
conda config --add channels defaults
conda config --add channels r
conda config --add channels bioconda
conda install --yes abyss=2.0.1 bedtools biopython pandas pip

Install biobloomtools

You may need to use sudo in the last command:

# Install biobloomtools
git clone --recursive https://github.com/bcgsc/biobloom.git && \
cd biobloom/ && \
git submodule update --init && \
git checkout 0a42916922d42611a087d4df871e424a8907896e && \
./autogen.sh && \
./configure --prefix=/usr/local/ && \
make -j 4 && \
make install

Copy this repo and install it with pip:

pip install https://github.com/jlanga/exfi/archive/v1.5.6.zip

If you have access to Docker, you can create a Debian container with the following command:

docker build --rm --tag exfi:v1.5.6 github.com/jlanga/exfi-docker

More info

Required data

A transcriptome in fasta format (take it from Ensembl for example, or the result of a de novo transcriptome assembler like trinity, trans-abyss or oases)
A set of genomic reads in fastq format, paired end or not. .gz files are allowed.

Running the pipeline

Make a baited Bloom filter of the genomic reads with build_baited_bloom_filter:

genome.fa.gz is the set of genomic reads and
genome_k25_m100M_l1.bf is the resulting Bloom filter, made of kmers of length 25, a size of 100 MB and the number of times of a kmer must be in the reads is 1 (levels).

# Assuming that you are in the exfi folder:
build_baited_bloom_filter \
    --input-fasta data/transcript.fa \
    --kmer 25 \
    --bloom-size 100M \
    --levels 1 \
    --threads 4 \
    --output-bloom genome_k25_m100M_l1.bf \
    data/genome.fa.gz

Run build_splice_graph to get putative exons in the transcriptome.

data/transcript.fa is the input transcriptome,
genome_k25_m500M_l1.bf is the Bloom filter generated above
kmer length has to be the same
test.gfa is the resulting splice graph in GFA1 format.

build_splicegraph \
    --input-fasta data/transcript.fa \
    --input-bloom genome_k25_m100M_l1.bloom \
    --kmer 25 \
    --max-fp-bases 5 \
    --output-gfa test.gfa

This splice graph can be visualized with Bandage

Example:

H	VN:Z:1.0
S	ENSDART00000033574.5:0-216	GTAAGCCGCGGCGGTGTGTGTGTGTGTGTGTGTTCTCCGTCATCTGTGTTCTGCTGAATGATGAGGACAGACGTGTTTCTCCAGCGGAGGAAGCGTAGAGATGTTCTGCTCTCCATCATCGCTCTTCTTCTGCTCATCTTCGCCATCGTTCATCTCGTCTTCTGCGCTGGACTGAGTTTCCAGGGTTCGAGTTCTGCTCGCGTCCGCCGAGACCTC
S	ENSDART00000033574.5:216-398	GAGAATGCGAGTGAGTGTGTGCAGCCACAGTCGTCTGAGTTTCCTGAAGGATTCTTCACGGTGCAGGAGAGGAAAGATGGAGGAATCCTGATTTACTTCATGATCATCTTCTACATGCTGCTGTCCGTCTCCATCGTGTGTGATGAATATTTTCTGCCATCTCTGGAGGTCATCAGCGAGCG
S	ENSDART00000033574.5:397-482	GTCTTGGTCTCTCGCAGGATGTTGCTGGAGCCACGTTTATGGCTGCGGGGAGTTCGGCTCCAGAGCTCGTCACTGCATTTCTGGG
S	ENSDART00000033574.5:480-586	GGTGTGTTTGTGACGAAGGGCGACATCGGCGTCAGCACCATCATGGGTTCTGCTGTCTATAACCTGCTGTGCATCTGTGCAGCGTGCGGCCTGCTGTCCTCTGCAG
S	ENSDART00000033574.5:585-687	GTTGGTCGTCTGAGCTGCTGGCCGTTGTTCAGAGATTGTGTTGCGTACTCCATCAGTGTCGCCGCCGTCATCGCCATCATCTCAGATAACAGAGTTTACTGG
S	ENSDART00000033574.5:685-969	GGTATGATGGCGCGTGTCTCCTGCTGGTGTACGGTGTGTATGTAGCTGTACTGTGTTTCGATCTGAAGATCAGCGAGTACGTGATGCAGCGCTTCAGTCCATGCTGCTGGTGTCTGAAACCTCGCGATCGTGACTCAGGCGAGCAGCAGCCTCTAGTGGGCTGGAGTGACGACAGCAGCCTGCGGGTCCAGCGCCGTTCCAGAAATGACAGCGGAATATTCCAGGATGATTCTGGATATTCACATCTATCGCTCAGCCTGCACGGACTCAACGAAATCAGCGAC
S	ENSDART00000033574.5:969-1177	GAGCACAAGAGTGTGTTCTCCATGCCGGATCACGATCTGAAGCGAATCCTGTGGGTTTTGTCTCTTCCGGTCAGCACTCTGCTGTTTGTGAGCGTTCCCGACTGCAGGAGACCCTTCTGGAAGAACTTCTACATGCTGACCTTCCTGATGTCCGCCGTCTGGATTTCTGCATTCACTTATGTGCTGGTCTGGATGGTCACAATCGTGG
S	ENSDART00000033574.5:1176-1283	GGGGAGACTCTGGGAATCCCGGACACAGTGATGGGAATGACTCTTCTGGCTGCAGGAACCAGTATCCCCGACACCGTGGCCAGTGTGATGGTGGCACGAGAAGGTAA
S	ENSDART00000033574.5:1277-2002	AGGTAAATCTGATATGGCCATGTCCAACATCGTGGGCTCTAACGTGTTCGATATGCTGTGTCTGGGCCTGCCGTGGTTCATCCAGACGGTGTTTGTTGACGTGGGCTCCCCGGTGGATGTCAACAGCTCGGGGCTGGTCTTCATGTCCTGCACGCTGCTGCTCTCCATCATCTTCCTCTTCCTCGCCGTGCACATCAACGGCTGGAAGCTGGACTGGAAGCTGGGTCTGGTGTGTTTGGCGTGTTACATTCTGTTCGCAACACTCTCCATCCTGTACGAGCTCGGCATCATCGGGAACAATCCCATACGCTCCTGCAGCGACTGAACACTGCTCTACAGCGCCCCCTTATGGACAACACAAGGACGTGACTCTTTATAACCCTCTAAAGTGCACAGGTTCATTACTGAATACAAGAAAATAGAACTGCGAGACGTCAACTCAAAATACAAGAGAAGTCAAAGTGCGAGATGTAAAAAATATATGCACATAAATGAGGATAAACTTTTTATTTAATAAGACAAAACTGCATAAAGTCTGATGTGAACACTGCTCAACAGCGCCCTCTCATGGACAACACATGGATCTGACTCTTATTAACCCTCCAGAGTGCAAATACACTAACACAACGTAATATAACCAAGTTAAAATGGCAAGATGTGAACTCAAAATACAAGAAAGCAGTCAAGATGCCCGACATAACAAATGTGCATTAAAATGTAAGCCC
L	ENSDART00000033574.5:0-216	+	ENSDART00000033574.5:216-398	+	0M
L	ENSDART00000033574.5:216-398	+	ENSDART00000033574.5:397-482	+	1M
L	ENSDART00000033574.5:397-482	+	ENSDART00000033574.5:480-586	+	2M
L	ENSDART00000033574.5:480-586	+	ENSDART00000033574.5:585-687	+	1M
L	ENSDART00000033574.5:585-687	+	ENSDART00000033574.5:685-969	+	2M
L	ENSDART00000033574.5:685-969	+	ENSDART00000033574.5:969-1177	+	0M
L	ENSDART00000033574.5:969-1177	+	ENSDART00000033574.5:1176-1283	+	1M
L	ENSDART00000033574.5:1176-1283	+	ENSDART00000033574.5:1277-2002	+	6M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:0-216	+	0	216M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:216-398	+	216	182M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:397-482	+	397	85M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:480-586	+	480	106M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:585-687	+	585	102M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:685-969	+	685	284M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:969-1177	+	969	208M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:1176-1283	+	1176	107M
C	ENSDART00000033574.5	+	ENSDART00000033574.5:1277-2002	+	1277	725M
P	ENSDART00000033574.5	ENSDART00000033574.5:0-216+,ENSDART00000033574.5:216-398+,ENSDART00000033574.5:397-482+,ENSDART00000033574.5:480-586+,ENSDART00000033574.5:585-687+,ENSDART00000033574.5:685-969+,ENSDART00000033574.5:969-1177+,ENSDART00000033574.5:1176-1283+,ENSDART00000033574.5:1277-2002+	*

Get exonic sequences

To extract meaningful information from the GFA file, we provide two scripts:

gfa_to_exons: which returns the predicted exons in FASTA format. For each record, each sequence comes with a unique identifier (EXON[0-9]+), a description indicating the coordinates of this exon with respect to the different transcripts (TR1:0-200 TR2:105-305), and the sequence of nucleotides. It is possible to hard and soft mask nucleotides that may not be correct. Example (soft masked):

>EXON00000000003 ENSDART00000033574.5:397-482
gTCTTGGTCTCTCGCAGGATGTTGCTGGAGCCACGTTTATGGCTGCGGGGAGTTCGGCTC
CAGAGCTCGTCACTGCATTTCTGgg
>EXON00000000004 ENSDART00000033574.5:480-586
ggTGTGTTTGTGACGAAGGGCGACATCGGCGTCAGCACCATCATGGGTTCTGCTGTCTAT
AACCTGCTGTGCATCTGTGCAGCGTGCGGCCTGCTGTCCTCTGCAg

gfa_to_gapped_transcript: which returns the transcript with interleaved Ns where it is predicted to be an intron. Example (hard masked):

>ENSDART00000033574.5 EXON00000000001,EXON00000000002,EXON00000000003,EXON00000000004,EXON00000000005,EXON00000000006,EXON00000000007,EXON00000000008,EXON00000000009
GTAAGCCGCGGCGGTGTGTGTGTGTGTGTGTGTTCTCCGTCATCTGTGTTCTGCTGAATG
ATGAGGACAGACGTGTTTCTCCAGCGGAGGAAGCGTAGAGATGTTCTGCTCTCCATCATC
GCTCTTCTTCTGCTCATCTTCGCCATCGTTCATCTCGTCTTCTGCGCTGGACTGAGTTTC
CAGGGTTCGAGTTCTGCTCGCGTCCGCCGAGACCTCNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNGAGAATGCGAGTGAGTGTGTGCAGCCACAGTCGTCTGAGTTTCC
TGAAGGATTCTTCACGGTGCAGGAGAGGAAAGATGGAGGAATCCTGATTTACTTCATGAT
CATCTTCTACATGCTGCTGTCCGTCTCCATCGTGTGTGATGAATATTTTCTGCCATCTCT
GGAGGTCATCAGCGAGCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT
CTTGGTCTCTCGCAGGATGTTGCTGGAGCCACGTTTATGGCTGCGGGGAGTTCGGCTCCA
GAGCTCGTCACTGCATTTCTGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNTGTGTTTGTGACGAAGGGCGACATCGGCGTCAGCACCATCATGGGTTCTGCTGTC
TATAACCTGCTGTGCATCTGTGCAGCGTGCGGCCTGCTGTCCTCTGCANNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTTGGTCGTCTGAGCTGCTGGCCGTTGTTCA
GAGATTGTGTTGCGTACTCCATCAGTGTCGCCGCCGTCATCGCCATCATCTCAGATAACA
GAGTTTACTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTATGATG
GCGCGTGTCTCCTGCTGGTGTACGGTGTGTATGTAGCTGTACTGTGTTTCGATCTGAAGA
TCAGCGAGTACGTGATGCAGCGCTTCAGTCCATGCTGCTGGTGTCTGAAACCTCGCGATC
GTGACTCAGGCGAGCAGCAGCCTCTAGTGGGCTGGAGTGACGACAGCAGCCTGCGGGTCC
AGCGCCGTTCCAGAAATGACAGCGGAATATTCCAGGATGATTCTGGATATTCACATCTAT
CGCTCAGCCTGCACGGACTCAACGAAATCAGCGACNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNGAGCACAAGAGTGTGTTCTCCATGCCGGATCACGATCTGAAGCGA
ATCCTGTGGGTTTTGTCTCTTCCGGTCAGCACTCTGCTGTTTGTGAGCGTTCCCGACTGC
AGGAGACCCTTCTGGAAGAACTTCTACATGCTGACCTTCCTGATGTCCGCCGTCTGGATT
TCTGCATTCACTTATGTGCTGGTCTGGATGGTCACAATCGTGNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNGGGAGACTCTGGGAATCCCGGACACAGTGATGGGAA
TGACTCTTCTGGCTGCAGGAACCAGTATCCCCGACACCGTGGCCAGTGTGATGGTGGCAC
GAGANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNATCT
GATATGGCCATGTCCAACATCGTGGGCTCTAACGTGTTCGATATGCTGTGTCTGGGCCTG
CCGTGGTTCATCCAGACGGTGTTTGTTGACGTGGGCTCCCCGGTGGATGTCAACAGCTCG
GGGCTGGTCTTCATGTCCTGCACGCTGCTGCTCTCCATCATCTTCCTCTTCCTCGCCGTG
CACATCAACGGCTGGAAGCTGGACTGGAAGCTGGGTCTGGTGTGTTTGGCGTGTTACATT
CTGTTCGCAACACTCTCCATCCTGTACGAGCTCGGCATCATCGGGAACAATCCCATACGC
TCCTGCAGCGACTGAACACTGCTCTACAGCGCCCCCTTATGGACAACACAAGGACGTGAC
TCTTTATAACCCTCTAAAGTGCACAGGTTCATTACTGAATACAAGAAAATAGAACTGCGA
GACGTCAACTCAAAATACAAGAGAAGTCAAAGTGCGAGATGTAAAAAATATATGCACATA
AATGAGGATAAACTTTTTATTTAATAAGACAAAACTGCATAAAGTCTGATGTGAACACTG
CTCAACAGCGCCCTCTCATGGACAACACATGGATCTGACTCTTATTAACCCTCCAGAGTG
CAAATACACTAACACAACGTAATATAACCAAGTTAAAATGGCAAGATGTGAACTCAAAAT
ACAAGAAAGCAGTCAAGATGCCCGACATAACAAATGTGCATTAAAATGTAAGCCC

Attributions

Written by Jorge Langa

Bibliography

exfi's People

Contributors

Stargazers

Watchers

exfi's Issues

use sealer to fill gaps left by SNPs

Sealer can be used to try to fill small gaps of 1-5 nt.

To do that, I should make a script to

paste exons with Ns,
run sealer and
split again into exons.

Fix messages to stderr in build_splice_graph

Bio.SeqIO.parse -> Bio.SeqIO.FastaIO.SimpleFastaParser

make exfi a single tool

exfi build_baited_bloom_filter : build the baited bloom filter from biobloom|abyss
exfi build_splice_graph :
exfi gfa1_to_exons :
exfi gfa1_to_gapped_transcripts :

memory in polish

polish is taking too much memory per thread

bbt -> rapmap/salmon/kallisto and split-apply-combine

When collapsing, exon ids trid:start-end are not being translated into exon_xxxxxx

Compress biobloommaker files as they come out

biobloommaker returns the files categories_multimatch.fa (empty), categories_noMatch.fa (huge) and categories_transcriptome.fa (not so huge, but big).

These files can be compressed to .gz as they come out by creating a FIFO in their path, and setting gzip in the background.

Or just compress as soon as biobloommaker finishes.

add a functionality to use sealer in between small gaps (<10bp)

Compute overlaps even with nonconsecutive exons

Sometimes exons are contained inside other exons (ISI, A5 and A3 splicing events).

Maybe it is not necessary to do a pairwise comparison between all but inside a connected component

use gfapy instead of my functions

Separate filter building and read classification

Currently, build_baited_bloom_filter does both the filter building and read classification (despite its name). Even though filter building is quite fast, it would be prudent to be able to re-use one filter for multiple read datasets. Perhaps add an option to only build the filter, and allow to specify a pre-existing filter?

build_splice_graph

read transcriptome when necessary, instead of storing the file the entire pipeline

make -o - send to stdout

Compress the resulting Bloom filters

Bloom filters in this context are mostly zeros.

Using pigz -9 I was able to go from 30Gb to 70Mb in a BF with FPR of 0.01%.

I think also usign gzip instead of pigz will make it even smaller.

Also: try bgzip or xz

Set up travis-ci

Integrate into Galaxy?

We see that 2-4 GB are enough to process

split-apply-combine

Make a small filter for the complete transcriptome
split transcriptome into subtranscriptomes (~ 1000)
run build_splice_graph in parallel, without collapse
merge GFA1, collapse optionally

Set up sphinx to create a documentation from the docstrings

find_exons.py: delete the merge2 step

Matches between consecutive exons follow a Poisson distribution. In the merge2 step, we merge exons that overlap 8 or more consecutive exons. This happens, rarely, but happens.

match_distributions.pdf

logging in polish

The module is completely silent

Improve the collapse function

collapse merges exons by exact identity.

A more successful aproach could be to map the exons to every exon, get blocks of mathces and recompose the graph.

Do multiprocessing as described in https://pymotw.com/3/multiprocessing/communication.html

pool_size = multiprocessing.cpu_count() * 2
pool = multiprocessing.Pool(
        processes=pool_size,
        initializer=start_process,
        maxtasksperchild=2,
)
pool_outputs = pool.map(do_calculation, inputs)
pool.close()  # no more tasks
pool.join()  # wrap up current tasks

if threads==1: do map/starmap

Check stdout or file in output

Follow 10 simple rules for making reasearch software more robust

Use version control.

Put everything that you write or make into version control as soon as it is created.
Use a feature branch workflow.

Document your code and usage.

[] Write a good README file.
[] Print usage information.

Make common operations easy to control.

[] Allow the most commonly changed parameters to be configured from the command line.
[] Check that all input values are in a reasonable range at startup.
[] Choose reasonable defaults where they exist.
[] Set no defaults at all when there aren’t any reasonable ones.

Version your releases.

[] Increment your version number every time you release your software to other people.
[] Make the version of your software easily available by supplying --version or -v on the command line.
[] Include the version number in in all of the program’s output.
[] Ensure that old released versions continue to be available.

Reuse software (within reason).

[] Make sure that you really need the auxiliary program.
[] Ensure the appropriate software and version is available.
[] Ensure that reused software is robust.

Rely on build tools and package managers for installation.

[] Document all dependencies in a machine-readable form.
[] Avoid depending on scripts and tools which are not available as packages.

Do not require root or other special privileges to install or run.

[] Do not require root privileges to set up or use packages.
[] Allow packages to be installed in an arbitrary location.

Ask another person to try and build your software before releasing it.

[] Eliminate hard-coded paths.
[] Set the names and locations of input and output files as command-line parameters.

Do not require users to navigate to a particular directory to do their work.

[] Include a small test set that can be run to ensure the software is actually working.
[] Make the tests easy to find and run.
[] Make the test script’s output easy to interpret.

Produce identical results when given identical inputs.

[] Echo all parameters and software versions to standard out or a log file alongside the results.
[] Produce the same results each time the same version of the program is run with the same inputs.
[] Allow the user to optionally provide the random seed as an input parameter.
[] Make sure acceptable tolerances are known and detailed in documentation and tests.

add options to trim sequences

Bloom filters are known for having false positives. In this context, the reported exons may have one or two additional bases at the start or the end of the exon. Therefore, we could give the option to the script to trim 1 or 2 nucleotides to each exon.

read bloom filters twice in build_splice_graph

Instead of providing once the bloom filter path, provide it twice, one for initial exon prediction, and a second one (if necessary) for sealer. This way, we can pipe it from gzip and save space, and maybe time.

Keep filter info file

The Bloom filter is written/copied to the .bf file, but the accompanying .txt file that biobloommaker also outputs seems to be absent. However, the BioBloomTools documentation notes that the filter file is useless without the info file. It would be prudent to also keep the info file around after build_baited_bloom_filter.

Some file and tool names in README are wrong

In the README under "Running the pipeline":

2.:

build_splicegraph should be build_splice_graph
genome_k25_m100M_l1.bloom should be genome_k25_m500M_l1.bf

3.:
Neither gfa_to_exons nor gfa_to_gapped_transcript exist. Instead, gfa1_to_fasta is there and seems to have options that modify its behaviour according to the other two tools. Amend the instructions to reflect this.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.