CC @thegenemyers, @richarddurbin, @ekg, @rrwick, @sjackman, @jts, @pb-jchin, @skoren, @aphillippy, @MihaiPop, @ggonnella, @pmelsted, @edawson
The current status of GFA1
There were two general-purpose assembly formats: FASTG and GFA1. With David Jaffe et al's SuperNova assembler apparently moving away from FASTG, GFA1 is practically the only generic assembly format. I have written converters for ABySS, SGA, Velvet, Spades, SOAPdenovo and fermi short-read assemblers. @jts's SGA and @sjackman's ABySS natively support GFA1 output, I believe. @jts's fork of DALIGNER can emit GFA1. @pb-jchin has written a converter for FALCON. My miniasm and fermi-lite assemblers output GFA. I believe the vast majority of mainstream assemblers are compatible with GFA1, too. For tools working with variations, @ekg's vg supports GFA output and has an internal data representation conceptually equivalent to GFA1 (vg effectively implements a bidirected graph). SuperNova graph can be converted to GFA1 (not implemented yet). DISCOVAR outputs FASTG which can also be converted to GFA in principle (not implemented, either). As to tools consuming GFA, @rrwick's Bandage can visualize GFA graphs. I have written gfaview to perform graph transformation (e.g. transitive reduction, tip trimming and bubble popping) for long-read graphs. @sjackman has implemented similar transformations for short-read graphs in ABySS. There are already a few libraries in C++ and Ruby to read GFA1. In conclusion, GFA1 is getting used. It is fairly simple yet general enough for all the tools and applications mentioned above.
About GFA2
The necessity
GFA2 was proposed because GFA1 does not work when we choose a path at a fork to merge. This leaves an end-to-internal match, which can't be described by GFA1. A few other hypothetical use cases (e.g. alignment between two long haplotypes) have also been raised. So far, I am not sure which implementations output and, more importantly, consume end-to-internal or internal-to-internal matches (NB: containment is a special case of end-to-internal alignment, but it can be described with GFA1). I am happy to update this post if there are any.
The GFA2 graph representation
While GFA1 models a directed skew-symmetric graph that is topologically equivalent to bidirected graphs, overlap graphs and string graphs, GFA2 models an undirected multi-graph where mapping coordinates are playing a central role. They represent fundamentally different types of graphs. Although we can see GFA1 as a special case of GFA2, how to understand the graph will be distinct. GFA2 will also have more complex syntax and implementations for what GFA1 is really good at.
My take
SAM/BAM is popular not only because they can store alignments, but more because they enable us to do things to the alignments that would be complex otherwise. Similarly, I see GFA is not just a storage format; it should be a format that helps our analysis. Due to the lack of clear downstream use cases and implementations of GFA2 (e.g. what information do we want to extract from GFA2? how to?), I am unable to evaluate the necessity of the added complexity, especially given that vg and SuperNova can already achieve part of the GFA2 goal with the GFA1 representation only. I am reluctant to add unproven features too early.
The future of GFA
As the creator of the initial GFA1, I do not see GFA2 is ready to replace GFA1. I foresee the coexistence of GFA1 and GFA2 for a period of time. During this period, developers have to make a choice between GFA1 and GFA2. We may re-evaluate the necessity of GFA2 yearly. I will be happy to phase out GFA1 if GFA2 is proven to be useful with concrete and practical applications. I understand the split is unfortunate, but this seems an unavoidable cost when we explore the unknowns.
The future of GFA1
The current GFA1 spec was modified from a blog post. I would like to replace it with something a little more formal like the one here, with further improvements of course.
I also want to ask developers on the CC list: how much do CIGAR on L-lines and the lack of segment length on S-lines hurt? Personally, I would really like to change the format, but if you all think you can live with these issues, I am ok to keep them as they are. Once we reach a consensus on this issue, we will try best to maintain the compatibility of GFA1 going forward. We may have new line types, but we don't break existing lines.