Giter VIP home page Giter VIP logo

Comments (16)

olafurpg avatar olafurpg commented on July 21, 2024 2

OK I re-ran the analysis deduplicating jvm/js cross-built files and the occurrences of '\n' (metric could definitely be refined) and they appear to be ~2 million loc instead of 3 million. Without jvm/js deduplication I can only count 2.3 million loc so I was doing something wrong when I first ran the analysis. I'll update the blog post to reflect this.

Two notable additions to my corpus are ornicar/lila (180k) and guardian/frontend (140k), which may help explain the difference with the compiler CB. I think the compiler CB also skips submodules in some projects.

Here is a full breakdown of the loc/project https://docs.google.com/spreadsheets/d/1btkCiF30Wb9MJti6LDc9og788XqXBKgwEhIdKb9aloc/edit?usp=sharing

Instructions to reproduce the analysis are in the readme here https://github.com/olafurpg/scala-experiments

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

seems tricky because we often enable/disable subprojects and/or tests, so I don't think we'd get an accurate count anyway. I don't plan to tackle it

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

I've gotten interested in this again, partly because I would really like to have a number to put in a blog post

the most accurate approach would be to inject some instrumentation into the compiler. I guess this would be a little compiler plugin that we'd add to all the projects that would count the lines and print a line of output with a sum, then to get the total we'd grep for those lines in the overall run log.

(but maybe we could also get a good-enough count by writing a script that greps the run log for lines like [akka-http] The following subprojects will be built in project akka-http: akka-parsing, akka-http-core, akka-http, akka-http-xml, akka-http-spray-json, akka-http-marshallers-scala, akka-http-testkit, akka-http-jackson, akka-http-marshallers-java, akka-http2-support, akka-http-tests, docs, root. and then refine it a bit by including test code or not by looking for extra.run-tests: false)

I'm thinking I ought to just go the compiler plugin route, I know the small-compiler-plugin drill pretty well. I'll need to check and see what dbuild offers for doing the injection.

from community-build.

gkossakowski avatar gkossakowski commented on July 21, 2024

@SethTisue have you considered running cloc from the compiler plugin? You could easily access the set of sources compielr is about to compile from the plugin and rely on cloc's impl for skipping comments and counting code?

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

@gkossakowski that's a good idea. the plugin could just print the filenames, then we'd grep for those and pass them to cloc.

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

I decided I liked Grzegorz's suggestion of running cloc directly from the plugin. I'm trying to get this done in the most expeditious manner possible and doing it this way means we don't need to clutter up the dbuild log with a lot of extra stuff.

here's the compiler plugin: https://github.com/sethtisue/cloc-plugin

working now on integrating it in this repo.

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

keeping the ticket open until we actually have a full count in hand. (we only count lines compiled during a particular run, so many runs will only have a partial count, if any cached builds are used.)

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

a recent run has 1.74 million total lines:

Lines of Scala code recompiled during this run only:
   241654 scala-collections-laws
   186789 akka-more
   145424 scala-js
    76366 scala-debugger
    61218 monix
    58929 scalatest
    52611 akka-http
    42804 breeze
    41786 scala-refactoring
    40499 scalaz
    39930 twitter-util
    39108 spire
    36187 specs2
    35664 scalikejdbc
    32298 play-core
    29479 shapeless
    28017 zinc
    27435 scalameta-2
    27212 sbt
    25517 slick
    22113 cats
    20639 scalameta-1
    18214 akka-actor
    17866 collection-strawman
    16490 scalachess
    14714 unfiltered
    13693 sbt-librarymanagement
    13668 ammonite
    13259 scalariform
    12733 scalapb
    11153 scala-stm
     9708 github4s
     9545 scalaprops
     9485 coursier
     9303 play-json
     8649 fs2
     8489 sjson-new
     8425 circe
     8176 scalafmt
     8045 scala-gopher
     7985 scalastyle
     6905 fastparse
     6713 scalafix
     6258 scala-java8-compat
     5780 scala-swing
     5770 parboiled2
     5561 conductr-lib
     5445 scalameter
     5387 scalacheck
     5259 argonaut
     5187 scala-async
     4888 scallop
     4817 json4s
     4560 jackson-module-scala
     4519 kxbmap-configs
     4376 doodle
     4238 pureconfig
     4222 lift-json
     4030 meta-paradise
     3953 play-ws
     3834 monocle
     3682 blaze
     3600 ssl-config
     3544 scodec-bits
     3516 utest
     3459 scalatags
     3329 nyaya
     3322 sbt-util
     3269 scoverage
     3236 scodec
     2910 macro-paradise
     2798 upickle
     2783 algebra
     2699 spray-json
     2696 better-files
     2516 scala-continuations
     2508 gigahorse
     2393 scalamock
     2375 cachecontrol
     2171 twirl
     2143 jawn-0-11
     2099 sbt-io
     2075 pprint
     2050 scala-parser-combinators
     1988 sbinary
     1946 scala-partest
     1935 scalatex
     1848 scopt
     1833 mima
     1799 scalacheck-shapeless
     1798 jawn-0-10
     1770 dispatch
     1650 cats-effect
     1519 scala-json-ast
     1428 parboiled
     1398 scala-records
     1327 scalaj-http
     1296 case-app
     1246 paiges
     1136 metaconfig-old
     1072 genjavadoc
     1064 lightbend-emoji
     1033 akka-contrib-extra
     1014 scala-logging
      937 metaconfig-new
      925 play-doc
      909 simulacrum
      886 fansi
      884 twotails
      870 atto
      786 play-webgoat
      759 minitest
      731 pcplod
      719 scala-xml-quote
      641 acyclic
      614 log4s
      607 scala-ssh
      571 macro-compat
      538 geny
      530 tut
      497 sourcecode
      492 circe-config
      478 base64
      475 kind-projector
      400 http4s-websocket
      281 scalapb-lenses
      258 discipline
      255 scalalib
      211 machinist
      146 jawn-fs2
      111 semanticdb-sbt
      107 sbt-testng
       46 catalysts
  1743918 TOTAL

from community-build.

gkossakowski avatar gkossakowski commented on July 21, 2024

The https://www.scala-lang.org/blog/2017/11/27/macros.html says that community build is ~3 million LoCs. where does the difference come form?

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

the "corpus" that Olafur mentions in that blog post may be derived in part from the community build, but isn't the same and is apparently larger. for example, he lists scanamo and lila as being included, but neither of them is in the community build.

@olafurpg is your corpus on GitHub somewhere?

from community-build.

olafurpg avatar olafurpg commented on July 21, 2024

I just realized we may be counting js/jvm cross-built files twice, which may explain the difference. I will double check tomorrow! 😅

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

I just realized we may be counting js/jvm cross-built files twice

you're not generating Scaladoc, are you? I was double-counting until I added a if (!global.settings.isScaladoc) check

from community-build.

olafurpg avatar olafurpg commented on July 21, 2024

I manually added a few more projects. I'm on the phone now but I can send a link to the corpus and steps to reproduce when I'm back at the computer.

from community-build.

olafurpg avatar olafurpg commented on July 21, 2024

I run compile for all projects that either define 2.11 or 2.12 in their cross Scala version

from community-build.

SethTisue avatar SethTisue commented on July 21, 2024

fwiw, I consider it expected and normal that the community build would be smaller than other corpuses of open source Scala code. getting stuff in the community build is hard, for multiple reasons:

  • because we can't add a project until all of its recursive Scala-based dependencies have already been added (since everything is built from source)
  • because everything must be source compatible with the same library versions — we sometimes allow multiple library versions (e.g. we currently support both jawn 0.10 and 0.11, and both scalameta 1 and scalameta 2), but it's a pain and we don't like to do it
  • because everything must be dbuild friendly (nearly every project needs at least a little adjusting)
  • because we try to actually run the tests

(you guys know these things, just stating them for the record)

from community-build.

gkossakowski avatar gkossakowski commented on July 21, 2024

Thanks for checking the numbers!

from community-build.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.