Automated two-layer and three-layer bundling into separate files given the a list of m

Something like - <div class="highlight highlight-source-js notranslate position-re

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Optimization build functions,about systemjs/builder

Comments (36)

guybedford commented on July 17, 2024

Something like -

var builder = new require('systemjs-builder')(config);

builder.optimize({
  page1: ['page1-module'],
  page2: ['page2-module', 'another-module']
} [, optimizeFunction]).then(function() {
  // saves page1-bundle.js
  // page2-bundle.js
  // common-bundle.js
  // and injects configuration for loading correct bundles
});

// optimization function is optional
function optimize(pages) -> (bundles)
  // bundles[0] == { routes: ['page1', 'page2'], tree: {...common tree...} };
  // bundles[1] == { routes: ['page1'], tree: {...page 1 tree...} };
  // bundles[2] == { routes: ['page2'], tree: {...page 2 tree...} };

from builder.

Bubblyworld commented on July 17, 2024

See here for a simple optimize function following the above API I wrote for zygo.

from builder.

raphaelokon commented on July 17, 2024

@guybedford When you say three-layer bundling do you refer to have a third list of modules like:

  page1: ['page1-module'],
  page2: ['page2-module', 'another-module']
  page3: ['page3-module', 'another-module', 'yet-another-module']

So that there is an optimisation problem when you try to find the intersects between the trees of the pages? I am still really keen to get into this. Really try sort out my terminology problem.

from builder.

guybedford commented on July 17, 2024

@Bubblyworld perhaps you can help explain this area?

from builder.

Bubblyworld commented on July 17, 2024

@guybedford @interactionist Sure, sorry about the delay!

The way I understand three layered bundling is as follows. As you wrote, you have a list of pages and their traced dependencies. Some dependencies might be unique to a page, some might be shared between a few of the pages and others might be shared by all of the pages.

The idea is to create three layers of bundles:

A single common bundle, containing the dependencies shared by all pages.
A layer of bundles containing dependencies unique to each page.
A layer of bundles containing dependencies shared by some of the pages.

The sample implementation I posted above does this the naive way, by (efficiently) taking every subset of pages and creating a bundle for dependencies shared within each subset. This could potentially result in a large number of bundles, which kinda defeats the point of doing bundling at all. (X_x)

So the problem is to split up the modules into bundles for the best possible user experience. What this means and whether it's even worth solving I'm not sure. I also might be completely missing the mark, so take this with a pinch of salt!

from builder.

circlingthesun commented on July 17, 2024

Any progress on this? It's the only thing that is stopping me from moving to jspm ;)

from builder.

raphaelokon commented on July 17, 2024

@Bubblyworld
Is the dependency object a flat object like:

{
  'page1' : ['module1', 'module2', 'module3'],
  'page2' : ['module1', 'module4'],
  'page3' : ['module2', 'module3']
}

So that given a page key we get a one-dimensional array of deps?

from builder.

Bubblyworld commented on July 17, 2024

@interactionist Yes, that's what I assume in that example I posted before.

from builder.

markstickley commented on July 17, 2024

I have been working on a bundling solution using systemjs-builder too, and came up against the same problem identified by @Bubblyworld ... How do you split the middle layer of bundles up so that you don't potentially end up with a zillion bundles each containing one or two modules?

One solution would be to hand-craft the bundles trying to logically group modules by area and potentially offloading some modules that appear in most entry point trees into the common bundle. However that would increase exponentially both in difficulty and being a massive drag to maintain as the project grows so it's not really viable.

For our app we have a number of possible entry points but some are more likely than others and regardless of entry point a typical user journey would most likely load most of the entry points anyway. By weighting each entry point and prioritising the most commonly used ones we can bundle with no repetition and still be quite lean in the code loaded initially.

var entryPoints = [
    'src/popularEntryPoint',      // 100% efficiency - no unnecessary code it loaded for this entry point.
    'src/anotherEntryPoint',      // If any of the models in this dependency tree are also in popularEntryPoint's dependency tree, just load that bundle as well. Since it's the most popular entry point it's quite likely the user will need these files soon anyway.
    'src/yetAnotherEntryPoint',   // Maybe this tree will share modules with popularEntryPoint or perhaps anotherEntryPoint as well.
    'src/leastPopularEntryPoint'  // Etc.
];

This way you may be loading code you don't need for entry points further down the list but the chances are that since those parts of the site / app are more popular the user will need the files soon anyway.

Our optimisation function takes an array like the one above and an integer specifying how many bundles you want. That way you don't necessarily load 187 files when entering via the least popular entry point in a site. Limit it to, say, 8 bundles and it will perform the following steps:

If applicable, create a bundle of common modules to all entry points (so long as the number of bundles requested > 2).
Create bundles for each entry point in priority order until (number of bundles created) == (bundles requested)-1.
Take any remaining entry points and bundle them all into the final bundle.

The function also takes a switch to turn on analysis which spits out some interesting numbers regarding file sizes and entry point code loading efficiency.

What do you think of this method of optimisation? Worth including as an option, even if not the default optimisation function?

I'm happy to take on this feature if no one is already tackling it...

from builder.

guybedford commented on July 17, 2024

We should probably make the optimization function take the direct load record objects so it can factor in things like code size. Alternatively perhaps we should include the gzipped code size as part of the input vector.

Please definitely take this on if you're interested. The key thing I'd like incredibly well defined is the API of the optimization function like we're trying to do above. Then it would be great to bundle in a default optimization function that was loaded as a separate package, and yours sounds like it is doing some great analysis.

Perhaps lets start by making sure we can properly nail down this bundling API and work from there? Value your ideas.

from builder.

markstickley commented on July 17, 2024

I agree, there is a lot more data that might be useful for a custom optimization function that what we are considering processing at the moment. Do you think the optimize promise should a) resolve with an object full of trees, or b) just indicate that the bundling process is complete and the files have been written? A) would give the user more of an opportunity to further analyse and break down the output but b) would avoid extra code for what I'd assume to be the common use case.

from builder.

markstickley commented on July 17, 2024

I've written up some notes in pseudo code form to try and clarify the API around builder.optimize and the custom optimization function.

Would you mind taking a look and letting me know what you think?

// Because you can't rely on the order when enumerating properties on an object,
// the best way to weight them without specifying weight as an attribute somewhere
// is in an array.
builder.optimize([
    ['src/module1', 'src/module2'],
    'src/module3',
    'src/module4'
]).then(...);


// By making the elements of the array objects we can specify a name for the bundle
// which can be used in the file name. It also opens the way for other metadata
// about this bundle that might be useful in custom optimization functions
builder.optimize([
    {
        entryPoint: ['src/module1', 'src/module2'],
        name: "bundle1"
    },
    {
        entryPoint: 'src/module3',
        name: "module3Bundle"
    },
    {
        entryPoint: 'src/module4',
        name: "myBundle"
    }
]).then(...);

// Could give a choice of syntax as the first is easier if you are happy with default bundle names?


builder.optimize(entryPoints, optimizationFunction).then(...);

/**
 * Optional, user-defined function to process the tree data generated from the entry points
 * @param  {[{entryPoint:String|[String], loadRecord:Object, gZippedFilesize:Number, tree:Object[, name:String]}]} data
 *         Array of objects containing the entry points passed into builder.optimize,  any other
 *         data passed into builder.optimize including the bundle name (if provided) and the resulting
 *         tree from a trace operation on the entry point(s).
 * @return {Promise|{bundleName:Tree, ...} Promise to be resolved with an object of Trees or
 *         an object of Trees, with each tree representing a bundle to be written.
 */
function optimizationFunction(data) {

    // <clever manuipulation of trees happens here>

    return {
        bundle1: <tree>,
        module3Bundle: <tree>,
        myBundle: <tree>
    }

    // Note the bundle names don't have to be used in the returned data structure if the optimization
    // algorithm calls for a different practice. Also, if no names are provided default names will be
    // used here (bundle0, bundle1, bundle2, etc).
}


builder.optimize(entryPoints).then(function(data) {
    // data is the same data as the data returned from the optimizationFunction
    // (this is where I get a bit unsure about things - should we require the developer to take this
    // step or should it be an implicit part of the optimize process? If so that's more data to pass
    // in up front. I'm leaning towards this step since they might like to inspect / fiddle with the
    // output before writing it).
    builder.writeTrees(data, { minify: true, sourceMaps: true, uglify: false });

});

from builder.

guybedford commented on July 17, 2024

@markstickley thanks for this. I think the way we would do weighting and entry point priorities is through a special options object that gets passed through to the optimizer. This way any custom optimization options can be passed through without us needing to characterize everything.

Also it would be nice to have a single optimize call that does all the writing to bundles, with a memory mode just like we have for the current build.

Here is an adjusted API I'd suggest based on the points mentioned:

builder.optimizeBuild({
  optimizationFunction: require('custom-optimizer'),
  entryPoints: {name: modules},
  optimizationOptions: { ...custom optimize variables, entrypoint priorities etc... }
  outPath: 'out/folder', // optional, if not set returns source as memory compilation
  sourceMaps: true
  // etc other options
})
.then(function(bundles) {
  //bundles[0] = {
  //  name: 'bundle-1', // name of the bundle in the outpath
  //  entryPoints: [], // entry points this bundle is loaded for
  //  modules: [], // modules in this bundle
  //  source: // provided if in memory mode and no outPath given
  //}
});

The builder then does a full trace of all the modules of all the entry points.
It passes the direct trace object through to the optimize function, so it has direct access to all the metadata on the load records. This way it can do source size checking etc, completely on its own.

function optimize(entryPoints, trace, optimizationOptions) {
  // entryPoints is exactly as above
  // optimizationOptions is exactly as above
  // trace is an object hash containing all traced dependencies of the modules
  // of the entry points and all their dependencies.
  // trace['some/module'] = load
  // where load is the direct trace load record from the loader
  // the optimize function returns the bundles object as above, except without source set
}

Let me know how that looks to you!

from builder.

guybedford commented on July 17, 2024

Ideally we could just turn bundling into a linear constraint problem and use something like https://github.com/slightlyoff/cassowary.js?

from builder.

markstickley commented on July 17, 2024

@guybedford Yes that looks very sane. I'll set about implementing that, cheers!

Just one thing, I'm not 100% clear what the return value of the optimization function should look like. Are you able to clarify?

Thanks :)

from builder.

guybedford commented on July 17, 2024

I guess it would be -

bundles = [{
  name: 'bundle-1', // name of the bundle in the outpath
  entryPoints: [], // entry points this bundle is loaded for
  modules: [], // modules in this bundle
}]

from builder.

guybedford commented on July 17, 2024

Alternatively we could returned an object keyed by bundle name rather?

from builder.

markstickley commented on July 17, 2024

I think I like the name attribute better as if the names are omitted it will use a default naming convention. While they will still all have names in end, keying off those names is less useful if you don't know what they will be in advance.

Since entryPoints is not optional, I think perhaps it should come outside the opts variable in builder.optimizeBuild. Perhaps the same for optimizationFunction? The problem with a default optimization function is if it requires extra information (like entry point weighting) it makes using optimizeBuild much less intuitive. If we require an optimization function to be specified then there can be extra instructions based on which one you choose.

from builder.

guybedford commented on July 17, 2024

All good points! So do you think something like -

builder.optimizeBuild(entryPoints, require('custom-optimizer'), {
  // not handled by builder - 
  // custom configuration options needed by the optimizer get passed through
  entrypointPriorities: {}, 
  // builder-specific options:
  outPath: 'out/folder', // optional, if not set returns source as memory compilation
  sourceMaps: true
  // etc other options
})

from builder.

markstickley commented on July 17, 2024

Looks good to me :)

from builder.

markstickley commented on July 17, 2024

Couple of questions.

Is there a standard way of generating systemjs config or a standard place to write it to? Currently in the implementation I've been working on it writes bundles and depCache config to config.js in the same output folder as the rest of the built files.

Also, the work I've done has the option of outputting metadata about each bundle and the bundling process as a whole. In the current specification for the optimization function's return value, data pertaining to the process as a whole isn't supported. In order to avoid complexity do you think it would be better to return an instance of an object that conforms to an API? That way you could have (for example) returnValue.getNames(), returnValue.getBundles(), returnValue.getBundle(name), returnValue.getEntryPoints(name), returnValue.getModules(name) as well as any custom extensions like returnValue.getBundleEfficiency(name) etc.

Thanks!

from builder.

guybedford commented on July 17, 2024

@markstickley yes there is currently only one config which is changed from dev to production. Having a special "production config" being created is an issue being tracked at #67.

In terms of the additional metadata, we could possibly allow the individual bundle entries to be overloaded? Or does that not cover everything you need?

from builder.

markstickley commented on July 17, 2024

Ah #67 looks like what we need here to create/update the config file. Since all the config code is in jspm-cli, is it OK for builder to depend on that? I would think that config specific to bundles would most likely be production config. I think that means this task has as a dependency on #67 would you agree?

As for the additional metadata, overloading the bundle entries is fine but if there is metadata about the build as a whole (speed, efficiency, size etc) I'm wondering where that should go. Technically you could break down those figures and add them to each bundle to be totted up later but it's more of a convenience thing.

from builder.

guybedford commented on July 17, 2024

@markstickley was trying to create an API that didn't have configuration assumptions as a dependency. If we want to see the config as part of the system, then perhaps we should reconsider the API itself. I can go either way on this though - on the one hand its about specifying how users would want the config to be generated for pages, on the other hand its about characterizing the minimal optimization build. I was trying to take the shortest definite path first in specifying a comprehensive optimization that can be generalized as part of a great config process later on to avoid having to do all that spec work upfront though - because chances are I'd get it wrong. Note that builder is designed to be wrapped in other tools while it doesn't have the high-level APIs, so there's no problem with that either.

At the same time, perhaps creating this arbitrary API boundary is the wrong framing as well, and we should be looking at more comprehensive solutions. I'm not sure.

Perhaps we just make bundles a property of the output object, with other properties allowed then? Since this is what the optimization function returns it could then choose to add its own extra properties.

from builder.

markstickley commented on July 17, 2024

@guybedford Hmm it's a difficult balance. Here's what I suggest:

By default, write config.js to the same folder as the bundles. If config.js already exists then append it, otherwise create it.
Extend the output object to also include a config property which has depCache and bundles arrays that can be added to an existing config.js
Provide another function in the builder API which can be bound with paths for bundles and config.js and used as the callback in optimizeBuild().then();

builder.optimizeBuild(entryPoints, require('custom-optimizer')).
then(builder.writeOptimizedBuild.bind(builder, {
    bundlesPath: 'built',
    'configPath': '.'
});

That way the relative path for the bundles can be calculated as well

System.bundles['relative/path/to/bundle'] = ["pages/page1","utils/myUtil"];

from builder.

markstickley commented on July 17, 2024

Just for clarity I wrote out the spec again taking into account the discussions. Please should if anything looks wrong.

var entryPoints = {
  bundle1Name: 'path/to/bundle1/entryPoint',
  bundle2Name: ['path/to/bundle2/entryPoint', 'path/to/anotherEntryPoint/includedIn/bundle2'],
  bundle3Name: ['path/to/bundle3/entryPoint']
};
// Note: entryPoints can also be an array if you are happy to use default bundle names

builder.optimizeBuild(entryPoints, require('custom-optimizer'), {
  // builder-specific options:
  outPath: 'out/folder', // optional, if not set returns source as memory compilation
  sourceMaps: true,
  uglify: true,
  minify: true,
  // etc other options
  // custom configuration options not handled by builder but needed by the optimizer get passed through:
  entrypointPriorities: {},
  // etc...
}).
then(function(optimizedData) {
  console.dir(optimizedData);
/*
{
  bundles: [
    {
      name: 'bundle1Name',
      entryPoints: 'path/to/bundle1/entryPoint',
      modules: ['path/to/dependency1','path/to/dependency2'], // modules in this bundle
      source: '...' // provided if in memory mode and no outPath given
    },
    ...
  ],
  config: {
    depCache: {
      'path/to/bundle1/entryPoint': ['path/to/dependency1','path/to/dependency2']
    },
    bundles: {
      'out/folder/bundle1Name': ['path/to/bundle1/entryPoint','path/to/dependency1','path/to/dependency2']
    }
  }
}
*/
});


/**
 * Optional, user-defined function to process the tree data generated from the entry points
 * @param {{String|String[]}|[String|String[]]} entryPoints Object or array of entry points (strings or arrays of strings)
 * @param {Object} trace Full trace data mapping to the entry points
 * @param {Object} optimizationOptions Custom variables that can be required or optional for this optimization function
 * @return {Promise|{bundleName:Tree, ...} Promise to be resolved with an object of Trees or
 *         an object of Trees, with each tree representing a bundle to be written.
 */
function optimizationFunction(entryPoints, trace, optimizationOptions) {

  // <clever manuipulation of trees happens here>

  return {
    bundles: [{
      name: 'bundle-1', // name of the bundle in the outpath
      entryPoints: [], // entry points this bundle is loaded for
      modules: [], // modules in this bundle
    },
    ...
    ],
    config: {
      depCache: {
        ...
      },
      bundles: {
        ...
      }
    }
  };

  // Note the bundle names don't have to be used in the returned data structure if the optimization
  // algorithm calls for a different practice. Also, if no names are provided default names should be
  // used here (bundle0, bundle1, bundle2, etc).
}

from builder.

guybedford commented on July 17, 2024

@markstickley yes that looks correct. Note that the config output is not from the optimization function but added within SystemJS itself. I'm actually looking at adding a similar output for the single-file build so would be good to align with that. See #67 (comment).

from builder.

guybedford commented on July 17, 2024

Also note the entry points don't refer to bundle names but actual entry point names.

from builder.

guybedford commented on July 17, 2024

Entry points should never be arrays I don't think - it's against the definition of an entry point.

from builder.

guybedford commented on July 17, 2024

I guess it could make sense though.

from builder.

markstickley commented on July 17, 2024

I've created a public gist for the spec so it's easier to edit and keep track of changes! https://gist.github.com/markstickley/c1bc6663cbe36bc0d46e

I agree that entry points shouldn't be arrays but I was trying to incorporate the original suggestion at the top of this thread. I'll remove support for that.

The entry points don't actually refer to bundle names but the name of the entry point can be used when creating the bundle filenames. But they don't have to be so I'll change that to make it clearer.

Thanks!

from builder.

guybedford commented on July 17, 2024

@markstickley perfect thanks for summarizing, it looks like we're on the same page with that.

from builder.

markstickley commented on July 17, 2024

Quick update - things are going well! Just writing some tests.

Should I include a default optimization function within systemjs-builder or should all optimization functions live somewhere else externally? If internally, should I make it possible to run without specifying a function (so it uses a default) or should it be specified explicitly?

Finally, I'm still not sure what to do with the SystemJS config. Currently it's being written to the same location as the bundles but I realise this may not be ideal. Any advice on this? Cheers!

from builder.

guybedford commented on July 17, 2024

@markstickley that is awesome to hear!

What sort of algorithm are you planning to have the default optimization function follow?

For the config, I wouldn't do anything here. baseURL is dist-specific. After that the only map configuration one needs in production is the configuration needed specifically for dynamic loading (I've changed bundles to use normalized IDs only to allow this in the next release). That is, the bundles config object only. Perhaps this could even be inlined into the page-specific bundles given that we know page ordering.

from builder.

guybedford commented on July 17, 2024

@markstickley thanks so much for your efforts on this to date. I think an optimization may be best handled through a wrapper around this project, and it maintains responsibilities much better anyway.

from builder.

wclr commented on July 17, 2024

Use http://stealjs.com!

from builder.

Optimization build functions about builder HOT 36 CLOSED

Comments (36)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent