Course notes for Alexander Stepanov's teachings on design and usage of C++ STL.

Home Page: https://www.jmeiners.com/efficient-programming-with-components/

HTML 84.15% C++ 14.30% Shell 0.46% Common Lisp 0.42% CSS 0.67%

abstract-algebra cpp cpp-concepts generic-programming history stl

efficient-programming-with-components's People

Contributors

Stargazers

Watchers

Forkers

aharrison24 dimitarpg13 petter-holmberg kei100 divinedominion 0xadb

efficient-programming-with-components's Issues

Improve `test_min1_2.cpp`

This could use more explanation, both in the code, and in the text. Also note the original compares several variations of this code that we did not include:

https://github.com/rjernst/stepanov-components-course/blob/master/code/lecture10/min_element1_2.h

Include simple example of binary counter

I also think it would be very valuable to have a slightly extended example of how the counter works, where multiple elements are inserted. You could do an example where the elements are characters and the binary operator is min, for instance. 'zero' could be a space character. You could push in 8 letters in an order that causes multiple slots to be occupied at the end. You can use that to show that the counter only needs log2(8) = 3 slots. You can also show that after all the elements have been inserted, the job isn't yet finished. This justifies the need for the reduce_counter operation, which performs all the remaining comparisons to get to a single 'winner'. I think that's important, because with the text as it is currently, the reduction step is never really justified. It's just presented as something that is necessary.

Some confusing things in Chapter 8

While I was reading chapter 8, I was confused by a couple of things. Others might not find them confusing, but I thought it was worth mentioning them.

Section headings that are lisp keywords

I was initially confused by a section heading that was just the word 'Car'. Because of the capitalization, I didn't associate it with the car operation. Could you break the style guide for these, to keep them lower case, or format them as code? (see below for a potentially better suggestion)

`list_type` is not defined until very late

list_type gets used 15 times before it is actually defined. That was super confusing to me, because I naturally assumed that it must be some non-trivial class type.

It would have been very helpful to know that list_type is just an integer representing an index into the node vector. Members of node_t are referenced a number of times before they are actually defined, too.

I assume the order you've got things is the order that Alex introduces them. But if possible it would be really helpful to see list_type, node_t, pool and free_list defined early on. It seems like they would fit quite naturally into the code block where the list_pool class is introduced:

    #include <vector>

    template<typename T, typename N>
    // T is semi-regular.
    // N is integral
    class list_pool {
        typedef N list_type;

        struct node_t {
          T value;
          N next;
        };

        std::vector<node_t> pool;
        list_type free_list;

        // ...
    };

`list_pool` methods

On first reading, it wasn't clear to me that value, next, free etc... were actually member functions of list_pool. Partly because I didn't know what list_type was, so they could easily have been free functions.

In my chapter 8 typos PR (#27), I tried to clarify things by replacing this:

Now we are going to implement cons, car, cdr, and free, but we need appropriate names for a younger generation.

with this:

Now we are going to implement cons, car, cdr, and free as member functions of list_pool, but we need appropriate names for a younger generation.

It might help further to adjust the section headings to read like "### Implementing a member function to represent car". That would also serve to solve the header capitalization issue I mentioned above.

`rplaca` / `set-car` confusion

I've not really played with lisp, so I'd not come across rplaca and rplacd before. It was confusing to me that the hyperlinks for both of them went to a single thing called 'set-car'. It was only after Googling that I discovered the names were short for 'replace address part' and 'replace decrement part'.

I think it would be really helpful to have a footnote or something explaining that:

rplac is short for 'replace'
Lisp and Scheme use different keywords for these operations(???)

Later in the chapter there is a piece of Scheme code that uses set-cdr, but the corresponding Lisp code doesn't use rplacd. I still don't understand that.

Chapter 10 suggestions

I'm about half way through chapter 10. There are some suggestions I'd like to make up to this point, because I hit a brick wall in my understanding. I've not read the second half, so apologies if my questions are answered later on. If they are then that probably suggests that a reordering might be appropriate.

Section reordering

I think it would help the narrative flow to reorder the sections a bit. I'm thinking:

Start with the naive divide and conquer approach.
Then talk about the unbalanced tree shape of min_element.
Then talk about the balanced tree approach.

They won't swap totally cleanly -- there would need to be a bit of smoothing, but it should be fairly minimal. As things are, I found things hard to follow because it wasn't clear to me where the description was headed. It starts off talking about balanced tournament trees (the 'correct' approach), then moves on to 'unoptimal divide and conquer' and then to the even-worse unbalanced tree. That seems backwards to me.

Divide and conquer section

I was initially a bit confused by the reasoning about the number of comparisons required for the divide and conquer approach. The algorithm is first introduced as if the full list of players is recursively subdivided (i.e top-down). But the numbered list counting up the comparisons is in a bottom-up order, where we start with pairs of players in the first 'round'. Essentially the recursion has already bottomed-out and we're making our way back up the call stack. I think it would be useful to make that distinction a bit more explicit if possible. Say something like "The list of players is recursively subdivided until there are only two players in each list. It's trivial to find the best and second-best players for this base case... etc...". Obviously it would need to be Alexified, but I hope you understand what I'm getting at.

Element 'power'

Originally there was a sentence saying "We can define the power of each element to be the number of games they have played.". The use of 'element' here was jarring, as the surrounding text is talking about 'players'. In my PR I suggested replacing the word 'element' with 'player'. But I still find this sentence a bit problematic. This notion of 'power' seems to come out of nowhere, and it's not obvious at this point in the text why it's even necessary. I was thinking "Why can't we just talk about the number of games they have played? Why do we need a special term?". Also, is this the number of games they have played so far, or the number of games they played before they were knocked out of the tournament? A bit more detail would be welcome.

Commutativity footnote

The commutativity footnote gives a good example with multiplication, then goes on to describe a proof of something from Dirichlet. The sentence starting "This fact about integers can be proven..." was confusing, as I wasn't sure what fact it was referring to. It wasn't clear to me what that proof is trying to show, or what value it adds to the footnote.

The algorithm that wasn't quite an algorithm

The sentence "The algorithm wasn't quite an algorithm and it was clearly not optimal." is frustratingly imprecise. I'm not really sure what it means. Were the steps described in the paper not complete, not correct, or what? At least can we avoid the overloaded use of the word 'algorithm'?

Singleton bits

The 'Binary counting and reduction' section is where I lose the thread. This part in particular:

We can create a counter and in every bit of this counter we're
going to keep a singleton. In each bit we keep the person who had n victories.

Firstly I'm not sure what is meant by 'singleton'. Obviously it has a very specific meaning in programming, but I suspect that's not it. I assume it means 'a single integer value' here?

We have the notion of a 'counter' which to me means 'an integer that counts up'. But somehow the 'bits' of this integer are actually going to contain an integer ID representing a particular player? How is that even possible?

Looking at the ASCII diagrams that follow I see that the 'counter' is actually more like a list of integers. But the only example contents are 0, x and y, so it's a bit hard to generalise and understand how it is used. Presumably x and y are actually integer IDs?

The later paragraphs about laziness and numerical analysis don't really help to clarify, as they are quite abstract.

I'd really like to see a clear explanation at the start of the section about what we're aiming to do, before we dive in to the details. It would be good to clarify what is meant by the terms 'singleton' and 'counter', and it would be helpful to have a more concrete example that uses actual player IDs rather than variable placeholders (x and y).

I realise this probably goes way outside of the way Alex described things in the lectures, but I really struggled with the way things are explained currently.

Unknown person referenced in chapter 8 is PJ Plauger

https://en.wikipedia.org/wiki/P._J._Plauger

Notably he wrote the STL implementation (actually the whole C++ standard library) used by Microsoft compilers in the late 90s.

Expanded footnote about near and huge pointers on windows?

(chapter 9)

Add a footnote to chapter 15 explaining what rotate does

Suggestions for chapters 8 and 9

Chapters 8 and 9 introduce list_pool and list_pool_iterator. A few things confused me while I read those chapters. I've listed them here in case they are useful:

Examples of using the `list_pool` class

list_pool has an unusual API compared to what we're used to from the STL. It was not immediately obvious to me how I was supposed to use it. I eventually discovered test_merge_sort.cpp which generates a list via the push methods on the iterator. But there are no examples of how to use the most important member functions of list_pool. And I wasn't expecting to need to read test_merge_sort.cpp when I was only on Chapter 8.

Perhaps it would help to give some brief examples in the chapters themselves? It's difficult to come up with concise examples, as the API seems to be designed to be used mostly via algorithms like generate_list. Here's my best stab at code using the list_pool API directly.

Chapter 8

This uses the queue operations. Without them I guess you have to keep calling allocate, which is probably a bit too low level.

list_pool<char> pool;
list_pool<char>::pair_type q = pool.empty_queue();

// push_xxx does mutate the pool, but not the queue. So you must
// remember to update the queue object yourself.
q = pool.push_back(q, 'c');
q = pool.push_back(q, 'd');
q = pool.push_back(q, 'e');
q = pool.push_front(q, 'b');
q = pool.push_front(q, 'a');
q = pool.push_back(q, 'f');

// If you want to print the list without using the list_iterator then I think this is
// the only way? But at least it demonstrates how to use `value` and `next`.
list_pool<char>::list_type node = q.first;
while (node != pool.end()) {
    std::cout << pool.value(node) << " ";
    node = pool.next(node);
}

free_list(pool, q.first);

Chapter 9

Using iterators to populate and print a list. Using push_back is more challenging, as you need to increment the insertion point each time, which gets messy. It's much neater to use something like the generate_list function from list_algorithm.h.

    list_pool<char> pool;

    list_pool<char>::iterator nil = list_pool<char>::iterator(pool);
    list_pool<char>::iterator list = nil;

    push_front(list, 'd');
    push_front(list, 'c');
    push_front(list, 'b');
    push_front(list, 'a');

    std::copy(list, nil, std::ostream_iterator<char>(std::cout, " "));

Why there is no `pop_back`

The list_pool API contains push_front and push_back, but only pop_front and not pop_back. That lack of symmetry seemed odd until I realised that you can't implement pop_back efficiently for a singly-linked list. A note to that effect might be worthwhile somewhere in Chapter 8?

`pop_front` does not `free` the removed elements

I'm still not sure whether that's a bug or whether it is by design. I suppose it means you end up with two lists sharing a common tail sequence, assuming you kept hold of your original head node. This isn't really something actionable in the write-up, but it is a thing that still bothers me.

Singly Linked List Iterator concept

I find this concept unsettling. I've not come across iterators that mutate the structure of the underlying data structure before (have I?). These iterators do a lot more than act as 'coordinates'!

The fact that the push_front and push_back friend functions inside list_pool_iterator.h return void even though they allocate new nodes is also odd. As it stands, push_front mutates the iterator, but push_back does not. I'd think it would be more useful for both of them to return an iterator to the newly created node or something. That would make them a little easier to hold correctly, and make the generate_list function easier to write.

But who am I to doubt Alex Stepanov? Would be interested to hear your thoughts on why it's designed this way.

No indirect links without explanation

When linking off to a page with more information, a footnote should be provided giving a brief explanation of the link. This does not apply when the link directly matches the text and the connection is obvious. For example, the name of a school or person's wikipedia page can usually be linked to directly.

Inline code should have monospaced font, but not background color.

Chapter 4 `partial_sort` section could be a bit clearer

I totally understand and support the idea of keeping Alex's idiomatic way of speaking, but I wonder if a little clarification would help in the partial_sort section of chapter 4. It says:

std::partial_sort

For example, you give it 100 elements and you sort from 1st to the
10th to the last. But, for the last 90 elements there is no guarantee.
Those of you who work on search, know you don't really need to sort everything,
you just need to sort a little bit.

The video section corresponding to this is at https://youtu.be/VelLby6K2jQ?t=1305

Alex does mention that partial_sort takes 3 iterators, which is not reflected in the text.
On it's own, "you sort from 1st to the 10th to the last" is not very clear. I know it's exactly what he said, but of course the text lacks the context and the hand gestures from the video.
It is one of the speakers who says "for the last 90 elements there is no guarantee.". Alex corrects him and points out there is a guarantee that the last 90 elements will appear in some order
Alex also mentions that partial_sort can be used as a complete sort.

I don't know what your policy is for putting words in Alex's mouth, but I wonder if something like the following would be clearer?

std::partial_sort

partial_sort takes 3 iterators as parameters. For example, you give it 100 elements and you pass iterators to the 1st, 11th and last+1 elements. The smallest ten elements will be moved to the front in sorted order. But for the last 90 elements the only guarantee is that they will appear in some order.
Those of you who work on search, know you don't really need to sort everything,
you just need to sort a little bit.
You can use partial_sort as a complete sort by setting the second argument the same as the third, but it won't be as efficient as sort.

I'm not convinced it's reasonable (and I know my suggested text is super clunky as it stands), which is why I've raised an issue instead of a PR. Perhaps a footnote would be better?

Add paragraph to acknowledgements for contributions

Ryan Pendleton - https://github.com/rpendleton

Alastair Harrison - https://github.com/aharrison24

Indicate where exercises are solved

Warnings on epub build with pandoc

I built an epub version of your notes with commandline below and it succeeded, but it also gave me some warnings.
I opened this issue in case you want to investigate, but I am happy with the result.
I have taken base.css from https://github.com/onlurking/category-theory-for-programmers
Note that with the below commanline, links to cpp source files are broken, because they are not included in the final epub.

pandoc --toc --toc-depth=2 --epub-metadata=metadata.xml --css=base.css --highlight-style pygments --epub-cover-image=img/cover.png -o build/epub/efficient-programming-with-components.epub title.txt 00_foreword.md 01_data_structures.md 02_regular_types.md 03_singleton.md 04_instrumented.md 05_swap.md 06_min_max.md 07_min_range.md 08_lisp.md 09_iterators.md 10_binary_counter.md 11_min_1_2.md 12_merge_sort.md 13_searching.md 14_binary_search.md 15_merge_inplace.md 16_optimizing_stable_sort.md 17_adaptive_merge_sort.md 18_binary_insertion_sort.md 19_insertion_sort.md
[WARNING] Note with key 'about-bjarne' defined at 02_regular_types.md line 287 column 1 but not used.
[WARNING] Note with key 'design-of-cpp' defined at 18_binary_insertion_sort.md line 138 column 1 but not used.
[WARNING] Note with key 'paul' defined at 14_binary_search.md line 188 column 1 but not used.
[WARNING] Note with key 'shift-vs-divide' defined at 14_binary_search.md line 200 column 1 but not used.
[WARNING] Note with key 'without-goto' defined at 12_merge_sort.md line 439 column 1 but not used.
[WARNING] Duplicate link reference '[knuth]' at 10_binary_counter.md line 68 column 1
[WARNING] Duplicate note reference 'eop' at 13_searching.md line 361 column 1
[WARNING] Duplicate note reference 'pc-info' at 16_optimizing_stable_sort.md line 83 column 1
[WARNING] Duplicate note reference 'pc-info' at 17_adaptive_merge_sort.md line 187 column 1
[WARNING] Duplicate link reference '[moore]' at 19_insertion_sort.md line 42 column 1

Footnote describing how python iterators work

introduction to chapter 9. Do this instead of linking off without explanation.

Chapter 7 comment about gzip could do with clarification

In the stdin footnote in chapter 7, it says:

This is one of the reasons why gzip is so useful.
It compresses data on the fly, without being able to see the incoming data,
or re-analyzing what came before.

It wasn't totally clear to me what this was trying to convey. Superficially 'It compresses data on the fly, without being able to see the incoming data...' doesn't seem to make sense, because of course it must be able to see the input data at some point! It looks like the intention was perhaps to say that gzip only needs to see the current byte, and that it can do some compression without yet knowing what future bytes are going to be encountered.

If that interpretation is correct then it's probably a bit problematic, because as I understand it gzip internally buffers input data into fixed-size blocks (up to 64kB each) and only performs compression for each block once all the data for that block is available. Fundamentally you cannot compress data until you've seen enough of it to identify patterns and therefore come up with a more efficient encoding. As such, the output from gzip will come in chunks. The size of each chunk will be proportional to the entropy present in the corresponding input block. At that point the InputIterator analogy gets a bit strained!

Perhaps this paragraph could be clarified, or alternatively deleted as I'm not sure how much value it brings.

Add reference to art of computer programming from Peter

What Alex refers to here is actually in the answer to exercise 5.5.3 in The Art of Computer Programming vol. 3 (see pages 390 and 701).

justinmeiners / efficient-programming-with-components Goto Github PK