Create an initial outline plan and post it here. It will be updated and and added

<input type="checkbox" id="" disabled=""

Message to convey: Varying degrees of throughpu

Thesis Structure about thesis HOT 3 OPEN

beepy0 commented on September 25, 2024

Thesis Structure

from thesis.

Comments (3)

beepy0 commented on September 25, 2024

Erklärung
Abstract
Table of Contents
Introduction (4 pages)

Problem Statement - in similar fashion to the intro of the expose
Motivation - Sketches; why sketches
Outline - of the whole work / what comes in the coming chapters

Background (8 pages)

Stream Processing; Distributed Query Processing
Sketches - history; details about theory; advantages to other solutions; different types of sketches
Algorithms - AGMS and FastAGMS explained + visual/pseudo code; runtime and space complexity
Vectorization
Related Work - Papers that have tested out count-min or AGMS/FAGMS in some setting are good examples. I could stick to some more general information if I don't have enough direct examples, but should also try to keep it brief.

Approach (11-14 pages)

Optimization Posibilities - Traditional single instruction CPU vs. SIMD + AVX-512; Cache vs. RAM vs. Disc
Optimization Implementation - AVX-512 or other SIMD-like for memory-sensitive cases?; provide code snippets of original and changed versions; link to open-source repo of original code
Discussion?

Evaluation (would contain multiple diagrams 15-17 pages)

Test Environment - Intel Sever-Grade CPU; Linux; C++; Data generator
Analysis Tools - VTune; Bash : reporting the process and tools, no in-depth explanations of how they work.
Setting baseline - but what baseline? algorithms execution times? (could run a file with n-samples and record avg execution time); VTune hotspots analysis (+ code snippets); VTune performance analysis;
Results - VTune hotspot/performance; algorithm execution speed-up(compared to baseline) ; results from different data distributions; results on machines with different cache sizes?
Cache size as a factor?

Conclusion (2 pages)

Optimization Conclusion AGMS - compute, memory or both
Optimization Conclusion FastAGMS - compute, memory or both
Comparison Conclusion AGMS vs. FAGMS : based on something Martin mentioned; based on data distribution; what else?

List of Tables
List of Figures
Listings
Bibliography

from thesis.

beepy0 commented on September 25, 2024

Read a few papers that use SIMD and such to optimize to get inspired on structure and visualization

from thesis.

beepy0 commented on September 25, 2024

Message to convey:

Varying degrees of throughput speed-up depending on rows / buckets size can be observed. AGMS tends to get a bigger overall speed-up but F-AGMS has more throughput in the higher rows / buckets scenarios (in cases where high accuracy is 1st priority).
AGMS can be useful in cases where accuary can be sacrificed for speed, as it does have a significant edge in lower buckets / rows scenarios.
F-AGMS can sustain a good accuracy/throughput ratio across all scenarios, with even 8 buckets / 8 rows being way more precise than AGMS. Maybe provide a 8 rows / 163840 buckets like in your example to show that accuracy can be kept very high (probably in high 98%s - low 99%s) with practically no performance loss and a small space loss.
Display overall speed-up from the current implementation across all tested cases for both algorithms and compare numbers. It should be around 10x for AGMS and 7x for F-AGMS.
There is a small hit on performance using data samples other than zipf. Normal has a slightly smaller throughput, uniform even smaller than normal. These are tied to amount of microarchitecture usage being observed via VTune.
Probably show more memory-dependance after implementing SIMD (reaching peak of compute headroom), but I'll have to see what comes out from profiling the optimized files.
Discussion for future optimization: parallelization, memory-bounds optimization ?, GPU implementation, further CPU SIMD optimization.

1. Experimental Setup / Baseline

1.1 VTune

- Overall runtime vs microarchitecture utilization based on data distribution. Discuss correlation

1.1.1 Hotspot Analysis

  - Per-function runtime ratio and microarchitecture utilization

  - Heatmap of overall microarchitecture utilization (serves as overview)

1.1.2 Micro-architecture Analysis

  - discuss memory-bounds results

1.2 AGMS

- raw data points

- averaged data plus curve (tendency)

1.3 Fast-AGMS

- raw data points

- averaged data plus curve (tendency)

1.4 AGMS vs Fast-AGMS

- Comparison line-plot, serves as overview

- Averaged throughput across all cases, single number for each algorithm, easy to compare average speed(-up)

2. Optimization Benchmarked

Repeat more or less the same steps (VTune only partially):

One graph showing the new data
One additional graph comparing between old and new (when necessary)

Approach

server hw specs

VTune command line; VTune version?

Three sample data types of different sizes. Code snippets for each distribution.

Using SIMD: Sketch implementation in C++ - some snippets

Seaborn for graphing; maybe one code snippet example

explain benchmarking different cases and different amount of runs with each time a new random variables generation

from thesis.

Thesis Structure about thesis HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent