Giter VIP home page Giter VIP logo

project1-cuda-flocking's Introduction

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 1 - Flocking

  • Licheng CAO
  • Tested on: Windows 10, i7-10870H @ 2.20GHz 32GB, GTX 3060 6009MB

Result (65536 Boids)

boid

Analysis

  • the number of boids (with/without visualization)

    • Figures 1 and 2 depict a clear trend: as the number of boids increases, the frames per second (FPS) decreases. This decline is primarily attributed to the growing number of boids that each thread must process. Among the various step methods, the naive method is the most affected by the number of boids, as it necessitates considering all boids within each thread. On the other hand, the scattered method significantly enhances performance by limiting the scope to boids in nearby grids, thereby reducing the number of boids each thread must handle. Additionally, in this method, I've implemented a specific order for searching the nearby grids to ensure contiguous grid access.
    • The coherent method provides a slight performance boost over the scattered method. It accomplishes this by rearranging the position and velocity arrays of boids in such a way that information for boids within the same grid is stored contiguously in memory. This optimization allows for faster retrieval of information when iterating over all boids within a single grid.
    • Figure1 average FPS/number of boids with visualization
    • avgFPS_numboidsV
    • Figure2 average FPS/number of boids without visualization
    • avgFPS_numboids
    • We observe that the average FPS is higher when visualization is disabled. As the number of boids increases, the disparity between FPS with and without visualization diminishes. This phenomenon occurs because the primary bottleneck affecting FPS is the GPU calculations when dealing with a large number of boids, whereas it becomes the drawing speed when dealing with a smaller number of boids.
    • Figure3 average FPS with/without visualization
    • FPS_V_NV
  • the blocksize and cell width

    • Table 1 and Figure 4 provide insights into the behavior of the average FPS. In the case of the naive method, we observe that the average FPS remains relatively consistent. However, for both the scattered and coherent methods, altering the block size from 32 to 64 leads to an increase in the average FPS, while changing it from 224 to 512 results in a decrease.
    • The increase in average FPS with the larger block size can be attributed to the block having more warps, allowing for smoother switching to hide the delay associated with accessing global data. Conversely, the decrease in average FPS with the smaller block size may stem from each Streaming Multiprocessor (SM) lacking sufficient Streaming Processors (SPs) to efficiently process threads in a single loop.
    • Modifying the cell width and increasing the number of blocks to check from 8 to 27 can have a notable impact on the performance of my implementation. In my approach, I introduce a jitter vector (vec3(-0.5)) to ensure that each boid within a grid only needs to search within nearby 8 blocks to locate all neighboring boids efficiently. Consequently, examining 27 blocks would introduce unnecessary computational overhead.
    • However, when we alter the cell width and block size, the outcomes may differ. A smaller cell width could result in a reduced bounding box for each boid, potentially decreasing the number of boids that each thread needs to inspect. This reduction in workload could, in turn, lead to improved performance.
    • Table1 average FPS/blocksize without visualization (number of boids: 524,288)
    • blocksize 32 64 96 128 160 192 224 512
      naiveFPS 0.323 0.454 0.452 0.458 0.449 0.455 0.457 0.1326
      ScatteredFPS 129.62 174.728 181.97 181.566 179.985 181.666 177.534 166.941
      CoherentFPS 231.937 270.333 273.499 270.334 269.264 269.608 267.44 251.679
    • Figure4 average FPS/blocksize
    • avgFPS_blocksize

project1-cuda-flocking's People

Contributors

ottaviohartman avatar likangning93 avatar shehzan10 avatar yashv28 avatar lichengcao avatar horo-ursa avatar trungtle avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.