Giter VIP home page Giter VIP logo

mms's People

Contributors

merrymercy avatar pkuflyingpig avatar ying1123 avatar zhisbug avatar zhuohan123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mms's Issues

Profiling Results for Simulation

The pickle binary contains the profiling results which can be loaded by ProfilingDatabase in alpa_serve/profiling.py.

Content:

model_name batch_size dp op pp
bert-1.3b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8
bert-2.6b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8, 16, 32
bert-6.7b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8, 16, 32
moe-1.3b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8, 16
moe-2.4b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8, 16
moe-7.1b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8, 16
moe-10.2b 1, 2, 4, 8, 16 1 1, 2, 4, 8 1, 2, 4, 8, 16

Simulator is not accurate for long pipeline + many models

I found the simulator is not very accurate for some cases in our goodput experiment.

Reproduce

checkout this branch https://github.com/alpa-projects/mms/tree/inaccurate

Simulator

python3 gen_data_goodput.py --mode simulate
cat *.tsv

results:

selective replication, goodput=0.330
model parallelism, goodput=0.599

Real system

python3 gen_data_goodput.py --mode run
cat *.tsv

results:

selective replication, goodput=0.325
model parallelism, goodput=0.456

The simulator is very accurate for selective replication, but not accurate for pipeline parallelism.

Comparison between Simple Placement and Model Parallel

HI, guys, I have a question.
Accoding to the paper,
We evaluate the two placements when the requests to each model follow an independent Poisson process with an arrival rate of 1.5 request/s. Fig. 2a shows the cumulative distribution function (CDF) and average of request latency (which includes the GPU execution time and queuing delay). Model parallel placement reduces the average latency of the simple placement from 0.70s to 0.55s, a 1.3× speedup. The speedup comes from the better burst tolerance: when a burst arrives that exceeds the capability of a single GPU, simple placement must begin queuing requests. However, as long as the other model does not receive many requests, the model parallel placement can use both GPUs to serve the requests for the popular model via statistical multiplexing of the GPUs.

If I understand correctly, the latency makes more sense for comparison under fixed and the same pressure: Given fixed number of concurrent clients (or threads) that continuously send data, then compare the time delay (or equivalently, throughput, which is equal to num_clients/avg_latency_in_ms) between Simple Placement and Model Parallel. Specifically, assuming A == B && A0==A1, and A0, A1, B0, and B1 each occupy less than 50% of a GPU utils when runing, the latency theoretically remains constant(except for the first request), and they should share the same throughput and latency for concurrent request == 2.

In this example, Model Parallel(A0 =>A1, B0 =>B1) can support at most 4 clients and Simple Placement (A0A1 =>B0B1) for 2 without caching. However,
Sequential A0 -> A1 (card 0) => (card 1)B0 -> B1 can also support at most 4 concurrent clients in fact.

Error Running "illustrative_example.py" in Parallel Mode: "ValueError" in "run_controller"

screenshot

Environment:
Python Version: 3.9
Server Hardware: Two V100 GPUs
Ray Version: 2.8.0
Alpa Version: 1.0.0.dev0

Description:
While attempting to run illustrative_example.py in parallel mode on a server equipped with 2 V100 GPUs, I encountered a ValueError related to the run_controller function. The issue seems to stem from an existing actor name conflict in Ray.

I would appreciate any insights or solutions to resolve this error. Thank you for your support.

Combined with batch

Hello, Lianmin Zheng:
I would like to ask how the controller works.
Can the controller be combined with the idea of the batch, for example, if the controller sends ten requests to a group1, can these requests be optimized with the idea of the batch, so that the overall latency is shorter?

Batching

We can port the batching logic in #30 to the new fast simulator.

The assumption of the fast simulator is that all GPUs are FIFO streams, which is compatible with batching.
We can port the group selection, batching, and dropping logic here

# Select group id
g_id = -1
min_device_clock = inf
for j in m_id2g_id[m_id]:
if j < 0:
break
tmp = device_clocks[j][num_stages[j] - 1]
if tmp < min_device_clock:
min_device_clock = tmp
g_id = j
if g_id < 0:
finish[i] = tstamp
good[i] = False
continue
t = tstamp
for k in range(num_stages[g_id]):
t = max(t, device_clocks[g_id][k]) + stage_latency[m_id][g_id][k]
tmp_time[k] = t
finish_time = t + fixed_overhead
group_num_requests[g_id] += 1
if finish_time - tstamp <= slo:

[Question] How to run alpa-serve?

Hi.

I am interested in your nice work.

I want to get a parallel configuration for my server.

I read your codes but it is hard to find some documents or steps for Alpa-serve (not Alpa).

Can you give some advice to run alpa-serve system on a server?
(To get a parallel configuration, how to use alpa-serve?)

I already installed pre-requisite package (ray and other python packages).

Randomized Search

Workload

  • rate distribution
    • Uniform
    • Power law (x = 0.2, 0.5, 0.8)

Initial Solution

  • Enumerate group partitions
    • Equal size, equal op and pp
    • Equal size, unequal op or pp
    • Add some unequal group partitions?
      • Partition groups according to the rate
  • Place models on groups
    • greedy: place a model with the minimum bursty tolerance(?) on a group with the most available memory
    • whether to fill all memory

Search

Framework

  • MCTS
    • place models one by one.
  • Genetic algorithm. (#population size, mutation ratio, cross-over ratio, #iter)
    • cross over
    • mutation (mutate one model, swap two models)
  • MCMC (simulated annealing)
    • mutation

Operators

  • Mutation
    • Mutate model placement
      • Mutate one model on a group
      • Swap two models on two groups
    • Mutate group partition
      • Mutate op x pp of a single group
      • How to get unequal-sized groups by mutation?
  • Cross over
    • Cross over two solutions

Others

  • Parallelize get_scores with ray

Uneven device group sizes - experiments log

  • bert-1.3b * 8 + bert-6.7b * 8, mem_budget = 14 GB, total_rate = 70, slo_scales = 4
    • Mixed Greedy: 0.837 [4-4-4-4] (got with approximate evaluation when training, precise evaluation gives 0.661)
    • Separated Greedy: 0.852 [2-2-2-2-8]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.