Giter VIP home page Giter VIP logo

Comments (2)

mrocklin avatar mrocklin commented on August 20, 2024

I've done a bit of this already. Copying from internal chat.

I've dumped 2GB of fairly repetitive text into a textfile in hdfs. Then I log into one of the data nodes and read all of the blocks (not just the local ones) through memory and measure the bandwidth

import hdfs
h = hdfs.HDFileSystem()
f = h.open('/tmp/tmp.dat')
blocks = h.get_block_locations('/tmp/tmp.dat')

for block in blocks:
    f.seek(block['offset'])
    start = time()
    l = len(f.read(block['length']))
    assert l == block['length']
    end = time()
    block['bandwidth'] = block['length'] / (end - start) / 1e6

We find that for some blocks we get bandwidths of around 400 MB/s (around hard drive read speeds) and for other blocks we get bandwidths around 120 MB/s (around gigabit ethernet speeds.) Presumably this maps between local and non-local blocks.

[{'bandwidth': 347.52914653409914,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 0},
 {'bandwidth': 396.3504030892114,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 67108864},
 {'bandwidth': 353.78556138839053,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 134217728},
 {'bandwidth': 304.9612634543276,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 201326592},
 {'bandwidth': 395.8287009204799,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 268435456},
 {'bandwidth': 177.41838451756567,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 335544320},
 {'bandwidth': 122.41384895991426,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 402653184},
 {'bandwidth': 427.0453371464706,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 469762048},
 {'bandwidth': 171.54928364076153,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 536870912},
 {'bandwidth': 462.8106807960986,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 603979776},
 {'bandwidth': 166.62452440343176,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 671088640},
 {'bandwidth': 122.31166422991596,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 738197504},
 {'bandwidth': 122.39841190168642,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 805306368},
 {'bandwidth': 122.33127939499784,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 872415232},
 {'bandwidth': 461.6576487164383,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 939524096},
 {'bandwidth': 463.4011842239538,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1006632960},
 {'bandwidth': 260.8133436284131,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1073741824},
 {'bandwidth': 468.18624775767256,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1140850688},
 {'bandwidth': 165.97057590757183,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1207959552},
 {'bandwidth': 462.2201623262103,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1275068416},
 {'bandwidth': 392.8833314871377,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1342177280},
 {'bandwidth': 289.8033452463455,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1409286144},
 {'bandwidth': 122.32771736641125,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1476395008},
 {'bandwidth': 122.31969027146347,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1543503872},
 {'bandwidth': 464.0612791929661,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1610612736},
 {'bandwidth': 462.02593272806,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1677721600},
 {'bandwidth': 465.1425655854063,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1744830464},
 {'bandwidth': 346.6491828828209,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1811939328},
 {'bandwidth': 397.5663340993312,
  'hosts': ['ip-172-31-9-66.ec2.internal'],
  'length': 67108864,
  'offset': 1879048192},
 {'bandwidth': 224.03685571591987,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 1946157056},
 {'bandwidth': 122.37255024922571,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 2013265920},
 {'bandwidth': 122.38212736466593,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 2080374784},
 {'bandwidth': 122.43418923555822,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 2147483648},
 {'bandwidth': 121.66087989627262,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 67108864,
  'offset': 2214592512},
 {'bandwidth': 231.146729234761,
  'hosts': ['ip-172-31-9-64.ec2.internal'],
  'length': 7187514,
  'offset': 2281701376}]

from hdfs3.

mrocklin avatar mrocklin commented on August 20, 2024

Marking this as done with the above experiment.

from hdfs3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.