wangyinfeng / notes Goto Github PK

notes share from internet

notes's Introduction

notes

notes share from internet

##从AWS X1说起：为什么公有云也用四路服务器？ Scale up能很好解决的问题，就没必要去Scale out。Google在2009年发布的《The Datacenter as a Computer》中很重要的一个观点：“强核理论”（编者注：强核与弱核是相对的。例如x86处理器相对ARM是强核，所以少有Google关于ARM的消息）。简单说来，在一个集群之中，单节点的性能越好，整个集群在通信上耗费的总体资源比例越低，整体利用率更好。比如，一个由较高性能服务器（单节点128核心）组成的集群，对比一个由较普通性能服务器（单节点4核心）组成的集群，集群总核心数量相同情况下，前者的总性能几乎是后者的10倍以上，后者在通信上效率也远不如前者。详见《The Datacenter as a Computer》33-36页

##三层交换与路由从硬件上说，三层交换机是通过交换芯片转发数据的，交换芯片是带有三层转发能力的，也就是路由的功能。路由器则是通过CPU转发的，所有的报文的重新计算和转发任务是在CPU的计算下完成的。三层交换使用NP作为转发实现，报文路由通过CPU计算出来之后，下发到NP，后续报文即根据三层转发表进行路由（FIB？）；路由器CPU功能强，计算能力强，可以快速响应路由的变化。路由器转发报文都需要经过CPU实现，报文上送耗时较NP大。

##169.254.0.0地址的作用 169.254.0.0/16地址是Automatic Private IP Addressing, 用于在使用DHCP获取不到地址的情况下，自动给本机分配的一个随机地址（冲突怎么办？）。由/etc/sysconfig/network NOZEROCONF 配置项控制是否启用

##send-redirect 当前主机作为一个转发节点的时候，如果设置了send-redirect，当发现有更优的路由配置的时候，则会发送ICMP Redirect报文给对端，通知更新路由。

notes's People

Contributors

Watchers

notes's Issues

white-box switcher

白盒交换机目的并不是减少成本网络设备的成本，通过OEM获取交换机设备，运维成本不一定小于商业设备，而且对使用者有很高的技能要求。
白盒交换机的诱惑在于所能够提供的想象空间，通过自定义的软件层，提供编程接口，满足对数据中心所有关键设备控制的欲望，归一化设备的行为和表现，减少整体（非网络）运维成本。

95计费-带宽计费

云服务提供的带宽计费模式有按流量以及按带宽两种基本的，按带宽计费中一般取当月最高带宽作为计费标准，但是由于一般应用带宽不是始终平缓的，而是有波动的，因而提供了一种稍有良心的计费方式 - 95计费：5分钟为一个采样点，在一个月的采样数据中，取95%的位置带宽值作为最大带宽。这种方式允许了5%的突发带宽。

Google's networking inside datacenter and to the outside

https://www.nextplatform.com/2017/07/17/google-wants-rewire-internet/

Google carries about 25 percent of Internet traffic today. In other words, one out of four bytes that are delivered to end users across the Internet originate from Google.

B4 is our private datacenter network interconnect, and B2 is what connects our datacenter to the public Internet.

(The traditional routing) is focused on individual boxes, and essentially what this means is that they have to take a very local view about connectivity. As soon as they can find a path between a source and a destination, two computers that want to talk to each other across the network, then BGP is happy. It doesn’t try to find the best path, and it doesn’t try to do dynamic optimization.
basically what Espresso does is it actually pulls the routing intelligence outside of individual routers into a server pool, where we can do offline analysis of the data.

Espresso is serving 20 percent of the traffic

SmartNIC

ENA 收购Annapurna 设计的智能网卡
ENA 从15年随X1提供25G网络开始

从驱动看，ENA于82599EB的区别主要在：

芯片中实现了L3 switch
10/25/40自适应
实现了Low Latency Queue，可以实现降低时延若干ms。通过在卡内分配内存，内核将package的descripter以及128 byte的报文直接复制到卡内
实现了admin queue，以及异步消息通知队列aenq。
admin queue完成queue的创建销毁，网卡特性的读写，网卡统计读取
aenq通知链路状态，严重故障，状态更改通知，心跳

ENA的驱动不区分PF/VF，但卡应该判断通过ethtool暴露api的权限。重要点：

驱动从卡读取其支持特性
驱动绑定中断，根据卡的状态设置中断频率
驱动中链路支持1G/2.5G/10G/25G/40G/50G/100G/200G/400G
卡中存储os，kernel，driver的版本
卡的状态检查为1HZ一次，相比82599为2HZ
实现了suspend与resume

超融合架构

Hyper-Converged

超融合架构是一种技术手段，而并不是某一类特殊的硬件产品。
超融合只是一种架构设计，并不依赖于某种硬件设备。但是从硬件优化的角度来说。我们要求服务器有较多的本地盘，从而让尽量多的 IO 在本地就实现了访问。 https://community.qingcloud.com/topic/345/%E6%B7%B1%E5%BA%A6%E5%89%96%E6%9E%90-%E8%B6%85%E8%9E%8D%E5%90%88%E6%9E%B6%E6%9E%84%E5%BA%94%E7%94%A8%E4%B8%8E%E5%AE%9E%E8%B7%B5%E5%88%86%E4%BA%AB

什么是超融合？
– 共享计算、存储、网络资源的平台
– 软件定义存储、软件定义计算、软件定义网络

超融合市场驱动力：方便
– 运维简单，管理方便
– 部署简单，变更方便
– 方案简单，调整方便

HXDP分布式存储软件基本原理
• Hypervisor和Controller VM的软件都安装在HyperFlex的64G SD卡上，每
台物理服务器都要安装
• Hypervisor启动pass through直通模式，所有物理盘包括SSD和磁盘的控
制都是由Controller VM进行控制。所有Controller VM组成Cluster，逻辑上
整个集群是一块本地硬盘

https://www.cisco.com/c/dam/global/zh_cn/solutions/industry/segment_sol/enterprise/programs/pdf/hyperFlex.pdf

顺序读写与随机读写

顺序读写（吞吐量，常用单位为MB/s）：文件在硬盘上存储位置是连续的。
适用场景：大文件拷贝（比如视频音乐）。速度即使很高，对数据库性能也没有参考价值。

4K随机读写（IOPS，常用单位为次）：在硬盘上随机位置读写数据，每次4KB。
适用场景：操作系统运行、软件运行、数据库。

https://tech.meituan.com/about-desk-io.html

SSD以Page为单位做读写，以Block为单位做垃圾回收，Page一般有16KB大小，Block一般有几十MB大小，SSD写数据的逻辑是：
1）将该块数据所在的Page读出
2）修改该Page中该块数据的内容
3）找出一个新的空闲Block将2)中的Page写入，并将1)中提到的Page所在的Block中的Page标志为脏

理解了写原理，也就明白了为什么顺序写比随机写好了。四个字：垃圾回收！写相同数据量的情况下，顺序写制造更少的垃圾Block，所以比随机写有更高的性能。

Above the Clouds: A Berkeley View of Cloud Computing

When a Cloud is made available in a pay-as-you-go manner to the general public, we call it a Public Cloud;

Similarly, the advantages of the economy of scale and statistical multiplexing may ultimately lead to a handful of Cloud Computing providers who can amortize the cost of their large datacenters over the products of many “datacenter-less” companies. ---- like fab-less

Therefore, a necessary but not sufficient condition for a company to become a Cloud Computing provider is that it must have existing investments not only in very large datacenters, but also in large-scale software infrastructure and operational expertise required to run them. ---- who can be cloud provider

James Hamilton’s estimates [23] that very large datacenters (tens of thousands of computers) can purchase hardware, network bandwidth, and power for 1/5 to 1/7 the prices offered to a medium-sized (hundreds or thousands of computers) datacenter. Further, the fixed costs of software development and deployment can be amortized over many more machines.

why intel does not provide cloud computing? It's all about *computing*

it’s cheaper to ship data over fiber optic cables than to ship electricity over high-voltage transmission lines.

the cost/benefit analysis must weigh the cost of moving large datasets into the cloud against the benefit of potential speedup in the data analysis.

Real world estimates of server utilization in datacenters range from 5% to 20%

Elastic is the key feature of Cloud Computing

If new technologies or pricing plans become available to a cloud vendor, existing applications and customers can potentially benefit from them immediately, without incurring a capital expense.

A second opportunity is to find other reasons to make it attractive to keep data in the cloud, for once data is in the cloud for any reason it may no longer be a bottleneck and may enable new services that could drive the purchase of Cloud Computing cycles.

One of the difficult challenges in Cloud Computing is removing errors in these very large scale distributed systems. A common occurrence is that these bugs cannot be reproduced in smaller configurations, so the debugging must occur at scale in the production datacenters.

A fast and easy-to-use snapshot/restart tool might further encourage conservation of computing resources.

the licensing model for commercial software is not a good match to Utility Computing.

云骨干网

WAN虚拟化技术，内部的WAN资源包装成产品提供给租户使用

建设在运营商网络资源基础上——使用了运营商的专线？or 走互联网

租用OTN
也走专线才靠谱，批发零售
网络切片，QoS，时延
Over the telco，提供更可靠，更灵活的服务

NetO - Ali 骨干网流量调度系统

运营商专线产品分类有组网专线（分支互联）和上网专线（internet服务）

云化之后大部分业务将在云上闭环，企业分支间的互联由云服务商提供免费服务，不再有专线互联诉求

Nitro vs Bare Metal?

However, it was the C5 instance, released just this month, that was first to offload the entire EC2 stack -- from networking and storage to management, security and monitoring -- onto dedicated hardware. Developed using technology AWS acquired in 2015 when it purchased Annapurna Labs, the C5 marked the debut of the "new EC2 hypervisor," DeSantis said.

edge-computing

What's this

“我们无需自己设计架构，就能实现高性能、高稳定性的高可用存储。”

Busy to Death

https://barryoreilly.com/2017/05/31/busy-to-death/amp/

Have a list of your top priorities. If what you’re being ask to do isn’t on it, then don’t do it, or alternatively suggest how people can move ahead without you.
Thinking is an activity too—don’t undervalue it. Make time and space for it in your daily work schedule (not evenings and weekends).
Stop booking all your available capacity. It means you’ve no adaptability or optionality in your schedule for unexpected events (which should be expected to happen).
Use an activity account to understand where you are spending your time. It doesn’t have to be overly sophisticated. Look at last week’s calendar and write down notes about each day
Set relative percentages for where you want to spend your time, then track and monitor it. If it’s not what you want, think of how you can adapt how you spend your time to move toward it.

公有云的数据中心是否存在卫星站点模式？

上规模的云服务提供商在建立超大规模集中式的数据中心，所谓卫星站点，贴近用户，用距离降低网络时延的需求是否存在？性价比如何？AWS，Ali，Azure，Google，FB是否有所谓这种模式？

Why NVMe faster

https://blogs.cisco.com/datacenter/nvme-for-absolute-beginners

Serial Attached SCSI (SAS) ... no matter how many CPU cores you’ve got or how dense your flash, all data must move serially. NVMe, on the other hand, offers thousands of parallel queues, representing exponentially greater communication potential.

NVMe is the standardized interface for PCIe SSDs

SCSI controller slow down the throughput of the flash, because it's made for disk, one command at one time.
NVMe just release the potential by multiple queues.

linpack C3

Sample data file lininput_xeon64.

Current date/time: Fri Jul 20 12:14:46 2018

CPU frequency: 3.182 GHz
Number of CPUs: 1
Number of cores: 2
Number of threads: 4

Parameters are set to:

Number of tests: 15

Number of equations to solve (problem size) : 1000 2000 5000 10000 15000 18000 20000 22000 25000 26000 27000 30000 35000 40000 45000
Leading dimension of array : 1000 2000 5008 10000 15000 18008 20016 22008 25000 26000 27000 30000 35000 40000 45000
Number of trials to run : 4 2 2 2 2 2 2 2 2 2 1 1 1 1 1
Data alignment value (in Kbytes) : 4 4 4 4 4 4 4 4 4 4 4 1 1 1 1
Maximum memory requested that can be used=12800801024, at the size=40000

=================== Timing linear equation system solver ===================

Size LDA Align. Time(s) GFlops Residual Residual(norm) Check
1000 1000 4 0.034 19.7983 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.006 104.9613 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.006 103.4335 1.165068e-12 3.973181e-02 pass
1000 1000 4 0.006 105.0405 1.165068e-12 3.973181e-02 pass
2000 2000 4 0.040 132.0736 5.001027e-12 4.350281e-02 pass
2000 2000 4 0.039 138.4960 5.001027e-12 4.350281e-02 pass
5000 5008 4 0.552 151.1293 2.715800e-11 3.786964e-02 pass
5000 5008 4 0.543 153.4799 2.715800e-11 3.786964e-02 pass
10000 10000 4 4.182 159.4505 1.026247e-10 3.618650e-02 pass
10000 10000 4 4.135 161.2604 1.026247e-10 3.618650e-02 pass
15000 15000 4 13.823 162.8074 2.383860e-10 3.754620e-02 pass
15000 15000 4 13.778 163.3317 2.383860e-10 3.754620e-02 pass
18000 18008 4 23.738 163.8125 3.012383e-10 3.298929e-02 pass
18000 18008 4 23.607 164.7210 3.012383e-10 3.298929e-02 pass
20000 20016 4 32.493 164.1619 3.488892e-10 3.088433e-02 pass
20000 20016 4 32.425 164.5075 3.488892e-10 3.088433e-02 pass
22000 22008 4 43.142 164.5649 4.480804e-10 3.282012e-02 pass
22000 22008 4 43.114 164.6711 4.480804e-10 3.282012e-02 pass
25000 25000 4 63.053 165.2235 5.548119e-10 3.155017e-02 pass
25000 25000 4 63.000 165.3626 5.548119e-10 3.155017e-02 pass
26000 26000 4 70.946 165.1765 5.690611e-10 2.992294e-02 pass
26000 26000 4 70.916 165.2471 5.690611e-10 2.992294e-02 pass
27000 27000 4 79.248 165.5989 6.225738e-10 3.035986e-02 pass
30000 30000 1 108.143 166.4630 8.671269e-10 3.418223e-02 pass
35000 35000 1 171.685 166.5016 1.019134e-09 2.958393e-02 pass
40000 40000 1 255.331 167.1161 1.394161e-09 3.100662e-02 pass

Performance Summary (GFlops)

Size LDA Align. Average Maximal
1000 1000 4 83.3084 105.0405
2000 2000 4 135.2848 138.4960
5000 5008 4 152.3046 153.4799
10000 10000 4 160.3554 161.2604
15000 15000 4 163.0695 163.3317
18000 18008 4 164.2668 164.7210
20000 20016 4 164.3347 164.5075
22000 22008 4 164.6180 164.6711
25000 25000 4 165.2930 165.3626
26000 26000 4 165.2118 165.2471
27000 27000 4 165.5989 165.5989
30000 30000 1 166.4630 166.4630
35000 35000 1 166.5016 166.5016
40000 40000 1 167.1161 167.1161

Residual checks PASSED

End of tests

2018/07/20 周五
12:41

ISP的出路

https://techcrunch.com/2018/06/12/netflix-and-alphabet-will-need-to-become-isps-fast/

AT&T, Comcast都在投入内容服务，购买Time Warner等，不甘心沦为管道工。进而推动网络平权的废除。
Google，Netflix等反而在投入管道建设，不甘心被管道工剥削，更不用说平权废除后，进行不公平竞争。

Internet的流量集中化，被几个巨头公司把持，Google, FB, AWS都有成为新ISP的潜力，但管道工是辛苦的体力活，也是重资产，更多的是如同文章所说，防卡脖子同时提供谈判资本。

反观国内的ISP，占尽天时地利，联通是彻底消亡了，只能成为巨头的管道工；移动再学习AT&T，内容上投资不少，电信有最好的宽带资源，无非卖卖机房；如果不是政策限制，A，T这些公司是必定会学习国外同行自建骨干网的，无论对内容巨头，还是设备厂家，或者ISP自己，都是有健康促进作用的。

Learn from [How many data centers needed world wide]

http://perspectives.mvdirona.com/2017/04/how-many-data-centers-needed-world-wide/

why do most cloud providers have data centers in the same US areas (NW states and VA area) but the central, northern area of North America don’t have as many data center locations?

being close to population centers and major communications hubs matters to most operators more than cooling costs.