cmu-safari / mqsim Goto Github PK

MQSim is a fast and accurate simulator modeling the performance of modern multi-queue (MQ) SSDs as well as traditional SATA based SSDs. MQSim faithfully models new high-bandwidth protocol implementations, steady-state SSD conditions, and the full end-to-end latency of requests in modern SSDs. It is described in detail in the FAST 2018 paper by Arash Tavakkol et al., "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices" (https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf)

Home Page: https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf

License: MIT License

C++ 89.57% C 0.12% Makefile 0.07% HTML 10.25%

mqsim's Introduction

MQSim: A Simulator for Modern NVMe and SATA SSDs

MQSim is a simulator that accurately captures the behavior of both modern multi-queue SSDs and conventional SATA-based SSDs. MQSim faithfully models a number of critical features absent in existing state-of-the-art simulators, including (1) modern multi-queue-based host–interface protocols (e.g., NVMe), (2) the steady-state behavior of SSDs, and (3) the end-to-end latency of I/O requests. MQSim can be run as a standalone tool, or integrated with a full-system simulator.

The full paper is published in FAST 2018 and is available online at https://people.inf.ethz.ch/omutlu/pub/MQSim-SSD-simulation-framework_fast18.pdf

Citation

Please cite our full FAST 2018 paper if you find this repository useful.

Arash Tavakkol, Juan Gomez-Luna, Mohammad Sadrosadati, Saugata Ghose, and Onur Mutlu, "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices" Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST), Oakland, CA, USA, February 2018.

@inproceedings{tavakkol2018mqsim,
  title={{MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices}},
  author={Tavakkol, Arash and G{\'o}mez-Luna, Juan and Sadrosadati, Mohammad and Ghose, Saugata and Mutlu, Onur},
  booktitle={FAST},
  year={2018}
}

Additional Resources

To learn more about MQSim, please refer to the slides and talk below:

Slides: (pptx) (pdf)
Talk: Introduction to MQSim from the Understanding and Designing Modern NAND Flash-Based Solid-State Drives (SSDs) course

Usage in Linux

Run following commands:

$ make
$ ./MQSim -i <SSD Configuration File> -w <Workload Definition File>

Usage in Windows

Open the MQSim.sln solution file in MS Visual Studio 2017 or later.
Set the Solution Configuration to Release (it is set to Debug by default).
Compile the solution.
Run the generated executable file (e.g., MQSim.exe) either in command line mode or by clicking the MS Visual Studio run button. Please specify the paths to the files containing the 1) SSD configurations, and 2) workload definitions.

Example command line execution:

$ MQSim.exe -i <SSD Configuration File> -w <Workload Definition File>

MQSim Execution Configurations

You can specify your preferred SSD configuration in the XML format. If the SSD configuration file specified in the command line does not exist, MQSim will create a sample XML file in the specified path. Here are the definitions of configuration parameters available in the XML file:

Host

PCIe_Lane_Bandwidth: the PCIe bandwidth per lane in GB/s. Range = {all positive double precision values}.
PCIe_Lane_Count: the number of PCIe lanes. Range = {all positive integer values}.
SATA_Processing_Delay: defines the aggregate hardware and software processing delay to send/receive a SATA message to the SSD device in nanoseconds. Range = {all positive integer values}.
Enable_ResponseTime_Logging: the toggle to enable response time logging. If enabled, response time is calculated for each running I/O flow over simulation epochs and is reported in a log file at the end of each epoch. Range = {true, false}.
ResponseTime_Logging_Period_Length: defines the epoch length for response time logging in nanoseconds. Range = {all positive integer values}.

SSD Device

Seed: the seed value that is used for random number generation. Range = {all positive integer values}.
Enabled_Preconditioning: the toggle to enable preconditioning. Range = {true, false}.
Memory_Type: the type of the non-volatile memory used for data storage. Range = {FLASH}.
HostInterface_Type: the type of host interface. Range = {NVME, SATA}.
IO_Queue_Depth: the length of the host-side I/O queue. If the host interface is set to NVME, then IO_Queue_Depth defines the capacity of the I/O Submission and I/O Completion Queues. If the host interface is set to SATA, then IO_Queue_Depth defines the capacity of the Native Command Queue (NCQ). Range = {all positive integer values}
Queue_Fetch_Size: the value of the QueueFetchSize parameter as described in the FAST 2018 paper [1]. Range = {all positive integer values}
Caching_Mechanism: the data caching mechanism used on the device. Range = {SIMPLE: implements a simple data destaging buffer, ADVANCED: implements an advanced data caching mechanism with different sharing options among the concurrent flows}.
Data_Cache_Sharing_Mode: the sharing mode of the DRAM data cache (buffer) among the concurrently running I/O flows when an NVMe host interface is used. Range = {SHARED, EQUAL_PARTITIONING}.
Data_Cache_Capacity: the size of the DRAM data cache in bytes. Range = {all positive integers}
Data_Cache_DRAM_Row_Size: the size of the DRAM rows in bytes. Range = {all positive power of two numbers}.
Data_Cache_DRAM_Data_Rate: the DRAM data transfer rate in MT/s. Range = {all positive integer values}.
Data_Cache_DRAM_Data_Burst_Size: the number of bytes that are transferred in one DRAM burst (depends on the number of DRAM chips). Range = {all positive integer values}.
Data_Cache_DRAM_tRCD: the value of the timing parameter tRCD in nanoseconds used to access DRAM in the data cache. Range = {all positive integer values}.
Data_Cache_DRAM_tCL: the value of the timing parameter tCL in nanoseconds used to access DRAM in the data cache. Range = {all positive integer values}.
Data_Cache_DRAM_tRP: the value of the timing parameter tRP in nanoseconds used to access DRAM in the data cache. Range = {all positive integer values}.
Address_Mapping: the logical-to-physical address mapping policy implemented in the Flash Translation Layer (FTL). Range = {PAGE_LEVEL, HYBRID}.
Ideal_Mapping_Table: if mapping is ideal, table is enabled in which all address translations entries are always in CMT (i.e., CMT is infinite in size) and thus all adddress translation requests are always successful (i.e., all the mapping entries are found in the DRAM and there is no need to read mapping entries from flash)
CMT_Capacity: the size of the SRAM/DRAM space in bytes used to cache the address mapping table (Cached Mapping Table). Range = {all positive integer values}.
CMT_Sharing_Mode: the mode that determines how the entire CMT (Cached Mapping Table) space is shared among concurrently running flows when an NVMe host interface is used. Range = {SHARED, EQUAL_PARTITIONING}.
Plane_Allocation_Scheme: the scheme for plane allocation as defined in Tavakkol et al. [3]. Range = {CWDP, CWPD, CDWP, CDPW, CPWD, CPDW, WCDP, WCPD, WDCP, WDPC, WPCD, WPDC, DCWP, DCPW, DWCP, DWPC, DPCW, DPWC, PCWD, PCDW, PWCD, PWDC, PDCW, PDWC}
Transaction_Scheduling_Policy: the transaction scheduling policy that is used in the SSD back end. Range = {OUT_OF_ORDER as defined in the Sprinkler paper [2], PRIORITY_OUT_OF_ORDER which implements OUT_OF_ORDER and NVMe priorities}.
Overprovisioning_Ratio: the ratio of reserved storage space with respect to the available flash storage capacity. Range = {all positive double precision values}.
GC_Exect_Threshold: the threshold for starting Garbage Collection (GC). When the ratio of the free physical pages for a plane drops below this threshold, GC execution begins. Range = {all positive double precision values}.
GC_Block_Selection_Policy: the GC block selection policy. Range {GREEDY, RGA (described in [4] and [5]), RANDOM (described in [4]), RANDOM_P (described in [4]), RANDOM_PP (described in [4]), FIFO (described in [6])}.
Use_Copyback_for_GC: used in GC_and_WL_Unit_Page_Level to determine block_manager→Is_page_valid gc_write transaction
Preemptible_GC_Enabled: the toggle to enable pre-emptible GC (described in [7]). Range = {true, false}.
GC_Hard_Threshold: the threshold to stop pre-emptible GC execution (described in [7]). Range = {all possible positive double precision values less than GC_Exect_Threshold}.
Dynamic_Wearleveling_Enabled: the toggle to enable dynamic wear-leveling (described in [9]). Range = {true, false}.
Static_Wearleveling_Enabled: the toggle to enable static wear-leveling (described in [9]). Range = {all positive integer values}.
Static_Wearleveling_Threshold: the threshold for starting static wear-leveling (described in [9]). When the difference between the minimum and maximum erase count within a memory unit (e.g., plane in flash memory) drops below this threshold, static wear-leveling begins. Range = {true, false}.
Preferred_suspend_erase_time_for_read: the reasonable time to suspend an ongoing flash erase operation in favor of a recently-queued read operation. Range = {all positive integer values}.
Preferred_suspend_erase_time_for_write: the reasonable time to suspend an ongoing flash erase operation in favor of a recently-queued read operation. Range = {all positive integer values}.
Preferred_suspend_write_time_for_read: the reasonable time to suspend an ongoing flash erase operation in favor of a recently-queued program operation. Range = {all positive integer values}.
Flash_Channel_Count: the number of flash channels in the SSD back end. Range = {all positive integer values}.
Flash_Channel_Width: the width of each flash channel in byte. Range = {all positive integer values}.
Channel_Transfer_Rate: the transfer rate of flash channels in the SSD back end in MT/s. Range = {all positive integer values}.
Chip_No_Per_Channel: the number of flash chips attached to each channel in the SSD back end. Range = {all positive integer values}.
Flash_Comm_Protocol: the Open NAND Flash Interface (ONFI) protocol used for data transfer over flash channels in the SSD back end. Range = {NVDDR2}.

NAND Flash

Flash_Technology: Range = {SLC, MLC, TLC}.
CMD_Suspension_Support: the type of suspend command support by flash chips. Range = {NONE, PROGRAM, PROGRAM_ERASE, ERASE}.
Page_Read_Latency_LSB: the latency of reading LSB bits of flash memory cells in nanoseconds. Range = {all positive integer values}.
Page_Read_Latency_CSB: the latency of reading CSB bits of flash memory cells in nanoseconds. Range = {all positive integer values}.
Page_Read_Latency_MSB: the latency of reading MSB bits of flash memory cells in nanoseconds. Range = {all positive integer values}.
Page_Program_Latency_LSB: the latency of programming LSB bits of flash memory cells in nanoseconds. Range = {all positive integer values}.
Page_Program_Latency_CSB: the latency of programming CSB bits of flash memory cells in nanoseconds. Range = {all positive integer values}.
Page_Program_Latency_MSB: the latency of programming MSB bits of flash memory cells in nanoseconds. Range = {all positive integer values}.
Block_Erase_Latency: the latency of erasing a flash block in nanoseconds. Range = {all positive integer values}.
Block_PE_Cycles_Limit: the PE limit of each flash block. Range = {all positive integer values}.
Suspend_Erase_Time: the time taken to suspend an ongoing erase operation in nanoseconds. Range = {all positive integer values}.
Suspend_Program_Time: the time taken to suspend an ongoing program operation in nanoseconds. Range = {all positive integer values}.
Die_No_Per_Chip: the number of dies in each flash chip. Range = {all positive integer values}.
Plane_No_Per_Die: the number of planes in each die. Range = {all positive integer values}.
Block_No_Per_Plane: the number of flash blocks in each plane. Range = {all positive integer values}.
Page_No_Per_Block: the number of physical pages in each flash block. Range = {all positive integer values}.
Page_Capacity: the size of each physical flash page in bytes. Range = {all positive integer values}.
Page_Metadat_Capacity: the size of the metadata area of each physical flash page in bytes. Range = {all positive integer values}.

MQSim Workload Definition

You can define your preferred set of workloads in the XML format. If the specified workload definition file does not exist, MQSim will create a sample workload definition file in XML format for you (i.e., workload.xml). Here is the explanation of the XML attributes and tags for the workload definition file:

The entire workload definitions should be embedded within <MQSim_IO_Scenarios></MQSim_IO_Scenarios> tags. You can define different sets of I/O scenarios within these tags. MQSim simulates each I/O scenario separately.
We call a set of workloads that should be executed together, an I/O scenario. An I/O scenario is defined within the <IO_Scenario></IO_Scenario> tags. For example, two different I/O scenarios are defined in the workload definition file in the following way:

<MQSim_IO_Scenarios>
	<IO_Scenario>
	.............
	</IO_Scenario>
	<IO_Scenario>
	.............
	</IO_Scenario>
</MQSim_IO_Scenarios>

For each I/O scenario, MQSim 1) rebuilds the Host and SSD Drive model and executes the scenario to completion, and 2) creates an output file and writes the simulation results to it. For the example mentioned above, MQSim builds the Host and SSD Drive models twice, executes the first and second I/O scenarios, and finally writes the execution results into the workload_scenario_1.xml and workload_scenario_2.xml files, respectively.

You can define up to 8 different workloads within each IO_Scenario tag. Each workload could either be a disk trace file that has already been collected on a real system or a synthetic stream of I/O requests that are generated by MQSim's request generator.

Defining a Trace-based Workload

You can define a trace-based workload for MQSim, using the <IO_Flow_Parameter_Set_Trace_Based> XML tag. Currently, MQSim can execute ASCII disk traces define in [8] in which each line of the trace file has the following format: 1.Request_Arrival_Time 2.Device_Number 3.Starting_Logical_Sector_Address 4.Request_Size_In_Sectors 5.Type_of_Requests[0 for write, 1 for read]

The following parameters are used to define a trace-based workload:

Priority_Class: the priority class of the I/O queue associated with this I/O request. Range = {URGENT, HIGH, MEDIUM, LOW}.
Device_Level_Data_Caching_Mode: the type of on-device data caching for this flow. Range={WRITE_CACHE, READ_CACHE, WRITE_READ_CACHE, TURNED_OFF}. If the caching mechanism mentioned above is set to SIMPLE, then only WRITE_CACHE and TURNED_OFF modes could be used.
Channel_IDs: a comma-separated list of channel IDs that are allocated to this workload. This list is used for resource partitioning. If there are C channels in the SSD (defined in the SSD configuration file), then the channel ID list should include values in the range 0 to C-1. If no resource partitioning is required, then all workloads should have channel IDs 0 to C-1.
Chip_IDs: a comma-separated list of chip IDs that are allocated to this workload. This list is used for resource partitioning. If there are W chips in each channel (defined in the SSD configuration file), then the chip ID list should include values in the range 0 to W-1. If no resource partitioning is required, then all workloads should have chip IDs 0 to W-1.
Die_IDs: a comma-separated list of chip IDs that are allocated to this workload. This list is used for resource partitioning. If there are D dies in each flash chip (defined in the SSD configuration file), then the die ID list should include values in the range 0 to D-1. If no resource partitioning is required, then all workloads should have die IDs 0 to D-1.
Plane_IDs: a comma-separated list of plane IDs that are allocated to this workload. This list is used for resource partitioning. If there are P planes in each die (defined in the SSD configuration file), then the plane ID list should include values in the range 0 to P-1. If no resource partitioning is required, then all workloads should have plane IDs 0 to P-1.
Initial_Occupancy_Percentage: the percentage of the storage space (i.e., logical pages) that is filled during preconditioning. Range = {all integer values in the range 1 to 100}.
File_Path: the relative/absolute path to the input trace file.
Percentage_To_Be_Executed: the percentage of requests in the input trace file that should be executed. Range = {all integer values in the range 1 to 100}.
Relay_Count: the number of times that the trace execution should be repeated. Range = {all positive integer values}.
Time_Unit: the unit of arrival times in the input trace file. Range = {PICOSECOND, NANOSECOND, MICROSECOND}

Defining a Synthetic Workload

You can define a synthetic workload for MQSim, using the <IO_Flow_Parameter_Set_Synthetic> XML tag.

The following parameters are used to define a trace-based workload:

Priority_Class: same as trace-based parameters mentioned above.
Device_Level_Data_Caching_Mode: same as trace-based parameters mentioned above.
Channel_IDs: same as trace-based parameters mentioned above.
Chip_IDs: same as trace-based parameters mentioned above.
Die_IDs: same as trace-based parameters mentioned above.
Plane_IDs: same as trace-based parameters mentioned above.
Initial_Occupancy_Percentage: same as trace-based parameters mentioned above.
Working_Set_Percentage: the percentage of available logical storage space that is accessed by generated requests. Range = {all integer values in the range 1 to 100}.
Synthetic_Generator_Type: determines the way that the stream of requests is generated. Currently, there are two modes for generating consecutive requests, 1) based on the average bandwidth of I/O requests, or 2) based on the average depth of the I/O queue. Range = {BANDWIDTH, QUEUE_DEPTH}.
Read_Percentage: the ratio of read requests in the generated flow of I/O requests. Range = {all integer values in the range 1 to 100}.
Address_Distribution: the distribution pattern of addresses in the generated flow of I/O requests. Range = {STREAMING, RANDOM_UNIFORM, RANDOM_HOTCOLD, MIXED_STREAMING_RANDOM}.
Percentage_of_Hot_Region: if RANDOM_HOTCOLD is set for address distribution, then this parameter determines the ratio of the hot region with respect to the entire logical address space. Range = {all integer values in the range 1 to 100}.
Generated_Aligned_Addresses: the toggle to enable aligned address generation. Range = {true, false}.
Address_Alignment_Unit: the unit that all generated addresses must be aligned to in sectors (i.e. 512 bytes). Range = {all positive integer values}.
Request_Size_Distribution: the distribution pattern of request sizes in the generated flow of I/O requests. Range = {FIXED, NORMAL}.
Average_Request_Size: average size of generated I/O requests in sectors (i.e. 512 bytes). Range = {all positive integer values}.
Variance_Request_Size: if the request size distribution is set to NORMAL, then this parameter determines the variance of I/O request sizes in sectors. Range = {all non-negative integer values}.
Seed: the seed value that is used for random number generation. Range = {all positive integer values}.
Average_No_of_Reqs_in_Queue: average number of I/O requests enqueued in the host-side I/O queue (i.e., the intensity of the generated flow). This parameter is used in QUEUE_DEPTH mode of request generation. Range = {all positive integer values}.
Bandwidth: the average bandwidth of I/O requests (i.e., the intensity of the generated flow) in bytes per seconds. MQSim uses this parameter in BANDWIDTH mode of request generation.
Stop_Time: defines when to stop generating I/O requests in nanoseconds.
Total_Requests_To_Generate: if Stop_Time is set to zero, then MQSim's request generator considers Total_Requests_To_Generate to decide when to stop generating I/O requests.

Analyze MQSim's XML Output

You can use an XML processor to easily read and analyze an MQSim output file. For example, you can open an MQSim output file in MS Excel. Then, MS Excel shows a set of options and you should choose "Use the XML Source task pane". The XML file is processed in MS Excel and a task pane is shown with all output parameters listed in it. In the task pane on the right, you see different types of statistics available in the MQSim's output file. To read the value of a parameter, you should:

Drag and drop that parameter from the task source pane to the Excel sheet.,
Right click on the cell that you have dropped the parameter and select XML > Refresh XML Data from the drop-down menue.

The parameters used to define the output file of the simulator are divided into categories:

Host

For each defined IO_Flow, the following parameters are shown:

Name: The name of the IO flow, e.g. Host.IO_Flow.Synth.No_0
Request_Count: The total number of requests from this IO_flow.
Read_Request_Count: The total number of read requests from this IO_flow.
Write_Request_Count: The total number of write requests from this IO_flow.
IOPS: The number of IO operations per second, i.e. how many requests are served per second.
IOPS_Read: The number of read IO operations per second.
IOPS_Write: The number of write IO operations per second.
Bytes_Transferred: The total number of data bytes transferred across the interface.
Bytes_Transferred_Read: The total number of data bytes read from the SSD Device.
Bytes_Transferred_write: The total number of data bytes written to the SSD Device.
Bandwidth: The total bandwidth delivered by the SSD Device in bytes per second.
Bandwidth_Read: The total read bandwidth delivered by the SSD Device in bytes per second.
Bandwidth_Write: The total write bandwidth delivered by the SSD Device in bytes per second.
Device_Response_Time: The average SSD device response time for a request, in nanoseconds. This is defined as the time between enqueueing the request in the I/O submission queue, and removing it from the I/O completion queue.
Min_Device_Response_Time: The minimum SSD device response time for a request, in nanoseconds.
Max_Device_Response_Time: The maximum SSD device response time for a request, in nanoseconds.
End_to_End_Request_Delay: The average delay between generating an I/O request and receiving a corresponding answer. This is defined as the difference between the request arrival time, and its removal time from the I/O completion queue. Note that the request arrival_time is the same as the request enqueue_time, when using the multi-queue properties of NVMe drives.
Min_End_to_End_Request_Delay: The minimum end-to-end request delay.
Max_End_to_End_Request_Delay: The maximum end-to-end request delay.

SSDDevice

The output parameters in the SSDDevice category contain values for:

Average transaction times at a lower abstraction level (SSDDevice.IO_Stream)
Statistics for the flash transaction layer (FTL)
Statistics for each queue in the SSD's internal flash Transaction Scheduling Unit (TSU): In the TSU exists a User_Read_TR_Queue, a User_Write_TR_Queue, a Mapping_Read_TR_Queue, a Mapping_Write_TR_Queue, a GC_Read_TR_Queue, a GC_Write_TR_queue, a GC_Erase_TR_Queue for each combination of channel and package.
For each package: the fraction of time in the exclusive memory command execution, exclusive data transfer, overlapped memory command execution and data transfer, and idle mode.

References

[1] A. Tavakkol et al., "MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices," FAST, pp. 49 - 66, 2018.

[2] M. Jung and M. T. Kandemir, "Sprinkler: Maximizing Resource Utilization in Many-chip Solid State Disks," HPCA, pp. 524-535, 2014.

[3] A. Tavakkol et al., "Performance Evaluation of Dynamic Page Allocation Strategies in SSDs," ACM TOMPECS, pp. 7:1--7:33, 2016.

[4] B. Van Houdt, "A Mean Field Model for a Class of Garbage Collection Algorithms in Flash-based Solid State Drives," SIGMETRICS, pp. 191-202, 2013.

[5] Y. Li et al., "Stochastic Modeling of Large-Scale Solid-State Storage Systems: Analysis, Design Tradeoffs and Optimization," SIGMETRICS, pp. 179-190, 2013.

[6] P. Desnoyers, "Analytic Modeling of SSD Write Performance", SYSTOR, pp. 12:1-12:10, 2012.

[7] J. Lee et al., "Preemptible I/O Scheduling of Garbage Collection for Solid State Drives," Vol. 32, No. 2, pp. 247-260, 2013.

[8] J. S. Bucy et al., "The DiskSim Simulation Environment Version 4.0 Reference Manual", CMU Tech Rep. CMU-PDL-08-101, 2008.

[9] Micron Technology, Inc., "Wear Leveling in NAND Flash Memory", Application Note AN1822, 2010.

mqsim's People

Contributors

Stargazers

Watchers

Forkers

hayamiz qinglicsaggie umar895 hoangt wongyung0 nsq974487195 gdchenahu younghogong zy-tough raranoo arafatms xinhuilin huangtao00 wyfzidane larryckl adivittala miyavi-chen adweathers tmnvnbl zuoerfeng hao219 xractor arjunbala jabingu mhzz templestorager viccto chenzdido skoppula snu-arc pglprome binzhou-hust fengye0316 nithinv13 byungwoo733 sa3036 shiangjun virgilshi hushunkui sac2019 mazen930 yoonakim95 alaeddine1996 hrithiksampson martina-lu yunchih umn-cris cuhk-mass hyung8789 ziyangjiao raji-siva daixuanli xiaobuding-cx niusenc abolfazlamini byrantwithyou ironysuzumiya jeremyfrc daniel9710 yanlei2017 danlinjia junshim wurikiji erikliu123 usokon jason-0104 zhaohaismart tx221 anniezfy abnerzheng sozong tundergod prabuddhasinha zinechant 14010007517 compstorassasin samba-sen fandahao17 mnasevich woehlerj wintry-coder kiyoakii llljun yuhun-jun mark15217 xixiha0 ajunlonglive allencho1222 mkhubaibumer jsienki civita magnitionio liufeng911 wangyuyue knightku yousei-github zongwuwang zy1024cs thaneesh-babu beomjun-kim

mqsim's Issues

Preconditioning doesn't work properly.

I think the preconditioning doesn’t work – endless while loop.
In FTL.cpp, FTL::Perform_precondition function, Step 1-4, there are two while loops:

while (lpa_set_for_preconditioning.size() < no_of_logical_pages_in_steadystate)
                           {
                                  … cout<<loop1
                                  {
                                         … cout<<loop2

The “loop1” runs forever, and the printouts of it show that "lpa_set_for_preconditioning" is always an empty set.
In detail, [FTL::Perform_precondition] Step 1-4:
while loop 1:

lpa_set_for_preconditioning.size()=0 <  no_of_logical_pages_in_steadystate=43687868.

How can I get "response time of EACH request"

Hi. I am Joonsung Kim.

I want to reproduce FAST '18 paper (especially the response time CDF and the time series plot of read/write response time in Appendix 3 and 4 respectively).
I cannot find out how to extract the response time of each request. The simulator's output only provides the average/min/max latencies/bandwidth of read/write requests.
I quickly checked the code (i.e., NVMe/SATA_consume_io_request functions), but the simulator doesn't store the response time (device_response_time) of each request. It just compares the response time with minimum and maximum values to keep track of min/max.

Is there any way to obtain the latency results of each request?

Great thanks.

Why is the CMT larger and the higher the end to end delay?

Hi, I am facing a problem that the larger the CMT are, the higher the end to end delay. When running the wsrch-small trace, I increase the CMT capacity (2M, 4M, 6M...), and the end to end delays are also increasing. I think larger CMT will reduce delay. Can you give me some hints?

in MQSim paper i had seen that fio generates the trace files but device number and address column is not generated in fio where can i find such a tool to do it. And Device Number is not used much in the code can i keep it random uniform

SLC caching implementation

In addition to DRAM caching, many storage interfaces use SLC caching to improve performance.
Please give me some idea how to implement SLC caching scheme into the original source.

Flash_Block_Manager_Base.cpp

I find a small issue, in Flash_Block_Manager_Base.cpp, the function Get_min_max_erase_difference() and Get_coldest_block_id(), there are two algorithms that find the max/min erase block, here is the code:
/for (unsigned int i = 1; i < block_no_per_plane; i++)
{
if (plane_record->Blocks[i].Erase_count > plane_record->Blocks[i].Erase_count)//here i think should be Block[i]>Block[i-1]
max_erased_block = i;
if (plane_record->Blocks[i].Erase_count < plane_record->Blocks[i].Erase_count)
min_erased_block = i;
}/

/for (unsigned int i = 1; i < block_no_per_plane; i++)
{
if (plane_record->Blocks[i].Erase_count < plane_record->Blocks[i].Erase_count)//get coldest block, here i think should be Block[i]<Block[i+1]
min_erased_block = i;
}/

And i still have a trouble, i run MQSim(Dubeg mode) in my computer(i5-6200U) need at least 4 hours, is too long, so i want to ask how do you test this simulator, if really need several hours.
thanks.

FLIN scheduler

does the MQSim support the FLIN scheduler?
why the .c and .h file of FLIN are commented in the source code of simulator?

Regards

Bug in SIMPLE Caching Mechanism?

Hi,
When executing a given trace with a SIMPLE caching mechanism with a WRITE_CACHE device-level data caching mode, it throws an error.
Line 118 in Data_Cache_Manager_Flash_Simple.cpp

Thanks

gem5 integration

I want to test out the integration with gem5 as described in the fast18 paper. However, I cannot find any hint in the code or documentation as to where I should start. Is this functionality present in the simulator? I guess writing a certain disk trace in gem5 and simulating it in MQSim is possible, but that would defeat the purpose of a complete integration as described.

I am wondering How can I learn about FLIN scheduling more and more

Thanks for your open soruce Simulator and Publications, I can have a few knowledge about ssd Transaction scheduling.
I want to learn more and more, so I am watching FLIN code with FILN publication.
I am trying to understand FLIN algorithm with publication and TSU_FLIN.cpp

I think , in TSU_FILN.cpp there is some miss and some codes don't correspond to FLIN publications.
for example , TSU_FLIN::fairness_based_on_average_slowdown() , TSU_FLIN::reorder_for_fairness()
So, I really wondering that TSU_FLIN was perfectly completed to run successfully?

build warnings: slowdown_min and slowdown_min_reverse to large for double type

Build warnings:
src/ssd/TSU_FLIN.cpp:351:25: warning: integer constant is so large that it is unsigned
double slowdown_min = 10000000000000000000, slowdown_max = 0;
^~~~~~~~~~~~~~~~~~~~
src/ssd/TSU_FLIN.cpp:378:33: warning: integer constant is so large that it is unsigned
double slowdown_min_reverse = 10000000000000000000, slowdown_max_reverse = 0;

suggestion:
replace line 351:
double slowdown_min = 10000000000000000000, slowdown_max = 0;
with:
unsigned long long int slowdown_min = 10000000000000000000ULL, slowdown_max = 0ULL;

replace line 378:
double slowdown_min_reverse = 10000000000000000000, slowdown_max_reverse = 0;
with:
unsigned long long int slowdown_min_reverse = 10000000000000000000ULL, slowdown_max_reverse = 0ULL;

Why does the larger the capacity of CMT, the smaller the end-to-end delay?

Hello, I now encountered a problem : after many experiments, I found that the larger the capacity of CMT, the smaller the end-to-end delay, which is contrary to my idea. My workload is wsrch-samll, and the configuration only changes the CMT _ Capacity ( 256KB, 512KB, 1MB, 2MB... ) on the basis of the project source code. May I have some suggestions? I have seen before someone asked this question, but the issue is now closed, I am very anxious now, hope you can help me. Best wishes!

Integrate MQSim with gem5

Hi,
Thanks for working with MQSim open source.

I'm new to using MQSim.
In paper, I saw that an integrated execution mode (that works with the gem5 simulator) is possible.
How do I run this in current code?

Best Regards,
Mingeon

Floating point exception while running Financial benchmark

Hi, I am facing some issues while running the simulator. The simulator is encountering floating point exception while running the trace.

can't run the simulator

I try to install the simulator and run it in my vamre virtual machine(the system in virtual machine is ubuntu 16.04lts).but when the simulator run at 10% progress,an error happened,it shows Segmentation fault (core dumped).and it crashed.so how can i fix this

Unusual running time

Hi
I ran the source code without any modification, and then the execution time was up to ten hours, and it would stop for a long time near 55% of scenario 2. Has anyone else encountered a situation similar to mine? Or is there any solution?
thank you

use_default_workloads missing Synthetic_Generator_Type

ERROR:Unhandled request type generator in FTL preconditioning function when running with on-the-fly generated workload (i.e., default_ssdconfig.xml and default_workload.xml don't exist initially so they are generated by MQSim)
commandline:
./MQSim -i default_ssdconfig.xml -w default_workload.xml

Compilation failed in Linux (gcc-5)

HostInterface_Type is ambitiously defined. The commit fb08965 still works with gcc-5 and Visual Studio may pass this failure.

src/exec/Device_Parameter_Set.h:22:28: error: declaration of ‘HostInterface_Type Device_Parameter_Set::HostInterface_Type’ [-fpermissive]
  static HostInterface_Type HostInterface_Type;
                            ^
In file included from src/exec/Device_Parameter_Set.h:5:0,
                 from src/exec/Execution_Parameter_Set.h:6,
                 from src/main.cpp:7:
src/exec/../ssd/Host_Interface_Defs.h:7:12: error: changes meaning of ‘HostInterface_Type’ from ‘enum class HostInterface_Type’ [-fpermissive]
 enum class HostInterface_Type { SATA, NVME };

Input files to repeat the experiments of Contention at CMT in the FAST 2018 paper.

I cannot find the input files to repeat the experiments of Contention at the Cached Mapping Table (Section 6.1.3) in the FAST 2018 paper. Could you please provide the configuration files in this repository?

I am a little confused about how to set the execution time of the random access pattern and sequential access pattern in one run.

In addition, could you please explain a little about the MIXED_STREAMING_RANDOM option in the IO flow configuration Address_Distribution?

microbenchmark: couldn't find the microbenchmark program that analyzes and estimates the real ssd's internal configuration

Hello
In FAST18 paper, it mentions there is a microbenchmarking program to analyze and estimate the real ssd's internal configuration (eg,NAND flash page size，address mapping strategy ，write cache size).And the article says you have open-sourced that microbenchmark，but i haven't found it in your specified reference （MQSim GitHub Repository. https://github.com/CMU-SAFARI/MQSim）.
Could you please tell me where i can get that microbenchmarking program? Thank you very much!

signed-by-off Xuqiang Chen [email protected]

How to collect transfer/execution time for each request?

I found that class User_request has a Transaction_list, where contains all related transactions. I want to sum the transfer/execution time of transactions of each User_request and return the time info to the host.
I wonder which class I should modify?

OP ratio

Just to confirm, I'm wondering does it mean (T - U) / T or (T-U) / U in MQSim?
Thank you for your time and help!

problem about running the trace of Financial1

When I run the Trace of Financial1 with MQSim, I found that when the write request is passed to the list of Waiting_user_requests in the device and the data of the write request is retrieved from the host, the write request placed in the list of Waiting_user_requests is abnormally deleted. And every time we run MQSim, the write requests that are deleted abnormally are different. Do you know what caused this?

TSU_Prioirty_OutOfOrder

Hi
I'm reading this code, and found something that I can't understand.
Why in TSU_Priority_OutOfOrder, MappingWriteTRQueue isn't controlled after Mapping Transaction are pushed in?
Is there any hidden trick that I don't know?

Runtime error on Linux

We tried to run MQSim on linux with default configuration (on github).
It was successful when unmodified configuration was used. But it caused
segmentation fault when we used higher memory frequency or smaller page size.

Experiment environment is Ubuntu 18.04, G++ 7.3.0

Get response time for each request?

Hi,

I'm defining a trace-based workload to feed MQSim.
I wonder is that possible to get the finish time of each request?
If yes, which file should I modify?

Thanks,
Danlin

Bug in combining 2 uint32_t

In ssd/Host_Interface_SATA.cpp and ssd/Host_Interface_NVMe.cpp, the function Request_Fetch_Unit_NVMe::Process_pcie_read_message gets the new_request->Start_LBA (uint64_t) by combining two uint32_t fields of Command_specific. However it left shift the upper bits by 31 but not 32.

new_request->Start_LBA = ((LHA_type)sqe->Command_specific[1]) << 31 | (LHA_type)sqe->Command_specific[0];

I believe that it should be:
new_request->Start_LBA = ((LHA_type)sqe->Command_specific[1]) << 32 | (LHA_type)sqe->Command_specific[0];

Maybe I misunderstood it. Does anyone find the same issue :)

missing parameters in README.md, XML parse error for Use_Copyback_GC

proposed changes diff
README.md
+17. Ideal_Mapping_Table: if mapping is ideal, table is enabled in which all address translations entries are always in CMT (i.e., CMT is infinite in size) and thus all adddress translation requests are always successful (i.e., all the mapping entries are found in the DRAM and there is no need to read mapping entries from flash)

and
+25. Use_Copyback_for_GC: used in GC_and_WL_Unit_Page_Level to determine block_manager→Is_page_valid gc_write transaction

exec/Device_parameter_Set.cpp

val = (Use_Copyback_for_GC ? "true" : "false");

val = (Ideal_Mapping_Table ? "true" : "false");

Retrieving Info on Specific Requests

Hello,

I'm wondering if there's an easy way to modify the code so that it outputs information for each individual read/write request within a stream (i.e. request #500 occurred at time x, completed at time y, and had z requests queued in front of it)? If anyone could point me in the right direction it would be much appreciated :)

Cheers,

Gavin

minor xml tag error and some missing parameters

=========================================================
workload-backend-contention-flow-1.xml
workload-backend-contention-flow-1-flow-2.xml
was
<Generated_Aligned_Addresses>true</<Generated_Aligned_Addresses>
is
<Generated_Aligned_Addresses>true</Generated_Aligned_Addresses>

3rd scenario missing parameters
<Generated_Aligned_Addresses>true</Generated_Aligned_Addresses>
<Address_Alignment_Unit>16</Address_Alignment_Unit>

==========================================================
workload-backend-contention-flow-2.xml
workload-queue-fetch-size-flow-1.xml
workload-queue-fetch-size-flow-2.xml
workload-queue-fetch-size-flow-1-flow-2.xml
workload-datacache-contention-flow-1.xml
workload-datacache-contention-flow-2.xml
workload-datacache-contention-flow-1-flow-2.xml

multiple places
was
<Generated_Aligned_Addresses>true</<Generated_Aligned_Addresses>
is
<Generated_Aligned_Addresses>true</Generated_Aligned_Addresses>

=========================================================

Simulation Stops earlier TPCC: requests serviced are way lower than requests generated

(total requests generated: 23666510 total requests serviced:21992). Is it the simulator limitation ? I have set the Percentage_to_Be_Executed to be 100, still it persists. What parameters to check if we want it to run all requests.

Also is there a way we can run applications directly on top of MQSim as the paper describes that it can be done via gem5, rather than a trace-driven simulation (please suggest if you have a version for it).

Questions about repeating trace files

When I run the Trace of tpcc-small repeatedly with MQSim in windows, the program is abnormal. I did not change the program, only modified the input file - ssdconfig.xml and workload.xml.
I only made the following changes to the ssdconfig.xml：
Flash_Channel_Count = 1;
Chip_No_Per_Channel = 1;
Die_No_Per_Chip = 1;
Plane_No_Per_Die = 1;
Block_No_Per_Plane = 64;
Also,I only define one IO_Scenario,and a trace-based workload for MQSim.
I made the following changes to the workload.xml：
Channel_IDs =0 ;
Chip_IDs = 0;
Die_IDs = 0;
Plane_IDs = 0;
File_Path = traces/tpcc-small.trace;
Relay_Count = 10;
The program error message is：

Do you know what caused this? I want to study the wear leveling algorithm, which requires the block to wear as quickly as possible, so I reduced the capacity of the SSD. However, there is a problem with the simulation. How can I solve it?

What Device_Number means?

Hi,

I notice that in a trace_based workload, the columns of input trace file has following names: 1.Request_Arrival_Time 2.Device_Number 3.Starting_Logical_Sector_Address 4.Request_Size_In_Sectors 5.Type_of_Requests[0 for write, 1 for read].

I wonder what Device_Number refers to? Does that mean we can simulate multiple storage devices, and the host may send requests to different devices?

run syntax

hi,
should we use two xml files: workload.xml and ssdconfig.xml for run?
when I type the following commands:
./MQSIM -i ssdconfig.xml -w workload.xml

this error appears:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
aborted (core dumped)

thanks

Trying to set parameters for defining a proper workload to GC algorithms comparison

Hello,
I want to define a workload to compare my Garbage Collection algorithm versus other algorithms to see how it performs.
Is there a sample workload definition which is specifically used for GC comparison? Or is there any documentation or guide about defining workloads.

How can I modify MQSim to support multiple SSDs?

Hi,

I want to do a load balancing experiment, but I don't know how to start.
Could you tell me which host modules should be modified?

Thanks

Accidentally created this issue.

NVMe Queue Arbitration (Flow priority)

Hello,

In the workload.xml file there is a option named Priority_Class that can be set for different flows. What I understand from the readme description, we can set the relative I/O priority of different flows by setting the corresponding Priority_Class value in the range {{URGENT, HIGH, MEDIUM, LOW}, which is very similar to the NVMe queue specification. I tried two different flows with two different Priority_Class values, one with URGENT and another with LOW. And after the simulation, when I see the statistics, I see almost no difference in performance at all ( the delay, IOPS etc of these two flows are almost similar, which supposed not to be the case). I put rest of the required attributes (i.e. Read_Percentage, Average_Request_Size, Average_No_of_Reqs_in_Queue etc) of the two flows identical.

So, what I want to know, whether this Priority_Class of each flows really has some impact or this is left for the future enhancement.

Thanks
Joyanta

Preconditioning error

When preconditioning an SSD device (<Enabled_Preconditioning>true</Enabled_Preconditioning>), the preconditioning code very often results in an error:
"It is not possible to assign PPA to all LPAs in Allocate_address_for_preconditioning! It is not safe to continue preconditioning."

Depending on the workload (Initial_Occupancy_Percentage, but also type such as STREAMING or RANDOM_UNIFORM), this error shows up or does not show up. This makes a little sense, as the workload defines the distribution of addresses used in the preconditioning, but does not explain why preconditioning fails.

How can this error be avoided?

Does MQSim support simulation of a storage cluster?

Hi,
I just wonder if MQSim supports simulation of a storage cluster?
Could we launch multiple MQSim and expose the simulator to users as a storage cluster?

fast18/data-cache-contention workload parameters missing <Synthetic_Generator_Type>

each of the workload*.xml files are missing <Synthetic_Generator_Type>
in the case of workload-datacache-contention-flow-1.xml, using either <Synthetic_Generator_Type>QUEUE_DEPTH or BANDWIDTH</Synthetic_Generator_Type> throws a run time
ERROR:It is not possible to assing PPA to all LPAs in Allocate_address_for_preconditioning! It is not safe to continue preconditioning.
Note: the spelling error (assing) is in the code ;-)

Suggestion: The other two fast18 directories seem to run currently so this is probably an xml file revision that is needed.

Why there are so many memory leaks....

Jesus, when i try to run the porgram with valgrind or address sanitizer, there are so many memory leaks. And there are so many "new" without "delete"...

Questions about Hybrid FTL

The hybrid FTL doesn't work properly. I wrote a write amount count monitor to print out in-byte writes of host and SSD.

PAGE_LEVEL FTL:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Host: WriteAmount=930493 LBAs. (512 bytes/sector)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Host: WriteAmount=58155.8 LPAs. (8192 bytes/page)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Host: WriteAmount=476412416 bytes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SSD:  WriteAmount=7622598656 bytes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SSD:  Page_Capacity=8192 bytes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SSD:  WriteAmount=930493 PPAs. (8192 bytes/page)
Flow Host.IO_Flow.Synth.No_0 - total requests generated: 930493 total requests serviced:930493
WAF is around 16, which make sense.

HYBRID FTL MODE

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Host: WriteAmount=2 LBAs. (512 bytes/sector)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Host: WriteAmount=0.125 LPAs. (8192 bytes/page)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Host: WriteAmount=0 bytes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SSD:  WriteAmount=0 bytes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SSD:  Page_Capacity=8192 bytes.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SSD:  WriteAmount=0 PPAs. (8192 bytes/page)
Flow Host.IO_Flow.Synth.No_0 - total requests generated: 2 total requests serviced:0

I even tried the original MQSim's sample traces, as long as I set the SSD to hybrid FTL mode, it doesn't write PPAs at all.

questions about latency of small request size and large request size

Hi, I'm writing this because I'm just wondering what do you think about my problem which is related to MQSim. Let me get to the point directly.

As shown in your FAST18 paper's figure 10 and the experimental results I made from MQSim, the latency (I mean, device response time in MQSim's result XML file) of small size (e.g., 8kB) is smaller than that of a large request size (e.g., 512KB).
However, figure 3 of the paper from NVMOS'20 [1] showed that the latency of 4KB is much larger than that of 128KB on both ZNS and traditional SSD. And they said that this is because of SSD's internal parallelism.
It seems that this paper shows the opposite results from yours.
I took a look at your source code and it looks like your code also considers the internal parallelism of SSD. But I don't know why they have different results, I mean totally opposite.
So I just wonder what do you think about this.
Is there anything I understood wrong? if so, please let me know.

[1] Exploring Performance Characteristics of ZNS SSDs: Observation and implication.

Exploring Performance Characteristics of ZNS SSDs Observation and Implication_2020_Shin et al copy.pdf

Bandwidth saturated by SSD and not PCIe - Synthetic Workload Bandwidth cannot exceed value of 2GB

Hello,

I was playing with the simulator. I found it really useful but I had some questions/issues with it.

Saturated Bandwidth
It seems that, for the baseline SSD configuration that is given, the maximum possible IOPS that can be achieved only corresponds to around 1GB/s bandwidth. Even if you increase the amount of workload the IOPs do no increase further.

This seems somewhat strange as the PCIe used provides up to 4GB/s. Especially, given that today we have NVMe cards with 3.5GB/s read bandwidth. I tried to reconfigure the SSD to achieve better bandwidth but I am not sure what options would be more realistic. Should I increase the frequency of the SSD channel, the number of channels, the number of chips per channel?

Do you guys have a configuration suggestion so that the SSD can fully benefit from the PCIe bandwidth and also correspond to a realistic implementation? That would be really helpful.

The intensity of each synthetic IO is limited to <2GB/s (2147483648). This problem comes from the definition of the "Bandwidth" parameter as an unsigned int (and not long long) and because it is measured in Bytes. Using values >=2GB causes the simulator to crash as numbers get overflowed. I tried modifying the simulator but there where other errors propagating. A workaround is to use multiple synthetic workloads under the same scenario.

simulation time does not equal to the 'stop time' defined in the workload.xml

I use MQSim to run a Synthetic workload which generates random requests. In the workload.xml, the stop time is 10s.
At first, we set the Read_Percentage to 0, and the total simulation time is 12s. Then we set the Read_Percentage to 100, and the total simulation time is 25s. Is this a normal phenomenon? I guess that the simulation time should correspond to the stop time of workload.xml.

does mqsim support process variation ? and if yes where?

does MQSIM support process variation. If yes in which file is it prominent

MQSim only issue synthetic workload

Hi,

I try to run MQSim on a trace_based workload. The workload.xml is shown as following:

<?xml version="1.0" encoding="us-ascii"?>
<MQSim_IO_Scenarios>
	<IO_Scenario>
		<IO_Flow_Parameter_Set_Trace_Based>
			<Priority_Class>HIGH</Priority_Class>
			<Device_Level_Data_Caching_Mode>WRITE_CACHE</Device_Level_Data_Caching_Mode>
			<Channel_IDs>0,1,2,3,4,5,6,7</Channel_IDs>
			<Chip_IDs>0,1,2,3</Chip_IDs>
			<Die_IDs>0,1</Die_IDs>
			<Plane_IDs>0,1</Plane_IDs>
			<Initial_Occupancy_Percentage>70</Initial_Occupancy_Percentage>
			<File_Path>traces/test.trace</File_Path>
			<Percentage_To_Be_Executed>100</Percentage_To_Be_Executed>
			<Relay_Count>1</Relay_Count>
			<Time_Unit>NANOSECOND</Time_Unit>
		</IO_Flow_Parameter_Set_Trace_Based>
	</IO_Scenario>
</MQSim_IO_Scenarios>

After running MQSim, it seems MQSim runs on a synthetic workload, as I print type of workflow in "parameters->IO_Flow_Definitions[flow_id]->Type", which is 0.
The following is the output.

MQSim started at Thu Oct 22 18:08:00 2020

******************************
Executing scenario 1 out of 1 .......

[>                   ]  0% progress in Host.IO_Flow.Synth.No_0

[>                   ]  0% progress in Host.IO_Flow.Synth.No_1

[=>                  ]  5% progress in Host.IO_Flow.Synth.No_1

[=>                  ]  5% progress in Host.IO_Flow.Synth.No_0

[==>                 ]  10% progress in Host.IO_Flow.Synth.No_1

[==>                 ]  10% progress in Host.IO_Flow.Synth.No_0

[===>                ]  15% progress in Host.IO_Flow.Synth.No_0

[===>                ]  15% progress in Host.IO_Flow.Synth.No_1

[====>               ]  20% progress in Host.IO_Flow.Synth.No_1

[====>               ]  20% progress in Host.IO_Flow.Synth.No_0

[=====>              ]  25% progress in Host.IO_Flow.Synth.No_0

[=====>              ]  25% progress in Host.IO_Flow.Synth.No_1

[======>             ]  30% progress in Host.IO_Flow.Synth.No_1

[======>             ]  30% progress in Host.IO_Flow.Synth.No_0

[=======>            ]  35% progress in Host.IO_Flow.Synth.No_0

[=======>            ]  35% progress in Host.IO_Flow.Synth.No_1

[========>           ]  40% progress in Host.IO_Flow.Synth.No_0

[========>           ]  40% progress in Host.IO_Flow.Synth.No_1

[=========>          ]  45% progress in Host.IO_Flow.Synth.No_1

[=========>          ]  45% progress in Host.IO_Flow.Synth.No_0

[==========>         ]  50% progress in Host.IO_Flow.Synth.No_1

[==========>         ]  50% progress in Host.IO_Flow.Synth.No_0

[===========>        ]  55% progress in Host.IO_Flow.Synth.No_1

[===========>        ]  55% progress in Host.IO_Flow.Synth.No_0

[============>       ]  60% progress in Host.IO_Flow.Synth.No_0

[============>       ]  60% progress in Host.IO_Flow.Synth.No_1

[=============>      ]  65% progress in Host.IO_Flow.Synth.No_0

[=============>      ]  65% progress in Host.IO_Flow.Synth.No_1

[==============>     ]  70% progress in Host.IO_Flow.Synth.No_1

[==============>     ]  70% progress in Host.IO_Flow.Synth.No_0

[===============>    ]  75% progress in Host.IO_Flow.Synth.No_1

[===============>    ]  75% progress in Host.IO_Flow.Synth.No_0

[================>   ]  80% progress in Host.IO_Flow.Synth.No_0

[================>   ]  80% progress in Host.IO_Flow.Synth.No_1

[=================>  ]  85% progress in Host.IO_Flow.Synth.No_1

[=================>  ]  85% progress in Host.IO_Flow.Synth.No_0

[==================> ]  90% progress in Host.IO_Flow.Synth.No_1

[==================> ]  90% progress in Host.IO_Flow.Synth.No_0

[===================>]  95% progress in Host.IO_Flow.Synth.No_0

[===================>]  95% progress in Host.IO_Flow.Synth.No_1

[====================]  100% progress in Host.IO_Flow.Synth.No_1

[====================]  100% progress in Host.IO_Flow.Synth.No_0

MQSim finished at Thu Oct 22 18:08:03 2020

Total simulation time: 0:0:3

Writing results to output file .......
Flow Host.IO_Flow.Synth.No_0 - total requests generated: 11395 total requests serviced:11395
                   - device response time: 175 (us) end-to-end request delay:175 (us)
Flow Host.IO_Flow.Synth.No_1 - total requests generated: 11378 total requests serviced:11378
                   - device response time: 175 (us) end-to-end request delay:175 (us)
Simulation complete; Press any key to exit.

Address Indexing?

In Queue_Probe.h (and maybe in other places?) there is an unordered_map implementation, where the indexing is done with addresses (pointers to NVM_Transaction parameters).
It might create Segmentation Faults as memory may not handled carefully

SSD Device preconditioning started.......

Hello,
I am sorry to bother you but i meet some question that is very unfriendly,the question is that when i run MQSim,it always remain in the SSD Device preconditioning started........ in the scenario of IO traced based ,all Configuration used is original.
hope you can give me some advice ,thank you.

cmu-safari / mqsim Goto Github PK

mqsim's Introduction

MQSim: A Simulator for Modern NVMe and SATA SSDs

Citation

Additional Resources

Usage in Linux

Usage in Windows

MQSim Execution Configurations

Host

SSD Device

NAND Flash

MQSim Workload Definition

Defining a Trace-based Workload

Defining a Synthetic Workload

Analyze MQSim's XML Output

Host

SSDDevice

References

mqsim's People

Contributors

Stargazers

Watchers

Forkers

mqsim's Issues

Recommend Projects

Recommend Topics

Recommend Org