Checkpoint Files Are Not Generated
Introduction
Checkpoint files are essential for running applications on the XiangShan processor. However, users may encounter issues where the checkpoint files are not generated. This article aims to provide a comprehensive guide to troubleshooting and resolving this issue.
Before You Start
Before diving into the troubleshooting process, ensure that you have:
- Read the XiangShan documentation thoroughly.
- Searched previous issues and discussions related to this problem.
- Reproduced the issue using the latest commit on the master branch.
Describe Your Problem
You are trying to generate checkpoint files using NEMU to run on XiangShan. Despite following the instructions, the checkpoint files are not generated. The profiling and clustering steps seem to be completed successfully, but the generation of checkpoint files results in no files being generated.
What Did You Do Before
Setup Tools
To begin, you need to set up the necessary tools. Run the following commands:
git clone https://github.com/OpenXiangShan/xs-env.git
cd /xs-env && sudo -s ./setup-tools.sh && ./setup.sh && source env.sh && source update-submodule.sh
Setup NEMU and Simpoint
Next, set up NEMU and Simpoint by running the following commands:
cd $NEMU_HOME
git submodule update --init
cd $NEMU_HOME/resource/simpoint/simpoint_repo
make clean
make
cd $NEMU_HOME
make clean
make riscv64-xs-cpt_defconfig
make -j 8
cd $NEMU_HOME/resource/gcpt_restore
make
Set an Example from Nexus-am/Apps for Checkpoint
Choose an example from Nexus-am/apps for checkpoint. In this case, we will use the hello
example:
cd /xs-env/nexus-am/apps/hello/
Modify the hello.c
file to set the traps:
#define DISABLE_TIME_INTR 0x100
#define NOTIFY_PROFILER 0x101
#define GOOD_TRAP 0x0
void nemu_signal(int a){
asm volatile ("mv a0, %0\n\t"
".insn r 0x6B, 0, 0, x0, x0, x0\n\t"
:
: "r"(a)
: "a0");
}
#include <klib.h>
int main()
{
nemu_signal(DISABLE_TIME_INTR);
nemu_signal(NOTIFY_PROFILER);
printf("Hello, XiangShan!\n");
nemu_signal(GOOD_TRAP);
return 0;
}
Compile the hello
program:
make ARCH=riscv64-xs
Run the Checkpoint Steps
Run the following script to perform the checkpoint steps:
!/bin/bash
# prepare env
export NEMU_HOME=/xs-env/NEMU
export NEMU=$NEMU_HOME/build/riscv64-nemu-interpreter
export GCPT=$NEMU_HOME/resource/gcpt_restore/build/gcpt.bin
export SIMPOINT=$NEMU_HOME/resource/simpoint/simpoint_repo/bin/simpoint
export WORKLOAD_ROOT_PATH=/xs-env/nexus-am/apps/hello/build/
export LOG_PATH=$NEMU_HOME/hello/logs
RESULT=$NEMU_HOME/hello_result
export profiling_result_name=simpoint-profiling
export PROFILING_RES=$RESULT/$profiling_result_name
export interval=$((2))
# Profiling
# using config: riscv64-xs-cpt_defconfig
profiling(){
set -x
workload=$1
log=$LOG_PATH/profiling_logs
mkdir -p $log
$NEMU ${WORKLOAD_ROOT_PATH}/${workload}.bin \
-D $RESULT -w $workload -C $profiling_result_name \
-b --simpoint-profile --cpt-interval ${interval} > $log/${workload}-out.txt 2>${log}/${workload}-err.txt
}
export -f profiling
# Cluster
cluster(){
set -x
workload=$1
export CLUSTER=$RESULT/cluster/${workload}
mkdir -p $CLUSTER
random1=`head -20 /dev/urandom | cksum | cut -c 1-6`
random2=`head -20 /dev/urandom | cksum | cut -c 1-6`
log=$LOG_PATH/cluster_logs/cluster
mkdir -p $log
$SIMPOINT \
-loadFVFile $PROFILING_RES/${workload}/simpoint_bbv.gz \
-saveSimpoints $CLUSTER/simpoints0 -saveSimpointWeights $CLUSTER/weights0 \
-inputVectorsGzipped -maxK 30 -numInitSeeds 2 -iters 1000 -seedkm ${random1} -seedproj ${random2} \
> $log/${workload}-out.txt 2> $log/${workload}-err.txt
}
export -f cluster
# Checkpointing
# using config: riscv64-xs-cpt_defconfig
checkpoint(){
set -x
workload=$1
export CLUSTER=$RESULT/cluster
log=$LOG_PATH/checkpoint_logs
mkdir -p $log
$NEMU ${WORKLOAD_ROOT_PATH}/${workload}.bin \
-D $RESULT -w ${workload} -C spec-cpt \
-b -S $CLUSTER --cpt-interval $interval \
> $log/${workload}-out.txt 2>$log/${workload}-err.txt
#--checkpoint-format zstd > $log/${workload}-out.txt 2>$log/${workload}-err.txt
}
export -f checkpoint
profiling hello-riscv64-xs
cluster hello-riscv64-xs
checkpoint hello-riscv64-xs
The Files You See Generated
After running the script, you should see the following files generated:
tree NEMU/hello*
NEMU/hello
`-- logs
|-- checkpoint_logs
| |-- hello-riscv64-xs-err.txt
| `-- hello-riscv64-xs-out.txt
|-- cluster_logs
| `-- cluster
| |-- hello-riscv64-xs-err.txt
| `-- hello-riscv64-xs-out.txt
`-- profiling_logs
|-- hello-riscv64-xs-err.txt
`-- hello-riscv64-xs-out.txt
NEMU/hello_result
|-- cluster
| `-- hello-riscv64-xs
| |--points0
| `-- weights0
|-- simpoint-profiling
| `-- hello-riscv64-xs
| `-- simpoint_bbv.gz
`-- spec-cpt
`-- hello-riscv64-xs
`-- 1
Environment
- XiangShan branch: master
- XiangShan commit id: 4bbdccbb077840af5e1b65c7138d31af3966f625
- NEMU commit id: 4a24b77a61505e34745667b1ad712a817b090cf8
- SPIKE commit id:
- Operating System: Ubuntu 22.04
- gcc version: 11.4.0
- mill version: 0.12.10
- java version: 11.0.26
Additional Context
Q: What are checkpoint files and why are they important?
A: Checkpoint files are essential for running applications on the XiangShan processor. They contain the state of the processor at a specific point in time, allowing the application to be resumed from that point. Without checkpoint files, the application would need to start from the beginning, which can be time-consuming and inefficient.
Q: What are the common issues that prevent checkpoint files from being generated?
A: There are several common issues that can prevent checkpoint files from being generated, including:
- Incorrect configuration settings
- Insufficient memory or resources
- Issues with the NEMU or Simpoint tools
- Problems with the workload or application being run
Q: How can I troubleshoot the issue of checkpoint files not being generated?
A: To troubleshoot the issue, follow these steps:
- Check the configuration settings to ensure they are correct.
- Verify that there is sufficient memory and resources available.
- Check the NEMU and Simpoint tools for any issues or errors.
- Review the workload or application being run to ensure it is correct and properly configured.
Q: What are some common errors that can occur when trying to generate checkpoint files?
A: Some common errors that can occur when trying to generate checkpoint files include:
- "Error: unable to create checkpoint file"
- "Error: insufficient memory or resources"
- "Error: invalid configuration settings"
- "Error: issue with NEMU or Simpoint tools"
Q: How can I resolve the issue of checkpoint files not being generated?
A: To resolve the issue, follow these steps:
- Review the configuration settings and ensure they are correct.
- Verify that there is sufficient memory and resources available.
- Check the NEMU and Simpoint tools for any issues or errors.
- Review the workload or application being run to ensure it is correct and properly configured.
Q: What are some best practices for generating checkpoint files?
A: Some best practices for generating checkpoint files include:
- Ensuring that the configuration settings are correct.
- Verifying that there is sufficient memory and resources available.
- Checking the NEMU and Simpoint tools for any issues or errors.
- Reviewing the workload or application being run to ensure it is correct and properly configured.
Q: Can I use checkpoint files to run applications on other processors?
A: Yes, checkpoint files can be used to run applications on other processors. However, the processor must support the same architecture and instruction set as the processor that generated the checkpoint file.
Q: How can I optimize the generation of checkpoint files?
A: To optimize the generation of checkpoint files, follow these steps:
- Use a high-performance processor to generate the checkpoint file.
- Use a large amount of memory and resources to ensure that the checkpoint file is generated quickly and efficiently.
- Use a optimized configuration setting to ensure that the checkpoint file is generated correctly.
- Review the workload or application being run to ensure it is correct and properly.
Q: Can I use checkpoint files to run applications in parallel?
A: Yes, checkpoint files can be used to run applications in parallel. However, the processor must support the same architecture and instruction set as the processor that generated the checkpoint file.
Q: How can I troubleshoot issues with checkpoint files in parallel?
A: To troubleshoot issues with checkpoint files in parallel, follow these steps:
- Review the configuration settings to ensure they are correct.
- Verify that there is sufficient memory and resources available.
- Check the NEMU and Simpoint tools for any issues or errors.
- Review the workload or application being run to ensure it is correct and properly configured.
Q: Can I use checkpoint files to run applications on a cluster?
A: Yes, checkpoint files can be used to run applications on a cluster. However, the cluster must support the same architecture and instruction set as the processor that generated the checkpoint file.
Q: How can I troubleshoot issues with checkpoint files on a cluster?
A: To troubleshoot issues with checkpoint files on a cluster, follow these steps:
- Review the configuration settings to ensure they are correct.
- Verify that there is sufficient memory and resources available.
- Check the NEMU and Simpoint tools for any issues or errors.
- Review the workload or application being run to ensure it is correct and properly configured.