Run job and check status
Run a job and check it's status¶
In this example we will show how to use the sfapi_client to run a job on Perlmutter at NERSC. Wait for the job to complete and look at the resulting output file to see that the job ran successfully.
from sfapi_client import Client
from sfapi_client.compute import Machine
user_name = "elvis"
### This gets your home based on your username
user_home = f'/global/homes/{user_name[0]}/{user_name}'
First we make a client to connect connect with the REST api. The client will read in a file from the directory $HOME/.superfacility
in the pem format.
CLIENT_ID
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
Or in a json format.
{
"client_id" : "CLIENT_ID",
"secret" : "{...}"
}
A full path to the file can also be given to the client to load in a different client_id and secret.
client = Client(key_name="/full/path/to/key.pem")
Creates our client object¶
client = Client()
Before we start let's check that Perlmutter is up¶
In this example we are getting the resource by the string name "perlmutter"
.
client.compute("perlmutter").status
<StatusValue.active: 'active'>
Once the client is configured we get a compute object for Perlmutter¶
You can also get the compute with names stored inside the Machine
enum
.
perlmutter = client.compute(Machine.perlmutter)
Let's create a submit script¶
We'll start with a basic "Hello world" to get started!
import random
random.seed(7)
script = f"""#!/bin/bash
#SBATCH -C cpu
#SBATCH -q shared
#SBATCH -N 1
#SBATCH -c 1
#SBATCH -t 1
#SBATCH -o {user_home}/sfapi_demo.txt
echo "Completed run {random.randint(1, 100)}"
"""
print(script)
#!/bin/bash #SBATCH -C cpu #SBATCH -q shared #SBATCH -N 1 #SBATCH -c 1 #SBATCH -t 1 #SBATCH -o /global/homes/e/elvis/sfapi_demo.txt echo "Completed run 42"
Once we have the script it can be submitted as a job¶
job = perlmutter.submit_job(script)
The job object will contain information about the job on the system including it's current status and job id.
job.jobid
'8407751'
To get the most recent information about the job you can ask the server to update the job. The PENDING
state in this example means that the job is waiting on the request resources to become available to run.
job.update()
job.state
<JobState.PENDING: 'PENDING'>
We can also wait for the job to complete, which can be helpful if the job needs to be finished before another process starts.
%%time
job.complete()
CPU times: user 26.1 ms, sys: 5.37 ms, total: 31.5 ms Wall time: 43.3 s
<JobState.COMPLETED: 'COMPLETED'>
Once the job is complete we can make sure it produced the expected output file sfapi_demo.txt
using the ls
command on the compute site perlmutter.
output_file = perlmutter.ls(f"{user_home}/sfapi_demo.txt")
output_file = output_file[0]
output_file.is_file()
True
We can also read the contents of small files by opening the file on the remote filesystem.
with output_file.open("r") as f:
print(f.read())
Completed run 42
At the end of working with the client it should be closed.
client.close()