IBM Cloud Docs
Debugging OpenShift Data Foundation

Debugging OpenShift Data Foundation

Complete the following steps to debug your OpenShift Data Foundation storage configurations.

Checking whether the pod that mounts your storage instance is successfully deployed

Follow the steps to review any error messages related to pod deployment.

  1. List the pods in your cluster. A pod is successfully deployed if the pod shows a status of Running.

    oc get pods
    
  2. Get the details of your pod and review any error messages that are displayed in the Events section of your CLI output.

    oc describe pod <pod_name>
    
  3. Retrieve the logs for your pod and review any error messages.

    oc logs <pod_name>
    
  4. Review the ODF troubleshooting documentation for steps to resolve common errors.

Restarting your app pod

Some issues can be resolved by restarting and redeploying your pods. Follow the steps to redeploy a specific pod.

  1. If your pod is part of a deployment, delete the pod and let the deployment rebuild it. If your pod is not part of a deployment, delete the pod and reapply your pod configuration file.

    1. Delete the pod.
      oc delete pod <pod_name>
      
      Example output
      pod "nginx" deleted
      
    2. Reapply the configuration file to redeploy the pod.
      oc apply -f <app.yaml>
      
      Example output
      pod/nginx created
      
  2. If restarting your pod does not resolve the issue, reload your worker nodes.

  3. Verify that you use the latest IBM Cloud and IBM Cloud Kubernetes Service plug-in version.

    ibmcloud update
    
    ibmcloud plugin repo-plugins
    
    ibmcloud plugin update
    

Verifying that the storage driver and plug-in pods show a status of Running

Follow the steps to check the status of your storage driver and plug-in pods and review any error messages.

  1. List the pods in the kube-system project.

    oc get pods -n kube-system
    
  2. If the storage driver and plug-in pods don't show a Running status, get more details of the pod to find the root cause. Depending on the status of your pod, the following commands might fail.

    1. Get the names of the containers that run in the driver pod.

      kubectl describe pod <pod_name> -n kube-system 
      
    2. Export the logs from the driver pod to a logs.txt file on your local machine.

      oc logs <pod_name> -n kube-system > logs.txt
      
    3. Review the log file.

      cat logs.txt
      
  3. Check the latest logs for any error messages. Review the ODF troubleshooting documentation for steps to resolve common errors.

Debugging your ODF resources

Describe your ODF resources and review the command outputs for any error messages.

  1. List the name of your ODF cluster.

    oc get ocscluster
    

    Example output:

    NAME             AGE
    ocscluster-vpc   71d
    
  2. Describe the storage cluster and review the Events section of the output for any error messages.

    oc describe ocscluster <ocscluster-name>
    
  3. List the ODF pods in the kube-system namespace and verify that they are Running.

    oc get pods -n kube-system
    

    Example output

    NAME                                                   READY   STATUS    RESTARTS   AGE
    ibm-keepalived-watcher-5g2gs                           1/1     Running   0          7d21h
    ibm-keepalived-watcher-8l4ld                           1/1     Running   0          7d21h
    ibm-keepalived-watcher-mhkh5                           1/1     Running   0          7d21h
    ibm-master-proxy-static-10.240.128.10                  2/2     Running   0          71d
    ibm-master-proxy-static-10.240.128.11                  2/2     Running   0          71d
    ibm-master-proxy-static-10.240.128.12                  2/2     Running   0          71d
    ibm-ocs-operator-controller-manager-55667f4d68-md4zb   1/1     Running   8          15d
    ibm-vpc-block-csi-controller-0                         4/4     Running   0          48d
    ibm-vpc-block-csi-node-6gnwv                           3/3     Running   0          48d
    ibm-vpc-block-csi-node-j2h62                           3/3     Running   0          48d
    ibm-vpc-block-csi-node-xpwpf                           3/3     Running   0          48d
    vpn-5b8694cdb-pll6z 
    
  4. Describe the ibm-ocs-operator-controller-manager pod and review the Events section in the output for any error messages.

    oc describe pod <ibm-ocs-operator-controller-manager-a1a1a1a> -n kube-system
    
  5. Review the logs of the ibm-ocs-operator-controller-manager.

    oc logs <ibm-ocs-operator-controller-manager-a1a1a1a> -n kube-system
    
  6. Describe NooBaa and review the Events section of the output for any error messages.

    oc describe noobaa -n openshift-storage
    
  7. Describe the ibm-storage-metrics-agent pod and review the Events section in the output for any error messages.

    oc get pods -n kube-system -l name=ibm-storage-metrics-agent
    
    NAME                                                  READY   STATUS    RESTARTS   AGE ibm-storage-metrics-agent-8685869cc6-79qzq   
    
  8. Review the logs from the ibm-storage-metrics-agent.

    oc logs ibm-storage-metrics-agent-xxx -n kube-system
    
  9. Describe the ocscluster and review the output for error messages.

    oc describe ocscluster <ocscluster-name> -n openshift-storage
    
  10. Gather data about the cluster by using the oc adm must-gather command.

    oc adm must-gather --image=registry.redhat.io/ocs4/ocs-must-gather-rhel8:latest --dest-dir=ocs_mustgather
    
  11. For classic clusters or Satellite clusters that use local volumes on the worker node, make sure that the disk-by-id for the volumes that you used for the osd-device-path and mon-device-path parameters exists on the worker nodes. For more information about how to retrieve these volume IDs, see Gathering your device details

  12. Review the troubleshooting documentation for steps to solve common errors.

  1. Get OCS pod name.

    export POD_OCS_NAME=$(oc get pods -n kube-system | grep ocs | awk '{print $1}')
    
  2. Get the logs of OCS operator controller manager.

    oc logs -n kube-system $POD_OCS_NAME
    
  3. Get a list of your app pods.

    oc get pods -n NAMESPACE
    
  4. Get the logs for the pod you want to troubleshoot.

    oc logs POD -n NAMESPACE
    

Change worker node name assigned to ODF

  1. Get the worker names.

    oc get nodes
    
  2. Change the names of the worker assigned to ODF.

    ibmcloud sat storage config param set --config <your_storage_config_name> --param "worker-nodes=<node-name-1>,<node-name-2>,<node-name-3>" --apply
    
  3. For Satellite clusters that use local volumes on the worker node, make sure that the disk-by-id for the volumes that you used for the osd-device-path and mon-device-path parameters exists on the worker nodes.

  4. Review the troubleshooting documentation for steps to solve common errors.