我正在使用下面的 .yaml 文件在 Kubeflow 中创建 Katib Experiment。但是,我越来越
Failed to reconcile: cannot restore struct from: string
错误。对此有任何解决方案吗?大多数 Katib Experiment 示例代码中没有卷,但我试图在从我的 S3 下载数据后安装一个卷。
apiVersion: "kubeflow.org/v1alpha3"
kind: Experiment
metadata:
namespace: apple
labels:
controller-tools.k8s.io: "1.0"
name: transformer-experiment
spec:
objective:
type: maximize
goal: 0.8
objectiveMetricName: Train-accuracy
additionalMetricNames:
- Train-loss
algorithm:
algorithmName: random
parallelTrialCount: 3
maxTrialCount: 12
maxFailedTrialCount: 3
metricsCollectorSpec:
collector:
kind: StdOut
parameters:
- name: --lr
parameterType: double
feasibleSpace:
min: "0.01"
max: "0.03"
- name: --dropout_rate
parameterType: double
feasibleSpace:
min: "0.005"
max: "0.020"
- name: --layer_count
parameterType: int
feasibleSpace:
min: "2"
max: "5"
- name: --d_model_count
parameterType: categorical
feasibleSpace:
list:
- "64"
- "128"
- "256"
trialTemplate:
goTemplate:
rawTemplate: |-
apiVersion: batch/v1
kind: Job
metadata:
name: {{.Trial}}
namespace: {{.NameSpace}}
spec:
template:
spec:
volumes:
- name: train-data
emptyDir: {}
containers:
- name: data-download
image: amazon/aws-cli
command:
- "aws s3 sync s3://kubeflow/kubeflowdata.tar.gz /train-data"
volumeMounts:
- name: train-data
mountPath: /train-data
- name: {{.Trial}}
image: <Our Image>
command:
- "cd /train-data"
- "ls"
- "python"
- "/opt/ml/src/main.py"
- "--train_batch=64"
- "--test_batch=64"
- "--num_workers=4"
volumeMounts:
- name: train-data
mountPath: /train-data
{{- with .HyperParameters}}
{{- range .}}
- "{{.Name}}={{.Value}}"
{{- end}}
{{- end}}
restartPolicy: Never
最佳答案
如回答here ,以下对我有用:
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: training-container
image: docker.io/romeokienzler/claimed-train-mobilenet_v2:0.4
command:
- "ipython"
- "/train-mobilenet_v2.ipynb"
- "optimizer=${trialParameters.optimizer}"
volumeMounts:
- mountPath: /data/
name: data-volume
restartPolicy: Never
volumes:
- name: data-volume
persistentVolumeClaim:
claimName: data-pvc
https://stackoverflow.com/questions/63844416/