Kubeflow 1.1

www.kubeflow.org/docs/started/k8s/kfctl-k8s-istio/

 

Install Kustomize

kubectl.docs.kubernetes.io/installation/kustomize/binaries/

$ curl -s "https://raw.githubusercontent.com/\ kubernetes-sigs/kustomize/master/hack/install_kustomize.sh" | bash

$ mv kustomize /usr/local/bin

$ [root@kubeflow MyKubeflow]# kustomize version

{Version:kustomize/v3.8.8 GitCommit:72262c5e7135045ed51b01e417a7e72f558a22b0 BuildDate:2020-12-10T18:05:35Z GoOs:linux GoArch:amd64}

 

Install Dynamic Volume Provisioning

github.com/rancher/local-path-provisioner#deployment

$ kubectl apply -f https://raw.githubusercontent.com/rancher/local-path-provisioner/master/deploy/local-path-storage.yaml

$ kubectl get sc

NAME                   PROVISIONER             AGE
local-path   rancher.io/local-path   81m

※ storageclass에 default 설정을 추가해준다.

$ kubectl edit sc local-path 

  annotations:
    storageclass.kubernetes.io/is-default-class: "true"

 $ kubectl get sc

NAME                   PROVISIONER             AGE
local-path (default)   rancher.io/local-path   81m

 

Insttall Kubeflow

$ yum install wget

$ wget http://github.com/kubeflow/kfctl/releases/download/v1.2.0/kfctl_v1.2.0-0-gbc038f9_linux.tar.gz

$ tar -xvf kfctl_v1.2.0-0-gbc038f9_linux.tar.gz

$ mv kfctl /usr/local/bin

$ export PATH=$PATH:/usr/local/bin

$ export export KF_NAME=MyKubeflow

$ export BASE_DIR=$HOME

$ export KF_DIR=${BASE_DIR}/${KF_NAME}

$ export CONFIG_URI="https://raw.githubusercontent.com/kubeflow/manifests/v1.1-branch/kfdef/kfctl_k8s_istio.v1.1.0.yaml"

 

$ mkdir -p ${KF_DIR}

$ cd ${KF_DIR}

$ kfctl apply -V -f ${CONFIG_URI}

 

※ 만약, 설치가 제대로 안된다면, 아래의 troubleshooing을 적용후 삭제후 재설치해본다.

$ kfctl delete -f kfctl_k8s_istio.v1.1.0.yaml

$ rm -rf ${KF_DIR}

$ cd ${KF_DIR}

$ kfctl apply -V -f ${CONFIG_URI}

 

※ 만약, istio-token 에러가 난다면, api-server에 설정을 추가해준다.

MountVolume.SetUp failed for volume "istio-token"

github.com/kubeflow/manifests/issues/959

 

$ vi /etc/kubernetes/manifests/kube-apiserver.yaml

    - --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
    - --service-account-issuer=kubernetes.default.svc

※ 만약, unable to get metrics for resource cpu 에러가 난다면, metricserver를 설치한다.

github.com/kubernetes-sigs/metrics-server

$ wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

$ vi components.yaml

deployment 설정에 추가해준다
--kubelet-insecure-tls

$ kubectl apply -f components.yaml

 

 

정상적으로 설치완료되면 아래와 같이 구성요소들이 기동된다.

$ kubectl get pod -n kubeflow

NAME                                                     READY   STATUS    RESTARTS   AGE
admission-webhook-bootstrap-stateful-set-0               1/1     Running   1          45m
admission-webhook-deployment-5bc5f97cfd-dvsmc            1/1     Running   0          33m
application-controller-stateful-set-0                    1/1     Running   0          47m
argo-ui-669bcd8bfc-2d4nz                                 1/1     Running   0          45m
cache-deployer-deployment-b75f5c5f6-8hjtf                2/2     Running   1          45m
cache-server-85bccd99bd-4hsgm                            2/2     Running   0          45m
centraldashboard-68965b5d89-8x6b8                        1/1     Running   0          45m
jupyter-web-app-deployment-5dfbb68956-j65jd              1/1     Running   0          45m
katib-controller-76b78f5db-f4pnm                         1/1     Running   0          45m
katib-db-manager-67c9554b6d-h5dr7                        1/1     Running   0          45m
katib-mysql-5754b5dd66-mtwzd                             1/1     Running   0          45m
katib-ui-844b4fc655-mgndk                                1/1     Running   0          44m
kfserving-controller-manager-0                           2/2     Running   0          45m
kubeflow-pipelines-profile-controller-65b65d97bb-nlz29   1/1     Running   0          45m
metacontroller-0                                         1/1     Running   0          45m
metadata-db-695fb6f55-qfttw                              1/1     Running   0          45m
metadata-deployment-7d77b884b6-j77d7                     1/1     Running   4          45m
metadata-envoy-deployment-c5985d64b-kkfjk                1/1     Running   0          45m
metadata-grpc-deployment-9fdb476-w9wvw                   1/1     Running   2          44m
metadata-ui-cf67fdb48-5rbfx                              1/1     Running   0          45m
metadata-writer-59d755696c-k75pb                         2/2     Running   0          45m
minio-6647564c5c-nng67                                   1/1     Running   0          45m
ml-pipeline-6bc56cd86d-ctvnp                             2/2     Running   7          45m
ml-pipeline-persistenceagent-6f99b56974-vnl9q            2/2     Running   0          45m
ml-pipeline-scheduledworkflow-d596b8bd-t768z             2/2     Running   0          45m
ml-pipeline-ui-8695cc6b46-5mj5l                          2/2     Running   0          44m
ml-pipeline-viewer-crd-5998ff7f56-fg6nc                  2/2     Running   1          44m
ml-pipeline-visualizationserver-cbbb5b5b-429j9           2/2     Running   0          44m
mpi-operator-c747f5bf6-tmf47                             1/1     Running   1          44m
mxnet-operator-7cd59d475-4htgf                           1/1     Running   1          44m
mysql-76597cf5b5-7tj8x                                   2/2     Running   0          44m
notebook-controller-deployment-756587d86-4p4xr           1/1     Running   0          44m
profiles-deployment-65fcc9c97-qt2bs                      2/2     Running   0          44m
pytorch-operator-5db58565b-76s8c                         1/1     Running   1          44m
seldon-controller-manager-6ddf664d54-dhgm7               1/1     Running   1          44m
spark-operatorsparkoperator-85bbf89886-gk6dn             1/1     Running   0          45m
spartakus-volunteer-7566cfd658-wv79t                     1/1     Running   0          44m
tf-job-operator-5bf84768bf-z2vjd                         1/1     Running   1          44m
workflow-controller-54dccb7dc4-rl4bm                     1/1     Running   0          44m

 

Dashboard 접속

 

$ kubectl get svc istio-ingressgateway -n istio-system

NAME                   TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)                                                                                                                                      AGE
istio-ingressgateway   NodePort   10.111.109.51   <none>        15020:31630/TCP,80:31380/TCP,443:31390/TCP,31400:31400/TCP,15029:31116/TCP,15030:31533/TCP,15031:31478/TCP,15032:31955/TCP,15443:30855/TCP   50m

http://192.168.19.134:31380

Notebook Server 만들기

Fashion Mnist 실행해보기

import tensorflow as tf

class MyFashionMnist(object):
  def train(self):

    (x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
    x_train, x_test = x_train / 255.0, x_test / 255.0

    model = tf.keras.models.Sequential([
      tf.keras.layers.Flatten(input_shape=(28, 28)),
      tf.keras.layers.Dense(128, activation='relu'),
      tf.keras.layers.Dropout(0.2),
      tf.keras.layers.Dense(10, activation='softmax')
    ])
    model.summary()
    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    model.fit(x_train, y_train, epochs=10)

    model.evaluate(x_test,  y_test, verbose=2)

if __name__ == '__main__':
    local_train = MyFashionMnist()
    local_train.train()