Istio Traffic Management

https://istio.io/latest/docs/tasks/traffic-management/

Istio 서비스 메쉬 적용후 가능해지는 Features 중에서 Traffic Management 기능들을 수행해 본다.

 

1. Request Routing

Default Destination Rules 적용

kubectl apply -f samples/bookinfo/networking/destination-rule-all.yaml
$ kubectl get destinationrules

details       details       17d
productpage   productpage   17d
ratings       ratings       17d
reviews       reviews       17d

details, productpage, ratings, reviews 에 대해 version v1, v2를 지정

 

Virtual service 적용

kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml
$ kubectl get virtualservices

NAME          GATEWAYS             HOSTS           AGE
bookinfo      [bookinfo-gateway]   [*]             24d
details                            [details]       17d
productpage                        [productpage]   17d
ratings                            [ratings]       17d
reviews                            [reviews]       17d

Virtual Service에 details, productpage, ratings, reviews 에 대해 version v1를 지정

 

User에 따라 Routing을 달리하는 설정 적용

kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
  - match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: reviews
        subset: v2
  - route:
    - destination:
        host: reviews
        subset: v1

reviews Virtual Service를 수정해서 jason 유저에 대해서만 v2를 지정하도록 설정

 

로그인 없이 productpage 접근시
: v1 이 보여진다. (reviews에 별이 안보임)

jason으로 로그인하여 productpage 접근시
: v2 가 보여진다. (reviews에 별이 보임)

 

2. Fault Injection

상황설명

Review 서비스와 Productpage 서비스의 개발팀이 서로 다르다.

 

Review 소스코드에 timeout 10s로 하드코딩되어 있음.

 

https://github.com/istio/istio/blob/master/samples/bookinfo/src/reviews/reviews-application/src/main/java/application/rest/LibertyRestEndpoint.java

... 중략 ...

    private JsonObject getRatings(String productId, HttpHeaders requestHeaders) {
      ClientBuilder cb = ClientBuilder.newBuilder();
      Integer timeout = star_color.equals("black") ? 10000 : 2500;
      cb.property("com.ibm.ws.jaxrs.client.connection.timeout", timeout);
      cb.property("com.ibm.ws.jaxrs.client.receive.timeout", timeout);
      Client client = cb.build();
      WebTarget ratingsTarget = client.target(ratings_service + "/" + productId);
      Invocation.Builder builder = ratingsTarget.request(MediaType.APPLICATION_JSON);

... 중략 ...

Review서비스에서 Rating서비스를 호출시, delay 7s로 줬을경우,

예상되는 상황은 timeout 이 발생하지 않을것이라 기대한다.

 

장애주입

Rating 서비스에 7초 Delay가 발생할 경우, 서비스에 이상이 없는지를 장애주입을 통해 확인해 볼수 있음.

Rating서비스에 jason 사용자에 대해서만 7s의 Delay를 설정한다.

 

$ kubectl apply -f samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
  hosts:
  - ratings
  http:
  - fault:
      delay:
        fixedDelay: 7s
        percentage:
          value: 100
    match:
    - headers:
        end-user:
          exact: jason
    route:
    - destination:
        host: ratings
        subset: v1
  - route:
    - destination:
        host: ratings
        subset: v1

 

페이지 확인

jason사용자로 Productpage에 접속하여 서비스를 확인한다.

 

http://192.168.19.133:31099/productpage

에러발생!! "Sorry, product reviews are currently unavailable for this book." 문구가 나옴.

 

원인

Productpage 개발팀에서,

Productpage 소스코드에 Reviews서비스에 대해 timeout 3s * 2회 = 6s 로 하드코딩 설정된 것이 원인임.

https://github.com/istio/istio/blob/master/samples/bookinfo/src/productpage/productpage.py

... 중략...

def getProductReviews(product_id, headers):
    # Do not remove. Bug introduced explicitly for illustration in fault injection task
    # TODO: Figure out how to achieve the same effect using Envoy retries/timeouts
    for _ in range(2):
        try:
            url = reviews['name'] + "/" + reviews['endpoint'] + "/" + str(product_id)
            res = requests.get(url, headers=headers, timeout=3.0)
        except BaseException:
            res = None
        if res and res.status_code == 200:
            return 200, res.json()
    status = res.status_code if res is not None and res.status_code else 500
    return status, {'error': 'Sorry, product reviews are currently unavailable for this book.'}

... 중략...

 

결론

장애주입을 통해 Rating서비스에 7s Delay를 주면 timeout이 발생한다는 것이 확인됨.

Review소스코드에 10s timeout을 3s 이내로 줄이는 것이 바람직함.

 

P.S) Request Timeout

$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v2
    timeout: 0.5s
EOF

 

3. Traffic Shifting

상황설명

Review 서비스에 대해 version을 v1 → v3로 업데이트 하려고 할때, 

50%는 v1에 할당하고, 50%는 v3에 할당한후, 최종적으로 v3에 100%를 할당하여 업데이트를 완료한다.

 

50% v1, 50% v3

$ kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-50-v3.yaml

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v1
      weight: 50
    - destination:
        host: reviews
        subset: v3
      weight: 50

review v1
review v3

페이지를 지속적으로 리프레쉬하면, review v1(별표가 없는 화면) 과 review v3(빨간색 별표가 있는 화면)이 50:50으로 나타난다.

 

100% v3

$ kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-v3.yaml

spec:
  hosts:
  - reviews
  http:
  - route:
    - destination:
        host: reviews
        subset: v3

P.S) TCP Traffic Shifting

$ kubectl apply -f samples/tcp-echo/tcp-echo-20-v2.yaml -n istio-io-tcp-traffic-shifting

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
  ...
spec:
  ...
  tcp:
  - match:
    - port: 31400
    route:
    - destination:
        host: tcp-echo
        port:
          number: 9000
        subset: v1
      weight: 80
    - destination:
        host: tcp-echo
        port:
          number: 9000
        subset: v2
      weight: 20

P.S) Mirroring

$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
  name: httpbin
spec:
  hosts:
    - httpbin
  http:
  - route:
    - destination:
        host: httpbin
        subset: v1
      weight: 100
    mirror:
      host: httpbin
      subset: v2
    mirror_percent: 100
EOF

 

4. Circuit Breaking

설명

httpbin Pod를 배포하고, outlierDetection을 적용한 desinationrule을 설정한 후,

부하툴인 Fortio를 통해 서비스 정상처리량과 503에러를 발생시키면서 circuit breaking 되는 양을 확인한다.

 

httpbin 배포

$ kubectl apply -f samples/httpbin/httpbin.yaml

 

destinationrule 설정

: connection 1개, concurrent request 1개 이상일때, istio-proxy가 circuit breaking을 발생시킴

$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
  name: httpbin
spec:
  host: httpbin
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 1
      http:
        http1MaxPendingRequests: 1
        maxRequestsPerConnection: 1
    outlierDetection:
      consecutiveErrors: 1
      interval: 1s
      baseEjectionTime: 3m
      maxEjectionPercent: 100
EOF

 

부하툴인 Fortio 배포

$ kubectl apply -f samples/httpbin/sample-client/fortio-deploy.yaml

 

Fortio 동작확인

$ export FORTIO_POD=$(kubectl get pods -lapp=fortio -o 'jsonpath={.items[0].metadata.name}')

$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio curl -quiet http://httpbin:8000/get

 

Circuit Breaking 발생

concurrent connection 2개로 20개 request 발생시킴

$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 -loglevel Warning http://httpbin:8000/get

20:33:46 I logger.go:97> Log level is now 3 Warning (was 2 Info)
Fortio 1.3.1 running at 0 queries per second, 6->6 procs, for 20 calls: http://httpbin:8000/get
Starting at max qps with 2 thread(s) [gomax 6] for exactly 20 calls (10 per thread + 0)
20:33:46 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
20:33:47 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
20:33:47 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
Ended after 59.8524ms : 20 calls. qps=334.16
Aggregated Function Time : count 20 avg 0.0056869 +/- 0.003869 min 0.000499 max 0.0144329 sum 0.113738
# range, mid point, percentile, count
>= 0.000499 <= 0.001 , 0.0007495 , 10.00, 2
> 0.001 <= 0.002 , 0.0015 , 15.00, 1
> 0.003 <= 0.004 , 0.0035 , 45.00, 6
> 0.004 <= 0.005 , 0.0045 , 55.00, 2
> 0.005 <= 0.006 , 0.0055 , 60.00, 1
> 0.006 <= 0.007 , 0.0065 , 70.00, 2
> 0.007 <= 0.008 , 0.0075 , 80.00, 2
> 0.008 <= 0.009 , 0.0085 , 85.00, 1
> 0.011 <= 0.012 , 0.0115 , 90.00, 1
> 0.012 <= 0.014 , 0.013 , 95.00, 1
> 0.014 <= 0.0144329 , 0.0142165 , 100.00, 1
# target 50% 0.0045
# target 75% 0.0075
# target 90% 0.012
# target 99% 0.0143463
# target 99.9% 0.0144242
Sockets used: 4 (for perfect keepalive, would be 2)
Code 200 : 17 (85.0 %)
Code 503 : 3 (15.0 %)
Response Header Sizes : count 20 avg 195.65 +/- 82.19 min 0 max 231 sum 3913
Response Body/Total Sizes : count 20 avg 729.9 +/- 205.4 min 241 max 817 sum 14598
All done 20 calls (plus 0 warmup) 5.687 ms avg, 334.2 qps

: Code 200 (정상처리) 85%, Code 503 (failure) 15%

 

concurrent connection 3개로 30개 request 발생시킴

$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 3 -qps 0 -n 30 -loglevel Warning http://httpbin:8000/get

Code 200 : 11 (36.7 %)
Code 503 : 19 (63.3 %)

istio-proxy 상태확인

$ kubectl exec "$FORTIO_POD" -c istio-proxy -- pilot-agent request GET stats | grep httpbin | grep pending

cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.high.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_active: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_failure_eject: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_overflow: 21
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_total: 29

: 21개의 circuit breaking 발생의미

 

 

 

 

 

 

 

'CloudNative' 카테고리의 다른 글

Day-2 Operation  (0) 2021.02.12