Istio Traffic Management
https://istio.io/latest/docs/tasks/traffic-management/
Istio 서비스 메쉬 적용후 가능해지는 Features 중에서 Traffic Management 기능들을 수행해 본다.
1. Request Routing
Default Destination Rules 적용
kubectl apply -f samples/bookinfo/networking/destination-rule-all.yaml
$ kubectl get destinationrules
details details 17d
productpage productpage 17d
ratings ratings 17d
reviews reviews 17d
details, productpage, ratings, reviews 에 대해 version v1, v2를 지정
Virtual service 적용
kubectl apply -f samples/bookinfo/networking/virtual-service-all-v1.yaml
$ kubectl get virtualservices
NAME GATEWAYS HOSTS AGE
bookinfo [bookinfo-gateway] [*] 24d
details [details] 17d
productpage [productpage] 17d
ratings [ratings] 17d
reviews [reviews] 17d
Virtual Service에 details, productpage, ratings, reviews 에 대해 version v1를 지정
User에 따라 Routing을 달리하는 설정 적용
kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-test-v2.yaml
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- match:
- headers:
end-user:
exact: jason
route:
- destination:
host: reviews
subset: v2
- route:
- destination:
host: reviews
subset: v1
reviews Virtual Service를 수정해서 jason 유저에 대해서만 v2를 지정하도록 설정
로그인 없이 productpage 접근시
: v1 이 보여진다. (reviews에 별이 안보임)
jason으로 로그인하여 productpage 접근시
: v2 가 보여진다. (reviews에 별이 보임)
2. Fault Injection
상황설명
Review 서비스와 Productpage 서비스의 개발팀이 서로 다르다.
Review 소스코드에 timeout 10s로 하드코딩되어 있음.
... 중략 ...
private JsonObject getRatings(String productId, HttpHeaders requestHeaders) {
ClientBuilder cb = ClientBuilder.newBuilder();
Integer timeout = star_color.equals("black") ? 10000 : 2500;
cb.property("com.ibm.ws.jaxrs.client.connection.timeout", timeout);
cb.property("com.ibm.ws.jaxrs.client.receive.timeout", timeout);
Client client = cb.build();
WebTarget ratingsTarget = client.target(ratings_service + "/" + productId);
Invocation.Builder builder = ratingsTarget.request(MediaType.APPLICATION_JSON);
... 중략 ...
Review서비스에서 Rating서비스를 호출시, delay 7s로 줬을경우,
예상되는 상황은 timeout 이 발생하지 않을것이라 기대한다.
장애주입
Rating 서비스에 7초 Delay가 발생할 경우, 서비스에 이상이 없는지를 장애주입을 통해 확인해 볼수 있음.
Rating서비스에 jason 사용자에 대해서만 7s의 Delay를 설정한다.
$ kubectl apply -f samples/bookinfo/networking/virtual-service-ratings-test-delay.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percentage:
value: 100
match:
- headers:
end-user:
exact: jason
route:
- destination:
host: ratings
subset: v1
- route:
- destination:
host: ratings
subset: v1
페이지 확인
jason사용자로 Productpage에 접속하여 서비스를 확인한다.
http://192.168.19.133:31099/productpage
에러발생!! "Sorry, product reviews are currently unavailable for this book." 문구가 나옴.
원인
Productpage 개발팀에서,
Productpage 소스코드에 Reviews서비스에 대해 timeout 3s * 2회 = 6s 로 하드코딩 설정된 것이 원인임.
https://github.com/istio/istio/blob/master/samples/bookinfo/src/productpage/productpage.py
... 중략...
def getProductReviews(product_id, headers):
# Do not remove. Bug introduced explicitly for illustration in fault injection task
# TODO: Figure out how to achieve the same effect using Envoy retries/timeouts
for _ in range(2):
try:
url = reviews['name'] + "/" + reviews['endpoint'] + "/" + str(product_id)
res = requests.get(url, headers=headers, timeout=3.0)
except BaseException:
res = None
if res and res.status_code == 200:
return 200, res.json()
status = res.status_code if res is not None and res.status_code else 500
return status, {'error': 'Sorry, product reviews are currently unavailable for this book.'}
... 중략...
결론
장애주입을 통해 Rating서비스에 7s Delay를 주면 timeout이 발생한다는 것이 확인됨.
Review소스코드에 10s timeout을 3s 이내로 줄이는 것이 바람직함.
P.S) Request Timeout
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v2
timeout: 0.5s
EOF
3. Traffic Shifting
상황설명
Review 서비스에 대해 version을 v1 → v3로 업데이트 하려고 할때,
50%는 v1에 할당하고, 50%는 v3에 할당한후, 최종적으로 v3에 100%를 할당하여 업데이트를 완료한다.
50% v1, 50% v3
$ kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-50-v3.yaml
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 50
- destination:
host: reviews
subset: v3
weight: 50
페이지를 지속적으로 리프레쉬하면, review v1(별표가 없는 화면) 과 review v3(빨간색 별표가 있는 화면)이 50:50으로 나타난다.
100% v3
$ kubectl apply -f samples/bookinfo/networking/virtual-service-reviews-v3.yaml
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v3
P.S) TCP Traffic Shifting
$ kubectl apply -f samples/tcp-echo/tcp-echo-20-v2.yaml -n istio-io-tcp-traffic-shifting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
...
spec:
...
tcp:
- match:
- port: 31400
route:
- destination:
host: tcp-echo
port:
number: 9000
subset: v1
weight: 80
- destination:
host: tcp-echo
port:
number: 9000
subset: v2
weight: 20
P.S) Mirroring
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: httpbin
spec:
hosts:
- httpbin
http:
- route:
- destination:
host: httpbin
subset: v1
weight: 100
mirror:
host: httpbin
subset: v2
mirror_percent: 100
EOF
4. Circuit Breaking
설명
httpbin Pod를 배포하고, outlierDetection을 적용한 desinationrule을 설정한 후,
부하툴인 Fortio를 통해 서비스 정상처리량과 503에러를 발생시키면서 circuit breaking 되는 양을 확인한다.
httpbin 배포
$ kubectl apply -f samples/httpbin/httpbin.yaml
destinationrule 설정
: connection 1개, concurrent request 1개 이상일때, istio-proxy가 circuit breaking을 발생시킴
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: httpbin
spec:
host: httpbin
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
outlierDetection:
consecutiveErrors: 1
interval: 1s
baseEjectionTime: 3m
maxEjectionPercent: 100
EOF
부하툴인 Fortio 배포
$ kubectl apply -f samples/httpbin/sample-client/fortio-deploy.yaml
Fortio 동작확인
$ export FORTIO_POD=$(kubectl get pods -lapp=fortio -o 'jsonpath={.items[0].metadata.name}')
$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio curl -quiet http://httpbin:8000/get
Circuit Breaking 발생
concurrent connection 2개로 20개 request 발생시킴
$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 2 -qps 0 -n 20 -loglevel Warning http://httpbin:8000/get
20:33:46 I logger.go:97> Log level is now 3 Warning (was 2 Info)
Fortio 1.3.1 running at 0 queries per second, 6->6 procs, for 20 calls: http://httpbin:8000/get
Starting at max qps with 2 thread(s) [gomax 6] for exactly 20 calls (10 per thread + 0)
20:33:46 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
20:33:47 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
20:33:47 W http_client.go:679> Parsed non ok code 503 (HTTP/1.1 503)
Ended after 59.8524ms : 20 calls. qps=334.16
Aggregated Function Time : count 20 avg 0.0056869 +/- 0.003869 min 0.000499 max 0.0144329 sum 0.113738
# range, mid point, percentile, count
>= 0.000499 <= 0.001 , 0.0007495 , 10.00, 2
> 0.001 <= 0.002 , 0.0015 , 15.00, 1
> 0.003 <= 0.004 , 0.0035 , 45.00, 6
> 0.004 <= 0.005 , 0.0045 , 55.00, 2
> 0.005 <= 0.006 , 0.0055 , 60.00, 1
> 0.006 <= 0.007 , 0.0065 , 70.00, 2
> 0.007 <= 0.008 , 0.0075 , 80.00, 2
> 0.008 <= 0.009 , 0.0085 , 85.00, 1
> 0.011 <= 0.012 , 0.0115 , 90.00, 1
> 0.012 <= 0.014 , 0.013 , 95.00, 1
> 0.014 <= 0.0144329 , 0.0142165 , 100.00, 1
# target 50% 0.0045
# target 75% 0.0075
# target 90% 0.012
# target 99% 0.0143463
# target 99.9% 0.0144242
Sockets used: 4 (for perfect keepalive, would be 2)
Code 200 : 17 (85.0 %)
Code 503 : 3 (15.0 %)
Response Header Sizes : count 20 avg 195.65 +/- 82.19 min 0 max 231 sum 3913
Response Body/Total Sizes : count 20 avg 729.9 +/- 205.4 min 241 max 817 sum 14598
All done 20 calls (plus 0 warmup) 5.687 ms avg, 334.2 qps
: Code 200 (정상처리) 85%, Code 503 (failure) 15%
concurrent connection 3개로 30개 request 발생시킴
$ kubectl exec "$FORTIO_POD" -c fortio -- /usr/bin/fortio load -c 3 -qps 0 -n 30 -loglevel Warning http://httpbin:8000/get
Code 200 : 11 (36.7 %)
Code 503 : 19 (63.3 %)
istio-proxy 상태확인
$ kubectl exec "$FORTIO_POD" -c istio-proxy -- pilot-agent request GET stats | grep httpbin | grep pending
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.default.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.circuit_breakers.high.rq_pending_open: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_active: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_failure_eject: 0
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_overflow: 21
cluster.outbound|8000||httpbin.default.svc.cluster.local.upstream_rq_pending_total: 29
: 21개의 circuit breaking 발생의미
'CloudNative' 카테고리의 다른 글
Day-2 Operation (0) | 2021.02.12 |
---|