Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tests for Liveness probes #12497

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions pkg/testing/v1/service.go
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,14 @@ func WithReadinessProbe(p *corev1.Probe) ServiceOption {
}
}

// WithLivenessProbe sets the provided probe to be the liveness
// probe on the service.
func WithLivenessProbe(p *corev1.Probe) ServiceOption {
return func(s *v1.Service) {
s.Spec.Template.Spec.Containers[0].LivenessProbe = p
}
}

// MarkConfigurationNotReconciled calls the function of the same name on the Service's status.
func MarkConfigurationNotReconciled(service *v1.Service) {
service.Status.MarkConfigurationNotReconciled()
Expand Down
3 changes: 2 additions & 1 deletion test/conformance.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ const (
PizzaPlanet1 = "pizzaplanetv1"
PizzaPlanet2 = "pizzaplanetv2"
Protocols = "protocols"
Readiness = "readiness"
HealthProbes = "healthprobes"
Runtime = "runtime"
ServingContainer = "servingcontainer"
SidecarContainer = "sidecarcontainer"
Expand All @@ -59,6 +59,7 @@ const (
PizzaPlanetText1 = "What a spaceport!"
PizzaPlanetText2 = "Re-energize yourself with a slice of pepperoni!"
HelloWorldText = "Hello World! How about some tasty noodles?"
LivenessText = "Friendly ping "
HelloHTTP2Text = "Hello, New World! How about donuts and coffee?"
EmptyDirText = "From file in empty dir!"

Expand Down
227 changes: 227 additions & 0 deletions test/conformance/runtime/liveness_probe_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
//go:build e2e
// +build e2e

/*
Copyright 2022 The Knative Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package runtime

import (
"context"
"testing"
"time"

corev1 "k8s.io/api/core/v1"
pkgtest "knative.dev/pkg/test"
"knative.dev/pkg/test/spoof"
revisionresourcenames "knative.dev/serving/pkg/reconciler/revision/resources/names"
v1opts "knative.dev/serving/pkg/testing/v1"
"knative.dev/serving/test"
"knative.dev/serving/test/conformance/api/shared"
v1test "knative.dev/serving/test/v1"
)

const livenessPath = "/healthz/liveness"

func TestLiveness(t *testing.T) {
t.Parallel()
if test.ServingFlags.DisableOptionalAPI {
t.Skip("Container.livenessProbe is not required by Knative Serving API Specification")
}
clients := test.Setup(t)

var testCases = []struct {
// name of the test case, which will be inserted in names of routes, configurations, etc.
// Use a short name here to avoid hitting the 63-character limit in names
// (e.g., "service-to-service-call-svc-cluster-local-uagkdshh-frkml-service" is too long.)
name string
handler corev1.Handler
sleep bool
}{{
name: "httpGet",
handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: livenessPath,
},
},
}, {
name: "httpGetAfterFirstProbe",
handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: livenessPath,
},
},
sleep: true,
}}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
names := test.ResourceNames{
Service: test.ObjectNameForTest(t),
Image: test.HealthProbes,
}

test.EnsureTearDown(t, clients, &names)

t.Log("Creating a new Service")
resources, err := v1test.CreateServiceReady(t, clients, &names,
v1opts.WithLivenessProbe(
&corev1.Probe{
Handler: tc.handler,
PeriodSeconds: 10,
FailureThreshold: 1,
}))
if err != nil {
t.Fatalf("Failed to create initial Service: %v: %v", names.Service, err)
}

// If true sleeping till the first kubelet probe check.
if tc.sleep {
time.Sleep(15 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is trying to surface the bug we hit in #12462 by making sure the pod stays up for at least 15 seconds even if a liveness probe is configured, but I wouldn't have guessed that without remembering about that issue. I wonder if there's a way to do this that (a) is a bit clearer about what it's testing -- i.e. that a pod with a liveness probe is accessible while that probe returns true (and the probe actually runs), and then is restarted if it fails (b) ideally avoids a hardcoded sleep.

What about if the test image counted how many times its liveness handler had been invoked and returned that number in the response? Then we could WaitForEndpointState until the liveness probe had been executed at least N times (avoiding the sleep). We could then - maybe as a follow up - have the test POST to an endpoint on the test image that would cause the liveness check to fail, and assert that the container is properly restarted (this is similar to how we test readiness probes).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense - will do these changes, thanks!

Copy link
Member Author

@Shashankft9 Shashankft9 Jan 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julz just for clarity, if the liveness probe fails and the kubelet does see it, even though it restarts the containers, the readiness probe fails, and so the liveness probe check will also hang forever because of that. I am not really sure why the readiness probe is failing though - checking on that. (this is without any user provided readiness probe)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be the case where I am just adding bad livenesProbe or readinessProbe and going into the probe pitfalls, but I have tried to capture more of it here: #12571

}
url := resources.Route.Status.URL.URL()
url.Path = livenessPath
if _, err = pkgtest.CheckEndpointState(
context.Background(),
clients.KubeClient,
t.Logf,
url,
spoof.MatchesAllOf(spoof.IsStatusOK, spoof.MatchesBody(test.LivenessText)),
"livenessIsReady",
test.ServingFlags.ResolvableDomain,
test.AddRootCAtoTransport(context.Background(), t.Logf, clients, test.ServingFlags.HTTPS),
); err != nil {
t.Fatalf("The endpoint for Route %s at %s didn't return success: %v", names.Route, url, err)
}
// Check if scaling down works even if access from liveness probe exists.
if err := shared.WaitForScaleToZero(t, revisionresourcenames.Deployment(resources.Revision), clients); err != nil {
t.Fatal("Could not scale to zero:", err)
}
})

}
}

// commented out because, for this to work, https://github.com/knative/pkg/issues/2407 needs to be
// fixed, with another issue of a ksvc pod never properly restarting after going into a liveness
// probe fail.

/*
func TestLivenessWithFail(t *testing.T) {
t.Parallel()
if test.ServingFlags.DisableOptionalAPI {
t.Skip("Container.livenessProbe is not required by Knative Serving API Specification")
}
clients := test.Setup(t)

var testCases = []struct {
// name of the test case, which will be inserted in names of routes, configurations, etc.
// Use a short name here to avoid hitting the 63-character limit in names
// (e.g., "service-to-service-call-svc-cluster-local-uagkdshh-frkml-service" is too long.)
name string
handler corev1.Handler
sleep bool
}{{
name: "httpGet",
handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: livenessPath,
},
},
}}

for _, tc := range testCases {
t.Run(tc.name, func(t *testing.T) {
t.Parallel()
names := test.ResourceNames{
Service: test.ObjectNameForTest(t),
Image: test.HealthProbes,
}

test.EnsureTearDown(t, clients, &names)

t.Log("Creating a new Service")
resources, err := v1test.CreateServiceReady(t, clients, &names,
v1opts.WithLivenessProbe(
&corev1.Probe{
Handler: tc.handler,
PeriodSeconds: 13,
FailureThreshold: 1,
}))
if err != nil {
t.Fatalf("Failed to create initial Service: %v: %v", names.Service, err)
}

url := resources.Route.Status.URL.URL()
url.Path = livenessPath
if _, err = pkgtest.WaitForEndpointState(
context.Background(),
clients.KubeClient,
t.Logf,
url,
spoof.MatchesAllOf(spoof.IsStatusOK, spoof.MatchesBody(test.LivenessText + "15")),
"livenessIsReady",
test.ServingFlags.ResolvableDomain,
test.AddRootCAtoTransport(context.Background(), t.Logf, clients, test.ServingFlags.HTTPS),
); err != nil {
t.Fatalf("The endpoint for Route %s at %s didn't return success: %v", names.Route, url, err)
}

t.Log("POST to /start-failing")
client, err := pkgtest.NewSpoofingClient(context.Background(),
clients.KubeClient,
t.Logf,
url.Hostname(),
test.ServingFlags.ResolvableDomain,
test.AddRootCAtoTransport(context.Background(), t.Logf, clients, test.ServingFlags.HTTPS))
if err != nil {
t.Fatalf("Failed to create spoofing client: %v", err)
}

url.Path = "/start-failing"
startFailing, err := http.NewRequest(http.MethodPost, url.String(), nil)
if err != nil {
t.Fatal(err)
}
if _, err := client.Do(startFailing); err != nil {
t.Fatalf("POST to /start-failing failed: %v", err)
}

url.Path = livenessPath
if _, err = pkgtest.WaitForEndpointState(
context.Background(),
clients.KubeClient,
t.Logf,
url,
spoof.MatchesAllOf(spoof.IsStatusOK, spoof.MatchesBody(test.LivenessText + "15")),
"livenessIsReady",
test.ServingFlags.ResolvableDomain,
test.AddRootCAtoTransport(context.Background(), t.Logf, clients, test.ServingFlags.HTTPS),
); err != nil {
t.Fatalf("The endpoint for Route %s at %s didn't return success: %v", names.Route, url, err)
}

// Check if scaling down works even if access from liveness probe exists.
if err := shared.WaitForScaleToZero(t, revisionresourcenames.Deployment(resources.Revision), clients); err != nil {
t.Fatal("Could not scale to zero:", err)
}
})

}
}

*/
30 changes: 17 additions & 13 deletions test/conformance/runtime/readiness_probe_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,16 @@ import (
v1test "knative.dev/serving/test/v1"
)

// readinessPropagationTime is how long to poll to allow for readiness probe
// changes to propagate to ingresses/activator.
// This is based on the default scaleToZeroGracePeriod.
const readinessPropagationTime = 30 * time.Second
const (
// readinessPropagationTime is how long to poll to allow for readiness probe
// changes to propagate to ingresses/activator.
// This is based on the default scaleToZeroGracePeriod.
readinessPropagationTime = 30 * time.Second

func TestProbeRuntime(t *testing.T) {
readinessPath = "/healthz/readiness"
)

func TestReadiness(t *testing.T) {
t.Parallel()
if test.ServingFlags.DisableOptionalAPI {
t.Skip("Container.readinessProbe is not required by Knative Serving API Specification")
Expand All @@ -67,7 +71,7 @@ func TestProbeRuntime(t *testing.T) {
}},
handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: "/healthz",
Path: readinessPath,
},
},
}, {
Expand All @@ -87,7 +91,7 @@ func TestProbeRuntime(t *testing.T) {
}},
handler: corev1.Handler{
Exec: &corev1.ExecAction{
Command: []string{"/ko-app/readiness", "probe"},
Command: []string{"/ko-app/healthprobes", "probe"},
},
},
}}
Expand All @@ -106,7 +110,7 @@ func TestProbeRuntime(t *testing.T) {
t.Parallel()
names := test.ResourceNames{
Service: test.ObjectNameForTest(t),
Image: test.Readiness,
Image: test.HealthProbes,
}

test.EnsureTearDown(t, clients, &names)
Expand All @@ -125,7 +129,7 @@ func TestProbeRuntime(t *testing.T) {

// Once the service reports ready we should immediately be able to curl it.
url := resources.Route.Status.URL.URL()
url.Path = "/healthz"
url.Path = readinessPath
if _, err = pkgtest.CheckEndpointState(
context.Background(),
clients.KubeClient,
Expand Down Expand Up @@ -155,7 +159,7 @@ func TestProbeRuntime(t *testing.T) {
// The goal of this test is largely to describe the current behaviour, so that
// we can confidently change it.
// See https://github.com/knative/serving/issues/10765.
func TestProbeRuntimeAfterStartup(t *testing.T) {
func TestReadinessAfterStartup(t *testing.T) {
t.Parallel()
if test.ServingFlags.DisableOptionalAPI {
t.Skip("Container.readinessProbe behaviour after startup is not defined by Knative Serving API Specification")
Expand All @@ -166,7 +170,7 @@ func TestProbeRuntimeAfterStartup(t *testing.T) {
t.Run(fmt.Sprintf("periodSeconds=%d", period), func(t *testing.T) {
t.Parallel()
clients := test.Setup(t)
names := test.ResourceNames{Service: test.ObjectNameForTest(t), Image: test.Readiness}
names := test.ResourceNames{Service: test.ObjectNameForTest(t), Image: test.HealthProbes}
test.EnsureTearDown(t, clients, &names)

url, client := waitReadyThenStartFailing(t, clients, names, period)
Expand Down Expand Up @@ -216,7 +220,7 @@ func waitReadyThenStartFailing(t *testing.T, clients *test.Clients, names test.R
PeriodSeconds: probePeriod,
Handler: corev1.Handler{
HTTPGet: &corev1.HTTPGetAction{
Path: "/healthz",
Path: readinessPath,
},
},
}))
Expand All @@ -230,7 +234,7 @@ func waitReadyThenStartFailing(t *testing.T, clients *test.Clients, names test.R
// Ref: https://github.com/knative/serving/issues/11404
t.Log("Wait for initial readiness")
url := resources.Route.Status.URL.URL()
url.Path = "/healthz"
url.Path = readinessPath
if _, err = pkgtest.CheckEndpointState(
context.Background(),
clients.KubeClient,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Ready test image

This directory contains the test image used to simulate an image that responds
to various types of readiness probe.
to various types of readiness and liveness probes.

The image provides a /healthz endpoint which will reply with a 200 status code
and the Hello World text only after the delay requested by the STARTUP_DELAY
environment variable has elapsed.
The image provides a `/healthz/readiness` and a `/healthz/liveness` endpoint which
will reply with a 200 status code and the Hello World text only after the delay
requested by the STARTUP_DELAY environment variable has elapsed.

The image also contains an exec probe when run with "probe" as its only argument.

Expand Down
Loading