jaegertracing / jaeger-operator Goto Github PK

View Code? Open in Web Editor NEW

1.0K 18.0 340.0 24.27 MB

Jaeger Operator for Kubernetes simplifies deploying and running Jaeger on Kubernetes.

Home Page: https://www.jaegertracing.io/docs/latest/operator/

License: Apache License 2.0

Go 88.93% Dockerfile 0.36% Shell 8.41% Makefile 2.30%

operator kubernetes jaegertracing hacktoberfest

jaeger-operator's People

Stargazers

Watchers

Forkers

jpkrohling pavelnikolov mad01 fengzixu secat geoand enesunal andream16 clyang82 jaegerci-bot annanay25 thedodd mihau sapcc timfpark pavolloffay atul9 chirino bysnupy awgreene rajdhandus gregoryfranklin etsangsplk modularsystems mtritschler letubert giang12 bsmr pwei1018 jkandasa mehstg sabreoss caohm zhang-ht18 sindhurekha gonzalesraul rubenvp8510 slok paologallinaharbur brianwong1861 cfontes tarvip mwringe vika2433 zhengweisk dnovvak hacktastic sysbind msshroff malz peacecoder jan25 nikita-v abstulo cloud-land chandresh-pancholi see-quick mcammisa78 yeya24 denisneuling sebbonnet cedrickring anshukr96 jendrikjoe felixrodriguezjara volmedo haf-afa shubhanshus iketutg allanpedroni palmerabollo amila-ku garyfritz ghrpdemo rabun788 marceloamaral davidxarnold christine-gong tinder-wentinggong alexsn isgasho bhavin192 rafabsb jstickler jgehrcke andye2004 noynir rareddy jrockway udit249 eundoosong piyush-pruthi m3ntalsp00n chaospuppy jalford-ns vinta-iaas-labs harpratap mszalai-casa kgtom laashub-soa

jaeger-operator's Issues

Install Spark Job

Install the Spark Job for dependency processing when Elasticsearch/cassandra is used.

Could agent sidecar injection specify service name

Wondering if, when an agent sidecar is injected, whether it could also specify the JAEGER_SERVICE_NAME environment variable on the service?

This would avoid user having to manually add it, and also means that we could define the convention - e.g. ., if we wanted to support multitenancy.

As a side issue - should the injection be dependent upon an annotation? Similar to Istio sidecar injection which can be enabled on the namespace (so all deployments) or individually? Just so that users have some control, and not everything automatically gets a jaeger agent sidecar.

How to add volumes and volumeMounts

I want to this yaml file can be able to apply by jaeger-operator.

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: istio-jaeger
spec:
  strategy: production
  collector:
    image: jaegertracing/jaeger-collector:latest
    volumeMounts:
    - mountPath: /usr/share/elasticsearch/config/tls
      name: certs
      readOnly: true
  query:
    image: jaegertracing/jaeger-query:latest
    volumeMounts:
    - mountPath: /usr/share/elasticsearch/config/tls
      name: certs
      readOnly: true
  volumes:
  - name: certs
    secret:
      defaultMode: 420
      secretName: es-certs
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
        tls: true
        tls.cert: /usr/share/elasticsearch/config/tls/elasticsearch-router.crt
        tls.key: /usr/share/elasticsearch/config/tls/elasticsearch-router.key
        tls.ca: /usr/share/elasticsearch/config/tls/ca.crt

Check the source code and cannot find any struct defined for volumes and volumeMounts

"Connection refused" on zipkin port

Deployed the all-in-one using the jaeger-kubernetes template and was able to post an example zipkin trace to the zipkin port.

However when trying the same with the operator I got:

$ curl -H "Content-Type: application/json" --request POST --data @zipkin.txt http://localhost:9411/api/v1/spans
curl: (52) Empty reply from server

with the following in the kubctl port-forward:

$kubectl port-forward $(kubectl get pod -l appaeger -o jsonpath='{.items[0].metadata.name}') 9411:9411
Forwarding from 127.0.0.1:9411 -> 9411
Forwarding from [::1]:9411 -> 9411
Handling connection for 9411
E1015 16:48:14.872520   12526 portforward.go:331] an error occurred forwarding 9411 -> 9411: error forwarding port 9411 to pod 8ffdd850d8a13b5147a4c2fe14e1c0db344ab59e3b432077ee46992973e889b7, uid : exit status 1: 2018/10/15 15:48:14 socat[8571] E connect(5, AF=2 127.0.0.1:9411, 16): Connection refused

Example zipkin trace can be found in zipkin.txt

How to: install agent as a daemonset or sidecar

Need to document how agent can be used as a daemonset or sidecar.

Incorporate rbac, crd and operator yaml into one file

Not sure if it is convention, but wondering if there is a real benefit to having the rbac, crd and operator yaml content in separate files.

It means users have to perform three operations instead of one. Same operations are performed on both kubernetes and openshift - so just thinking it may be better to have a single jaeger-operator.yaml file with all content?

Agent sidecar injection: enable propagation env var to be set

If automatic agent sidecar injection is used within an Istio environment, then it would be useful to be able to set the JAEGER_PROPAGATION environment variable to B3.

To enable this to be more generic, we should just enable the agent configuration to include setting the propagation value.

This requirement is similar to #29, which sets the service name env var when auto-injecting the agent sidecar.

Support update of the underlying Jaeger version

We should add support for automatic updates of the Jaeger instances that are managed by the operator. Idea:

Add a new property to the model: "update: [none|patch|minor|major]" to the model
Label the resulting "jaeger" object with the property above
From time to time, check for new versions of the Jaeger image on Docker Hub
On new versions, check which Jaeger instances need to be updated and do a blue/green deployment. For this to work properly, the following has to happen, in order:

Migrate the data. If this step fails, skip everything else
Add new collectors. If this step fails, skip everything else
Create a dummy pod with the agent, so that image gets pulled and we can check whether the container starts fine. Delete the pod once we confirm it's working. If this step fails, remove the new collectors and skip everything else
For each application with the old sidecar, update the deployment to use a new version. This will cause a new pod to start, replacing the old ones
Add new query/UI. If this fails, don't rollback anything. Just do not continue with the next step
Remove the old collectors/queries/UIs

We need also to find a way to alert the admins in case of failures. Is it enough for us to just generate metrics, hoping an admin will create an alert in case a specific gauge/counter goes off? There's an "admin" page being considered for Jaeger. How can we provide data to that?

Things we need to consider:

During the data update, should we stop the collectors from writing to it, or can we trust the update will be properly managed by the create-schema job?

Create Route when running on OpenShift

When running the operator on OpenShift, the operator should be able to create a Route for the Query UI, instead of using an Ingress.

Support overriding the Jaeger image name/version

Add a viper flag allowing users to override the version and image to use with all-in-one/agent/collector/query.

Run 'make generate' as part of the build

Currently, we require 'make generate' to be executed during the development phase and we don't check if it's needed during the CI. This has caused a couple of PRs to be merged with model changes but without the generated deep copy functions to be updated.

Ideally, we would run make generate and fail the build if it reports changes, similar to what we do with make fmt.

Move hard coded strings to constants

There are quite a few hard coded strings in the code, like jaeger-agent as the container name for the Jaeger Agent sidecar container. They should all move to constants.

Where appropriate, types should be used to represent constant values.

Run the Cassandra `create-schema` batch job

Currently, the Operator has no special knowledge about specific backing storages, but perhaps it should. Jaeger on Cassandra requires the schema to be created. This is done by running the image jaegertracing/jaeger-cassandra-schema. In Kubernetes, this is executed as a batch job:

https://github.com/jaegertracing/jaeger-kubernetes/blob/59e7afcae7b2f3bc109695cfda2a5c116cea4391/production/cassandra.yml#L108-L129

The idea here would be to do something similar: install the batch job and wait for it to finish before proceeding with the creation of the other objects.

The first implementation could be something very simple: the controller interface could have a Requirements() []batch.Job, batch being "k8s.io/api/batch/v1". The stub would run all the jobs and wait until they have completed.

This would also allow us to be able to run migration scripts and install pre-requirements, as long as this is wrapped in a "batch" container.

Document that `--config-file` shouldn't be used with the operator

Using an external configuration file mean that the Jaeger instance has to be manually restarted, instead of letting the operator handle the configuration change.

We need to document this behavior and recommendation, but also point out that this might change soon, once on-the-fly updates are supported.

See jaegertracing/jaeger#1058

Inject sidecars from different namespaces

It's quite possible that Jaeger instances will live in one namespace and applications in another. We need a way then to inject sidecars into deployments from one namespace pointing to Jaeger instances in another namespace.

Tasks:

Figure out whether the JaegerSpec should have a list of namespaces the operator should watch
The RBAC implications: what are the failure scenarios and how to recover from that
On OpenShift, one namespace cannot see things from another namespace by default. What should we do here?

Use fixed versions for Jaeger Operator releases

When releasing the Jaeger Operator, provide an image version for the jaeger-operator that is fixed (like, v1.6.1), instead of a "stream" version.

Support custom annotations/labels

We should provide a way for admins to label the resources that are being created. Something like:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: simple-prod
  labels:
    tenant: acme
spec:
  strategy: all-in-one
  all-in-one:
    annotations:
      prometheus.io/scrape: "false"

Note that some labels/annotations might clash with the ones we supply in the operator itself, like the prometheus ones. Because of that, we should always honor what the user specified in the CR.

Add support for TLS on ingress

https://kubernetes.io/docs/concepts/services-networking/ingress/#tls

Change from free form options to a typed model

As discussed in #71 (comment), the current CRD uses a free format options field to allow the user to pass parameters through to the executables.

Although this means that the user could take advantage of new options without requiring a new release of the operator, it also means that incorrect options can be specified and passed to the executables as parameters - resulting in the executable throwing an error.

Having a properly typed model in the CRD means that:

we can ensure the user has not inadvertently specified an option that will cause the executable to fail to start
options that only support specific values can be validated and an informative message presented to the user
the structure of the options can be organised in a more meaningful way - e.g. cassandra create schema info contained under other information related to cassandra.

Add liveness probes

Add liveness probes and enable the user to optionally specify the initial delay and period durations.

Handling UI configuration in a json file

Currently the CR enables options (name value pairs) to be specified which are converted to arguments supplied to the executable.

However UI configuration requires the specification of a separate file.

Wondering whether all options should be defined in a configmap per jaeger instance?

Update the Operator SDK

An upcoming release of the Operator SDK will break the compat with current master, which is what we currently have at Gopkg.toml.

This issue is to track:

Fix the operator SDK version in the dependency management file
Bump our operator SDK version once their refactoring is done + fix the parts where it breaks us.

https://github.com/kubernetes-sigs/controller-runtime

Inject sidecar in properly in annotated Statefulsets

I am testing out Jaeger as a distributed tracing solution, Jaeger-Operator sounded like the easiest way to do it, and here I am ( I also checked the helm chart, but this looks like the preferred way currently )

Operator worked great, but I can't use the very handy annotation below

annotations:
inject-jaeger-agent: "true"

Our system is an actor based system with sharded local state, and we use Statefulsets to maintain local state and instances together.

Of course I can use the side car in my YAML and I did that, but this annotation is a lot easier to maintain.

That is it!

Add support for image pull secrets

Enable the CR to specify a list of image pull secrets that could be used with any deployments, or possibly a serviceaccount if we define one.

Add prometheus scrape annotations to agent

Following on from #27, we should add the prometheus scrape/port annotations to the agent.

Issue with the agent as sidecar is to ensure we don't overwrite similar annotations provided by the service.

(minor) In `main.go` use proper context.

Use context.Background() in https://github.com/jaegertracing/jaeger-operator/blob/master/cmd/jaeger-operator/main.go#L37
instead of `context.TODO().
The TODO context is almost never the right context to use.

Update the version at build time

I think that the version should be injected into the binary at build time. This way you don't risk to end up with two binaries with different code with the same version. Unless I'm missing something.

Put more thought on passing refs vs. value

On the early days of the operator, it was easier to just pass references around. Unfortunately, this is not always appropriate and gives too much power to the function being called, as it can change the object in ways the caller isn't expecting.

A first thing to assess is whether we can change the places where we are using *v1alpha1.Jaeger to use values instead. It would mean that some functions will have its signature changed, like the controller.normalize(*v1alpha1.Jaeger), so that it returns a changed normalized object instead of changing it in-place.

Configuring a Kafka based Jaeger architecture

Currently Kafka support is being added to Jaeger in two places, as a storage plugin and an ingester.

The aim of this approach is to have a collector configured with Kafka as storage, to publish spans to Kafka, and then ingesters that can consume those messages and store the spans in a real storage backend (e.g. elasticsearch/cassandra).

We need to consider how such a configuration would be defined in the operator's CR?

Currently kafka is being listed as a storage type - but an operator CR can only support a single storage type - so either

We need to treat this kafka based configuration as something else - i.e. the storage type is specified as the real storage used by the ingester, but the collector using kafka and the ingester need to be configured from a different spec?
There would be two separate CRs - one defining the collector with Kafka storage, and the other defining the ingester with real storage. Issue with this approach is that only a subset of the components may need to be configured in each CR - so query will only be defined in the second CR (as it will also use the same real storage), and agent may potentially be defined in the first, as it will use the collector.

Although Kafka not yet fully supported, we need to consider how its introduction may impact the spec structure.

Update Docker image when a PR is merged

Whenever a PR is merged, it would be good to generate a new Docker image and push to Docker Hub under the latest version tag (like, 1.6.CI_BUILD_NUM + 1.6)

Reconsider the strategy setting

I like that all-in-one is the default value for the strategy setting. However, for new starters with Jaeger it might not be immediately obvious what the difference between all-in-one and production is. I would suggest a different point of view here. Instead of changing the functionality based on strategy, change it based on storage. Consider renaming the strategy option to storage and allow the following values: memory, elasticsearch, cassandra, etc.
The default storage option might be memory. The operator would chose the all-in-one image for that storage type. The user will still have the option to override the image for advanced users but in most cases you can preselect the default jaeger docker images - only request the users to select the version (tag) and image pull policy.
The other benefit of this approach is that if you change functionality based on storage type you can also add some functionality for managing retention. For example, you can delete Elasticsearch indices when they get out of retention. Alternatively, you can have per-service/operation retention and the operator can delete the traces for some busy services from Cassandra after a week but keep others for a month.
I am not saying that this is the best approach, I just wanted to give you a different point of view.

How to: create Jaeger instance in different namespace to operator

One of the benefits of the operator is supposed to be that it can be installed once, and used to create multiple Jaeger instances in different namespaces.

If this is possible, then we should document (possibly just in the readme) how such a configuration would be achieved.

Zipkin port

The Zipkin port is opened only when the env var COLLECTOR_ZIPKIN_HTTP_PORT is set, so, the both the collector and the all-in-one need to set this.

Switch to disable the ingress resource

Is it currently possible to disable the creation of the ingress resource?

If not, I think it would be a good feature to update the CRD spec with an ingress configuration section.

Add configurable option on resource requirements for jaeger componenets

We need to make resource requests and limits configurable for all jaeger pods.

Document how to install ES/Cassandra

For production setups, in-memory isn't suitable and documentation is need on how to get ES/Cassandra running and how to configure the setup to use it.

Add a helm chart

So far, awesome job @jpkrohling! However I think that the operator needs a helm chart.
In the README.md you show how to set this up using the kubectl --apply ... command. This is helpful but in a production environment, chances are that we wouldn't install things manually using kubectl. This is why I believe that there needs to be a helm chart for installing the operator. WDYT?

Avoid duplicate messages

As the operator runs in a loop, we should avoid logging the same message twice. Currently, we can see this:

INFO[0695] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0700] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0705] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0710] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0715] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0720] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0725] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0730] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0735] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory'

Add example application(s)

As seen on #52 (comment) , we need a simple application that generates traces. This application can then be used in the different deployment scenarios and e2e tests.

Add support for k8s secrets

As discussed istio/istio#9508 (comment) it would be good to support k8s secrets, to enable (for example) username and password to be passed to Elasticsearch and Cassandra.

Error "The resourceVersion for the provided watch is too old"

When the operator is running for an extended period of time, the following message is shown:

ERROR: logging before flag.Parse: W1109 09:42:04.207951   16079 reflector.go:341] github.com/jaegertracing/jaeger-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:91: watch of *unstructured.Unstructured ended with: unexpected object: &{map[code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired]}

This was seen on OpenShift but might happen with plain Kubernetes as well:

$ minishift status
Minishift:  Running
Profile:    istio-tutorial
OpenShift:  Running (openshift v3.10.0+349c70c-73)
DiskUsage:  23% of 19G (Mounted On: /mnt/sda1)
CacheUsage: 2.363 GB (used by oc binary, ISO or cached images)
$ oc version
oc v3.11.0-rc.0+0cbc58b
kubernetes v1.10.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.42.155:8443
openshift v3.10.0+349c70c-73
kubernetes v1.10.0+b81c8f8

For those who might be experiencing this as well: please leave a comment with the output from minikube version + kubectl version (or minishift version + oc version).

Add support for JaegerIngressSpec for all-in-one

The JaegerIngressSpec was introduced as part of #30, but the all-in-one missed this, as pointed out by @sneko on that issue

I left a comment there as well, with what would need to be changed to get it fixed. A unit test similar to the one from the query is also required to complete this task.

https://github.com/jaegertracing/jaeger-operator/blob/master/pkg/deployment/query_test.go#L64-L100

Set entrypoint for the Docker image

The main Jaeger docker images have an entrypoint set. For consistency, we should follow that for the operator.

Add support for `openshift-sar`

The OpenShift OAuth Proxy support is added as part of #100 and comes with the default auth strategy (any existing user is allowed to login). An improvement is to add support for openshift-sar, so that more complex auth strategies are possible:

https://github.com/openshift/oauth-proxy#limiting-access-to-users

Protect the UI with OpenShift OAuth Proxy

Depends on #88. When the operator is deployed on OpenShift, the Query UI should be protected by default by the OpenShift OAuth Proxy.

Inject sidecar if a deployment has a specific annotation

Investigate the possibility of injecting an agent sidecar in case a deployment contains a given label. See how the annotations for istio's auto injection are and try to follow the same naming pattern.

Support deployment of agent as a daemonset

Currently daemonset deployment strategy is not implemented for the jaeger-agent.

Document the strategies

Document the deployment strategies and what the differences are.

How to set env for Jaeger

I want to add base path for jaeger query (jaegertracing/jaeger-ui#258) so that the yaml file should be:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: istio-tracing
spec:
  strategy: production
  env:
    - name: QUERY_BASE_PATH
      value: /jaeger
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200

Right now, the jaeger operator cannot support set env in yaml file.

Enable operator user to specify sampling strategies

As described here, it is possible to specify adaptive sampling strategy details in a json file.

As changes to the sampling strategies may be automatically reloaded when changed, it may be appropriate to create this mounted file as default (albeit empty) - so an update to just this part of a CR does not need to necessarily cause the restarted.

We should consider in general what the update approach should be when changes are applied to the CR. Maybe a general rolling update approach would be safer.

jaegertracing / jaeger-operator Goto Github PK

jaeger-operator's People

Stargazers

Watchers

Forkers

jaeger-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org