Giter VIP home page Giter VIP logo

jaeger-operator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jaeger-operator's Issues

Install Spark Job

Install the Spark Job for dependency processing when Elasticsearch/cassandra is used.

Could agent sidecar injection specify service name

Wondering if, when an agent sidecar is injected, whether it could also specify the JAEGER_SERVICE_NAME environment variable on the service?

This would avoid user having to manually add it, and also means that we could define the convention - e.g. ., if we wanted to support multitenancy.

As a side issue - should the injection be dependent upon an annotation? Similar to Istio sidecar injection which can be enabled on the namespace (so all deployments) or individually? Just so that users have some control, and not everything automatically gets a jaeger agent sidecar.

How to add volumes and volumeMounts

I want to this yaml file can be able to apply by jaeger-operator.

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: istio-jaeger
spec:
  strategy: production
  collector:
    image: jaegertracing/jaeger-collector:latest
    volumeMounts:
    - mountPath: /usr/share/elasticsearch/config/tls
      name: certs
      readOnly: true
  query:
    image: jaegertracing/jaeger-query:latest
    volumeMounts:
    - mountPath: /usr/share/elasticsearch/config/tls
      name: certs
      readOnly: true
  volumes:
  - name: certs
    secret:
      defaultMode: 420
      secretName: es-certs
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200
        tls: true
        tls.cert: /usr/share/elasticsearch/config/tls/elasticsearch-router.crt
        tls.key: /usr/share/elasticsearch/config/tls/elasticsearch-router.key
        tls.ca: /usr/share/elasticsearch/config/tls/ca.crt

Check the source code and cannot find any struct defined for volumes and volumeMounts

"Connection refused" on zipkin port

Deployed the all-in-one using the jaeger-kubernetes template and was able to post an example zipkin trace to the zipkin port.

However when trying the same with the operator I got:

$ curl -H "Content-Type: application/json" --request POST --data @zipkin.txt http://localhost:9411/api/v1/spans
curl: (52) Empty reply from server

with the following in the kubctl port-forward:

$kubectl port-forward $(kubectl get pod -l appaeger -o jsonpath='{.items[0].metadata.name}') 9411:9411
Forwarding from 127.0.0.1:9411 -> 9411
Forwarding from [::1]:9411 -> 9411
Handling connection for 9411
E1015 16:48:14.872520   12526 portforward.go:331] an error occurred forwarding 9411 -> 9411: error forwarding port 9411 to pod 8ffdd850d8a13b5147a4c2fe14e1c0db344ab59e3b432077ee46992973e889b7, uid : exit status 1: 2018/10/15 15:48:14 socat[8571] E connect(5, AF=2 127.0.0.1:9411, 16): Connection refused

Example zipkin trace can be found in zipkin.txt

Incorporate rbac, crd and operator yaml into one file

Not sure if it is convention, but wondering if there is a real benefit to having the rbac, crd and operator yaml content in separate files.

It means users have to perform three operations instead of one. Same operations are performed on both kubernetes and openshift - so just thinking it may be better to have a single jaeger-operator.yaml file with all content?

Agent sidecar injection: enable propagation env var to be set

If automatic agent sidecar injection is used within an Istio environment, then it would be useful to be able to set the JAEGER_PROPAGATION environment variable to B3.

To enable this to be more generic, we should just enable the agent configuration to include setting the propagation value.

This requirement is similar to #29, which sets the service name env var when auto-injecting the agent sidecar.

Support update of the underlying Jaeger version

We should add support for automatic updates of the Jaeger instances that are managed by the operator. Idea:

  • Add a new property to the model: "update: [none|patch|minor|major]" to the model
  • Label the resulting "jaeger" object with the property above
  • From time to time, check for new versions of the Jaeger image on Docker Hub
  • On new versions, check which Jaeger instances need to be updated and do a blue/green deployment. For this to work properly, the following has to happen, in order:
  1. Migrate the data. If this step fails, skip everything else
  2. Add new collectors. If this step fails, skip everything else
  3. Create a dummy pod with the agent, so that image gets pulled and we can check whether the container starts fine. Delete the pod once we confirm it's working. If this step fails, remove the new collectors and skip everything else
  4. For each application with the old sidecar, update the deployment to use a new version. This will cause a new pod to start, replacing the old ones
  5. Add new query/UI. If this fails, don't rollback anything. Just do not continue with the next step
  6. Remove the old collectors/queries/UIs

We need also to find a way to alert the admins in case of failures. Is it enough for us to just generate metrics, hoping an admin will create an alert in case a specific gauge/counter goes off? There's an "admin" page being considered for Jaeger. How can we provide data to that?

Things we need to consider:

  1. During the data update, should we stop the collectors from writing to it, or can we trust the update will be properly managed by the create-schema job?

Run 'make generate' as part of the build

Currently, we require 'make generate' to be executed during the development phase and we don't check if it's needed during the CI. This has caused a couple of PRs to be merged with model changes but without the generated deep copy functions to be updated.

Ideally, we would run make generate and fail the build if it reports changes, similar to what we do with make fmt.

Move hard coded strings to constants

There are quite a few hard coded strings in the code, like jaeger-agent as the container name for the Jaeger Agent sidecar container. They should all move to constants.

Where appropriate, types should be used to represent constant values.

Run the Cassandra `create-schema` batch job

Currently, the Operator has no special knowledge about specific backing storages, but perhaps it should. Jaeger on Cassandra requires the schema to be created. This is done by running the image jaegertracing/jaeger-cassandra-schema. In Kubernetes, this is executed as a batch job:

https://github.com/jaegertracing/jaeger-kubernetes/blob/59e7afcae7b2f3bc109695cfda2a5c116cea4391/production/cassandra.yml#L108-L129

The idea here would be to do something similar: install the batch job and wait for it to finish before proceeding with the creation of the other objects.

The first implementation could be something very simple: the controller interface could have a Requirements() []batch.Job, batch being "k8s.io/api/batch/v1". The stub would run all the jobs and wait until they have completed.

This would also allow us to be able to run migration scripts and install pre-requirements, as long as this is wrapped in a "batch" container.

Inject sidecars from different namespaces

It's quite possible that Jaeger instances will live in one namespace and applications in another. We need a way then to inject sidecars into deployments from one namespace pointing to Jaeger instances in another namespace.

Tasks:

  • Figure out whether the JaegerSpec should have a list of namespaces the operator should watch
  • The RBAC implications: what are the failure scenarios and how to recover from that
  • On OpenShift, one namespace cannot see things from another namespace by default. What should we do here?

Support custom annotations/labels

We should provide a way for admins to label the resources that are being created. Something like:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: simple-prod
  labels:
    tenant: acme
spec:
  strategy: all-in-one
  all-in-one:
    annotations:
      prometheus.io/scrape: "false"

Note that some labels/annotations might clash with the ones we supply in the operator itself, like the prometheus ones. Because of that, we should always honor what the user specified in the CR.

Change from free form options to a typed model

As discussed in #71 (comment), the current CRD uses a free format options field to allow the user to pass parameters through to the executables.

Although this means that the user could take advantage of new options without requiring a new release of the operator, it also means that incorrect options can be specified and passed to the executables as parameters - resulting in the executable throwing an error.

Having a properly typed model in the CRD means that:

  • we can ensure the user has not inadvertently specified an option that will cause the executable to fail to start
  • options that only support specific values can be validated and an informative message presented to the user
  • the structure of the options can be organised in a more meaningful way - e.g. cassandra create schema info contained under other information related to cassandra.

Add liveness probes

Add liveness probes and enable the user to optionally specify the initial delay and period durations.

Handling UI configuration in a json file

Currently the CR enables options (name value pairs) to be specified which are converted to arguments supplied to the executable.

However UI configuration requires the specification of a separate file.

Wondering whether all options should be defined in a configmap per jaeger instance?

Update the Operator SDK

An upcoming release of the Operator SDK will break the compat with current master, which is what we currently have at Gopkg.toml.

This issue is to track:

  • Fix the operator SDK version in the dependency management file
  • Bump our operator SDK version once their refactoring is done + fix the parts where it breaks us.

https://github.com/kubernetes-sigs/controller-runtime

Inject sidecar in properly in annotated Statefulsets

I am testing out Jaeger as a distributed tracing solution, Jaeger-Operator sounded like the easiest way to do it, and here I am ( I also checked the helm chart, but this looks like the preferred way currently )

Operator worked great, but I can't use the very handy annotation below

annotations:
inject-jaeger-agent: "true"

Our system is an actor based system with sharded local state, and we use Statefulsets to maintain local state and instances together.

Of course I can use the side car in my YAML and I did that, but this annotation is a lot easier to maintain.

That is it!

Add support for image pull secrets

Enable the CR to specify a list of image pull secrets that could be used with any deployments, or possibly a serviceaccount if we define one.

Add prometheus scrape annotations to agent

Following on from #27, we should add the prometheus scrape/port annotations to the agent.

Issue with the agent as sidecar is to ensure we don't overwrite similar annotations provided by the service.

Update the version at build time

I think that the version should be injected into the binary at build time. This way you don't risk to end up with two binaries with different code with the same version. Unless I'm missing something.

Put more thought on passing refs vs. value

On the early days of the operator, it was easier to just pass references around. Unfortunately, this is not always appropriate and gives too much power to the function being called, as it can change the object in ways the caller isn't expecting.

A first thing to assess is whether we can change the places where we are using *v1alpha1.Jaeger to use values instead. It would mean that some functions will have its signature changed, like the controller.normalize(*v1alpha1.Jaeger), so that it returns a changed normalized object instead of changing it in-place.

Configuring a Kafka based Jaeger architecture

Currently Kafka support is being added to Jaeger in two places, as a storage plugin and an ingester.

The aim of this approach is to have a collector configured with Kafka as storage, to publish spans to Kafka, and then ingesters that can consume those messages and store the spans in a real storage backend (e.g. elasticsearch/cassandra).

We need to consider how such a configuration would be defined in the operator's CR?

Currently kafka is being listed as a storage type - but an operator CR can only support a single storage type - so either

  1. We need to treat this kafka based configuration as something else - i.e. the storage type is specified as the real storage used by the ingester, but the collector using kafka and the ingester need to be configured from a different spec?

  2. There would be two separate CRs - one defining the collector with Kafka storage, and the other defining the ingester with real storage. Issue with this approach is that only a subset of the components may need to be configured in each CR - so query will only be defined in the second CR (as it will also use the same real storage), and agent may potentially be defined in the first, as it will use the collector.

Although Kafka not yet fully supported, we need to consider how its introduction may impact the spec structure.

Update Docker image when a PR is merged

Whenever a PR is merged, it would be good to generate a new Docker image and push to Docker Hub under the latest version tag (like, 1.6.CI_BUILD_NUM + 1.6)

Reconsider the strategy setting

I like that all-in-one is the default value for the strategy setting. However, for new starters with Jaeger it might not be immediately obvious what the difference between all-in-one and production is. I would suggest a different point of view here. Instead of changing the functionality based on strategy, change it based on storage. Consider renaming the strategy option to storage and allow the following values: memory, elasticsearch, cassandra, etc.
The default storage option might be memory. The operator would chose the all-in-one image for that storage type. The user will still have the option to override the image for advanced users but in most cases you can preselect the default jaeger docker images - only request the users to select the version (tag) and image pull policy.
The other benefit of this approach is that if you change functionality based on storage type you can also add some functionality for managing retention. For example, you can delete Elasticsearch indices when they get out of retention. Alternatively, you can have per-service/operation retention and the operator can delete the traces for some busy services from Cassandra after a week but keep others for a month.
I am not saying that this is the best approach, I just wanted to give you a different point of view.

How to: create Jaeger instance in different namespace to operator

One of the benefits of the operator is supposed to be that it can be installed once, and used to create multiple Jaeger instances in different namespaces.

If this is possible, then we should document (possibly just in the readme) how such a configuration would be achieved.

Zipkin port

The Zipkin port is opened only when the env var COLLECTOR_ZIPKIN_HTTP_PORT is set, so, the both the collector and the all-in-one need to set this.

Switch to disable the ingress resource

Is it currently possible to disable the creation of the ingress resource?

If not, I think it would be a good feature to update the CRD spec with an ingress configuration section.

Document how to install ES/Cassandra

For production setups, in-memory isn't suitable and documentation is need on how to get ES/Cassandra running and how to configure the setup to use it.

Add a helm chart

So far, awesome job @jpkrohling! However I think that the operator needs a helm chart.
In the README.md you show how to set this up using the kubectl --apply ... command. This is helpful but in a production environment, chances are that we wouldn't install things manually using kubectl. This is why I believe that there needs to be a helm chart for installing the operator. WDYT?

Avoid duplicate messages

As the operator runs in a loop, we should avoid logging the same message twice. Currently, we can see this:

INFO[0695] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0700] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0705] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0710] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0715] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0720] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0725] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0730] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 
INFO[0735] Storage type wasn't provided for the Jaeger instance 'agent-as-daemonset'. Falling back to 'memory' 

Error "The resourceVersion for the provided watch is too old"

When the operator is running for an extended period of time, the following message is shown:

ERROR: logging before flag.Parse: W1109 09:42:04.207951   16079 reflector.go:341] github.com/jaegertracing/jaeger-operator/vendor/github.com/operator-framework/operator-sdk/pkg/sdk/informer.go:91: watch of *unstructured.Unstructured ended with: unexpected object: &{map[code:410 kind:Status apiVersion:v1 metadata:map[] status:Failure message:The resourceVersion for the provided watch is too old. reason:Expired]}

This was seen on OpenShift but might happen with plain Kubernetes as well:

$ minishift status
Minishift:  Running
Profile:    istio-tutorial
OpenShift:  Running (openshift v3.10.0+349c70c-73)
DiskUsage:  23% of 19G (Mounted On: /mnt/sda1)
CacheUsage: 2.363 GB (used by oc binary, ISO or cached images)
$ oc version
oc v3.11.0-rc.0+0cbc58b
kubernetes v1.10.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.42.155:8443
openshift v3.10.0+349c70c-73
kubernetes v1.10.0+b81c8f8

For those who might be experiencing this as well: please leave a comment with the output from minikube version + kubectl version (or minishift version + oc version).

How to set env for Jaeger

I want to add base path for jaeger query (jaegertracing/jaeger-ui#258) so that the yaml file should be:

apiVersion: io.jaegertracing/v1alpha1
kind: Jaeger
metadata:
  name: istio-tracing
spec:
  strategy: production
  env:
    - name: QUERY_BASE_PATH
      value: /jaeger
  storage:
    type: elasticsearch
    options:
      es:
        server-urls: http://elasticsearch:9200

Right now, the jaeger operator cannot support set env in yaml file.

Enable operator user to specify sampling strategies

As described here, it is possible to specify adaptive sampling strategy details in a json file.

As changes to the sampling strategies may be automatically reloaded when changed, it may be appropriate to create this mounted file as default (albeit empty) - so an update to just this part of a CR does not need to necessarily cause the restarted.

We should consider in general what the update approach should be when changes are applied to the CR. Maybe a general rolling update approach would be safer.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.