Giter VIP home page Giter VIP logo

opentelemetry-operations-python's Introduction

opentelemetry-operations-python's People

Contributors

aabmass avatar andrewaxue avatar cnnradams avatar damemi avatar dashpole avatar dependabot[bot] avatar douglasheriot avatar euri10 avatar fengxumr avatar mabdi3 avatar manojpandey avatar munagekar avatar muncus avatar patil2099 avatar psx95 avatar rangelreale avatar rghetia avatar tasty213 avatar the-ericwang35 avatar xrmx avatar ymotongpoo avatar ysde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opentelemetry-operations-python's Issues

Cloud Trace exporter adds seconds of latency to HTTP requests

Hello, I work on the Bank of Anthos sample application. In July, we added the OpenTelemetry exporter and trace propagator to 3 Python services (all using flask/gunicorn, all running inside containers on GKE).

We have determined that the OT exporter seems to be causing multiple seconds of latency per request (findings here). We upgraded OT to the latest version (0.13b0) but we're still seeing latency in the app.

My hypothesis is that there is a process in the exporter that is slow on a per-request basis, or there is something going on with our GKE environment/Trace API authentication that is causing the latency.

Wondering if you've seen anything similar, or are doing any performance testing on these libraries. Thank you!

CloudTraceExporter drops all spans if even one is malformed

If even one of the spans passed in a batch to CloudTraceExporter is malformed, when it attempts to export these, the Cloud Trace backend will complain and as a result none of the spans in the batch will be exported. A proposed solution is a divide and conquer algorithm that, given an array of spans (possibly a subarray) will attempt to export the left and right halves of the array. If it succeeds, continue, otherwise if it fails recurse on the half that failed.

def exportDivide(arr_of_spans):
left = arr_of_spans[:len(arr_of_spans)/2], right = arr_of_spans[len(arr_of_spans)/2:]
try left export
if left export failed: exportDivide(left)
try right export
if right export failed: exportDivide(right)

This will require O(klogn) calls the backend where k is the number of malformed spans and n is the number of spans. One issue with this approach is that if k is large, say k == n, then we would do O(nlogn) calls to the backend which is far too slow. The size of the stack also shouldn't be an issue since there would be log(n) recursive calls to the function.

Code sharing approach for common code between exporters, tests, etc.

There is a bunch of duplicated code that we need to work out how to share. This is tricky because we publish separate packages for the metrics exporter, trace exporter, and tools (propagator and resource detector). There is also BaseExporterIntegrationTest which could be useful in test packages.

Add Cloud Trace propagator

The OpenTelemetry SDK ships with propagators for Zipkin and Jaeger. OpenCensus also ships with a propagator for Cloud Trace (x-cloud-trace-context): https://github.com/census-instrumentation/opencensus-python/blob/master/opencensus/trace/propagation/google_cloud_format.py

We should add a Cloud Trace propagator to this exporter.

-> this could offer customers the possibility of being compatible with existing services that are not yet instrumented using OpenTelemetry by e.g. being able to extract either header format and inject both headers formats.

Documentation should mention BatchExportSpanProcessor

The documentation should mention that SimpleExportSpanProcessor is slow and probably only useful for debugging purposes. Instead use BatchExportSpanProcessor in production. I think the OTel python documentation could also be updated to explain this or at least an FAQ.

Related #75

Resource labels dict copied repeatedly

OpenTelemetry's Resource.labels property actually copies the labels dict: https://github.com/open-telemetry/opentelemetry-python/blob/8bc7786b191b4f1b1f161de983db71643bf9f51e/opentelemetry-sdk/src/opentelemetry/sdk/resources/__init__.py#L41-L43

So each access is here is copying the dict:

if resource.labels.get("cloud.provider") != "gcp":
return None
resource_type = resource.labels["gcp.resource_type"]
if resource_type not in OT_RESOURCE_LABEL_TO_GCP:
return None
return MonitoredResource(
type=resource_type,
labels={
gcp_label: str(resource.labels[ot_label])
for ot_label, gcp_label in OT_RESOURCE_LABEL_TO_GCP[
resource_type
].items()
},
)

If the goal of Resource doing this is to achieve immutability, maybe it can be updated upstream to use a FrozenDict/ImmutableDict library?

Leverage more of the Cloud Trace API

There are some fields of the cloud trace API that can be set, but we ignore. For example same_process_as_parent_span and child_span_count. We'll need to look into how difficult it is to extract this info from OpenTelemetry and then work to send it over in the exporter

Remove pinned "opentelemetry_python_commit" and test against releases

See #17 for discussion.

#17 has tox install opentelemetry-api/sdk packages to run tests against from a pinned commit. This keeps us up developing closely with opentelemetry-python repo before GA. After GA where there are fewer breaking changes, we should switch to actually testing against released versions of those packages.

Add upper bounds for google-cloud-monitoring and google-cloud-trace

Hi,

I'm on the Cloud Client Libraries team. Just wanted to give you folks a heads up that we're in the process of making major version bumps to the Python libraries with breaking changes. I'm about to cut a release for google-cloud-trace and monitoring will follow in a week or two.

Please pin the dependencies on these libraries to less than the next major to avoid breaking users.

install_requires =
google-cloud-monitoring
google-cloud-trace >= 0.24.0

google-cloud-monitoring<2.0.0
google-cloud-trace>=0.24.0, <1.0.0

Thanks!

drop python3.5

3.5 is no longer supported upstream opentelemetry-python, so needs to be dropped here

Set up sphinx and publish docs

I am thinking to just publish them on readthedocs.io but will explore other options as well. We are planning to leave a reference to these new docs in the official opentelemetry-python docs, but remove everything else.

Handle dataloss from interval < 10s when in stateless mode

Currently if we try to publish new metric data more than once every 10 seconds, we just ignore the publish attempt until the time has been greater than 10 seconds. However, when metrics are stateless, this will lead to data loss as the aggregation is thrown out on the exporter side but also reset by the batcher.

The solution to this is to store the checkpoints from the records of the previous publish attempt and merge those checkpoints into the records of the next export attempt.

Explore BatchSpanProcessor

BatchSpanProcessor is (effectively) a requirement in production, to avoid making tons of rpc requests. How it works is it spins up a worker thread that stores spans until a certain threshold, at which point it deals with all of them. We need to look into the interaction between the new thread and grpc calls and make sure it works fine.

g.co/agent defaulting to opencensus

When running opentelemetry cloud-exporter on GKE a default agent is set to opencensus when opentelemetry is the framework being used.

Potentially may be a default collector set up by GCP.

Opentelemetry Version: v0.9.0
Cloud Exporter Version: 0.10.dev0

http method is always empty

I'm sending traces from an asgi app that is not hosted on google.
while in the trace ui I can see the opentelemetry label http.method and it's correct, GET POST etc, it seems the corresponding google span is not populated.

Cloud Trace Propagator Doesn't Work with gRPC

The Cloud Trace Propagator uses the X-Cloud-Trace-Context header key, which has upper case letters. When using it with the gRPC instrumentation, it sets this as gRPC metadata which will throw a validation error and abort the request because of the upper case letters.

Please make the key all lower case, I tested it on my own services on Cloud Run using a Flask app (http from Google Cloud's Load Balancer which sets the x-cloud-trace-context header initially), and is passed on to my backend service using gRPC and it works with the metadata key set to the same, and appears that the AppServer component on Cloud Run in-between the two services also picks up the trace ID with the lowercase metadata header, and everything links up correctly as expected so I can now see my full end-to-end request on Google Trace correctly.

Set up mypy

Run mypy in dev/CI. Would be nice to have a config that works cleanly with vscode.

Add more integration tests for cloud trace

Some things we'd like to test are:

  • Tests for the various features that the Exporter supports. (events, links, status codes, etc.)
  • Test that contains an invalid span and assert failure
  • Export multiple spans

ModuleNotFoundError: No module named 'google.cloud.trace_v2.proto'

It looks like a dependent module, google-cloud-trace, released a new major version yesterday, with breaking package changes, breaking our deployment and bringing down our entire service: https://github.com/googleapis/python-trace/blob/master/CHANGELOG.md

Please can you update this module to use the latest release of google-cloud-trace, or lock the version that works with the current module (0.24.0). If you update the code to work with the new module, please lock the version that works so this doesn't happen again (or only support minor version updates).

I fixed it by adding this line to the end of my requirements.txt file to override the automatically installed version installed when adding this module:

google-cloud-trace==0.24.0

Filter out health check request in grpc

As of 0.18b1, opentelemetry-instrumentation-grpc doesn't have the feature to exclude trace propagation to the specific endpoints. Because of this, gRPC servers always send health check request traces to Cloud Traces via the exporter as you see in the screenshot below:
Screenshot from 2021-03-15 16-32-51

It would be nice if the exporter support filter option to avoid sending unnecessary traces to the Cloud Trace. Otherwise, the upstream (opentelemetry-instrumentation-grpc) should have something similar to OTEL_PYTHON_FLASK_EXCLUDED_URLS in opentelemetry-instrumentation-flask

Infinite requests issue (Cloud Trace Exporter)

When we instrument requests + an exporter that uses http requests and grpc (ex. cloud trace exporter) we get an infinite loop. The flow is:

Span A (manually started by user)
When we export span A, the exporter makes a request, that request is wrapped and instrumented in Span B.
When we export span B, the exporter makes a request, that request is wrapped and instrumented in Span C.
When we export span C, the exporter makes a request, that request is wrapped and instrumented in Span D.
...
forever.

How this issue is usually resolved is by adding the flag "suppress_instrumentation" to contextvars before exporting, so that requests are NOT wrapped and instrumented anymore. Cloud Trace exporter uses grpc, which spins up a new thread, where contextvars are ignored, that makes a request which, since contextvars got ignored, continues to infinite loop.

The current solution is simply to blacklist and always suppress the url that grpc makes the request to. This is obviously not the best way to deal with the issue for a variety of reasons (e.g. what is the url changes in the future? What is grpc changes and makes more requests? What if the user wants to legitimately instrument a request to this url?).

I've talked with the grpc team and they have said there is currently no way of passing contextvars to grpc threads, but it is something they are working on. Another possible solution is changing the oauth threadpool to someone have this flag in its contextvars.

More examples + troubleshooting

Adding more examples of what we can export, stuff like links, attributes and events. Also a FAQ / troubleshooting section with common problems setting up.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.