From f571d81631f0a31188e57e65e8ed16487808e9ce Mon Sep 17 00:00:00 2001 From: Julius Volz Date: Thu, 22 May 2025 10:51:23 +0200 Subject: [PATCH] Fix up headings and "more"-markers in blog posts Signed-off-by: Julius Volz --- .../2016-03-23-interview-with-life360.md | 2 +- ...n-the-cloud-native-computing-foundation.md | 2 ++ .../2016-07-18-prometheus-1-0-released.md | 2 ++ ...16-07-23-pull-does-not-scale-or-does-it.md | 2 ++ .../2016-09-07-interview-with-shuttlecloud.md | 18 +++++----- .../2016-09-14-interview-with-digitalocean.md | 4 ++- .../2016-09-21-interview-with-compose.md | 3 +- .../2016-10-12-interview-with-justwatch.md | 2 ++ .../2016-11-16-interview-with-canonical.md | 2 ++ .../2017-02-20-interview-with-weaveworks.md | 12 ++++--- .../2017-04-06-interview-with-europace.md | 2 ++ .../2017-04-10-promehteus-20-sneak-peak.md | 8 +++-- .../2017-05-17-interview-with-iadvize.md | 6 ++-- ...06-14-interview-with-latelier-animation.md | 1 + ...21-prometheus-20-alpha3-new-rule-format.md | 31 ++++++++-------- .../2017-11-08-announcing-prometheus-2-0.md | 2 -- ...2017-11-30-prometheus-at-cloudnativecon.md | 2 -- .../2018-02-08-interview-with-scalefastr.md | 35 ++++++++++--------- .../2018-03-16-interview-with-datawire.md | 3 +- .../2018-07-05-implementing-custom-sd.md | 4 +-- ...-08-09-prometheus-graduates-within-cncf.md | 2 ++ .../2018-08-23-interview-with-presslabs.md | 8 +++-- blog-posts/2019-01-28-subquery-support.md | 2 ++ .../2019-02-06-interview-with-hostinger.md | 2 ++ .../2019-06-18-interview-with-forgerock.md | 2 ++ .../2019-10-10-remote-read-meets-streaming.md | 2 ++ .../2021-02-17-introducing-feature-flags.md | 4 ++- .../2021-02-18-introducing-the-@-modifier.md | 4 ++- ...roducing-prometheus-conformance-program.md | 2 ++ ...eus-conformance-remote-write-compliance.md | 2 ++ blog-posts/2021-06-10-on-ransomware-naming.md | 2 ++ ...21-10-14-prometheus-conformance-results.md | 2 ++ blog-posts/2021-11-16-agent.md | 4 ++- blog-posts/2023-03-21-stringlabel.md | 14 ++++---- blog-posts/2023-09-01-promcon2023-schedule.md | 2 ++ blog-posts/2024-09-11-prometheus-3-beta.md | 16 ++++----- blog-posts/2024-11-14-prometheus-3-0.md | 24 ++++++------- ...11-19-yace-joining-prometheus-community.md | 4 +-- .../blog/[year]/[month]/[day]/[slug]/page.tsx | 2 +- 39 files changed, 148 insertions(+), 95 deletions(-) diff --git a/blog-posts/2016-03-23-interview-with-life360.md b/blog-posts/2016-03-23-interview-with-life360.md index f4eae4da..5c37a11c 100644 --- a/blog-posts/2016-03-23-interview-with-life360.md +++ b/blog-posts/2016-03-23-interview-with-life360.md @@ -79,7 +79,7 @@ Grafana graphs, up to the point where we had total service coverage. We were also currently looking at InfluxDB for long term storage, but due to [recent developments](https://influxdata.com/blog/update-on-influxdb-clustering-high-availability-and-monetization/), -this may no longer be a viable option. +this may no longer be a viable option. We then added exporters for MySQL, Node, Cloudwatch, HAProxy, JMX, NSQ (with a bit of our own code), Redis and Blackbox (with our own contribution to add diff --git a/blog-posts/2016-05-09-prometheus-to-join-the-cloud-native-computing-foundation.md b/blog-posts/2016-05-09-prometheus-to-join-the-cloud-native-computing-foundation.md index 5cc61514..25d9a3b7 100644 --- a/blog-posts/2016-05-09-prometheus-to-join-the-cloud-native-computing-foundation.md +++ b/blog-posts/2016-05-09-prometheus-to-join-the-cloud-native-computing-foundation.md @@ -18,6 +18,8 @@ accept Prometheus as a second hosted project after Kubernetes! You can find more information about these plans in the [official press release by the CNCF](https://cncf.io/news/news/2016/05/cloud-native-computing-foundation-accepts-prometheus-second-hosted-project). + + By joining the CNCF, we hope to establish a clear and sustainable project governance model, as well as benefit from the resources, infrastructure, and advice that the independent foundation provides to its members. diff --git a/blog-posts/2016-07-18-prometheus-1-0-released.md b/blog-posts/2016-07-18-prometheus-1-0-released.md index 18d50437..50ddd5c9 100644 --- a/blog-posts/2016-07-18-prometheus-1-0-released.md +++ b/blog-posts/2016-07-18-prometheus-1-0-released.md @@ -17,6 +17,8 @@ If you have been using Prometheus for a while, you may have noticed that the rat In the same spirit, reaching 1.0 means that subsequent 1.x releases will remain API stable. Upgrades won’t break programs built atop the Prometheus API, and updates won’t require storage re-initialization or deployment changes. Custom dashboards and alerts will remain intact across 1.x version updates as well. We’re confident Prometheus 1.0 is a solid monitoring solution. Now that the Prometheus server has reached a stable API state, other modules will follow it to their own stable version 1.0 releases over time. + + ### Fine print So what does API stability mean? Prometheus has a large surface area and some parts are certainly more mature than others. diff --git a/blog-posts/2016-07-23-pull-does-not-scale-or-does-it.md b/blog-posts/2016-07-23-pull-does-not-scale-or-does-it.md index f954972a..4d136e37 100644 --- a/blog-posts/2016-07-23-pull-does-not-scale-or-does-it.md +++ b/blog-posts/2016-07-23-pull-does-not-scale-or-does-it.md @@ -19,6 +19,8 @@ but it does not focus specifically on scaling aspects. Let's have a closer look at the usual misconceptions around this claim and analyze whether and how they would apply to Prometheus. + + ## Prometheus is not Nagios When people think of a monitoring system that actively pulls, they often think diff --git a/blog-posts/2016-09-07-interview-with-shuttlecloud.md b/blog-posts/2016-09-07-interview-with-shuttlecloud.md index 82a6d710..ca4c5365 100644 --- a/blog-posts/2016-09-07-interview-with-shuttlecloud.md +++ b/blog-posts/2016-09-07-interview-with-shuttlecloud.md @@ -9,17 +9,19 @@ author_name: Brian Brazil ## What does ShuttleCloud do? -ShuttleCloud is the world’s most scalable email and contacts data importing system. We help some of the leading email and address book providers, including Google and Comcast, increase user growth and engagement by automating the switching experience through data import. +ShuttleCloud is the world’s most scalable email and contacts data importing system. We help some of the leading email and address book providers, including Google and Comcast, increase user growth and engagement by automating the switching experience through data import. By integrating our API into their offerings, our customers allow their users to easily migrate their email and contacts from one participating provider to another, reducing the friction users face when switching to a new provider. The 24/7 email providers supported include all major US internet service providers: Comcast, Time Warner Cable, AT&T, Verizon, and more. By offering end users a simple path for migrating their emails (while keeping complete control over the import tool’s UI), our customers dramatically improve user activation and onboarding. + + ![ShuttleCloud's integration with Gmail](/assets/blog/2016-09-07/gmail-integration.png) ***ShuttleCloud’s [integration](https://support.google.com/mail/answer/164640?hl=en) with Google’s Gmail Platform.*** *Gmail has imported data for 3 million users with our API.* -ShuttleCloud’s technology encrypts all the data required to process an import, in addition to following the most secure standards (SSL, oAuth) to ensure the confidentiality and integrity of API requests. Our technology allows us to guarantee our platform’s high availability, with up to 99.5% uptime assurances. +ShuttleCloud’s technology encrypts all the data required to process an import, in addition to following the most secure standards (SSL, oAuth) to ensure the confidentiality and integrity of API requests. Our technology allows us to guarantee our platform’s high availability, with up to 99.5% uptime assurances. ![ShuttleCloud by Numbers](/assets/blog/2016-09-07/shuttlecloud-numbers.png) @@ -30,7 +32,7 @@ In the beginning, a proper monitoring system for our infrastructure was not one * We had a set of automatic scripts to monitor most of the operational metrics for the machines. These were cron-based and executed, using Ansible from a centralized machine. The alerts were emails sent directly to the entire development team. * We trusted Pingdom for external blackbox monitoring and checking that all our frontends were up. They provided an easy interface and alerting system in case any of our external services were not reachable. -Fortunately, big customers arrived, and the SLAs started to be more demanding. Therefore, we needed something else to measure how we were performing and to ensure that we were complying with all SLAs. One of the features we required was to have accurate stats about our performance and business metrics (i.e., how many migrations finished correctly), so reporting was more on our minds than monitoring. +Fortunately, big customers arrived, and the SLAs started to be more demanding. Therefore, we needed something else to measure how we were performing and to ensure that we were complying with all SLAs. One of the features we required was to have accurate stats about our performance and business metrics (i.e., how many migrations finished correctly), so reporting was more on our minds than monitoring. We developed the following system: @@ -38,11 +40,11 @@ We developed the following system: * The source of all necessary data is a status database in a CouchDB. There, each document represents one status of an operation. This information is processed by the Status Importer and stored in a relational manner in a MySQL database. - * A component gathers data from that database, with the information aggregated and post-processed into several views. - * One of the views is the email report, which we needed for reporting purposes. This is sent via email. + * A component gathers data from that database, with the information aggregated and post-processed into several views. + * One of the views is the email report, which we needed for reporting purposes. This is sent via email. * The other view pushes data to a dashboard, where it can be easily controlled. The dashboard service we used was external. We trusted Ducksboard, not only because the dashboards were easy to set up and looked beautiful, but also because they provided automatic alerts if a threshold was reached. -With all that in place, it didn’t take us long to realize that we would need a proper metrics, monitoring, and alerting system as the number of projects started to increase. +With all that in place, it didn’t take us long to realize that we would need a proper metrics, monitoring, and alerting system as the number of projects started to increase. Some drawbacks of the systems we had at that time were: @@ -78,7 +80,7 @@ Our internal DNS service is integrated to be used for service discovery, so ever Some of the metrics we used, which were not provided by the node_exporter by default, were exported using the [node_exporter textfile collector](https://github.com/prometheus/node_exporter#textfile-collector) feature. The first alerts we declared on the Prometheus Alertmanager were mainly related to the operational metrics mentioned above. -We later developed an operation exporter that allowed us to know the status of the system almost in real time. It exposed business metrics, namely the statuses of all operations, the number of incoming migrations, the number of finished migrations, and the number of errors. We could aggregate these on the Prometheus side and let it calculate different rates. +We later developed an operation exporter that allowed us to know the status of the system almost in real time. It exposed business metrics, namely the statuses of all operations, the number of incoming migrations, the number of finished migrations, and the number of errors. We could aggregate these on the Prometheus side and let it calculate different rates. We decided to export and monitor the following metrics: @@ -109,6 +111,6 @@ We can't compare Prometheus with our previous solution because we didn’t have ## What do you think the future holds for ShuttleCloud and Prometheus? -We’re very happy with Prometheus, but new exporters are always welcome (Celery or Spark, for example). +We’re very happy with Prometheus, but new exporters are always welcome (Celery or Spark, for example). One question that we face every time we add a new alarm is: how do we test that the alarm works as expected? It would be nice to have a way to inject fake metrics in order to raise an alarm, to test it. diff --git a/blog-posts/2016-09-14-interview-with-digitalocean.md b/blog-posts/2016-09-14-interview-with-digitalocean.md index dd4a8221..5b965ac4 100644 --- a/blog-posts/2016-09-14-interview-with-digitalocean.md +++ b/blog-posts/2016-09-14-interview-with-digitalocean.md @@ -17,6 +17,8 @@ My name is Ian Hansen and I work on the platform metrics team. To date, we’ve created 20 million Droplets (SSD cloud servers) across 13 regions. We also recently released a new Block Storage product. + + ![DigitalOcean logo](/assets/blog/2016-09-14/DO_Logo_Horizontal_Blue-3db19536.png) ## What was your pre-Prometheus monitoring experience? @@ -33,7 +35,7 @@ We do still use Graphite but we no longer run OpenTSDB. I was frustrated with OpenTSDB because I was responsible for keeping the cluster online, but found it difficult to guard against metric storms. Sometimes a team would launch a new (very chatty) service that would impact the -total capacity of the cluster and hurt my SLAs. +total capacity of the cluster and hurt my SLAs. We are able to blacklist/whitelist new metrics coming in to OpenTSDB, but didn’t have a great way to guard against chatty services except for diff --git a/blog-posts/2016-09-21-interview-with-compose.md b/blog-posts/2016-09-21-interview-with-compose.md index 86de2b95..d1c83e19 100644 --- a/blog-posts/2016-09-21-interview-with-compose.md +++ b/blog-posts/2016-09-21-interview-with-compose.md @@ -24,6 +24,7 @@ where supported and is home to around 1000 highly-available database deployments in their own private networks. More regions and providers are in the works. + ## What was your pre-Prometheus monitoring experience? @@ -35,7 +36,7 @@ overloaded our systems. While we were aware that Graphite could be scaled horizontally relatively easily, it would have been an expensive cluster. [InfluxDB](https://www.influxdata.com/) looked more promising so we started trying out the early-ish versions of that and it seemed to work well for a good -while. Goodbye Graphite. +while. Goodbye Graphite. The earlier versions of InfluxDB had some issues with data corruption occasionally. We semi-regularly had to purge all of our metrics. It wasn’t a diff --git a/blog-posts/2016-10-12-interview-with-justwatch.md b/blog-posts/2016-10-12-interview-with-justwatch.md index 00835e8c..d9f41e78 100644 --- a/blog-posts/2016-10-12-interview-with-justwatch.md +++ b/blog-posts/2016-10-12-interview-with-justwatch.md @@ -22,6 +22,8 @@ purchase behavior and movie taste of fans worldwide from our consumer apps. We help studios to advertise their content to the right audience and make digital video advertising a lot more efficient in minimizing waste coverage. + + ![JustWatch logo](/assets/blog/2016-10-12/JW_logo_long_black.jpg) Since our launch in 2014 we went from zero to one of the largest 20k websites diff --git a/blog-posts/2016-11-16-interview-with-canonical.md b/blog-posts/2016-11-16-interview-with-canonical.md index adcbc328..9c467839 100644 --- a/blog-posts/2016-11-16-interview-with-canonical.md +++ b/blog-posts/2016-11-16-interview-with-canonical.md @@ -21,6 +21,8 @@ https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf#page=4 My group, BootStack, is our fully managed private cloud service. We build and operate OpenStack clouds for Canonical customers. + + ## What was your pre-Prometheus monitoring experience? We’d used a combination of [Nagios](https://www.nagios.org/), diff --git a/blog-posts/2017-02-20-interview-with-weaveworks.md b/blog-posts/2017-02-20-interview-with-weaveworks.md index e77c0087..5fc05c75 100644 --- a/blog-posts/2017-02-20-interview-with-weaveworks.md +++ b/blog-posts/2017-02-20-interview-with-weaveworks.md @@ -16,16 +16,18 @@ Cloud](https://www.weave.works/solution/cloud/), a service which "operationalizes" microservices through a combination of open source projects and software as a service. -Weave Cloud consists of: +Weave Cloud consists of: * Visualisation with [Weave Scope](https://github.com/weaveworks/scope) - * Continuous Deployment with [Weave Flux](https://github.com/weaveworks/flux) - * Networking with [Weave Net](https://github.com/weaveworks/weave), the container SDN + * Continuous Deployment with [Weave Flux](https://github.com/weaveworks/flux) + * Networking with [Weave Net](https://github.com/weaveworks/weave), the container SDN * [Monitoring with Weave Cortex](https://www.weave.works/guides/cloud-guide-part-3-monitor-prometheus-monitoring/), our open source, distributed Prometheus-as-a-Service. You can try Weave Cloud [free for 60 days](https://cloud.weave.works/signup). For the latest on our products check out our [blog](https://www.weave.works/blog/), [Twitter](https://twitter.com/weaveworks), or [Slack](https://weave-community.slack.com/) ([invite](https://weaveworks.github.io/community-slack/)). + + ## What was your pre-Prometheus monitoring experience? Weave Cloud was a clean-slate implementation, and as such there was no previous @@ -50,7 +52,7 @@ When we started with Prometheus the Kubernetes service discovery was still just a PR and as such there were few docs. We ran a custom build for a while and kinda just muddled along, working it out for ourselves. Eventually we gave a talk at the [London Prometheus meetup](https://www.meetup.com/Prometheus-London/) on [our experience](http://www.slideshare.net/weaveworks/kubernetes-and-prometheus) and published a -[series](https://www.weave.works/prometheus-kubernetes-deploying/) of [blog](https://www.weave.works/prometheus-and-kubernetes-monitoring-your-applications/) [posts](https://www.weave.works/monitoring-kubernetes-infrastructure/). +[series](https://www.weave.works/prometheus-kubernetes-deploying/) of [blog](https://www.weave.works/prometheus-and-kubernetes-monitoring-your-applications/) [posts](https://www.weave.works/monitoring-kubernetes-infrastructure/). We've tried pretty much every different option for running Prometheus. We started off building our own container images with embedded config, running @@ -76,7 +78,7 @@ Cortex was born. Ignoring Cortex for a second, we were particularly excited to see the introduction of the HA Alert Manager; although mainly because it was one of the -[first non-Weaveworks projects to use Weave Mesh](https://www.weave.works/weave-mesh-prometheus-alertmanager/), +[first non-Weaveworks projects to use Weave Mesh](https://www.weave.works/weave-mesh-prometheus-alertmanager/), our gossip and coordination layer. I was also particularly keen on the version two Kubernetes service discovery diff --git a/blog-posts/2017-04-06-interview-with-europace.md b/blog-posts/2017-04-06-interview-with-europace.md index 27e6de19..28eb71cf 100644 --- a/blog-posts/2017-04-06-interview-with-europace.md +++ b/blog-posts/2017-04-06-interview-with-europace.md @@ -20,6 +20,8 @@ total of up to €4 billion on EUROPACE every month. Our engineers regularly blog at [http://tech.europace.de/](http://tech.europace.de/) and [@EuropaceTech](https://twitter.com/europacetech). + + ## What was your pre-Prometheus monitoring experience? [Nagios](https://www.nagios.org/)/[Icinga](https://www.icinga.com/) are still diff --git a/blog-posts/2017-04-10-promehteus-20-sneak-peak.md b/blog-posts/2017-04-10-promehteus-20-sneak-peak.md index 400c9fa8..c28212f7 100644 --- a/blog-posts/2017-04-10-promehteus-20-sneak-peak.md +++ b/blog-posts/2017-04-10-promehteus-20-sneak-peak.md @@ -10,10 +10,12 @@ We also realized that new developments in the infrastructure space, in particula Over the past few months we have been designing and implementing a new storage concept that addresses those bottlenecks and shows considerable performance improvements overall. It also paves the way to add features such as hot backups. -The changes are so fundamental that it will trigger a new major release: Prometheus 2.0. +The changes are so fundamental that it will trigger a new major release: Prometheus 2.0. Important features and changes beyond the storage are planned before its stable release. However, today we are releasing an early alpha of Prometheus 2.0 to kick off the stabilization process of the new storage. -[Release tarballs](https://github.com/prometheus/prometheus/releases/tag/v2.0.0-alpha.0) and [Docker containers](https://quay.io/repository/prometheus/prometheus?tab=tags) are now available. + + +[Release tarballs](https://github.com/prometheus/prometheus/releases/tag/v2.0.0-alpha.0) and [Docker containers](https://quay.io/repository/prometheus/prometheus?tab=tags) are now available. If you are interested in the new mechanics of the storage, make sure to read [the deep-dive blog post](https://fabxc.org/blog/2017-04-10-writing-a-tsdb/) looking under the hood. This version does not work with old storage data and should not replace existing production deployments. To run it, the data directory must be empty and all existing storage flags except for `-storage.local.retention` have to be removed. @@ -30,6 +32,6 @@ after: ./prometheus -storage.local.retention=200h -config.file=/etc/prometheus.yaml ``` -This is a very early version and crashes, data corruption, and bugs in general should be expected. Help us move towards a stable release by submitting them to [our issue tracker](https://github.com/prometheus/prometheus/issues). +This is a very early version and crashes, data corruption, and bugs in general should be expected. Help us move towards a stable release by submitting them to [our issue tracker](https://github.com/prometheus/prometheus/issues). The experimental remote storage APIs are disabled in this alpha release. Scraping targets exposing timestamps, such as federated Prometheus servers, does not yet work. The storage format is breaking and will break again between subsequent alpha releases. We plan to document an upgrade path from 1.0 to 2.0 once we are approaching a stable release. diff --git a/blog-posts/2017-05-17-interview-with-iadvize.md b/blog-posts/2017-05-17-interview-with-iadvize.md index d4461d06..18e14908 100644 --- a/blog-posts/2017-05-17-interview-with-iadvize.md +++ b/blog-posts/2017-05-17-interview-with-iadvize.md @@ -28,6 +28,8 @@ countries](http://www.iadvize.com/en/customers/). We are an international company of 200 employees with offices in France, UK, Germany, Spain and Italy. We raised $16 Million in 2015. + + ## What was your pre-Prometheus monitoring experience? I joined iAdvize in February 2016. Previously I worked in companies specialized @@ -92,7 +94,7 @@ Pingdom checks but it wasn't enough. We developed a few custom exporters in Go to scrape some business metrics from our databases (MySQL and Redis). -Soon enough, we were able to replace all the legacy monitoring by Prometheus. +Soon enough, we were able to replace all the legacy monitoring by Prometheus. ![One of iAdvize's Grafana dashboards](/assets/blog/2017-05-17/iadvize-dashboard-2.png) @@ -146,4 +148,4 @@ We use alertmanager to send some alerts by SMS or in to our time and business metrics. * I used to work with [Netuitive](http://www.netuitive.com/), it had a great anomaly detection feature with automatic correlation. It would be great to - have some in Prometheus. + have some in Prometheus. diff --git a/blog-posts/2017-06-14-interview-with-latelier-animation.md b/blog-posts/2017-06-14-interview-with-latelier-animation.md index 44d5acb5..08c446df 100644 --- a/blog-posts/2017-06-14-interview-with-latelier-animation.md +++ b/blog-posts/2017-06-14-interview-with-latelier-animation.md @@ -24,6 +24,7 @@ Our infrastructure consists of around 300 render blades, 150 workstations and twenty various servers. With the exception of a couple of Macs, everything runs on Linux ([CentOS](https://www.centos.org/)) and not a single Windows machine.   +   ## What was your pre-Prometheus monitoring experience?   diff --git a/blog-posts/2017-06-21-prometheus-20-alpha3-new-rule-format.md b/blog-posts/2017-06-21-prometheus-20-alpha3-new-rule-format.md index 7e0cc6e8..b390d5fd 100644 --- a/blog-posts/2017-06-21-prometheus-20-alpha3-new-rule-format.md +++ b/blog-posts/2017-06-21-prometheus-20-alpha3-new-rule-format.md @@ -6,23 +6,24 @@ --- Today we release the third alpha version of Prometheus 2.0. Aside from a variety of bug fixes in the new storage layer, it contains a few planned breaking changes. - + ## Flag Changes First, we moved to a new flag library, which uses the more common double-dash `--` prefix for flags instead of the single dash Prometheus used so far. Deployments have to be adapted accordingly. Additionally, some flags were removed with this alpha. The full list since Prometheus 1.0.0 is: - + * `web.telemetry-path` * All `storage.remote.*` flags * All `storage.local.*` flags * `query.staleness-delta` * `alertmanager.url` - - + + + ## Recording Rules changes - + Alerting and recording rules are one of the critical features of Prometheus. But they also come with a few design issues and missing features, namely: - + * All rules ran with the same interval. We could have some heavy rules that are better off being run at a 10-minute interval and some rules that could be run at 15-second intervals. * All rules were evaluated concurrently, which is actually Prometheus’ oldest [open bug](https://github.com/prometheus/prometheus/blob/main/rules/manager.go#L267). This has a couple of issues, the obvious one being that the load spikes every eval interval if you have a lot of rules. The other being that rules that depend on each other might be fed outdated data. For example: @@ -39,11 +40,11 @@ ALERT HighNetworkTraffic Here we are alerting over `instance:network_bytes:rate1m`, but `instance:network_bytes:rate1m` is itself being generated by another rule. We can get expected results only if the alert `HighNetworkTraffic` is run after the current value for `instance:network_bytes:rate1m` gets recorded. * Rules and alerts required users to learn yet another DSL. - + To solve the issues above, grouping of rules has been [proposed long back](https://github.com/prometheus/prometheus/issues/1095) but has only recently been implemented [as a part of Prometheus 2.0](https://github.com/prometheus/prometheus/pull/2842). As part of this implementation we have also moved the rules to the well-known YAML format, which also makes it easier to generate alerting rules based on common patterns in users’ environments. - + Here’s how the new format looks: - + ```yaml groups: - name: my-group-name @@ -58,19 +59,19 @@ groups: # multiple lines via YAML’s multi-line strings. expr: | sum without(instance) (instance:errors:rate5m) - / + / sum without(instance) (instance:requests:rate5m) for: 5m labels: severity: critical annotations: - description: "stuff's happening with {{ $labels.service }}" + description: "stuff's happening with {{ $labels.service }}" ``` - + The rules in each group are executed sequentially and you can have an evaluation interval per group. - + As this change is breaking, we are going to release it with the 2.0 release and have added a command to promtool for the migration: `promtool update rules ` The converted files have the `.yml` suffix appended and the `rule_files` clause in your Prometheus configuration has to be adapted. - - + + Help us moving towards the Prometheus 2.0 stable release by testing this new alpha version! You can report bugs on our [issue tracker](https://github.com/prometheus/prometheus/issues) and provide general feedback via our [community channels](https://prometheus.io/community/). diff --git a/blog-posts/2017-11-08-announcing-prometheus-2-0.md b/blog-posts/2017-11-08-announcing-prometheus-2-0.md index cbf84140..06058760 100644 --- a/blog-posts/2017-11-08-announcing-prometheus-2-0.md +++ b/blog-posts/2017-11-08-announcing-prometheus-2-0.md @@ -5,8 +5,6 @@ kind: article author_name: Fabian Reinartz on behalf of the Prometheus team --- -# Announcing Prometheus 2.0 - Nearly one and a half years ago, we released Prometheus 1.0 into the wild. The release marked a significant milestone for the project. We had reached a broad set of features that make up Prometheus' simple yet extremely powerful monitoring philosophy. Since then we added and improved on various service discovery integrations, extended PromQL, and experimented with a first iteration on remote APIs to enable pluggable long-term storage solutions. diff --git a/blog-posts/2017-11-30-prometheus-at-cloudnativecon.md b/blog-posts/2017-11-30-prometheus-at-cloudnativecon.md index d33cc785..0b86e46a 100644 --- a/blog-posts/2017-11-30-prometheus-at-cloudnativecon.md +++ b/blog-posts/2017-11-30-prometheus-at-cloudnativecon.md @@ -5,8 +5,6 @@ kind: article author_name: Tom Wilkie on behalf of the Prometheus team --- -## Prometheus at CloudNativeCon 2017 - Wednesday 6th December is Prometheus Day at CloudNativeCon Austin, and we’ve got a fantastic lineup of talks and events for you. Go to the Prometheus Salon for hands on advice on how best to monitor Kubernetes, attend a series of talks on diff --git a/blog-posts/2018-02-08-interview-with-scalefastr.md b/blog-posts/2018-02-08-interview-with-scalefastr.md index 3a751cfb..608ae4f6 100644 --- a/blog-posts/2018-02-08-interview-with-scalefastr.md +++ b/blog-posts/2018-02-08-interview-with-scalefastr.md @@ -13,18 +13,18 @@ from Scalefastr talks about how they are using Prometheus.* My name is Kevin Burton and I’m the CEO of [Scalefastr](https://www.scalefastr.io/). My background is in distributed systems and I’ve previously ran Datastreamer, a company that built a petabyte -scale distributed social media crawler and search engine. +scale distributed social media crawler and search engine. At Datastreamer we ran into scalability issues regarding our infrastructure and built out a high performance cluster based on Debian, Elasticsearch, Cassandra, -and Kubernetes. +and Kubernetes. We found that many of our customers were also struggling with their infrastructure and I was amazed at how much they were paying for hosting large amounts of content on AWS and Google Cloud. We continually evaluated what it costs to run in the cloud and for us our -hosting costs would have been about 5-10x what we currently pay. +hosting costs would have been about 5-10x what we currently pay. We made the decision to launch a new cloud platform based on Open Source and cloud native technologies like Kubernetes, Prometheus, Elasticsearch, @@ -33,6 +33,7 @@ Cassandra, Grafana, Etcd, etc. We’re currently hosting a few customers in the petabyte scale and are soft launching our new platform this month. + ## What was your pre-Prometheus monitoring experience? @@ -46,14 +47,14 @@ We built a platform based on KairosDB, Grafana, and our own (simple) visualization engine which worked out really well for quite a long time. They key problem we saw with KairosDB was the rate of adoption and customer -demand for Prometheus. +demand for Prometheus. Additionally, what’s nice about Prometheus is the support for exporters implemented by either the projects themselves or the community. With KairosDB we would often struggle to build out our own exporters. The chance that an exporter for KairosDB already existing was rather low compared -to Prometheus. +to Prometheus. For example, there is CollectD support for KairosDB but it’s not supported very well in Debian and there are practical bugs with CollectD that prevent it from @@ -68,13 +69,13 @@ Prometheus metrics once there are hosted platforms like Scalefastr which integrate it as a standardized and supported product. Having visibility into your application performance is critical and the high -scalability of Prometheus is necessary to make that happen. +scalability of Prometheus is necessary to make that happen. ## Why did you decide to look at Prometheus? We were initially curious how other people were monitoring their Kubernetes and -container applications. +container applications. One of the main challenges of containers is the fact that they can come and go quickly leaving behind both log and metric data that needs to be analyzed. @@ -90,19 +91,19 @@ dashboards. ## How did you transition? The transition was somewhat painless for us since Scalefastr is a greenfield -environment. +environment. The architecture for the most part is new with very few limiting factors. Our main goal is to deploy on bare metal but build cloud features on top of -existing and standardized hardware. +existing and standardized hardware. The idea is to have all analytics in our cluster backed by Prometheus. We provide customers with their own “management” infrastructure which includes Prometheus, Grafana, Elasticsearch, and Kibana as well as a Kubernetes control plane. We orchestrate this system with Ansible which handles initial machine -setup (ssh, core Debian packages, etc.) and baseline configuration. +setup (ssh, core Debian packages, etc.) and baseline configuration. We then deploy Prometheus, all the required exporters for the customer configuration, and additionally dashboards for Grafana. @@ -111,7 +112,7 @@ One thing we found to be somewhat problematic is that a few dashboards on Grafana.com were written for Prometheus 1.x and did not port cleanly to 2.x. It turns out that there are only a few functions not present in the 2.x series and many of them just need a small tweak here and there. Additionally, some -of the dashboards were written for an earlier version of Grafana. +of the dashboards were written for an earlier version of Grafana. To help solve that we announced a project this week to [standardize and improve dashboards for @@ -119,12 +120,12 @@ Prometheus](https://www.scalefastr.io/single-post/2018/01/26/Scalefastr-Grafana- for tools like Cassandra, Elasticsearch, the OS, but also Prometheus itself. We open sourced this and [published it to Github](https://github.com/scalefastr/scalefastr-prometheus-grafana-dashboards) -last week. +last week. We’re hoping this makes it easy for other people to migrate to Prometheus. One thing we want to improve is to automatically sync it with our Grafana -backend but also to upload these dashboards to Grafana.com. +backend but also to upload these dashboards to Grafana.com. We also published our Prometheus configuration so that the labels work correctly with our Grafana templates. This allows you to have a pull down menu @@ -145,7 +146,7 @@ parts made it an easy decision. Right now we’re deploying Elasticsearch and Cassandra directly on bare metal. We’re working to run these in containers directly on top of Kubernetes and working toward using the Container Storage Interface (CSI) to make this -possible. +possible. Before we can do this we need to get Prometheus service discovery working and this is something we haven’t played with yet. Currently we deploy and @@ -154,15 +155,15 @@ with Kubernetes since containers can come and go as our workload changes. We’re also working on improving the standard dashboards and alerting. One of the features we would like to add (maybe as a container) is support for -alerting based on holts winters forecasting. +alerting based on holts winters forecasting. This would essentially allow us to predict severe performance issues before they happen. Rather than waiting for something to fail (like running out of -disk space) until we take action to correct it. +disk space) until we take action to correct it. To a certain extent Kubernetes helps with this issue since we can just add nodes to the cluster based on a watermark. Once resource utilization is too -high we can just auto-scale. +high we can just auto-scale. We’re very excited about the future of Prometheus especially now that we’re moving forward on the 2.x series and the fact that CNCF collaboration seems to diff --git a/blog-posts/2018-03-16-interview-with-datawire.md b/blog-posts/2018-03-16-interview-with-datawire.md index 17536448..3a6cfe23 100644 --- a/blog-posts/2018-03-16-interview-with-datawire.md +++ b/blog-posts/2018-03-16-interview-with-datawire.md @@ -28,8 +28,9 @@ We used AWS CloudWatch. This was easy to set up, but we found that as we adopted a more distributed development model (microservices), we wanted more flexibility and control. For example, we wanted each team to be able to customize their monitoring on an as-needed basis, without requiring operational -help. +help. + ## Why did you decide to look at Prometheus? diff --git a/blog-posts/2018-07-05-implementing-custom-sd.md b/blog-posts/2018-07-05-implementing-custom-sd.md index b7b32d55..31eeb620 100644 --- a/blog-posts/2018-07-05-implementing-custom-sd.md +++ b/blog-posts/2018-07-05-implementing-custom-sd.md @@ -5,8 +5,6 @@ kind: article author_name: Callum Styan --- -## Implementing Custom Service Discovery - Prometheus contains built in integrations for many service discovery (SD) systems such as Consul, Kubernetes, and public cloud providers such as Azure. However, we can’t provide integration implementations for every service discovery option out there. The Prometheus team is already stretched @@ -34,6 +32,8 @@ Integrations using file_sd, such as those that are implemented with the adapter Let’s take a look at the example code. + + ## Adapter First we have the file [adapter.go](https://github.com/prometheus/prometheus/blob/main/documentation/examples/custom-sd/adapter/adapter.go). diff --git a/blog-posts/2018-08-09-prometheus-graduates-within-cncf.md b/blog-posts/2018-08-09-prometheus-graduates-within-cncf.md index 45b5759b..0568ed84 100644 --- a/blog-posts/2018-08-09-prometheus-graduates-within-cncf.md +++ b/blog-posts/2018-08-09-prometheus-graduates-within-cncf.md @@ -17,6 +17,8 @@ Since reaching incubation level, a lot of things happened; some of which stand o * We had a large push towards stability, especially with 2.3.2 * We started a documentation push with a special focus on making Prometheus adoption and joining the community easier + + Especially the last point is important as we currently enter our fourth phase of adoption. These phases were adoption by 1. Monitoring-centric users actively looking for the very best in monitoring diff --git a/blog-posts/2018-08-23-interview-with-presslabs.md b/blog-posts/2018-08-23-interview-with-presslabs.md index eb2545c7..f986ccdd 100644 --- a/blog-posts/2018-08-23-interview-with-presslabs.md +++ b/blog-posts/2018-08-23-interview-with-presslabs.md @@ -13,7 +13,7 @@ from Presslabs talks about their monitoring journey.* [Presslabs](https://www.presslabs.com/) is a high-performance managed WordPress hosting platform targeted at publishers, Enterprise brands and digital agencies which seek to offer a seamless experience to their website visitors, 100% of -the time. +the time. Recently, we have developed an innovative component to our core product—WordPress Business Intelligence. Users can now get real—time, @@ -22,12 +22,14 @@ issue-to-deployment process and continuous improvement of their sites. We support the seamless delivery of up to 2 billion pageviews per month, on a fleet of 100 machines entirely dedicated to managed WordPress hosting for -demanding customers. +demanding customers. We’re currently on our mission to bring the best experience to WordPress publishers around the world. In this journey, Kubernetes facilitates our route to an upcoming standard in high availability WordPress hosting infrastructure. + + ## What was your pre-Prometheus monitoring experience? We started building our WordPress hosting platform back in 2009. At the time, @@ -40,7 +42,7 @@ on our platform. Graphite was our second choice on the list, which solved the time challenge addressed by Munin. We added collectd in to the mix to expose metrics, and used -Graphite to collect and aggregate it. +Graphite to collect and aggregate it. Then we made Viz, a tool we’ve written in JavaScript & Python for visualisation and alerting. However, we stopped actively using this service because diff --git a/blog-posts/2019-01-28-subquery-support.md b/blog-posts/2019-01-28-subquery-support.md index 0fda3241..b8014dc7 100644 --- a/blog-posts/2019-01-28-subquery-support.md +++ b/blog-posts/2019-01-28-subquery-support.md @@ -21,6 +21,8 @@ When you want some quick results on data spanning days or weeks, it can be quite With subquery support, all the waiting and frustration is taken care of. + + ## Subqueries A subquery is similar to a [/api/v1/query_range](https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries) API call, but embedded within an instant query. The result of a subquery is a range vector. diff --git a/blog-posts/2019-02-06-interview-with-hostinger.md b/blog-posts/2019-02-06-interview-with-hostinger.md index bd5f6c9e..9a5b92ca 100644 --- a/blog-posts/2019-02-06-interview-with-hostinger.md +++ b/blog-posts/2019-02-06-interview-with-hostinger.md @@ -39,6 +39,8 @@ Beautiful is better than ugly, right? We use CumulusOS which is in our case mostly x86 thus there is absolutely no problem to run any kind of Linux stuff. + + ## Why did you decide to look at Prometheus? In 2015 when we started automating everything that could be automated, diff --git a/blog-posts/2019-06-18-interview-with-forgerock.md b/blog-posts/2019-06-18-interview-with-forgerock.md index 961b2dd3..f22dc481 100644 --- a/blog-posts/2019-06-18-interview-with-forgerock.md +++ b/blog-posts/2019-06-18-interview-with-forgerock.md @@ -28,6 +28,8 @@ most recent versions. Other products only had REST or JMX. As a result, monitoring the whole platform was complex and required tools that were able to integrate those protocols. + + ## Why did you decide to look at Prometheus? We needed to have a single and common interface for monitoring all our diff --git a/blog-posts/2019-10-10-remote-read-meets-streaming.md b/blog-posts/2019-10-10-remote-read-meets-streaming.md index f769f852..0f8fb605 100644 --- a/blog-posts/2019-10-10-remote-read-meets-streaming.md +++ b/blog-posts/2019-10-10-remote-read-meets-streaming.md @@ -24,6 +24,8 @@ This API allows 3rd party systems to interact with metrics data through two meth Both methods are using HTTP with messages encoded with [protobufs](https://github.com/protocolbuffers/protobuf). The request and response for both methods are compressed using [snappy](https://github.com/google/snappy). + + ### Remote Write This is the most popular way to replicate Prometheus data into 3rd party system. In this mode, Prometheus streams samples, diff --git a/blog-posts/2021-02-17-introducing-feature-flags.md b/blog-posts/2021-02-17-introducing-feature-flags.md index 2ebc5489..a76dd64e 100644 --- a/blog-posts/2021-02-17-introducing-feature-flags.md +++ b/blog-posts/2021-02-17-introducing-feature-flags.md @@ -19,6 +19,8 @@ The features in this list are considered experimental and comes with following c * For example the assumption that a query does not look ahead of the evaluation time for samples, which will be broken by `@` modifier and negative offset. 4. They may be unstable but we will try to keep them stable, of course. + + These considerations allow us to be more bold with experimentation and to innovate more quickly. Once any feature gets widely used and is considered stable with respect to its API, behavior, and implementation, they may be moved from disabled features list and enabled by default . If we find any feature to be not worth it or broken, we may completely remove it. If enabling some feature is considered a big breaking change for Prometheus, it would stay disabled until the next major release. -Keep an eye out on this list on every release, and do try them out! \ No newline at end of file +Keep an eye out on this list on every release, and do try them out! diff --git a/blog-posts/2021-02-18-introducing-the-@-modifier.md b/blog-posts/2021-02-18-introducing-the-@-modifier.md index bdb201d8..f6521723 100644 --- a/blog-posts/2021-02-18-introducing-the-@-modifier.md +++ b/blog-posts/2021-02-18-introducing-the-@-modifier.md @@ -17,6 +17,8 @@ In Prometheus v2.25.0, we have introduced a new PromQL modifier `@`. Similar to The `` is a unix timestamp and described with a float literal. + + For example, the query `http_requests_total @ 1609746000` returns the value of `http_requests_total` at `2021-01-04T07:40:00+00:00`. The query `rate(http_requests_total[5m] @ 1609746000)` returns the 5-minute rate of `http_requests_total` at the same time. Additionally, `start()` and `end()` can also be used as values for the `@` modifier as special values. For a range query, they resolve to the start and end of the range query respectively and remain the same for all steps. For an instant query, `start()` and `end()` both resolve to the evaluation time. @@ -29,4 +31,4 @@ Coming back to the `topk()` fix, the following query plots the `1m` rate of `htt Similarly, the `topk()` ranking can be replaced with other functions like `histogram_quantile()` which only makes sense as an instant query right now. `rate()` can be replaced with `_over_time()`, etc. Let us know how you use this new modifier! -`@` modifier is disabled by default and can be enabled using the flag `--enable-feature=promql-at-modifier`. Learn more about feature flags in [this blog post](https://prometheus.io/blog/2021/02/17/introducing-feature-flags/) and find the docs for `@` modifier [here](https://prometheus.io/docs/prometheus/latest/querying/basics/#modifier). \ No newline at end of file +`@` modifier is disabled by default and can be enabled using the flag `--enable-feature=promql-at-modifier`. Learn more about feature flags in [this blog post](https://prometheus.io/blog/2021/02/17/introducing-feature-flags/) and find the docs for `@` modifier [here](https://prometheus.io/docs/prometheus/latest/querying/basics/#modifier). diff --git a/blog-posts/2021-05-03-introducing-prometheus-conformance-program.md b/blog-posts/2021-05-03-introducing-prometheus-conformance-program.md index dd3610a7..71f4ed7e 100644 --- a/blog-posts/2021-05-03-introducing-prometheus-conformance-program.md +++ b/blog-posts/2021-05-03-introducing-prometheus-conformance-program.md @@ -11,6 +11,8 @@ The CNCF Governing Board is expected to formally review and approve the program With the help of our [extensive and expanding test suite](https://github.com/prometheus/compliance), projects and vendors can determine the compliance to our specifications and compatibility within the Prometheus ecosystem. + + At launch, we are offering compliance tests for three components: * PromQL (needs manual interpretation, somewhat complete) diff --git a/blog-posts/2021-05-04-prometheus-conformance-remote-write-compliance.md b/blog-posts/2021-05-04-prometheus-conformance-remote-write-compliance.md index fefbc91d..254ffded 100644 --- a/blog-posts/2021-05-04-prometheus-conformance-remote-write-compliance.md +++ b/blog-posts/2021-05-04-prometheus-conformance-remote-write-compliance.md @@ -12,6 +12,8 @@ To give everyone an overview of where the ecosystem is before running tests offi During Monday's [PromCon](https://promcon.io/2021-online/), [Tom Wilkie](https://twitter.com/tom_wilkie) presented the test results from the time of recording a few weeks ago. In the live section, he already had an [update](https://docs.google.com/presentation/d/1RcN58LlS3V5tYCUsftqUvNuCpCsgGR2P7-GoH1MVL0Q/edit#slide=id.gd1789c7f7c_0_0). Two days later we have two more updates: The addition of the [observability pipeline tool Vector](https://github.com/prometheus/compliance/pull/24), as well as [new versions of existing systems](https://github.com/prometheus/compliance/pull/25). + + So, without further ado, the current results in alphabetical order are: | Sender | Version | Score diff --git a/blog-posts/2021-06-10-on-ransomware-naming.md b/blog-posts/2021-06-10-on-ransomware-naming.md index 61ee68b2..137fe4cb 100644 --- a/blog-posts/2021-06-10-on-ransomware-naming.md +++ b/blog-posts/2021-06-10-on-ransomware-naming.md @@ -11,6 +11,8 @@ The names "Prometheus" and "Thanos" have [recently been taken up by a ransomware While we do *NOT* have reason to believe that this group will try to trick anyone into downloading fake binaries of our projects, we still recommend following common supply chain & security practices. When deploying software, do it through one of those mechanisms: + + * Binary downloads from the official release pages for [Prometheus](https://github.com/prometheus/prometheus/releases) and [Thanos](https://github.com/thanos-io/thanos/releases), with verification of checksums provided. * Docker downloads from official project controlled repositories: * Prometheus: https://quay.io/repository/prometheus/prometheus and https://hub.docker.com/r/prom/prometheus diff --git a/blog-posts/2021-10-14-prometheus-conformance-results.md b/blog-posts/2021-10-14-prometheus-conformance-results.md index b0a155a2..d9f2a4b6 100644 --- a/blog-posts/2021-10-14-prometheus-conformance-results.md +++ b/blog-posts/2021-10-14-prometheus-conformance-results.md @@ -9,6 +9,8 @@ Today, we're launching the [Prometheus Conformance Program](/blog/2021/05/03/int As a quick reminder: The program is called Prometheus **Conformance**, software can be **compliant** to specific tests, which result in a **compatibility** rating. The nomenclature might seem complex, but it allows us to speak about this topic without using endless word snakes. + + # Preamble ## New Categories diff --git a/blog-posts/2021-11-16-agent.md b/blog-posts/2021-11-16-agent.md index 8bfd8d67..aad90c65 100644 --- a/blog-posts/2021-11-16-agent.md +++ b/blog-posts/2021-11-16-agent.md @@ -15,6 +15,8 @@ What I personally love in the Prometheus project, and one of the many reasons wh In the end, we hardly see Kubernetes clusters without Prometheus running there. + + The strong focus of the Prometheus community allowed other open-source projects to grow too to extend the Prometheus deployment model beyond single nodes (e.g. [Cortex](https://cortexmetrics.io/), [Thanos](https://thanos.io/) and more). Not mentioning cloud vendors adopting Prometheus' API and data model (e.g. [Amazon Managed Prometheus](https://aws.amazon.com/prometheus/), [Google Cloud Managed Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus), [Grafana Cloud](https://grafana.com/products/cloud/) and more). If you are looking for a single reason why the Prometheus project is so successful, it is this: **Focusing the monitoring community on what matters**. In this (lengthy) blog post, I would love to introduce a new operational mode of running Prometheus called "Agent". It is built directly into the Prometheus binary. The agent mode disables some of Prometheus' usual features and optimizes the binary for scraping and remote writing to remote locations. Introducing a mode that reduces the number of features enables new usage patters. In this blog post I will explain why it is a game-changer for certain deployments in the CNCF ecosystem. I am super excited about this! @@ -37,7 +39,7 @@ What does that mean? That means monitoring data has to be somehow aggregated, pr Naively, we could think about implementing this by either putting Prometheus on that global level and scraping metrics across remote networks or pushing metrics directly from the application to the central location for monitoring purposes. Let me explain why both are generally *very* bad ideas: -🔥 Scraping across network boundaries can be a challenge if it adds new unknowns in a monitoring pipeline. The local pull model allows Prometheus to know why exactly the metric target has problems and when. Maybe it's down, misconfigured, restarted, too slow to give us metrics (e.g. CPU saturated), not discoverable by service discovery, we don't have credentials to access or just DNS, network, or the whole cluster is down. By putting our scraper outside of the network, we risk losing some of this information by introducing unreliability into scrapes that is unrelated to an individual target. On top of that, we risk losing important visibility completely if the network is temporarily down. Please don't do it. It's not worth it. (: +🔥 Scraping across network boundaries can be a challenge if it adds new unknowns in a monitoring pipeline. The local pull model allows Prometheus to know why exactly the metric target has problems and when. Maybe it's down, misconfigured, restarted, too slow to give us metrics (e.g. CPU saturated), not discoverable by service discovery, we don't have credentials to access or just DNS, network, or the whole cluster is down. By putting our scraper outside of the network, we risk losing some of this information by introducing unreliability into scrapes that is unrelated to an individual target. On top of that, we risk losing important visibility completely if the network is temporarily down. Please don't do it. It's not worth it. (: 🔥 Pushing metrics directly from the application to some central location is equally bad. Especially when you monitor a larger fleet, you know literally nothing when you don't see metrics from remote applications. Is the application down? Is my receiver pipeline down? Maybe the application failed to authorize? Maybe it failed to get the IP address of my remote cluster? Maybe it's too slow? Maybe the network is down? Worse, you may not even know that the data from some application targets is missing. And you don't even gain a lot as you need to track the state and status of everything that should be sending data. Such a design needs careful analysis as it can be a recipe for a failure too easily. diff --git a/blog-posts/2023-03-21-stringlabel.md b/blog-posts/2023-03-21-stringlabel.md index 2db5846f..329dc174 100644 --- a/blog-posts/2023-03-21-stringlabel.md +++ b/blog-posts/2023-03-21-stringlabel.md @@ -11,7 +11,7 @@ which uses a new data structure for labels. This blog post will answer some frequently asked questions about the 2.43 release and the `stringlabels` optimizations. -### What is the `stringlabels` release? +## What is the `stringlabels` release? The `stringlabels` release is a Prometheus 2.43 version that uses a new data structure for labels. It stores all the label/values in a single string, @@ -19,19 +19,21 @@ resulting in a smaller heap size and some speedups in most cases. These optimizations are not shipped in the default binaries and require compiling Prometheus using the Go tag `stringlabels`. -### Why didn't you go for a feature flag that we can toggle? + + +## Why didn't you go for a feature flag that we can toggle? We considered using a feature flag but it would have a memory overhead that was not worth it. Therefore, we decided to provide a separate release with these optimizations for those who are interested in testing and measuring the gains on their production environment. -### When will these optimizations be generally available? +## When will these optimizations be generally available? These optimizations will be available in the upcoming Prometheus 2.44 release by default. -### How do I get the 2.43 release? +## How do I get the 2.43 release? The [Prometheus 2.43 release](https://github.com/prometheus/prometheus/releases/tag/v2.43.0) is available on the official Prometheus GitHub releases page, and users can download the binary files directly from there. @@ -44,7 +46,7 @@ release](https://github.com/prometheus/prometheus/releases/tag/v2.43.0%2Bstringl binary or the [Docker images tagged v2.43.0-stringlabels](https://quay.io/repository/prometheus/prometheus?tab=tags) specifically. -### Why is the release `v2.43.0+stringlabels` and the Docker tag `v2.43.0-stringlabels`? +## Why is the release `v2.43.0+stringlabels` and the Docker tag `v2.43.0-stringlabels`? In semantic versioning, the plus sign (+) is used to denote build metadata. Therefore, the Prometheus 2.43 release with the `stringlabels` @@ -55,7 +57,7 @@ the plus sign in their names. Hence, the plus sign has been replaced with a dash pass the semantic versioning checks of downstream projects such as the Prometheus Operator. -### What are the other noticeable features in the Prometheus 2.43 release? +## What are the other noticeable features in the Prometheus 2.43 release? Apart from the `stringlabels` optimizations, the Prometheus 2.43 release brings several new features and enhancements. Some of the significant additions diff --git a/blog-posts/2023-09-01-promcon2023-schedule.md b/blog-posts/2023-09-01-promcon2023-schedule.md index b809c16b..7a1cd0cd 100644 --- a/blog-posts/2023-09-01-promcon2023-schedule.md +++ b/blog-posts/2023-09-01-promcon2023-schedule.md @@ -13,6 +13,8 @@ Now in its 8th installment, PromCon brings together Prometheus users and develop "We are super excited for PromCon to be coming home to Berlin. Prometheus was started in Berlin at Soundcloud in 2012. The first PromCon was hosted in Berlin and in between moved to Munich. This year we're hosting around 300 attendees at Radialsystem in Friedrichshain, Berlin. Berlin has a vibrant Prometheus community and many of the Prometheus team members live in the neighborhood. It is a great opportunity to network and connect with the Prometheus family who are all passionate about systems and service monitoring," said Matthias Loibl, Senior Software Engineer at Polar Signals and Prometheus team member who leads this year's PromCon program committee. "It will be a great event to learn about the latest developments from the Prometheus team itself and connect to some big-scale users of Prometheus up close." + + The community-curated schedule will feature sessions from open source community members, including: - [Towards making Prometheus OpenTelemetry native](https://promcon.io/2023-berlin/talks/towards-making-prometheus-opentelemetry-native) diff --git a/blog-posts/2024-09-11-prometheus-3-beta.md b/blog-posts/2024-09-11-prometheus-3-beta.md index 7c627836..f4fbc2f9 100644 --- a/blog-posts/2024-09-11-prometheus-3-beta.md +++ b/blog-posts/2024-09-11-prometheus-3-beta.md @@ -14,12 +14,12 @@ In general, the only breaking changes are the removal of deprecated feature flag -# What's New +## What's New With over 7500 commits in the 7 years since Prometheus 2.0 came out there are too many new individual features and fixes to list, but there are some big shiny and breaking changes we wanted to call out. We need everyone in the community to try them out and report any issues you might find. The more feedback we get, the more stable the final 3.0 release can be. -## New UI +### New UI One of the highlights in Prometheus 3.0 is its brand new UI that is enabled by default: @@ -31,15 +31,15 @@ Learn more about the new UI in general in [Julius' detailed article on the PromL Users can temporarily enable the old UI by using the `old-ui` feature flag. Since the new UI is not battle-tested yet, it is also very possible that there are still bugs. If you find any, please [report them on GitHub](https://github.com/prometheus/prometheus/issues/new?assignees=&labels=&projects=&template=bug_report.yml). -## Remote Write 2.0 +### Remote Write 2.0 Remote-Write 2.0 iterates on the previous protocol version by adding native support for a host of new elements including metadata, exemplars, created timestamp and native histograms. It also uses string interning to reduce payload size and CPU usage when compressing and decompressing. More details can be found [here](https://prometheus.io/docs/specs/remote_write_spec_2_0/). -## OpenTelemetry Support +### OpenTelemetry Support Prometheus intends to be the default choice for storing OpenTelemetry metrics, and 3.0 includes some big new features that makes it even better as a storage backend for OpenTelemetry metrics data. -### UTF-8 +#### UTF-8 By default, Prometheus will allow all valid UTF-8 characters to be used in metric and label names, as well as label values as has been true in version 2.x. @@ -47,16 +47,16 @@ Users will need to make sure their metrics producers are configured to pass UTF- Not all language bindings have been updated with support for UTF-8 but the primary Go libraries have been. -### OTLP Ingestion +#### OTLP Ingestion Prometheus can be configured as a native receiver for the OTLP Metrics protocol, receiving OTLP metrics on the /api/v1/otlp/v1/metrics endpoint. -## Native Histograms +### Native Histograms Native histograms are a Prometheus metric type that offer a higher efficiency and lower cost alternative to Classic Histograms. Rather than having to choose (and potentially have to update) bucket boundaries based on the data set, native histograms have pre-set bucket boundaries based on exponential growth. Native Histograms are still experimental and not yet enabled by default, and can be turned on by passing `--enable-feature=native-histograms`. Some aspects of Native Histograms, like the text format and accessor functions / operators are still under active design. -## Other Breaking Changes +### Other Breaking Changes The following feature flags have been removed, being enabled by default instead. References to these flags should be removed from configs, and will be ignored in Prometheus starting with version 3.0 diff --git a/blog-posts/2024-11-14-prometheus-3-0.md b/blog-posts/2024-11-14-prometheus-3-0.md index 8a1c2e86..f05fac62 100644 --- a/blog-posts/2024-11-14-prometheus-3-0.md +++ b/blog-posts/2024-11-14-prometheus-3-0.md @@ -16,11 +16,11 @@ The full 3.0 release adds some new features on top of the beta and also introduc -# What's New +## What's New Here is a summary of the exciting changes that have been released as part of the beta version, as well as what has been added since: -## New UI +### New UI One of the highlights in Prometheus 3.0 is its brand-new UI that is enabled by default: @@ -39,14 +39,14 @@ Since the beta, the user interface has been updated to support UTF-8 metric and ![New UTF-8 UI](/assets/blog/2024-11-14/utf8_ui.png) -## Remote Write 2.0 +### Remote Write 2.0 Remote-Write 2.0 iterates on the previous protocol version by adding native support for a host of new elements including metadata, exemplars, created timestamp and native histograms. It also uses string interning to reduce payload size and CPU usage when compressing and decompressing. There is better handling for partial writes to provide more details to clients when this occurs. More details can be found [here](https://prometheus.io/docs/specs/remote_write_spec_2_0/). -## UTF-8 Support +### UTF-8 Support Prometheus now allows all valid UTF-8 characters to be used in metric and label names by default, as well as label values, as has been true in version 2.x. @@ -57,18 +57,18 @@ in order to retrieve UTF-8 metrics, or users can specify the `__name__` label n Currently only the Go client library has been updated to support UTF-8, but support for other languages will be added soon. -## OTLP Support +### OTLP Support In alignment with [our commitment to OpenTelemetry](https://prometheus.io/blog/2024/03/14/commitment-to-opentelemetry/), Prometheus 3.0 features several new features to improve interoperability with OpenTelemetry. -### OTLP Ingestion +#### OTLP Ingestion Prometheus can be configured as a native receiver for the OTLP Metrics protocol, receiving OTLP metrics on the `/api/v1/otlp/v1/metrics` endpoint. See our [guide](https://prometheus.io/docs/guides/opentelemetry) on best practices for consuming OTLP metric traffic into Prometheus. -### UTF-8 Normalization +#### UTF-8 Normalization With Prometheus 3.0, thanks to [UTF-8 support](#utf-8-support), users can store and query OpenTelemetry metrics without annoying changes to metric and label names like [changing dots to underscores](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/pkg/translator/prometheus). @@ -78,19 +78,19 @@ To achieve this for OTLP ingestion, Prometheus 3.0 has experimental support for > NOTE: While “NoUTF8EscapingWithSuffixes” strategy allows special characters, it still adds required suffixes for the best experience. See [the proposal on the future work to enable no suffixes](https://github.com/prometheus/proposals/pull/39) in Prometheus. -## Native Histograms +### Native Histograms Native histograms are a Prometheus metric type that offer a higher efficiency and lower cost alternative to Classic Histograms. Rather than having to choose (and potentially have to update) bucket boundaries based on the data set, native histograms have pre-set bucket boundaries based on exponential growth. Native Histograms are still experimental and not yet enabled by default, and can be turned on by passing `--enable-feature=native-histograms`. Some aspects of Native Histograms, like the text format and accessor functions / operators are still under active design. -## Breaking Changes +### Breaking Changes The Prometheus community strives to [not break existing features within a major release](https://prometheus.io/docs/prometheus/latest/stability/). With a new major release we took the opportunity to clean up a few, but small, long-standing issues. In other words, Prometheus 3.0 contains a few breaking changes. This includes changes to feature flags, configuration files, PromQL, and scrape protocols. Please read the [migration guide](https://prometheus.io/docs/prometheus/3.0/migration/) to find out if your setup is affected and what actions to take. -# Performance +## Performance It’s impressive to see what we have accomplished in the community since Prometheus 2.0. We all love numbers, so let’s celebrate the efficiency improvements we made for both CPU and memory use for the TSDB mode. Below you can see performance numbers between 3 Prometheus versions on the node with 8 CPU and 49 GB allocatable memory. @@ -105,7 +105,7 @@ It’s impressive to see what we have accomplished in the community since Promet It’s furthermore impressive that those numbers were taken using our [prombench macrobenchmark](https://github.com/prometheus/prometheus/pull/15366) that uses the same PromQL queries, configuration and environment–highlighting backward compatibility and stability for the core features, even with 3.0. -# What's Next +## What's Next There are still tons of exciting features and improvements we can make in Prometheus and the ecosystem. Here is a non-exhaustive list to get you excited and… hopefully motivate you to contribute and join us! @@ -117,7 +117,7 @@ hopefully motivate you to contribute and join us! * More optimizations! * UTF-8 support coverage in more SDKs and tools -# Try It Out! +## Try It Out! You can try out Prometheus 3.0 by downloading it from our [official binaries](https://prometheus.io/download/#prometheus) and [container images](https://quay.io/repository/prometheus/prometheus?tab=tags). diff --git a/blog-posts/2024-11-19-yace-joining-prometheus-community.md b/blog-posts/2024-11-19-yace-joining-prometheus-community.md index 09a3e915..1a17adf1 100644 --- a/blog-posts/2024-11-19-yace-joining-prometheus-community.md +++ b/blog-posts/2024-11-19-yace-joining-prometheus-community.md @@ -7,14 +7,14 @@ author_name: Thomas Peitz (@thomaspeitz) [Yet Another Cloudwatch Exporter](https://github.com/prometheus-community/yet-another-cloudwatch-exporter) (YACE) has officially joined the Prometheus community! This move will make it more accessible to users and open new opportunities for contributors to enhance and maintain the project. There's also a blog post from [Cristian Greco's point of view](https://grafana.com/blog/2024/11/19/yace-moves-to-prometheus-community/). - - ## The early days When I first started YACE, I had no idea it would grow to this scale. At the time, I was working with [Invision AG](https://www.ivx.com) (not to be confused with the design app), a company focused on workforce management software. They fully supported me in open-sourcing the tool, and with the help of my teammate [Kai Forsthövel](https://github.com/kforsthoevel), YACE was brought to life. Our first commit was back in 2018, with one of our primary goals being to make CloudWatch metrics easy to scale and automatically detect what to measure, all while keeping the user experience simple and intuitive. InVision AG was scaling their infrastructure up and down due to machine learning workloads and we needed something that detects new infrastructure easily. This focus on simplicity has remained a core priority. From that point on, YACE began to find its audience. + + ## Yace Gains Momentum As YACE expanded, so did the support around it. One pivotal moment was when [Cristian Greco](https://github.com/cristiangreco) from Grafana Labs reached out. I was feeling overwhelmed and hardly keeping up when Cristian stepped in, simply asking where he could help. He quickly became the main releaser and led Grafana Labs' contributions to YACE, a turning point that made a huge impact on the project. Along with an incredible community of contributors from all over the world, they elevated YACE beyond what I could have achieved alone, shaping it into a truly global tool. YACE is no longer just my project or Invision's—it belongs to the community. diff --git a/src/app/blog/[year]/[month]/[day]/[slug]/page.tsx b/src/app/blog/[year]/[month]/[day]/[slug]/page.tsx index 97cf306b..45ef3af2 100644 --- a/src/app/blog/[year]/[month]/[day]/[slug]/page.tsx +++ b/src/app/blog/[year]/[month]/[day]/[slug]/page.tsx @@ -31,7 +31,7 @@ export default async function BlogPostPage({ return ( - + <Title order={1} mt={0} mb="xs"> {frontmatter.title}