…And Metrics For All


The Presentation inside:

Slide 0

…And Metrics For All Paul O’Connor github.com/pauloconnor 2015-05-19


Slide 1

About Yelp Founded: 2004 Monthly Active Users: ~142 Million Non-US Monthly Users: ~31 Million Review: ~77 Million Local Businesses: 2.1 Million Territories: Available in 31 countries


Slide 2

What are metrics? Name Value


Slide 3

What are metrics? Name Value Timestamp


Slide 4

What are metrics? Name Value Timestamp server1.load.1m 28.826667 1431950640


Slide 5

What are metrics? Name Value Timestamp server1.load.1m 28.826667 1431950640 server1.load.1m 29.188333 1431950700 server1.load.1m 29.231667 1431950760 server1.load.1m 29.083333 1431950820 server1.load.1m 29.710000 1431950880


Slide 6

What are metrics? Name Value Timestamp server1.load.1m 28.826667 1431950640 server1.load.1m 29.188333 1431950700 server1.load.1m 29.231667 1431950760 server1.load.1m 29.083333 1431950820 server1.load.1m 29.710000 1431950880


Slide 7

Graphite Components Carbon: relay cache aggregator Whisper Web app


Slide 8

Carbon Relay Deals with 2 things Replication Sharding


Slide 9

Relay Methods Rules [replicate] pattern = ^services\.ads\..+ servers = 10.1.2.3, 10.2.2.3 continue = true Consistent Hashing Defines a sharding strategy across multiple backends 10


Slide 10

Carbon Cache Receives metrics and persists them to disk Writes based on storage schemas 11


Slide 11

Storage Schemas Details retention rates for storing metrics [databases_10sec_1year] pattern = ^servers\.db.*$ retentions = 10s:7d,1m:30d,5m:90d,30m:365d 12


Slide 12

Storage Aggregation Rules for aggregating data to lower-precision retentions [all_min] pattern = \.min$ xFilesFactor = 0.1 aggregationMethod = min 13


Slide 13

Carbon Aggregator Buffers metrics before forwarding to carbon cache Roll up metrics based on rules 14


Slide 14

Aggregation Rules Not to be confused with storage aggregation Tells the carbon aggregator what to aggregate and how output_template (frequency) = method input_pattern <env>.applications.<app>.all.requests (60) = sum <env>.applications.<app>.*.requests prod.applications.apache.www01.requests prod.applications.apache.www02.requests prod.applications.apache.www03.requests prod.applications.apache.www04.requests prod.applications.apache.www05.requests prod.applications.apache.all.requests 15


Slide 15

Whisper Fixed size database Allows for roll ups Allows for backfilling data 16


Slide 16

Web App Django based app for rendering graphs 17


Slide 17

Putting it all together Carbon cache listening on port 2003 Write to disk Listen with web 18


Slide 18

Getting more complicated Carbon relay using consistent hashing to multiple caches Individual caches responsible for specific metrics 19


Slide 19

More Relays Use HAProxy to load balance between relays Use more relays to use CPU 20


Slide 20

Even more relays Useful for sending metrics to other locations 21


Slide 21

Replicate the metrics Duplicate your metrics for backup, and redundancy 22


Slide 22

More caches instead Consistent hash across multiple nodes 23


Slide 23

Where does the aggregator fit? Aggregator uses a lot of CPU. Put it on it’s own node 24


Slide 24

Scaling further Use nodes for particular functions: Use forwarding relay nodes solely to forward Have consistent hashing nodes Have aggregation nodes 25


Slide 25

26


Slide 26


Slide 27

Getting your data back out Graphite Dashboard Third Party Dashboard We use Grafana http://grafana.org/ Graphite-api https://github.com/brutasse/graphite-api


Slide 28

29


Slide 29

Tips Aggregate before ingestion Control the metrics that can be sent Metrics are a gas - they expand to fill all available room Use C implementation of carbon Use the latest webapp.


Slide 30

Optimize your dashboard queries services.biz_app.*.*.timers.pyramid_uwsgi_metrics_tweens_*.p99 2154 results 35 seconds to just find these files on disk Running functions against these results Timeout after a minute Dashboard automatically refreshing every 10 seconds


Slide 31


Slide 32

What’s the Future? InfluxDB Cassandra Third party 33


Slide 33

We’re hiring! http://www.yelp.com/careers Hiring SREs in Dublin, London, New York, San Francisco


×

HTML:





Ссылка: