Docs.webplatform.org

WPD:Infrastructure/reports/201503

2015-04-01

← Older revision

Revision as of 23:48, 31 March 2015

(7 intermediate revisions by the same user not shown)

Line 172:

Line 172:

** Creation of a GitHub repository to host compatibility data [https://github.com/webplatform/compatibility-data|on GitHub ''webplatform/compatibility-data'']

** Creation of a GitHub repository to host compatibility data [https://github.com/webplatform/compatibility-data|on GitHub ''webplatform/compatibility-data'']

** Originally it was regenerating the HTML for EVERY page load based on a previous JSON source

** Originally it was regenerating the HTML for EVERY page load based on a previous JSON source

−

* Removed requirement of shared storage across VMs to use external DreamObjects storage (Swift) at DreamHost

+

* Removed requirement of shared storage across VMs
(GlusterFS) and switched
to use external DreamObjects storage (Swift) at DreamHost

* Set in place image storage pulling files directly from DreamObjects

* Set in place image storage pulling files directly from DreamObjects

Line 208:

Line 208:

* Generates automatically configuration file with credentials based on the servers that are up at that moment, IP address, Passwords, Private keys, etc

* Generates automatically configuration file with credentials based on the servers that are up at that moment, IP address, Passwords, Private keys, etc

* Capability to update passwords/private keys across all web applications from one "private" configuration file

* Capability to update passwords/private keys across all web applications from one "private" configuration file

−

* Setup of a "private" configuration system stored in a git repo

+

* Setup of a "private" configuration system stored in a git repo
, see {{OperationsTask|145}}

−

* We will eventually publish all our deployment scripts to the public, except the "private" data files

+

* We will eventually publish all our deployment scripts to the public, except the "private" data files
. Ref {{OperationsTask|48}}

+

* Setup an NFS mount point so that ElasticSearch instances can do backups. Reviewed idea of not using inter instance storage, at least limit it only in the case of of backups... until we can store ElasticSearch snapshots through Swift/DreamObjects too, see {{OperationsTask|120}}

Line 237:

Line 238:

* [[WPD:Infrastructure/procedures/Maintaining_ElasticSearch_cluster|How to maintain our new ElasticSearch cluster]], including notes on how automatic backups are made

* [[WPD:Infrastructure/procedures/Maintaining_ElasticSearch_cluster|How to maintain our new ElasticSearch cluster]], including notes on how automatic backups are made

* [[WPD:Infrastructure/procedures/Maintaining_email_services|How to '''Maintain email services''']]

* [[WPD:Infrastructure/procedures/Maintaining_email_services|How to '''Maintain email services''']]

+

== Soon? ==

== Soon? ==

+

+

Some notes that were gathered around that aren’t been tried yet.

* Improve stats:

* Improve stats:

** '''Purpose''': Get system health data as graph, over time

** '''Purpose''': Get system health data as graph, over time

−

**
Set in place statsd, fluentd, monit and other system health graph tools

+

**
Some candidates

−

** Candidates

+

*** [http://grafana.org Graphana]?

*** [http://grafana.org Graphana]?

*** [http://sensuapp.org Sensu]?

*** [http://sensuapp.org Sensu]?

*** [http://riemann.io/ Riemann]

*** [http://riemann.io/ Riemann]

−

** Get page load stats (i.e. what SOASTA does, but use open-source version; [http://www.lognormal.com/boomerang/doc/ Boomerang]), see {{OperationsTask|143}}

+

*** collectd

+

*** fluentd

+

*** [https://github.com/python-diamond/Diamond Diamond]?

+

** Get page load stats (i.e. what
[http://www.soasta.com/
SOASTA
]
does, but use open-source version; [http://www.lognormal.com/boomerang/doc/ Boomerang]), see {{OperationsTask|143}}

** Attempt to merge data from ganglia into Graphana so we do not lose previous data

** Attempt to merge data from ganglia into Graphana so we do not lose previous data

+

+

* Improve toolkit

+

** Ability to debug a-la-firebug communication between backends w/ [https://blog.twitter.com/2012/distributed-systems-tracing-with-zipkin Zipkin], see [http://research.google.com/pubs/pub36356.html paper about ''distributed systems tracing'']

** Ability to snapshot a server state for future investigation? Ref; http://www.sysdig.org/

** Ability to snapshot a server state for future investigation? Ref; http://www.sysdig.org/

** More food for thoughts ... [http://metrics20.org/ Metrics 2.0]

** More food for thoughts ... [http://metrics20.org/ Metrics 2.0]

* Get system metrics

* Get system metrics

−

** Leverage Monit system, push to [http://hekad.readthedocs.org Heka]?

+

** Leverage Monit system, push to [http://hekad.readthedocs.org Heka]?
([https://github.com/renoirb/mmonit-mock-listener see this sandbox]), or [https://github.com/lunich/monitrb use an existing one in ''Ruby'' that stores in MongoDB]

** Use NGINX status, ref [https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide How to monitor NGINX, the essential guide]

** Use NGINX status, ref [https://www.scalyr.com/community/guides/how-to-monitor-nginx-the-essential-guide How to monitor NGINX, the essential guide]

+

** Gather ElasticSearch metrics, see [https://github.com/abronner/elasticsearch-monitoring this repo]

** Get Inspiration from [http://www.lowlevelmanager.com/2014/07/monitorama-conference.html some notes from ''Monitorama'' conference]

** Get Inspiration from [http://www.lowlevelmanager.com/2014/07/monitorama-conference.html some notes from ''Monitorama'' conference]

+

* Centralized logging:

* Centralized logging:

** '''Purpose''': Aggregate and harmonize all log messages to see what happened (or happens)

** '''Purpose''': Aggregate and harmonize all log messages to see what happened (or happens)

Line 287:

Line 298:

** Leverage LogStash as a Monitoring solution? Ref [http://www.nuxeo.com/blog/monitoring-nuxeo/ Monitoring at Nuxeo]

** Leverage LogStash as a Monitoring solution? Ref [http://www.nuxeo.com/blog/monitoring-nuxeo/ Monitoring at Nuxeo]

** Document expectations and endpoints along with links to documentation for each service

** Document expectations and endpoints along with links to documentation for each service

−

** Use deployment tests as monitoring? ref [http://riltsken.github.io/devops/infrastructure/monitoring/2014/04/19/making-runbooks-more-useful-by-exposing-them-through-monitoring.html making "
runbooks
" more useful by exposing through monitoring]

+

** Use deployment tests as monitoring? ref [http://riltsken.github.io/devops/infrastructure/monitoring/2014/04/19/making-runbooks-more-useful-by-exposing-them-through-monitoring.html making "
run books
" more useful by exposing through monitoring]

** etc...

** etc...

* Helpers

* Helpers

** Make server SSH MOTD point out links to maintenance documentation, [http://riltsken.github.io/devops/infrastructure/2014/03/16/how-server-message-of-the-day-improved-our-devops-team.html inspired by this ''blog post'']

** Make server SSH MOTD point out links to maintenance documentation, [http://riltsken.github.io/devops/infrastructure/2014/03/16/how-server-message-of-the-day-improved-our-devops-team.html inspired by this ''blog post'']