We’re excited to announce the next Meetup of the Site Reliability Engineering Munich group.
Agenda#
- 6:30 pm Get together with food and drinks
- 7:00 pm Welcome, Feedback from last Meetup
- 7:10 pm Talk 1: Intelligent Cloud Operations with AIOps
- 8:05 pm Short break
- 8:15 pm Talk 2: The Road to SRE (Instana)
- 8:45 pm Networking + Drinks
- 9:00 pm Leave happy and inspired :)
Abstracts#
Title: Intelligent Cloud Operations with AIOps#
Abstract: The field of AIOps, also known as Artificial Intelligence for IT Operations, uses advanced technologies to dramatically improve the monitoring, operation, and troubleshooting of distributed systems. Its main premise is that operations can be automated using monitoring data to reduce the workload of operators (e.g., SREs or production engineers). Our current research explores how AIOps – and many related fields such as deep learning, machine learning, distributed traces, graph analysis, time-series analysis, sequence analysis, advanced statistics, NLP and log analysis – can be explored to effectively detect, localize, predict, and remediate failures in large-scale cloud infrastructures (>50 regions and AZs) by analyzing service management data (e.g., distributed traces, logs, events, alerts, metrics). In particular, this talk will describe how a particular monitoring data structure, called distributed traces, can be analyzed using deep learning to identify anomalies in its spans. This capability empowers operators to quickly identify which components of a distributed system are faulty.
Title: The Road to SRE#
Abstract: Building and establishing a SRE team is a complex challenge that involves a lot more than just overcoming technical hurdles. Especially in the context of a fast-growing startup there are quite some lessons to be learned. The talk walks through the evolution of the operations/SRE team at Instana. From the early beginnings, having just a handful of well-meaning family and friends customers, winging features and deployments left and right over platform re-architectures and team growth to the present day with customers all around the world to whom we want to offer a frictionless 24/7 availability and product experience. During this time, we doubted, changed and learned a lot of things - some of them obvious, some of them not so much - around tooling, technology, architecture as well as processual and organizational topics.
Speakers#
- Dr. Jorge Cardoso is Chief Architect for Planet-scale AIOps at Huawei’s Ireland and Munich Research Centers. Before, he worked for several major companies such as SAP Research (Germany) on the Internet of Services and the Boeing Company in Seattle (USA) on Enterprise Application Integration. He previously gave lectures at the Karlsruhe Institute of Technology (Germany), University of Georgia (USA), University of Coimbra and University of Madeira (Portugal). His current research involves the development of the next generation of AIOps platforms and Cloud operations tools driven by AI to increase Cloud reliability and resilience. He has a Ph.D. in Computer Science from the University of Georgia (USA).
- Bastian Spanneberg (@spanneberg) is part of the SRE team at Instana and has seen the company grow and evolve from the early days
Participation#
We’re always looking for 20-35 minute (technical) talks related to the very broad field of Site Reliability Engineering. Get in touch with the organizers if you’d like to present!
Legal:#
There may be audio and video recordings of the talks and we may take photographs during the event with the purpose of sharing the learnings and advertising future events. By attending the event you give your consent to be recorded. The “Tales from On-call” sessions are never recorded and the Chatham House Rule apply: https://en.wikipedia.org/wiki/Chatham_House_Rule
Spread the word! Feel free to refer to this Meetup on social media using the #sremuc hashtag!