Title: Big data at CERN

Speaker: Zbigniew Baranowski (CERN)

Venue: Dipartimento di Fisica, Aula R, January 19, 14:30

Abstract: Data generation rates at CERN are growing very fast for database
workloads going into LHC run 2 and beyond. In particular this is
expected for data coming from controls, logging and monitoring
systems. Storing, administering and accessing big data sets in a
relational database system can quickly become a very hard technical
challenge, as the size of the active data set and the number of
concurrent users increase. In order to cope with this problem, CERN
has adopted modern Big Data solutions based on Apache Hadoop and its
ecosystem. Notably, technologies like Apache Spark, Impala, Parquet
are offering a rapidly developing set of solutions for deploying and
managing very large data warehouses on commodity hardware and with
open source software. Additionally, they enable new, flexible
interfaces for data processing including machine learning.
This presentation will also describe the infrastructure that currently
is deployed at CERN and the most interesting projects that are running
on top of it.