Skip to content

Latest commit

 

History

History
48 lines (34 loc) · 5.2 KB

2018-07-10-chep.md

File metadata and controls

48 lines (34 loc) · 5.2 KB

title: VC3 at CHEP 2018 date: 2018-07-10 tags: [papers]

The VC3 project was well represented at the CHEP 2018 Conference, Sofia, Bulgaria. Two publications are forthcoming which included contributions from the VC3 team:

Deploying and extending CMS Tier 3s using VC3 and the OSG Hosted CE service

Speaker: Kenyi Paolo Hurtado Anampa (University of Notre Dame (US)) Description:

CMS Tier 3 centers, frequently located at universities, play an important role in the physics analysis of CMS data. Although different computing resources are often available at universities, meeting all requirements to deploy a valid Tier 3 able to run CMS workflows can be challenging in certain scenarios. For instance, providing the right operating system (OS) with access to the CERNVM File System (CVMFS) on the worker nodes or having a Compute Element (CE) on the submit host is not always allowed or possible due to e.g: lack of root access to the nodes, TCP port network policies, maintenance of a CE, etc. The Notre Dame group operates a CMS Tier 3 with ~1K cores. In addition to this, researchers have access to an opportunistic pool with +25K cores that are used via lobster for CMS jobs, but cannot be used with other standard CMS submission tools on the grid like CRAB, as these resources are not part of the Tier 3 due to its opportunistic nature. This work describes the use of VC3, a service for automating the deployment of virtual cluster infrastructures, in order to provide the environment (user-space CVMFS access and customized OS via singularity containers) needed for CMS workflows to work. Also, its integration with the OSG Hosted CE service, to add these resources to CMS as part of our existing Tier 3 in a seamless way.

  • Primary author: Kenyi Paolo Hurtado Anampa (University of Notre Dame (US))

Co-authors:

  • Benjamin Tovar (University of Notre Dame)
  • Douglas Thain (University of Notre Dame)
  • Kevin Patrick Lannon (University of Notre Dame (US))

Materials:

Interactive, scalable, reproducible data analysis with containers, Jupyter, and Parsl

Speaker: Ms Anna Elizabeth Woodard (Computation Institute, University of Chicago)

Description

In the traditional HEP analysis paradigm, code, documentation, and results are separate entities that require significant effort to keep synchronized, which hinders reproducibility. Jupyter notebooks allow these elements to be combined into a single, repeatable narrative. HEP analyses, however, commonly rely on complex software stacks and the use of distributed computing resources, requirements that have been barriers to notebook adoption. In this presentation we describe how Jupyter can be combined with Parsl (Parallel Scripting Library) and containers to enable intuitive and interactive high performance computing in Python. Parsl is a pure Python library for orchestrating the concurrent execution of multiple tasks. Parsl is remarkable for its simplicity. Its primary construct is an “app” decorator, which the programmer uses to indicate that certain functions (either pure Python or wrappers around shell programs) are to be treated as “apps.” App function calls then result in the creation of a new “task” that runs concurrently with the main program and other tasks, subject to dataflow constraints defined by the availability of app function input data. Data dependencies can be in-memory objects, or external files. App decorators can further specify which computation resources to use and the required software environment to run the decorated function. Parsl abstracts hardware details, allowing a single script to be executed efficiently on one or more laptops, clusters, clouds, and/or supercomputers. To manage complex execution environments on various resources and also to improve reproducibility, Parsl can use containers— lightweight, virtualized constructs for packaging software with its environment— to wrap tasks. In this presentation we 1) show how a real-world complete HEP analysis workflow can be developed with Parsl and 2) demonstrate efficient and reproducible execution of such workflows on heterogeneous resources, including leadership-class computing facilities, using containers to wrap analysis code, Parsl to orchestrate the execution of these containers, and Jupyter as the interface for writing and executing the Parsl script.

Primary authors

  • Mr Yadu Babuji (Computation Institute, University of Chicago and Argonne National Laboratory)
  • Dr Kyle Chard (Computation Institute, University of Chicago and Argonne National Laboratory)
  • Dr Ian Foster (Computation Institute, University of Chicago and Argonne National Laboratory)
  • Dr Daniel S. Katz (ional Center for Supercomputing Applications, University of Illinois Urbana-Champaign)
  • Dr Michael Wilde (Computation Institute, University of Chicago and Argonne National Laboratory)
  • Ms Anna Elizabeth Woodard (Computation Institute, University of Chicago)
  • Dr Justin M. Wozniak (Computation Institute, University of Chicago and Argonne National Laboratory)

Materials: