Content:
- Latest example tarball *HERE*
- Introduction
- Create and configure the plugin
- Using the plugin
- FULL run mode
- TEST run mode
- OFFLINE run mode
- SUBMIT run mode
- TERMINATE run mode
- EXAMPLE: Pt analysis
- Applying cuts in tag-based analysis via plugin
Introduction.
The purpose of the plugin is to allow running transparently in AliEn the same user analysis that runs on the local PC or in a PROOF cluster. The practical goal is to do this without having to exit the ROOT prompt. This is achieved currently only to a certain extent: the analysis job is automatically submitted to AliEn and the file merging together with analysis task Terminate() phase executed by the plugin; user intervention is required only for inspecting the job status or resubmitting sub-jobs on demand. The plugin provides the following functionality to hide the complexity of the underlying GRID for users:
- Allow using all existing types of input data to be processed (root files, raw AliEn collections or XML collections) detecting their type automatically;
- Generate XML collections corresponding to requested runs and/or data patterns;
- Automatically detect presence of tags in input data and allow using tag-based cuts in a very simple way;
- Automatically connect to AliEn (generate a token if needed) - still requires sourcing the environment file produced by alien.token-init;
- Generate automatically: JDL, analysis macro to be run in grid, execution and validation scripts according a simple to understand user-driven configuration;
- Copy all needed files in user's AliEn space and submit the job automatically;
- Start an alien shell to allow inspecting the job status;
- Do automatic output merging and terminate the analysis tasks.
The plugin is implemented as a base class AliAnalysisGrid inside libANALYSIS providing the API to configure some custom parameters. This can be plugged to the analysis manager class in the same way as the event handlers. Those not yet familiar with ALICE analysis framework shoud read first this. The implementation of the plugin for AliEn is done by AliAnalysisAlien class inside libANALYSISalice library.
One needs to first configure the plugin before using it. The howto for this and an example are provided below.
Create and configure the plugin.
The simple example below illustrates how the configuration of the plugin can be done via a macro. You can find the example in the file CreateAlienHandler.C. This is the only part of the plugin that requires to be customized.
AliAnalysisGrid* CreateAlienHandler() // Overwrite all generated files, datasets and output results from a previous session // Method 1: Create automatically XML collections using alien 'find' command. // Method 2: Declare existing data files (raw collections, xml collections, root file) // Define alien work directory where all files will be copied. Relative to alien $HOME. |
Use the plugin in your preferred analysis macro.
The only modifications to your macro needed for including the runGrid.C together with the task files AliAnalysisTaskPt.h and AliAnalysisTaskPt.cxx
are described below and marked in red. You can download the file void runGrid() { // Load common libraries gSystem->Load("libTree.so"); gSystem->Load("libGeom.so"); gSystem->Load("libVMC.so"); gSystem->Load("libPhysics.so"); gSystem->Load("libSTEERBase"); gSystem->Load("libESD"); gSystem->Load("libAOD"); gSystem->Load("libANALYSIS"); gSystem->Load("libANALYSISalice"); // The plugin is here // Use AliRoot includes to compile our task // Create and configure the alien handler plugin // Connect gROOT->LoadMacro("AliAnalysisTaskPt.cxx++g"); AliESDInputHandler* esdH = new AliESDInputHandler(); // No need to create a chain - this is handled by the // Connect input/output // Enable debug printouts if (!mgr->InitAnalysis()) mgr->PrintStatus(); |
The execution flow. (plugin->SetRunMode("full"))
After calling mgr->StartAnalysis("grid") the analysis manager will execute task initialization AliAnalysisTask::LocalInit() on the client. Immediately after, the following actions are performed by the AliEn
. The description corresponds to the "full" run mode.- Connect to AliEn using the existing token. Exit if environment not sourced.
- Check validity of declared input data directories and files. Check compatibility in case of usage of multiple data sources. Automatically determine if the input data is tag-based or not.
- Create XML collections for the requested runs. These will be inserted in AliEn file catalog in the user working directory.
- Stream the analysis manager with all configured tasks to the file analysis.root . Copy this file to the AliEn working directory.
- Generate the analysis macro to be run in grid and copy this file in the AliEn working directory. This is a standard analysis macro that will execute analysis on worker nodes. It retrieves the analysis manager from the file analysis.root , makes the chain from the collection wn.xml , apply tag-based cuts (if input data contains tags and a macro doing cuts was provided) and initiates the analysis event loop without calling LocalInit() methods for the tasks. Unless special requirements are needed (like usage of .par files), the generated analysis macro does not need to be modified. A way to modify the macro is provided by combining the "offline" and "submit" run modes (see below). A typical example looks like below:
Automatically generated analysis macro AnalysisPt.C const char *anatype = "ESD"; void AnalysisPt()
{
// Analysis using ESD data
// Automatically generated analysis steering macro executed in grid subjobs// load base root libraries
gSystem->Load("libTree");
gSystem->Load("libGeom");
gSystem->Load("libVMC");
gSystem->Load("libPhysics");// load analysis framework libraries
gSystem->Load("libSTEERBase");
gSystem->Load("libESD");
gSystem->Load("libAOD");
gSystem->Load("libANALYSIS");
gSystem->Load("libANALYSISalice");// add aditional AliRoot libraries below
// include path (remove if using par files)
gROOT->ProcessLine(".include $ALICE_ROOT/include");// analysis source to be compiled at runtime (if any)
gROOT->ProcessLine(".L AliAnalysisTaskPt.cxx+g");// connect to AliEn and make the chain
if (!TGrid::Connect("alien://")) return;
TChain *chain = CreateChainFromTags("wn.xml", anatype);// read the analysis manager from file
TFile *file = TFile::Open("analysis.root");
if (!file) return;
TIter nextkey(file->GetListOfKeys());
AliAnalysisManager *mgr = 0;
TKey *key;
while ((key=(TKey*)nextkey())) {
if (!strcmp(key->GetClassName(), "AliAnalysisManager"))
mgr = (AliAnalysisManager*)file->Get(key->GetName());
};
if (!mgr) {
::Error("AnalysisPt", "No analysis manager found in file analysis.root");
return;
}mgr->PrintStatus();
mgr->StartAnalysis("localfile", chain);
}TChain* CreateChainFromTags(const char *xmlfile, const char *type="ESD")
{
// Create a chain using tags from the xml file.
TAlienCollection* coll = TAlienCollection::Open(xmlfile);
if (!coll) {
::Error("CreateChainFromTags", "Cannot create an AliEn collection from %s", xmlfile);
return NULL;
}
TGridResult* tagResult = coll->GetGridResult("",kFALSE,kFALSE);
AliTagAnalysis *tagAna = new AliTagAnalysis(type);
tagAna->ChainGridTags(tagResult);AliRunTagCuts *runCuts = new AliRunTagCuts();
AliLHCTagCuts *lhcCuts = new AliLHCTagCuts();
AliDetectorTagCuts *detCuts = new AliDetectorTagCuts();
AliEventTagCuts *evCuts = new AliEventTagCuts();
// Check if the cuts configuration file was provided
if (!gSystem->AccessPathName("ConfigureCuts.C")) {
gROOT->LoadMacro("ConfigureCuts.C");
ConfigureCuts(runCuts, lhcCuts, detCuts, evCuts);
}
TChain *chain = tagAna->QueryTags(runCuts, lhcCuts, detCuts, evCuts);
if (!chain || !chain->GetNtrees()) return NULL;
chain->ls();
return chain;
} - Generate the executable script that launches the analysis macro in batch jobs and copy this in $HOME/bin in AliEn space. Generate the validation script and copy it in the AliEn working directory. The validation script checks if all declared outputs were produced for a given sub-job.
- Generate the JDL for the batch analysis and copy it to AliEn. This again does not need to be modified by user unless a need for features not yet supported by the
Automatically generated JDL JobTag = "Automatically generated analysis JDL";
# Input xml collections
InputDataCollection = {
"LF:/alice/cern.ch/user/m/mgheata/work/300000.xml,nodownload",
"LF:/alice/cern.ch/user/m/mgheata/work/300001.xml,nodownload"
};# Output directory
OutputDir = "/alice/cern.ch/user/m/mgheata/work/output/#alien_counter_03i#";# List of output files to be registered
OutputFile = {
"Pt.ESD.1.root"
};# Packages to be used
Packages = {
"VO_ALICE@AliRoot::v4-15-Rev-05",
"VO_ALICE@ROOT::v5-21-01-alice",
"VO_ALICE@APISCONFIG::V2.4"
};# List of input files to be uploaded to wn's
InputFile = {
"LF:/alice/cern.ch/user/m/mgheata/work/AnalysisPt.C",
"LF:/alice/cern.ch/user/m/mgheata/work/analysis.root",
"LF:/alice/cern.ch/user/m/mgheata/work/ConfigureCuts.C",
"LF:/alice/cern.ch/user/m/mgheata/work/AliAnalysisTaskPt.h",
"LF:/alice/cern.ch/user/m/mgheata/work/AliAnalysisTaskPt.cxx"
};# This is the startup script
Executable = "analysis.sh";# We split per storage element
Split = "se";# Time to live for the job
TTL = "30000";# Resubmit failed jobs until DONE rate reaches this percentage
MasterResubmitThreshold = "90%";# We want each subjob to get maximum this number of input files
SplitMaxInputFileNumber = "100";# Format of input data
InputDataListFormat = "xml-single";# Collection to be processed on wn
InputDataList = "wn.xml";# Files to be archived
OutputArchive = {
"log_archive.zip:stdout,stderr"
};# Maximum number of first failing jobs to abort the master job
MaxInitFailed = "5";# AliEn price for this job
Price = "1";
# Validation script to be run for each subjob
Validationcommand = "/alice/cern.ch/user/m/mgheata/work/validate.sh";# AliEn user
User = "mgheata";# JDL variables
JDLVariables =
{
"Packages",
"OutputDir"
};
. An example of the JDL produced is below: - Automatically submit the AliEn job and start an alien shell. The masterjob ID is printed out. The shell is provided for inspecting the job execution in AliEn, but can be interupted any time. There is a simple way to resume analysis termination after all sub-jobs have finished.
- After exiting the automatic AliEn shell, the analysis manager tries to merge all output files that were produced and registered in the AliEn output directory. Currently no other check is performed (like how many jobs failed, automatic resubmission or time when the masterjob finished). One can re-run the merging phase by calling SetAnalysisType("terminate") in the
and re-running the local analysis macro. Merging is currently done on the local client. - The Terminate() method will be called in case merging succeeded.
The TEST run mode. ( plugin->SetRunMode("test"))
This mode enables testing the generated analysis macro and validation script as they will be executed in batch mode. For this to work, one has to use the input data declared as in Method1 above (base data directory + run number). A sub-sample of 10 files will be generated in xml format and copied to the local directory as wn.xml. Steps 1 to 6 will be performed and the generated script will be invoked in batch mode in a subshell, then the validation script will be run. The Terminate() method of the analysis manager will be however called in graphics mode so that eventual histograms can be visualized.
This mode is recommended to be run systematically before the FULL mode.
The OFFLINE run mode. (plugin->SetRunMode("offline"))
The offline mode will not actually run the generated scripts/macros. These will only be generated so that the user will be able to customize them. A typical use case of this mode is foreseen when running analysis based on .par packages is needed instead of AliRoot-based libraries (default). In this way the user can add manually methods or change the default behavior in the generated analysis macro. If .par files need to be copied in AliEn space this does not need however to be done manually, but better using plugin->SetAdditionalLibs() method. The OFFLINE mode will make steps 1 to 7 from the FULL run mode (except file copying) and is intended to be used together with the SUBMIT mode.
The SUBMIT run mode. (plugin->SetRunMode("submit"))
In this mode the files to be used in the AliEn batch job (generated by the plugin and customized by user in OFFLINE mode) will be copied in AliEn and the job will be automatically submitted in the grid ( steps 8 to 10 ).
The TERMINATE run mode. (plugin->SetRunMode("terminate"))
This mode can be used at any time to re-run steps 9 and 10 described in the FULL mode. Typically used after getting all subjobs in DONE status, it can be used at ANY time after some of the sub-jobs have already registered the output to inspect partial results. Very useful if the analysis task(s) draw some QA histograms in Terminate() phase.
EXAMPLE: Pt analysis
All files needed to run the Pt example analysis using AliEn plugin can be found in alienplugin.tgz. Just unpack in a folder, modify the plugin configuration and run mode in the file CreateAlienHandler.C then run the macro runGrid.C
Aplying cuts in tag base analysis via the plugin
The plugin detects usage of tags automatically in the input data. The automatic analysis macro that is generated will look for a macro ConfigureCuts.C in the current directory and will execute the method:
void ConfigureCuts(AliRunTagCuts *runCuts, AliLHCTagCuts *lhcCuts, AliDetectorTagCuts *detCuts, AliEventTagCuts *evCuts) { // Configure cuts. evCuts->SetMultiplicityRange(10,20); } |