Usage

The framework revolves around declaring a Selection. This object encapsulates the sample files that comprise a selection and the state at each stage (or Cut) as the selection progresses.

The object has several inputs for the selection:

Input

Collection

Single

Note

Sample

SampleSet

Sample

Represents an NTuple produced by playonverbs/HyperonProduction. Also manages sample type and POT.

Cut

list[Cut]

Cut

Stores data required for selection performance calculations.

ParameterSet

ParameterSet

Holds selection-dependent parameters, such as the values that certain cuts depend on.

Alongside these the Config class keeps options that concern how the selection operates.

Creating a Selection

Attention

A ParameterSet must be defined before the main body of the Selection object.

This is because while defining parameters within a class instance, the other parameters within the class cannot be referenced until the entire object has been resolved.

 1pset = ParameterSet.from_dict({...})
 2
 3sel = Selection(
 4    params=pset,
 5    cuts=[
 6        Cut(
 7            "fv",
 8            lambda arr: fv.in_active_tpc(arr)
 9        )
10    ],
11    samples=SampleSet(
12        Sample(
13            "hyperon",
14            "~/mydata/hyperon.root",
15            SampleType.Hyperon,
16            None
17        ),
18        Sample(
19            "background",
20            "~/mydata/background.root",
21            SampleType.Background,
22            None
23        ),
24        target_POT=2E20
25    )
26)

Defining your own cuts

The Cut objects used in the framework support the already defined selection cut methods using lambda-currying. This is needed as many selection functions take parameters via ParameterSet; however the object provided to the Cut object must have the function signature: ak.Arrayak.Array. Wrapping functions allows this.

When defining your own cuts the primary requirement is that the cut function returns a 1-dimensional ak.Array consisting of bool with a length matching the number of events. In the Datashape language, for 100,000 events this is represented as 100000 * bool.

Cut(
    "foo",
    lambda arr: pass
)

Extending the Available Parameters

Your own cuts may require additional parameters that are not supplied by ParameterSet. To add your own parameters you should create a new frozen dataclass that inherits from the existing one like so:

from dataclasses import dataclass

from sigmazerosearch.selection import ParameterSet

@dataclass(frozen=True)
class MyParameterSet(ParameterSet):
    a_new_param: float
    """Used in <x> cut, units are GeV"""
    flag_param: bool
    """Flag for <cut>"""

Here we can note a few things:

  1. Fields should not be defined with default values.

  2. Fields should be documented with a minimum mention of the field’s physical units (if any).

Now the extended ParameterSet can be used in your own and pre-existing cut functions like so:

import awkward as ak

def select_foos(arr: ak.Array, my_pset: MyParameterSet):
    if my_pset.flag_param:
        return arr < my_pset.a_new_param

    return arr > my_pset.a_new_param

Using Multi-Variate tools

The framework offloads the work required for BDT-like data processing to the TMVA package of the ROOT framework.

When you have decided what your MVA variables are and at what stage in your selection they will be extracted from you can write your variables in the form needed for MVA input files.

import awkward as ak
import uproot as up

sel = Selection(...)

# Set containing which branches/fields you want written to the BDT trees
BDT_BRANCHES = {
    "shr_length",
    "shr_open_angle",
    "pfp_true_pdg",
    "pfp_true_origin"
}

 # Run the selection
arr = sel.apply_cut(sel.cuts, accumulate=True)

# Define a ROOT output file (replace if one exists)
bdt_dir = up.recreate(OUTPUT_PATH)

d = dict(
    zip(
        ak.fields(arr[BDT_BRANCHES]),
        ak.unzip(
            array[BDT_BRANCHES][
                ak.num(
                    array[BDT_BRANCHES].pfp_true_pdg
                )
                > 0
            ]
        ),
    )
)

for k, v in d.items():
    d[k] = ak.flatten(v)

output = ak.zip(d)

# Write the Signal tree based on a condition
bdt_dir["bdt/SignalTree"] = output[
    output.pfp_true_origin == OriginType.SigmaZero.value
]

# Write the Background tree based on the negated condition
bdt_dir["bdt/BackgroundTree"] = output[
    output.pfp_true_origin != OriginType.SigmaZero.value
]