Usage¶
The framework revolves around declaring a Selection
. This object
encapsulates the sample files that comprise a selection and the state at each
stage (or Cut) as the selection progresses.
The object has several inputs for the selection:
Input |
Collection |
Single |
Note |
---|---|---|---|
Sample |
Represents an NTuple produced by playonverbs/HyperonProduction. Also manages sample type and POT. |
||
Cut |
Stores data required for selection performance calculations. |
||
ParameterSet |
Holds selection-dependent parameters, such as the values that certain cuts depend on. |
Alongside these the Config
class keeps options that concern how the
selection operates.
Creating a Selection¶
Attention
A ParameterSet
must be defined before the main body of the Selection
object.
This is because while defining parameters within a class instance, the other parameters within the class cannot be referenced until the entire object has been resolved.
1pset = ParameterSet.from_dict({...})
2
3sel = Selection(
4 params=pset,
5 cuts=[
6 Cut(
7 "fv",
8 lambda arr: fv.in_active_tpc(arr)
9 )
10 ],
11 samples=SampleSet(
12 Sample(
13 "hyperon",
14 "~/mydata/hyperon.root",
15 SampleType.Hyperon,
16 None
17 ),
18 Sample(
19 "background",
20 "~/mydata/background.root",
21 SampleType.Background,
22 None
23 ),
24 target_POT=2E20
25 )
26)
Defining your own cuts¶
The Cut
objects used in the framework support the already defined
selection cut methods using lambda
-currying. This is needed as many selection
functions take parameters via ParameterSet
; however the object
provided to the Cut object must have the function signature:
ak.Array
→ ak.Array
. Wrapping
functions allows this.
When defining your own cuts the primary requirement is that the cut function
returns a 1-dimensional ak.Array
consisting of bool
with a length
matching the number of events. In the
Datashape language,
for 100,000 events this is represented as 100000 * bool
.
Cut(
"foo",
lambda arr: pass
)
Extending the Available Parameters¶
Your own cuts may require additional parameters that are not supplied by
ParameterSet
. To add your own parameters you should create a new
frozen dataclass that inherits from the existing one like so:
from dataclasses import dataclass
from sigmazerosearch.selection import ParameterSet
@dataclass(frozen=True)
class MyParameterSet(ParameterSet):
a_new_param: float
"""Used in <x> cut, units are GeV"""
flag_param: bool
"""Flag for <cut>"""
Here we can note a few things:
Fields should not be defined with default values.
Fields should be documented with a minimum mention of the field’s physical units (if any).
Now the extended ParameterSet can be used in your own and pre-existing cut functions like so:
import awkward as ak
def select_foos(arr: ak.Array, my_pset: MyParameterSet):
if my_pset.flag_param:
return arr < my_pset.a_new_param
return arr > my_pset.a_new_param
Using Multi-Variate tools¶
The framework offloads the work required for BDT-like data processing to the TMVA package of the ROOT framework.
When you have decided what your MVA variables are and at what stage in your selection they will be extracted from you can write your variables in the form needed for MVA input files.
import awkward as ak
import uproot as up
sel = Selection(...)
# Set containing which branches/fields you want written to the BDT trees
BDT_BRANCHES = {
"shr_length",
"shr_open_angle",
"pfp_true_pdg",
"pfp_true_origin"
}
# Run the selection
arr = sel.apply_cut(sel.cuts, accumulate=True)
# Define a ROOT output file (replace if one exists)
bdt_dir = up.recreate(OUTPUT_PATH)
d = dict(
zip(
ak.fields(arr[BDT_BRANCHES]),
ak.unzip(
array[BDT_BRANCHES][
ak.num(
array[BDT_BRANCHES].pfp_true_pdg
)
> 0
]
),
)
)
for k, v in d.items():
d[k] = ak.flatten(v)
output = ak.zip(d)
# Write the Signal tree based on a condition
bdt_dir["bdt/SignalTree"] = output[
output.pfp_true_origin == OriginType.SigmaZero.value
]
# Write the Background tree based on the negated condition
bdt_dir["bdt/BackgroundTree"] = output[
output.pfp_true_origin != OriginType.SigmaZero.value
]