Go-HEP: Providing robust concurrent software for HEP
Computing Round Table @JLAB, 2018-04-03
Sebastien Binet
CNRS/IN2P3/LPC-Clermont
Sebastien Binet
CNRS/IN2P3/LPC-Clermont
FORTRAN77C++Python
Software in HEP is mostly C++/Python (with "pockets" of Java and Fortran.)
Painful to develop:
C++ is a complex language to learn, read, write and maintainPainful to use:
End-users tend to prefer Python because of its nicer development cycle, despite its runtime performances (or lack thereof.)
Software is painful and does not perform well:
C++ templates).so/.dll), slow to runCPU, VMem, people)
Parallelism and concurrency need to be exposed and leveraged, but the language (C++14, C++17, ...) is ill equiped for these tasks.
And C++ is not well adapted for large, distributed development teams (of varying programming skills.)
Time for something new?
5package main import "fmt" func main() { lang := "Go" fmt.Printf("Hello from %s\n", lang) }
$ go run hello.go Hello from Go
A nice language with a nice mascot.
Available on all major platforms (Linux, Windows, macOS, Android, iOS, ...) and for many architectures (amd64, arm, arm64, i386, s390x, mips64, ...)
Go's concurrency primitives - goroutines and channels - derive from Hoare's Communicating Sequential Processes (CSP.)
Goroutines are like threads: they share memory.
But cheaper:
go f() go f(x, y, ...)
We need an example to show the interesting properties of the concurrency primitives.
To avoid distraction, we make it a boring example.
// +build OMIT
package main
import (
"fmt"
"log"
"time"
)
func main() {
boring("boring!")
}
func boring(msg string) { for i := 0; ; i++ { fmt.Println(msg, i) time.Sleep(time.Second) } }
func init() {
// hack to make this program runnable on the playground.
go func() {
time.Sleep(10 * time.Second)
log.Fatalf("time's up")
}()
}
Make the intervals between messages unpredictable (still under a second).
func boring(msg string) { for i := 0; ; i++ { fmt.Println(msg, i) time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond) } }
The boring function runs on forever, like a boring party guest.
// +build OMIT
package main
import (
"fmt"
"log"
"math/rand"
"time"
)
func init() {
// hack to make this program runnable on the playground.
go func() {
time.Sleep(10 * time.Second)
log.Fatalf("time's up")
}()
}
func main() { boring("boring!") } func boring(msg string) { for i := 0; ; i++ { fmt.Println(msg, i) time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond) } }
The go statement runs the function as usual, but doesn't make the caller wait.
It launches a goroutine.
The functionality is analogous to the & on the end of a shell command.
package main import ( "fmt" "math/rand" "time" ) func main() { go boring("boring!") }
// STOP OMIT
func boring(msg string) {
for i := 0; ; i++ {
fmt.Println(msg, i)
time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
}
}
When main returns, the program exits and takes the boring function down with it.
We can hang around a little, and on the way show that both main and the launched goroutine are running.
// +build OMIT
package main
import (
"fmt"
"math/rand"
"time"
)
func main() { go boring("boring!") fmt.Println("I'm listening.") time.Sleep(2 * time.Second) fmt.Println("You're boring; I'm leaving.") }
// STOP OMIT
func boring(msg string) {
for i := 0; ; i++ {
fmt.Println(msg, i)
time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
}
}
What is a goroutine? It's an independently executing function, launched by a go statement.
It has its own call stack, which grows and shrinks as required.
It's very cheap. It's practical to have thousands, even hundreds of thousands of goroutines.
It's not a thread.
There might be only one thread in a program with thousands of goroutines.
Instead, goroutines are multiplexed dynamically onto threads as needed to keep all the goroutines running.
But if you think of it as a very cheap thread, you won't be far off.
19Our boring examples cheated: the main function couldn't see the output from the other goroutine.
It was just printed to the screen, where we pretended we saw a conversation.
Real conversations require communication.
20A channel in Go provides a connection between two goroutines, allowing them to communicate.
// Declaring and initializing. var c chan int c = make(chan int) // or c := make(chan int)
// Sending on a channel. c <- 1
// Receiving from a channel. // The "arrow" indicates the direction of data flow. value = <-c
A channel connects the main and boring goroutines so they can communicate.
// +build OMIT
package main
import (
"fmt"
"math/rand"
"time"
)
func main() { c := make(chan string) go boring("boring!", c) for i := 0; i < 5; i++ { fmt.Printf("You say: %q\n", <-c) // Receive expression is just a value. } fmt.Println("You're boring; I'm leaving.") }
// START2 OMIT
func boring(msg string, c chan string) {
for i := 0; ; i++ {
c <- fmt.Sprintf("%s %d", msg, i) // Expression to be sent can be any suitable value. // HL
time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
}
}
// STOP2 OMIT
func boring(msg string, c chan string) { for i := 0; ; i++ { c <- fmt.Sprintf("%s %d", msg, i) // Expression to be sent can be any suitable value. time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond) } }
When the main function executes <–c, it will wait for a value to be sent.
Similarly, when the boring function executes c <– value, it waits for a receiver to be ready.
A sender and receiver must both be ready to play their part in the communication. Otherwise we wait until they are.
Thus channels both communicate and synchronize.
23// +build OMIT
package main
import "fmt"
func f(left, right chan int) { left <- 1 + <-right } func main() { const n = 10000 leftmost := make(chan int) right := leftmost left := leftmost for i := 0; i < n; i++ { right = make(chan int) go f(left, right) left = right } go func(c chan int) { c <- 1 }(right) fmt.Println(<-leftmost) }
"Don't communicate by sharing memory, share memory by communicating."
Goroutines and channels make it easy to express complex operations dealing with:
And they're fun to use.
26golang.org/doc/effective_go.html
About concurrency:
talks.golang.org/2012/waza.slide
talks.golang.org/2013/advconc.slide
About s/w engineering with Go:
talks.golang.org/2012/splash.article
27What can Go bring to science and HEP?
$EXPERIMENT is taking data:
Convincing physicists:
Go is useful, pleasant to use and viableGo, faster than by other meansNeed to implement:
C++/ROOT=> go-hep.org is the beginning of such an endeavour
29
2 work areas to demonstrate Go's applicability for HEP use cases have been identified:
DAQ, monitoring, control command
DAQ, monitoring and control command need I/O, network libraries and performances.
Go is a great fit for these and has concurrency building blocks to top it all.
These have been already tested at LPC and is gaining traction outside:
Started to gather DAQ-HEP oriented packages under the go-daq organization.
31
fads is a "FAst Detector Simulation" toolkit.
HEP event loop (initialize | process-events | finalize)HepMC input/outputCode is on github (BSD-3):
Documentation is served by godoc.org:
godoc.org/go-hep.org/x/hep/fwk
godoc.org/go-hep.org/x/hep/fads
35As easy as:
$ export GOPATH=$HOME/dev/gocode $ export PATH=$GOPATH/bin:$PATH $ go get go-hep.org/x/hep/fads/...
Yes, with the ellipsis at the end, to also install sub-packages.
go get will recursively download and install all the packages that hep/fads depends onMakefile, no CMakeList.txt involved in the processfwk enables:
fwk relies on Go's runtime to properly schedule goroutines.
For sub-task concurrency, users are by construction required to use Go's constructs (goroutines and channels) so everything is consistent and the runtime has the complete picture.
37$ go get go-hep.org/x/hep/fads/cmd/fads-app $ fads-app -help Usage: fads-app [options] <hepmc-input-file> ex: $ fads-app -l=INFO -evtmax=-1 ./testdata/hepmc.data options: -cpu-prof=false: enable CPU profiling -evtmax=-1: number of events to process -l="INFO": log level (DEBUG|INFO|WARN|ERROR) -nprocs=0: number of concurrent events to process
HepMC converterCaveats:
C++-FastJet)
$> time delphes ./input.hepmc $> time fads-app ./input.hepmc
Delphes (up to calorimetry)The Rivet toolkit (Robust Independent Validation of Experiment and Theory) is a system for validation of Monte Carlo event generators. It provides a large (and ever growing) set of experimental analyses useful for MC generator development, validation, and tuning, as well as a convenient infrastructure for adding your own analyses.
$> repeat 10 'time rivet --analysis=MC_GENERIC -q ./Z-hadronic-LEP.hepmc > /dev/null' real=13.32 user=12.97 sys=0.33 CPU=99% MaxRSS=26292 real=13.31 user=12.93 sys=0.37 CPU=99% MaxRSS=26356 real=13.29 user=12.93 sys=0.35 CPU=99% MaxRSS=26440 real=13.31 user=12.95 sys=0.35 CPU=99% MaxRSS=26356 real=13.29 user=13.01 sys=0.27 CPU=99% MaxRSS=26280 real=13.31 user=12.97 sys=0.32 CPU=99% MaxRSS=26328 real=13.35 user=12.93 sys=0.41 CPU=99% MaxRSS=26276 real=13.30 user=12.96 sys=0.33 CPU=99% MaxRSS=26624 real=13.30 user=12.93 sys=0.36 CPU=99% MaxRSS=26440 real=13.35 user=12.98 sys=0.36 CPU=99% MaxRSS=26484
Reimplementation on top of go-hep/fwk+fads of the MC_GENERIC analysis.
Bit-to-bit identical results.
$> go get go-hep.org/x/hep/fads/cmd/fads-rivet-mc-generic $> repeat 10 'time fads-rivet-mc-generic -nprocs=1 ./Z-hadronic-LEP.hepmc > /dev/null' real=6.04 user=5.66 sys=0.12 CPU= 95% MaxRSS=23384 real=5.70 user=5.62 sys=0.09 CPU=100% MaxRSS=21128 real=5.71 user=5.58 sys=0.11 CPU= 99% MaxRSS=22208 real=5.68 user=5.60 sys=0.08 CPU=100% MaxRSS=23156 real=5.71 user=5.63 sys=0.08 CPU=100% MaxRSS=20672 real=5.78 user=5.62 sys=0.09 CPU= 98% MaxRSS=22328 real=5.67 user=5.62 sys=0.05 CPU=100% MaxRSS=20968 real=5.68 user=5.57 sys=0.07 CPU= 99% MaxRSS=23748 real=5.70 user=5.60 sys=0.10 CPU=100% MaxRSS=21360 real=5.72 user=5.65 sys=0.07 CPU=100% MaxRSS=22764
Go-HEP provides some amount of interoperability with ROOT-{5,6} via go-hep.org/x/hep/rootio, a pure-Go package (no C++, no ROOT, no PyROOT, just Go) that:
TFiles, TKeys, TDirectory and TStreamerInfos,TH1x, TH2x, TLeaf, TBranch and TTreesFor the moment, only the "I" part of ROOT I/O has been implemented (O is starting), but it's already quite useful:
cmd/root-lscmd/root-dump, cmd/root-diff, cmd/root-print, cmd/root-gen-datareadercmd/root2csvcmd/root2npy, cmd/root2yodacmd/root-srv
f, err := rootio.Open("my-file.root")
obj, err := f.Get("my-tree")
tree := obj.(rootio.Tree)
type Data struct {
I64 int64 `rootio:"Int64"`
F64 float64 `rootio:"Float64"`
Str string `rootio:"Str"`
ArrF64 [10]float64 `rootio:"ArrayFloat64"`
N int32 `rootio:"N"`
SliF64 []float64 `rootio:"SliceFloat64"`
}
var data Data
sc, err := rootio.NewScanner(tree, &data)
for sc.Next() {
err := sc.Scan()
if err != nil {
log.Fatal(err)
}
fmt.Printf("entry[%d]: %+v\n", sc.Entry(), data)
}std::vector<T> (where T is a C/C++ builtin or a std::string / TString), another user defined class, std::string or TString, static/dynamic arrays of C/C++ builtins and, of course C/C++ builtinsstruct P3 { int32_t Px; double Py; int32_t Pz; };
struct Event {
int16_t I16; int32_t I32; int64_t I64; uint32_t U32;
float F32; double F64;
TString TStr; std::string StdStr;
P3 P3;
int16_t ArrayI16[ARRAYSZ]; int32_t ArrayI32[ARRAYSZ];
double ArrayF64[ARRAYSZ];
int32_t N;
int16_t *SliceI16; //[N]
int32_t *SliceI32; //[N]
double *SliceF64; //[N]
std::vector<int64_t> StlVecI64; std::vector<std::string> StlVecStr;
};$> ll *root -rw-r--r-- 1 binet binet 686M Aug 16 11:00 f64s-default-compress.root -rw-r--r-- 1 binet binet 764M Aug 16 15:39 f64s-no-compress.root $> root-ls -t f64s-no-compress.root === [f64s-no-compress.root] === version: 61002 TTree tree tree (entries=1000000) Scalar_0 "Scalar_0/D" TBranch Scalar_1 "Scalar_1/D" TBranch Scalar_2 "Scalar_2/D" TBranch [...] Scalar_98 "Scalar_98/D" TBranch Scalar_99 "Scalar_99/D" TBranch
auto f = TFile::Open(argv[1], "read");
auto t = (TTree*)f->Get("tree");
const Long_t BRANCHES= 100;
Double_t v[BRANCHES] = {0};
for (int i = 0; i < BRANCHES; i++) {
auto n = TString::Format("Scalar_%d", i);
t->SetBranchAddress(n, &v[i]);
}
Long_t entries = t->GetEntries();
Double_t sum = 0;
for ( Long_t i = 0; i < entries; i++ ) {
t->GetEntry(i);
sum += v[0];
}
std::cout << "sum= " << sum << "\n";f, err := rootio.Open(flag.Arg(0))
obj, err := f.Get("tree")
t := obj.(rootio.Tree)
var vs [100]float64
var svars []rootio.ScanVar
for i := range vs {
svars = append(svars, rootio.ScanVar{
Name: fmt.Sprintf("Scalar_%d", i),
Value: &vs[i],
})
}
sum := 0.0
scan, err := rootio.NewScannerVars(t, svars...)
for scan.Next() {
err = scan.Scan()
if err != nil {
log.Fatal(err)
}
sum += vs[0]
}
fmt.Printf("sum= %v\n", sum)$> time ./cxx-read-data $> time ./go-read-data === ROOT === (VMem=517Mb) real=6.70 user=6.18 sys=0.51 CPU= 99% MaxRSS=258296 real=6.84 user=6.32 sys=0.51 CPU= 99% MaxRSS=257748 real=6.82 user=6.29 sys=0.52 CPU= 99% MaxRSS=258348 real=6.66 user=6.13 sys=0.53 CPU=100% MaxRSS=258440 === go-hep/rootio === (VMem=43Mb) real=12.94 user=12.39 sys=0.56 CPU=100% MaxRSS=42028 real=12.93 user=12.37 sys=0.56 CPU=100% MaxRSS=42072 real=12.96 user=12.38 sys=0.58 CPU=100% MaxRSS=41984 real=12.94 user=12.36 sys=0.57 CPU=100% MaxRSS=42048
=== ROOT === (VMem=529Mb) real=20.61 user=11.86 sys=0.63 CPU=60% MaxRSS=292640 real=12.56 user=11.54 sys=0.51 CPU=96% MaxRSS=290124 real=12.04 user=11.50 sys=0.52 CPU=99% MaxRSS=290444 real=12.05 user=11.54 sys=0.50 CPU=99% MaxRSS=290324 === go-hep/rootio === (VMem=83Mb) real=36.43 user=35.20 sys=0.69 CPU= 98% MaxRSS=81196 real=35.75 user=35.15 sys=0.63 CPU=100% MaxRSS=81644 real=35.76 user=35.10 sys=0.69 CPU=100% MaxRSS=81856 real=35.70 user=35.18 sys=0.54 CPU=100% MaxRSS=81944
Only ~2 times slower, w/o any optimization wrt baskets buffering, TTreeCache, ...
No concurrency (yet.)
Of course, go-hep/rootio provides less features than ROOT, isn't as battle-tested and is probably full of bugs.
But it's in the same order of magnitude, performance-wise.
Go improves on C/C++/Java/... and addresses C/C++ and python deficiencies:
$> GOARCH=arm64 GOOS=linux go build -o foo-linux-arm64.exe $> GOARCH=amd64 GOOS=windows go build -o foo-windows-64b.exe
and:
Go-HEP provides some building blocks that are already competitive with battle-tested C++ programs, both in terms of CPU, memory usage and cores' harnessing.
Further improvements are still necessary in the ROOT I/O compatibility part:
Further improvements in the Jupyter area are also warranted (but that's tackled by the Go data science community at large):
Go-HEP has also users outside LPC-Clermont:
58
Go is great at writing small and large (concurrent) programs.
Also true for science-y programs, even if the amount of libraries can still be improved.
Write your next tool/analysis/simulation/software in Go?
59rio files to YODAROOT TTrees to NumPy .np(y|z) filesROOT files to YODArioROOT files' contentROOT files' TTrees contentROOT files' TTrees content, event by eventROOT fileROOT files' content (plots TH1x, TH2x, TTrees)PAW in Gotalks.golang.org/2012/splash.slide
talks.golang.org/2012/goforc.slide
talks.golang.org/2012/waza.slide
talks.golang.org/2012/concurrency.slide
talks.golang.org/2013/advconc.slide
talks.golang.org/2014/gocon-tokyo.slide
talks.golang.org/2015/simplicity-is-complicated.slide
talks.golang.org/2016/applicative.slide
agenda.infn.it/getFile.py/access?contribId=24&sessionId=3&resId=0&materialId=slides&confId=11680
63Sebastien Binet
CNRS/IN2P3/LPC-Clermont