Go-HEP: Providing robust concurrent software for HEP

Computing Round Table @JLAB, 2018-04-03

Sebastien Binet

CNRS/IN2P3/LPC-Clermont

Software in HEP

Software in HEP is mostly C++/Python (with "pockets" of Java and Fortran.)

2

Software in HEP is painful

Painful to develop:

3

Software in HEP is painful

Painful to use:

End-users tend to prefer Python because of its nicer development cycle, despite its runtime performances (or lack thereof.)

4

Software in HEP: optimization and performances

Software is painful and does not perform well:

Parallelism and concurrency need to be exposed and leveraged, but the language (C++14, C++17, ...) is ill equiped for these tasks.

And C++ is not well adapted for large, distributed development teams (of varying programming skills.)

Time for something new?

5

Are those our only options ?

6

Enter... Go

7

What is Go ?

package main

import "fmt"

func main() {
    lang := "Go"
    fmt.Printf("Hello from %s\n", lang)
}
$ go run hello.go
Hello from Go

A nice language with a nice mascot.

8

History

9

Elements of Go

Available on all major platforms (Linux, Windows, macOS, Android, iOS, ...) and for many architectures (amd64, arm, arm64, i386, s390x, mips64, ...)

10

Concurrency

Go's concurrency primitives - goroutines and channels - derive from Hoare's Communicating Sequential Processes (CSP.)

Goroutines are like threads: they share memory.

But cheaper:

11

Goroutines

go f()
go f(x, y, ...)
12

Concurrency: basic examples

13

A boring function

We need an example to show the interesting properties of the concurrency primitives.
To avoid distraction, we make it a boring example.

// +build OMIT

package main

import (
	"fmt"
	"log"
	"time"
)

func main() {
	boring("boring!")
}

func boring(msg string) {
    for i := 0; ; i++ {
        fmt.Println(msg, i)
        time.Sleep(time.Second)
    }
}

func init() {
	// hack to make this program runnable on the playground.
	go func() {
		time.Sleep(10 * time.Second)
		log.Fatalf("time's up")
	}()
}
14

Slightly less boring

Make the intervals between messages unpredictable (still under a second).

func boring(msg string) {
    for i := 0; ; i++ {
        fmt.Println(msg, i)
        time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
    }
}
15

Running it

The boring function runs on forever, like a boring party guest.

// +build OMIT

package main

import (
	"fmt"
	"log"
	"math/rand"
	"time"
)

func init() {
	// hack to make this program runnable on the playground.
	go func() {
		time.Sleep(10 * time.Second)
		log.Fatalf("time's up")
	}()
}

func main() {
    boring("boring!")
}

func boring(msg string) {
    for i := 0; ; i++ {
        fmt.Println(msg, i)
        time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
    }
}
16

Ignoring it

The go statement runs the function as usual, but doesn't make the caller wait.

It launches a goroutine.

The functionality is analogous to the & on the end of a shell command.

package main

import (
    "fmt"
    "math/rand"
    "time"
)

func main() {
    go boring("boring!")
}
// STOP OMIT

func boring(msg string) {
	for i := 0; ; i++ {
		fmt.Println(msg, i)
		time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
	}
}
17

Ignoring it a little less

When main returns, the program exits and takes the boring function down with it.

We can hang around a little, and on the way show that both main and the launched goroutine are running.

// +build OMIT

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func main() {
    go boring("boring!")
    fmt.Println("I'm listening.")
    time.Sleep(2 * time.Second)
    fmt.Println("You're boring; I'm leaving.")
}
// STOP OMIT

func boring(msg string) {
	for i := 0; ; i++ {
		fmt.Println(msg, i)
		time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
	}
}
18

Goroutines

What is a goroutine? It's an independently executing function, launched by a go statement.

It has its own call stack, which grows and shrinks as required.

It's very cheap. It's practical to have thousands, even hundreds of thousands of goroutines.

It's not a thread.

There might be only one thread in a program with thousands of goroutines.

Instead, goroutines are multiplexed dynamically onto threads as needed to keep all the goroutines running.

But if you think of it as a very cheap thread, you won't be far off.

19

Communication

Our boring examples cheated: the main function couldn't see the output from the other goroutine.

It was just printed to the screen, where we pretended we saw a conversation.

Real conversations require communication.

20

Channels

A channel in Go provides a connection between two goroutines, allowing them to communicate.

    // Declaring and initializing.
    var c chan int
    c = make(chan int)
    // or
    c := make(chan int)
    // Sending on a channel.
    c <- 1
    // Receiving from a channel.
    // The "arrow" indicates the direction of data flow.
    value = <-c
21

Using channels

A channel connects the main and boring goroutines so they can communicate.

// +build OMIT

package main

import (
	"fmt"
	"math/rand"
	"time"
)

func main() {
    c := make(chan string)
    go boring("boring!", c)
    for i := 0; i < 5; i++ {
        fmt.Printf("You say: %q\n", <-c) // Receive expression is just a value.
    }
    fmt.Println("You're boring; I'm leaving.")
}

// START2 OMIT
func boring(msg string, c chan string) {
	for i := 0; ; i++ {
		c <- fmt.Sprintf("%s %d", msg, i) // Expression to be sent can be any suitable value. // HL
		time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
	}
}
// STOP2 OMIT
func boring(msg string, c chan string) {
    for i := 0; ; i++ {
        c <- fmt.Sprintf("%s %d", msg, i) // Expression to be sent can be any suitable value.
        time.Sleep(time.Duration(rand.Intn(1e3)) * time.Millisecond)
    }
}
22

Synchronization

When the main function executes <–c, it will wait for a value to be sent.

Similarly, when the boring function executes c <– value, it waits for a receiver to be ready.

A sender and receiver must both be ready to play their part in the communication. Otherwise we wait until they are.

Thus channels both communicate and synchronize.

23

Daisy-chain

// +build OMIT

package main

import "fmt"

func f(left, right chan int) {
    left <- 1 + <-right
}

func main() {
    const n = 10000
    leftmost := make(chan int)
    right := leftmost
    left := leftmost
    for i := 0; i < n; i++ {
        right = make(chan int)
        go f(left, right)
        left = right
    }
    go func(c chan int) { c <- 1 }(right)
    fmt.Println(<-leftmost)
}
24

Chinese whispers, gopher style

25

The Go approach

"Don't communicate by sharing memory, share memory by communicating."

Goroutines and channels make it easy to express complex operations dealing with:

And they're fun to use.

26

More on Go

About concurrency:

About s/w engineering with Go:

27

Go for HEP

What can Go bring to science and HEP?

28

Go for HEP: challenges

$EXPERIMENT is taking data:

Convincing physicists:

Need to implement:

=> go-hep.org is the beginning of such an endeavour

29

Go-HEP

2 work areas to demonstrate Go's applicability for HEP use cases have been identified:

30

Go-DAQ

DAQ, monitoring and control command need I/O, network libraries and performances.
Go is a great fit for these and has concurrency building blocks to top it all.

These have been already tested at LPC and is gaining traction outside:

Started to gather DAQ-HEP oriented packages under the go-daq organization.

31

Go-DAQ (LSST)

32

Go-DAQ (AVIRM)

33

Go-HEP - fast-simulation

34

fads

fads is a "FAst Detector Simulation" toolkit.

Code is on github (BSD-3):

Documentation is served by godoc.org:

35

go-hep/fads - Installation

As easy as:

$ export GOPATH=$HOME/dev/gocode
$ export PATH=$GOPATH/bin:$PATH

$ go get go-hep.org/x/hep/fads/...

Yes, with the ellipsis at the end, to also install sub-packages.

36

go-hep/fwk - Concurrency

fwk enables:

fwk relies on Go's runtime to properly schedule goroutines.

For sub-task concurrency, users are by construction required to use Go's constructs (goroutines and channels) so everything is consistent and the runtime has the complete picture.

37

go-hep/fads - real world use case

$ go get go-hep.org/x/hep/fads/cmd/fads-app
$ fads-app -help
Usage: fads-app [options] <hepmc-input-file>

ex:
 $ fads-app -l=INFO -evtmax=-1 ./testdata/hepmc.data

options:
  -cpu-prof=false: enable CPU profiling
  -evtmax=-1: number of events to process
  -l="INFO": log level (DEBUG|INFO|WARN|ERROR)
  -nprocs=0: number of concurrent events to process
38

go-hep/fads - components

Caveats:

39

40

Results - testbenches

$> time delphes ./input.hepmc
$> time fads-app ./input.hepmc
41

42

43

fads: Results & Conclusions

44

Rivet & fads

45

Rivet

The Rivet toolkit (Robust Independent Validation of Experiment and Theory) is a system for validation of Monte Carlo event generators. It provides a large (and ever growing) set of experimental analyses useful for MC generator development, validation, and tuning, as well as a convenient infrastructure for adding your own analyses.

$> repeat 10 'time rivet --analysis=MC_GENERIC -q  ./Z-hadronic-LEP.hepmc > /dev/null'
real=13.32 user=12.97 sys=0.33 CPU=99% MaxRSS=26292
real=13.31 user=12.93 sys=0.37 CPU=99% MaxRSS=26356
real=13.29 user=12.93 sys=0.35 CPU=99% MaxRSS=26440
real=13.31 user=12.95 sys=0.35 CPU=99% MaxRSS=26356
real=13.29 user=13.01 sys=0.27 CPU=99% MaxRSS=26280
real=13.31 user=12.97 sys=0.32 CPU=99% MaxRSS=26328
real=13.35 user=12.93 sys=0.41 CPU=99% MaxRSS=26276
real=13.30 user=12.96 sys=0.33 CPU=99% MaxRSS=26624
real=13.30 user=12.93 sys=0.36 CPU=99% MaxRSS=26440
real=13.35 user=12.98 sys=0.36 CPU=99% MaxRSS=26484
46

fads-rivet-mc-generic

Reimplementation on top of go-hep/fwk+fads of the MC_GENERIC analysis.

Bit-to-bit identical results.

$> go get go-hep.org/x/hep/fads/cmd/fads-rivet-mc-generic

$> repeat 10 'time fads-rivet-mc-generic -nprocs=1 ./Z-hadronic-LEP.hepmc > /dev/null'
real=6.04 user=5.66 sys=0.12 CPU= 95% MaxRSS=23384
real=5.70 user=5.62 sys=0.09 CPU=100% MaxRSS=21128
real=5.71 user=5.58 sys=0.11 CPU= 99% MaxRSS=22208
real=5.68 user=5.60 sys=0.08 CPU=100% MaxRSS=23156
real=5.71 user=5.63 sys=0.08 CPU=100% MaxRSS=20672
real=5.78 user=5.62 sys=0.09 CPU= 98% MaxRSS=22328
real=5.67 user=5.62 sys=0.05 CPU=100% MaxRSS=20968
real=5.68 user=5.57 sys=0.07 CPU= 99% MaxRSS=23748
real=5.70 user=5.60 sys=0.10 CPU=100% MaxRSS=21360
real=5.72 user=5.65 sys=0.07 CPU=100% MaxRSS=22764
47

ROOT I/O

Go-HEP provides some amount of interoperability with ROOT-{5,6} via go-hep.org/x/hep/rootio, a pure-Go package (no C++, no ROOT, no PyROOT, just Go) that:

For the moment, only the "I" part of ROOT I/O has been implemented (O is starting), but it's already quite useful:

48

root-srv (served by AppEngine)

49

ROOT I/O

f, err := rootio.Open("my-file.root")
obj, err := f.Get("my-tree")
tree := obj.(rootio.Tree)

type Data struct {
    I64    int64       `rootio:"Int64"`
    F64    float64     `rootio:"Float64"`
    Str    string      `rootio:"Str"`
    ArrF64 [10]float64 `rootio:"ArrayFloat64"`
    N      int32       `rootio:"N"`
    SliF64 []float64   `rootio:"SliceFloat64"`
}

var data Data
sc, err := rootio.NewScanner(tree, &data)

for sc.Next() {
    err := sc.Scan()
    if err != nil {
        log.Fatal(err)
    }
    fmt.Printf("entry[%d]: %+v\n", sc.Entry(), data)
}
50

ROOT I/O features

struct P3 { int32_t Px; double  Py; int32_t Pz; };

struct Event {
  int16_t  I16;  int32_t  I32; int64_t  I64; uint32_t U32;
  float    F32;  double   F64;
  TString  TStr; std::string StdStr;
  P3       P3;
  int16_t  ArrayI16[ARRAYSZ]; int32_t  ArrayI32[ARRAYSZ];
  double   ArrayF64[ARRAYSZ];
  int32_t  N;
  int16_t  *SliceI16;  //[N]
  int32_t  *SliceI32;  //[N]
  double   *SliceF64;  //[N]
  std::vector<int64_t> StlVecI64; std::vector<std::string> StlVecStr;
};
51

ROOT I/O performances

$> ll *root
-rw-r--r-- 1 binet binet 686M Aug 16 11:00 f64s-default-compress.root
-rw-r--r-- 1 binet binet 764M Aug 16 15:39 f64s-no-compress.root

$> root-ls -t f64s-no-compress.root
=== [f64s-no-compress.root] ===
version: 61002
TTree       tree          tree    (entries=1000000)
  Scalar_0  "Scalar_0/D"  TBranch
  Scalar_1  "Scalar_1/D"  TBranch
  Scalar_2  "Scalar_2/D"  TBranch
  [...]
  Scalar_98 "Scalar_98/D" TBranch
  Scalar_99 "Scalar_99/D" TBranch
52

ROOT - C++

auto f = TFile::Open(argv[1], "read");
auto t = (TTree*)f->Get("tree");

const Long_t BRANCHES= 100;

Double_t v[BRANCHES] = {0};

for (int i = 0; i < BRANCHES; i++) {
    auto n = TString::Format("Scalar_%d", i);
    t->SetBranchAddress(n, &v[i]);
}

Long_t entries = t->GetEntries();
Double_t sum = 0;
for ( Long_t i = 0; i < entries; i++ ) {
    t->GetEntry(i);
    sum += v[0];
}

std::cout << "sum= " << sum << "\n";
53

Go-HEP/rootio

f, err := rootio.Open(flag.Arg(0))
obj, err := f.Get("tree")
t := obj.(rootio.Tree)

var vs [100]float64
var svars []rootio.ScanVar
for i := range vs {
    svars = append(svars, rootio.ScanVar{
        Name:  fmt.Sprintf("Scalar_%d", i),
        Value: &vs[i],
    })
}

sum := 0.0
scan, err := rootio.NewScannerVars(t, svars...)
for scan.Next() {
    err = scan.Scan()
    if err != nil {
        log.Fatal(err)
    }
    sum += vs[0]
}

fmt.Printf("sum= %v\n", sum)
54

Results -- No Compression

$> time ./cxx-read-data
$> time ./go-read-data

=== ROOT === (VMem=517Mb)
real=6.70 user=6.18 sys=0.51 CPU= 99% MaxRSS=258296
real=6.84 user=6.32 sys=0.51 CPU= 99% MaxRSS=257748
real=6.82 user=6.29 sys=0.52 CPU= 99% MaxRSS=258348
real=6.66 user=6.13 sys=0.53 CPU=100% MaxRSS=258440

=== go-hep/rootio === (VMem=43Mb)
real=12.94 user=12.39 sys=0.56 CPU=100% MaxRSS=42028
real=12.93 user=12.37 sys=0.56 CPU=100% MaxRSS=42072
real=12.96 user=12.38 sys=0.58 CPU=100% MaxRSS=41984
real=12.94 user=12.36 sys=0.57 CPU=100% MaxRSS=42048
55

Results -- with default compression

=== ROOT === (VMem=529Mb)
real=20.61 user=11.86 sys=0.63 CPU=60% MaxRSS=292640
real=12.56 user=11.54 sys=0.51 CPU=96% MaxRSS=290124
real=12.04 user=11.50 sys=0.52 CPU=99% MaxRSS=290444
real=12.05 user=11.54 sys=0.50 CPU=99% MaxRSS=290324

=== go-hep/rootio === (VMem=83Mb)
real=36.43 user=35.20 sys=0.69 CPU= 98% MaxRSS=81196
real=35.75 user=35.15 sys=0.63 CPU=100% MaxRSS=81644
real=35.76 user=35.10 sys=0.69 CPU=100% MaxRSS=81856
real=35.70 user=35.18 sys=0.54 CPU=100% MaxRSS=81944

Only ~2 times slower, w/o any optimization wrt baskets buffering, TTreeCache, ...
No concurrency (yet.)

Of course, go-hep/rootio provides less features than ROOT, isn't as battle-tested and is probably full of bugs.
But it's in the same order of magnitude, performance-wise.

56

Conclusions - Go

Go improves on C/C++/Java/... and addresses C/C++ and python deficiencies:

$> GOARCH=arm64 GOOS=linux   go build -o foo-linux-arm64.exe
$> GOARCH=amd64 GOOS=windows go build -o foo-windows-64b.exe

and:

57

Conclusions - Go-HEP

Go-HEP provides some building blocks that are already competitive with battle-tested C++ programs, both in terms of CPU, memory usage and cores' harnessing.

Further improvements are still necessary in the ROOT I/O compatibility part:

Further improvements in the Jupyter area are also warranted (but that's tackled by the Go data science community at large):

Go-HEP has also users outside LPC-Clermont:

58

Conclusions

Go is great at writing small and large (concurrent) programs.
Also true for science-y programs, even if the amount of libraries can still be improved.

Write your next tool/analysis/simulation/software in Go?

59

Go-HEP packages

60

Go-HEP packages (cont'd)

61

Go-HEP commands

62

Acknowledgements / resources

63

Thank you

Use the left and right arrow keys or click the left and right edges of the page to navigate between slides.
(Press 'H' or navigate to hide this message.)