# Go SDK The Apache Beam Go SDK is the Beam Model implemented in the [Go Programming Language](https://go.dev/). It is based on the following initial [design](https://s.apache.org/beam-go-sdk-design-rfc). ## How to run the examples **Prerequisites**: to use Google Cloud sources and sinks (default for most examples), follow the setup [here](https://beam.apache.org/documentation/runners/dataflow/). You can verify that it works by running the corresponding Java example. The examples are normal Go programs and are most easily run directly. They are parameterized by Go flags. For example, to run wordcount on the Go direct runner do: ```bash $ pwd [...]/sdks/go $ go run examples/wordcount/wordcount.go --output=/tmp/result.txt [{6: KV/GW/KV}] [{10: KV/GW/KV}] 2018/03/21 09:39:03 Pipeline: 2018/03/21 09:39:03 Nodes: {1: []uint8/GW/bytes} {2: string/GW/bytes} {3: string/GW/bytes} {4: string/GW/bytes} {5: string/GW/bytes} {6: KV/GW/KV} {7: CoGBK/GW/CoGBK} {8: KV/GW/KV} {9: string/GW/bytes} {10: KV/GW/KV} {11: CoGBK/GW/CoGBK} Edges: 1: Impulse [] -> [Out: []uint8 -> {1: []uint8/GW/bytes}] 2: ParDo [In(Main): []uint8 <- {1: []uint8/GW/bytes}] -> [Out: T -> {2: string/GW/bytes}] 3: ParDo [In(Main): string <- {2: string/GW/bytes}] -> [Out: string -> {3: string/GW/bytes}] 4: ParDo [In(Main): string <- {3: string/GW/bytes}] -> [Out: string -> {4: string/GW/bytes}] 5: ParDo [In(Main): string <- {4: string/GW/bytes}] -> [Out: string -> {5: string/GW/bytes}] 6: ParDo [In(Main): T <- {5: string/GW/bytes}] -> [Out: KV -> {6: KV/GW/KV}] 7: CoGBK [In(Main): KV <- {6: KV/GW/KV}] -> [Out: CoGBK -> {7: CoGBK/GW/CoGBK}] 8: Combine [In(Main): int <- {7: CoGBK/GW/CoGBK}] -> [Out: KV -> {8: KV/GW/KV}] 9: ParDo [In(Main): KV <- {8: KV/GW/KV}] -> [Out: string -> {9: string/GW/bytes}] 10: ParDo [In(Main): T <- {9: string/GW/bytes}] -> [Out: KV -> {10: KV/GW/KV}] 11: CoGBK [In(Main): KV <- {10: KV/GW/KV}] -> [Out: CoGBK -> {11: CoGBK/GW/CoGBK}] 12: ParDo [In(Main): CoGBK <- {11: CoGBK/GW/CoGBK}] -> [] 2018/03/21 09:39:03 Reading from gs://apache-beam-samples/shakespeare/kinglear.txt 2018/03/21 09:39:04 Writing to /tmp/result.txt ``` The debugging output is currently quite verbose and likely to change. The output is a local file in this case: ```bash $ head /tmp/result.txt while: 2 darkling: 1 rail'd: 1 ford: 1 bleed's: 1 hath: 52 Remain: 1 disclaim: 1 sentence: 1 purse: 6 ``` To run wordcount on dataflow runner do: ```bash $ go run wordcount.go --runner=dataflow --project= --region= --staging_location=/staging --worker_harness_container_image= --output=/output ``` The output is a GCS file in this case: ```bash $ gsutil cat /output* | head Blanket: 1 blot: 1 Kneeling: 3 cautions: 1 appears: 4 Deserved: 1 nettles: 1 OSWALD: 53 sport: 3 Crown'd: 1 ``` Note that, when running at Beam HEAD, the Dataflow runner will try to use a non-existent container `gcr.io/cloud-dataflow/v1beta3/beam_go_sdk:.dev`. To address this, you need to push your own SDK harness container image to a repository (for example, Docker Hub or Google Artifact Registry) and specify that as the `` parameter above. For example, running the following from Beam HEAD, will make the container availble at the location `/beam_go_sdk`. ```bash $ ./gradlew :sdks:go:container:docker -Pdocker-repository-root= $ docker push /beam_go_sdk ``` See [BUILD.md](./BUILD.md) for how to build Go code in general. See [container documentation](https://beam.apache.org/documentation/runtime/environments/#building-container-images) for more details on how to build and push the Go SDK harness container image. ## Issues Please use the [`sdk-go`](https://github.com/apache/beam/issues?q=is%3Aopen+is%3Aissue+label%3Asdk-go) component for any bugs or feature requests. ## Contributing to the Go SDK ### New to developing Go? : The Go Tour gives you the basics of the language, interactively no installation required. is a great start on learning good (optional) development tools for Go. ### Developing Go Beam SDK on Github The Go SDK uses Go Modules for dependency management so it's as simple as cloning the repo, making necessary changes and running tests. Executing all unit tests for the SDK is possible from the `\sdks\go` directory and running `go test ./...`. To test your change as Jenkins would execute it from a PR, from the beam root directory, run: * `./gradlew :sdks:go:goTest` executes the unit tests. * `./gradlew :sdks:go:test:prismValidatesRunner` validates the SDK against the Go Prism runner as a stand alone binary, with containers. * `./gradlew :sdks:go:test:ulrValidatesRunner` validates the SDK against the Portable Python runner. * `./gradlew :sdks:go:test:flinkValidatesRunner` validates the SDK against the Flink runner. Follow the [contribution guide](https://beam.apache.org/contribute/contribution-guide/#code) to create branches, and submit pull requests as normal.