Learning Go: Building a Universal API Syncer
A while ago I set out to learn the Go language. Not sure what about it, but learning a new paradigm of doing things that differs so much from the way I am used to doing things entices me.
Fortunately I got married, but that put Go on hold. Now that things have settled down I am excited to get back to it.
I always like learning by trying to fix a problem or need I have. And right now, as I am working on building out my data warehouse, I need some scripts to sync in data. That made me think — since most REST APIs follow similar structures, can I build something that will let me sync… everything?
The Idea
Not even building a web UI, just an easy to configure tool that automatically syncs APIs into my data warehouse. Starting off with a config.json that looks like this:
{
"output_dir": "output",
"sources": [
{
"name": "JSON Placeholder",
"url": "https://jsonplaceholder.typicode.com/posts",
"method": "GET",
"headers": {},
"returns": "array"
},
{
"name": "Scam Archive Scams",
"url": "https://scamarchive.org/api/scams",
"method": "GET",
"headers": {},
"returns": "object",
"list-field": "rows"
}
]
}
Obviously as it grows I will need to specify how to handle rate limits, API keys, etc. But let’s get started with this first.
Two endpoints that return JSON — one returns a direct array of records, and the other returns an object that has a field for rows.
Getting Started with the Fetch
I can write Python and Node/Bun applications for the most part without checking documentation at this point. But even to do a basic thing like fetching data from a URL I need to learn a completely different way of doing it.
What’s critical for me is that I understand what is going on, because once you learn that, the rest is just syntax.
We start off in main.go with just:
package main
func main() {
fmt.Println("Hello World")
}
Running go run main.go — if that doesn’t work, then nothing else down the line will.
Now that we have the base, let’s fill in our main. Go comes with a great standard library and we can create a request with:
req, err := http.NewRequest(http.MethodGet, "https://scamarchive.org/api/scams", nil)
There’s a bit here that helps us better understand Go.
First, req, err — Go can return multiple values from a function. Instead of .then or other error handling chains, the function returns an err value that you handle right after:
if err != nil {
fmt.Printf("Error when creating the request %s\n", err)
}
A lot of people dislike this because it creates a lot of boilerplate, but personally it helps simplify the code, makes it very readable, and gives a consistent and safe route for error handling.
After we create the request, we should probably use it:
res, err := http.DefaultClient.Do(req)
if err != nil {
fmt.Printf("Error making HTTP request %s\n", err)
}
That was the fetch! Now we read the body:
resBody, err := io.ReadAll(res.Body)
Parsing the Response
Now that we have the response, we need to do something with it.
Normally you would create a struct with the structure of the incoming JSON. But since we want this to handle any data that comes through and treat it more or less the same, I will forsake my ancestors and use any.
any is a generic (I think) which means it can hold any type. This is very much a cop-out, but when dealing with data that is dynamic in structure it can come in handy.
var data any
// This does a thing and only returns an error — what does the & do?
err := json.Unmarshal(resBody, &data)
Normalizing the Data
How do we get unstructured data into a database? Well… we need to structure it, unless you want to use a document database or JSONB. Which I don’t want to do — I want this to output into a clean Postgres table.
My thought process is to use DuckDB to turn a .jsonl file that has the data into a Parquet (structured and defined), and then copy the Parquet into my PG database. Since I’ll be using Parquets I might as well.
The two situations I anticipated earlier: the API directly returns an array, or it returns an object with an array in a field. Let’s normalize those into a list of rows:
var rows []any // an array of 'any' items
if source.Returns == "array" {
rows = data.([]any) // cast data into the same format as rows
} else {
m := data.(map[string]any) // cast data as a map
rows = m["OurFieldName"].([]any)
}
Writing to JSONL and Converting to Parquet
Since I want DuckDB to do a clean copy to Parquet, let’s make a temporary JSONL file:
tmpFile, _ := os.CreateTemp("", "sync-everything-*.jsonl")
for _, row := range rows {
line, _ := json.Marshal(row)
tmpFile.Write(line)
tmpFile.Write([]byte("\n"))
}
tmpFile.Close()
defer os.Remove(tmpFile.Name())
Then convert it to Parquet using DuckDB:
db, err := sql.Open("duckdb", "")
if err != nil {
fmt.Printf("Error starting DuckDB %s\n", err)
os.Exit(1)
}
defer db.Close()
query := fmt.Sprintf(
"COPY (SELECT * FROM read_json_auto('%s')) TO 'sync-%s-%d.parquet' (FORMAT PARQUET)",
tmpFile.Name(),
source.Name,
time.Now().Unix(),
)
_, err = db.Exec(query)
if err != nil {
fmt.Printf("Error executing DuckDB script %s\n", err)
os.Exit(1)
}
Parsing the Config
While it should have been the first step, for me it’s last. Let’s make it actually parse the config.json and operate off that.
First I defined my structs:
type Source struct {
Name string `json:"name"`
Url string `json:"url"`
Method string `json:"method"`
Returns string `json:"returns"`
ListField string `json:"list-field"`
// Headers we will do later
}
type Config struct {
OutputDirectory string `json:"output_dir"`
Sources []Source `json:"sources"`
}
Config is the file, and Source is each of the sources. Those backtick tags at the end (json:"sources") tell Go which fields from the JSON to grab and map.
Now let’s open the file, read it, and unmarshal it:
jsonFile, err := os.Open("config.json")
if err != nil {
fmt.Printf("Error opening JSON file %s\n", err)
}
fmt.Printf("Opened Config File Successfully")
byteValue, err := io.ReadAll(jsonFile)
if err != nil {
fmt.Printf("Error reading file")
return
}
var config Config
err = json.Unmarshal(byteValue, &config)
if err != nil {
fmt.Printf("Error unmarshaling configuration %s\n", err)
return
}
Putting It All Together
package main
import (
"database/sql"
"encoding/json"
"fmt"
"io"
"net/http"
"os"
"time"
_ "github.com/duckdb/duckdb-go/v2"
)
type Source struct {
Name string `json:"name"`
Url string `json:"url"`
Method string `json:"method"`
Returns string `json:"returns"`
ListField string `json:"list-field"`
// Headers we will do later
}
type Config struct {
OutputDirectory string `json:"output_dir"`
Sources []Source `json:"sources"`
}
func main() {
jsonFile, err := os.Open("config.json")
if err != nil {
fmt.Printf("Error opening JSON file %s\n", err)
}
fmt.Printf("Opened Config File Successfully")
byteValue, err := io.ReadAll(jsonFile)
if err != nil {
fmt.Printf("Error reading file")
return
}
var config Config
err = json.Unmarshal(byteValue, &config)
if err != nil {
fmt.Printf("Error unmarshaling configuration %s\n", err)
return
}
for _, source := range config.Sources {
fmt.Printf("Now running source %s\n", source.Name)
req, err := http.NewRequest(http.MethodGet, source.Url, nil)
if err != nil {
fmt.Printf("error making http request: %s\n", err)
os.Exit(1)
}
res, err := http.DefaultClient.Do(req)
if err != nil {
fmt.Printf("error making http request: %s\n", err)
os.Exit(1)
}
resBody, err := io.ReadAll(res.Body)
if err != nil {
fmt.Printf("Could not parse response body")
os.Exit(1)
}
var data any
if err := json.Unmarshal(resBody, &data); err != nil {
fmt.Printf("Could not unmarshal response body %s\n", err)
os.Exit(1)
}
var rows []any
if source.Returns == "array" {
rows = data.([]any)
} else {
m := data.(map[string]any)
rows = m[source.ListField].([]any)
}
tmpFile, _ := os.CreateTemp("", "sync-everything-*.jsonl")
for _, row := range rows {
line, _ := json.Marshal(row)
tmpFile.Write(line)
tmpFile.Write([]byte("\n"))
}
tmpFile.Close()
defer os.Remove(tmpFile.Name())
db, err := sql.Open("duckdb", "")
if err != nil {
fmt.Printf("Error starting DuckDB %s\n", err)
os.Exit(1)
}
defer db.Close()
query := fmt.Sprintf(
"COPY (SELECT * FROM read_json_auto('%s')) TO 'sync-%s-%d.parquet' (FORMAT PARQUET)",
tmpFile.Name(),
source.Name,
time.Now().Unix(),
)
_, err = db.Exec(query)
if err != nil {
fmt.Printf("Error executing DuckDB script %s\n", err)
os.Exit(1)
}
}
}
That’s it for today. I have some other work to jump to but I am happy with this progress. It fetches a URL, unmarshals the JSON, handles both array and object responses, writes line by line to a JSONL file, and then executes a DuckDB command that copies it to a Parquet. Next time: API keys!