[experiment] Auto-generate documentation for jaeger-v2 configuration structs via AST
#6628 opened on Jan 28, 2025
Description
We are still blocked on the main issue #6186 by schema-first efforts in OTEL Collector not progressing. I wonder if we could instead use the Go's AST library to navigate the hierarchy of known config structs and extract the comments and other metadata needed for the docs, and/or config examples.
There are various blog posts showing examples of using AST.
The tool could have just a hardcoded list of starting configuration structs, both from Jaeger and from OTEL code base, e.g. cmd/jaeger/internal/extension/jaegerquery/config.go.
The prototype is available in draft PR #7064.
Rough outline of the milestones:
- add a new subcommand to jaeger-v2 to generate config schema (done in #7064)
- collect config objects from OTEL component factories (done in #7064)
- use reflection on those objects to determine additional structs from field types and embedded structs (partially done in #7064)
- use "golang.org/x/tools/go/packages" to parse the packages containing the structs to get access to other metadata like comments (partially done in #7064)
- transform collected data into JSON Schema output (partially done in #7064)
- run 3rd party tools to convert JSON schema into HTML documentation (done in #7064)
- enhance Jaeger docs to use the output from last step to include in the website as part of the release process
This is another outline of the task from Gemini:
Feature: Generate JSON Schema with Comments and Defaults
Goal: Implement a tool or function that generates JSON schema for a collection of Go objects, incorporating comments as descriptions and using the current field values as defaults.
Implementation Outline:
I. Initialization and Package Loading:
- Input:
- A slice or map of Go objects to generate schemas for.
- The package paths where the types of these objects are defined.
- Load Packages:
- Utilize the
"golang.org/x/tools/go/packages"library to load the specified Go packages. - Configure
packages.Configto include necessary information for parsing comments and type structures (e.g.,NeedTypes,NeedSyntax,NeedName,NeedImports,NeedDeps,NeedFiles,NeedCompiledGoFiles,NeedExportFile,NeedModule).
- Utilize the
- Type Information:
- For each input Go object, obtain its
reflect.Typeusing thereflectpackage for runtime inspection.
- For each input Go object, obtain its
II. Reflecting and Parsing Types:
- Iterate Through Objects: Loop through each Go object in the input collection.
- Get
reflect.Typeandreflect.Value:- Obtain the
reflect.Typeto analyze the structure. - Obtain the
reflect.Valueto access the current field values for defaults.
- Obtain the
- Find Corresponding
ast.TypeSpec:- For the
reflect.Type, locate the correspondingast.TypeSpecwithin the parsed packages (pkg.Syntax). - This will involve traversing the syntax trees and matching the
ast.TypeSpec.Name.Namewith the Go type's name. - Handle potential complexities like embedded types and type aliases.
- For the
- Extract Field Information: For each field of the
reflect.Type:- Get the field name (
field.Name). - Get the field type (
field.Type). - Extract struct tags (
field.Tag), specifically looking for thejsontag to determine the JSON property name andomitempty. - Get the current value of the field from the
reflect.Value(Value.Field(i)).
- Get the field name (
- Extract Comment Information:
- Locate the corresponding
ast.Fieldin theast.TypeSpec. - Extract the associated comment from
ast.Field.Docorast.Field.Comment.
- Locate the corresponding
III. Building the JSON Schema:
- Schema Structure:
- Define a structure for the generated JSON schema, likely using the
"definitions"section for type schemas and a top-level schema referencing these definitions.
- Define a structure for the generated JSON schema, likely using the
- Type Mapping:
- Create a mapping between Go types (from
reflect.Type) and their corresponding JSON schema types (e.g.,string,integer,boolean,array,object). - Handle basic types, slices, maps, and nested structs.
- Create a mapping between Go types (from
- Schema Properties: For each Go field, create a property in the JSON schema:
type: Mapped from the Go field type.description: The extracted Go field comment.default: The current value of the Go field (serialized appropriately for JSON schema).- Potentially include other keywords like
format,nullable, and constraints based on struct tags.
- Handling Nested Objects:
- If a field is another Go object, recursively process its type and add a
$refto its definition in the"definitions"section.
- If a field is another Go object, recursively process its type and add a
- Handling Slices and Maps:
- For slice and map types, define the
itemsoradditionalPropertiesschema, referencing the schema of the element/value type.
- For slice and map types, define the
IV. Data Structures:
TypeCache(Map:reflect.Type->*ast.TypeSpec): Caches the mapping betweenreflect.Typeand itsast.TypeSpecto avoid redundant lookups.SchemaDefinitions(Map:string->map[string]interface{}): Stores the generated JSON schema definitions for each Go type, keyed by the type name.ProcessedTypes(Set:reflect.Type): Tracks already processed Go types to prevent infinite recursion with nested or circular dependencies.FieldInfo(Struct): Holds intermediate information about each field:type FieldInfo struct { Name string JSONName string Type reflect.Type Value reflect.Value Comment string Tags reflect.StructTag }PackageInfo(Struct): Stores information about a loaded Go package, including a mapping of type names to theirast.TypeSpec:type PackageInfo struct { Package *packages.Package TypeSpecs map[string]*ast.TypeSpec }
V. Output:
- Root Schema: Construct the final JSON schema object, including the
$schemaand the"definitions"section. The root schema might also define properties for the top-level object(s). - Serialization: Serialize the JSON schema structure into a JSON string using
encoding/json.
Key Considerations and Challenges:
- Handling embedded types correctly.
- Managing type aliases.
- Detecting and handling circular dependencies between types.
- Deciding how to handle unexported fields.
- Mapping custom Go types to appropriate JSON schema types.
- Implementing robust error handling.
- Optimizing performance for large and complex type structures.