aws/aws-cdk

(aws-glue-alpha): (struct schema produces unsupported inputStrings)

Open

#26,935 opened on Aug 30, 2023

View on GitHub
 (3 comments) (0 reactions) (0 assignees)TypeScript (10,710 stars) (3,530 forks)batch import
@aws-cdk/aws-gluebugeffort/mediumgood first issuep2

Description

Describe the bug

Regarding this line: https://github.com/aws/aws-cdk/blame/main/packages/%40aws-cdk/aws-glue-alpha/lib/schema.ts#L209

As far as I can tell, this will happily create invalid inputStrings for nested structs:

const nested = Schema.struct([
  {
    name: "name",
    comment: "The name of the thing",
    type: Schema.STRING
  },
  {
    name: "url",
    type: Schema.STRING
  }
])
{
  name: "some_nested_struct",
  type: nested
}

Will generate the following inputString for the nested struct:

struct<name:string COMMENT 'The name of the thing',url:string>

If you create a Glue table with this in the schema, athena will throw an error whenever you try to query the table:

HIVE_INVALID_METADATA: Glue table 'db.table' column 'some_nested_struct' has invalid data type: struct<name:string COMMENT 'The name of the thing',url:string>
...

From what I can tell, 'COMMENT' is not supported in nested structs. If I try to manually create a schema in a fresh glue table, adding "COMMENT" to the inputString of a nested string causes Glue to treat the type as 'unknown'

For example, before the COMMENT I can inspect the schema and see its type:

{
  "some_nested_struct": {
    "name": "string",
    "url": "string"
  }
}

But if I add the comment and inspect the type of the column I see:

{
  "some_nested_struct": {
    "name": {
      "unknown": "STRUCT <\n  name: STRING COMMENT 'some comment',\n  url: STRING\n>"
    },
    "url": "string"
  }
}

Expected Behavior

Ideally Glue would support nested comments (or at worst ignore them), but the CDK construct should at least not generate input strings that are guaranteed to not work.

Current Behavior

See description

Reproduction Steps

See description

Possible Solution

See expected behavior

Additional Information/Context

No response

CDK CLI Version

2.87.0 (build 9fca790)

Framework Version

No response

Node.js Version

v18.16.0

OS

AL2

Language

Typescript

Language Version

5.1.3

Other information

No response

Contributor guide