(aws-glue-alpha): (struct schema produces unsupported inputStrings)
#26,935 opened on 2023年8月30日
説明
Describe the bug
Regarding this line: https://github.com/aws/aws-cdk/blame/main/packages/%40aws-cdk/aws-glue-alpha/lib/schema.ts#L209
As far as I can tell, this will happily create invalid inputStrings for nested structs:
const nested = Schema.struct([
{
name: "name",
comment: "The name of the thing",
type: Schema.STRING
},
{
name: "url",
type: Schema.STRING
}
])
{
name: "some_nested_struct",
type: nested
}
Will generate the following inputString for the nested struct:
struct<name:string COMMENT 'The name of the thing',url:string>
If you create a Glue table with this in the schema, athena will throw an error whenever you try to query the table:
HIVE_INVALID_METADATA: Glue table 'db.table' column 'some_nested_struct' has invalid data type: struct<name:string COMMENT 'The name of the thing',url:string>
...
From what I can tell, 'COMMENT' is not supported in nested structs. If I try to manually create a schema in a fresh glue table, adding "COMMENT" to the inputString of a nested string causes Glue to treat the type as 'unknown'
For example, before the COMMENT I can inspect the schema and see its type:
{
"some_nested_struct": {
"name": "string",
"url": "string"
}
}
But if I add the comment and inspect the type of the column I see:
{
"some_nested_struct": {
"name": {
"unknown": "STRUCT <\n name: STRING COMMENT 'some comment',\n url: STRING\n>"
},
"url": "string"
}
}
Expected Behavior
Ideally Glue would support nested comments (or at worst ignore them), but the CDK construct should at least not generate input strings that are guaranteed to not work.
Current Behavior
See description
Reproduction Steps
See description
Possible Solution
See expected behavior
Additional Information/Context
No response
CDK CLI Version
2.87.0 (build 9fca790)
Framework Version
No response
Node.js Version
v18.16.0
OS
AL2
Language
Typescript
Language Version
5.1.3
Other information
No response