confluentinc/ksql

Two different behavior using REGEX_EXTRACT scalar function

Open

#7,792 创建于 2021年7月15日

在 GitHub 查看
 (0 评论) (0 反应) (0 负责人)Java (5,739 star) (1,048 fork)batch import
bugfix-it-weekgood first issuequery-engineuser-defined-functions

描述

Describe the bug Two different behaviors when inject statements with the CLI and the Control Center when using REGEX_EXTRACT scalar function

To Reproduce

  1. 0.18.0
{
  "id": "1234",
  "_data": {
    "referral_fields": "<ReferralFields version=\"1.0\"><TestCode>7621</TestCode><TestBarcode>33025016962</TestBarcode><Diagnoses><Diagnosis><Code>Y20185</Code><Description>EXAMINATION POSTPARTUM</Description><Attribute1/><Attribute2/><Comments/></Diagnosis></Diagnoses><TestReason/><Description/><PregnancyAge/><LastPeriodDate/><Menopause/><IUD/><Comments/><MainDiagnoses><MainDiagnosis><Code>Y11746</Code><Description>HYPERLIPIDEMIA</Description><Attribute1>M/P</Attribute1><Attribute2/><Comments>due to contraseptive</Comments></MainDiagnosis></MainDiagnoses><ChronicMedications/><DrugAllergies/><Allergies><Allergy><Name>הרדמה מקומית בעבר תקינה</Name><Expression/></Allergy><Allergy><Name>לא ידועה רגישות לתרופות</Name><Expression/></Allergy><Allergy><Name>לא ידוע</Name><Expression/></Allergy><Allergy><Name>NO</Name><Expression/></Allergy></Allergies></ReferralFields>"
  }
}
  1. Any SQL statements you ran
SET 'auto.offset.reset'='earliest';

create stream `referral_fields`
	(`id` string key,`_data` struct<`referral_fields` string>)
     with 
     (kafka_topic='<some topic>',value_format='json');

create table `barcodes` as 
select regexp_extract('(\<testbarcode\>)([0-9]+)\w',LCASE(`_data`->`referral_fields`), 2) as `barcode`,
       count(*) as `count`
from `referral_fields` 
where regexp_extract('(\<testbarcode\>)([0-9]+)\w',LCASE(`_data`->`referral_fields`), 2) is not null
group by regexp_extract('(\<testbarcode\>)([0-9]+)\w',LCASE(`_data`->`referral_fields`), 2)
emit changes;

Expected behavior Project the message to the table regardless which client the statement injected

Actual behavior

  1. created stream and table successfully
  2. No errors
  3. N/A

Additional context acts different because of the \w addition char in the end of the regex. although the \w eventually was not necessary in the regex, i expect it to act the same regardless the client i used.

if I inject the statement though the cli its not working, but through the Control Center, its working

贡献者指南

Two different behavior using REGEX_EXTRACT scalar function · confluentinc/ksql#7792 | Good First Issue