[Good First Issue] Support All SQL Functions in Other SQL System
#48,203 创建于 2025年2月22日
描述
UPDATE:
Hey guys, considering the current capabilities of LLMs, especially for this specific set of cases, we can already generate PRs that meet quality and requirement standards within minutes using the agents available on Github. Therefore, we will no longer invest a lot of effort into this. If you're interested in implementing these cases, I recommend you have in-depth discussions with AI, fully understand the code you need to submit, and then directly submit the PR. PRs involving new functions will automatically request my review, and I will check them in a timely manner. I will no longer reply to specific comments under this issue.
Thanks everyone.
Description
We plan to implement all SQL functions in other famous DBs, like MySQL, PG, Trino, CK, Hive, and more. Facilitate the users to migrate to Doris. They're very suitable for newcomers as your first Doris PR. So here's the list. Feel free to comment to pick anyone! If one is picked, I will tick it.
Part I. Hive
- sinh(from Trino), asinh, atanh, acosh (Easy) @ChenMiaoi
- context_ngrams @noixcn
- factorial (Easy) @K-handle-Y
- levenshtein @whisper33z
- encode, decode @hacklu-tu
- soundex
See the newest Hive document for these functions' explanation.
Part II. Spark
- map_concat (been taken for interview @HappenLee )
- regexp_extract_all for the third argument
Part III. Trino&Presto
- regexp_count
- regexp_position @lsy3993
- hamming_distance (better with levenshtein together)
- human_readable_seconds
- timezone_hour, timezone_minute @om2805
- GEO FUNCTIONS
- ST_GeomFromKML
- ST_Equals, ST_Relate
- ST_Intersects, ST_Disjoint, ST_Touches @koi2000
- ST_Crosses, ST_Overlaps, ST_Relate, ST_Within
- ST_Buffer, ST_Boundary, ST_Envelope, ST_EnvelopeAsPts, ST_ExteriorRing
- geometry_nearest_points, geometry_union, ST_Union
- ST_Difference, ST_Intersection, ST_SymDifference
- ST_Centroid, ST_ConvexHull
- ST_CoordDim, ST_Dimension
- ST_Distance, ST_GeometryType, ST_Length @zxc20041
- ST_InteriorRingN, ST_InteriorRings, ST_NumInteriorRing
- ST_GeometryType, ST_IsClosed, ST_IsEmpty, ST_IsSimple, ST_IsRing, ST_IsValid
- ST_PointN, ST_StartPoint, ST_EndPoint, ST_Points, ST_XMax, ST_XMin, ST_YMax, ST_YMin
- simplify_geometry
- ST_NumGeometries, ST_Geometries, ST_NumPoints
- ARRAY FUNCTIONS
- dot_product @meox3259
- trim_array @vajaw
- ngrams @advisedy
- combinations @daju233
- reduce (lambda function) @cypppper
- sort (add the three arguments with lambda functor version) (Hard)
- merge(HLL)
- typeof
- Aggregation Functions
- bool_or, bool_and
Part IV. DuckDB
- Math Functions
- even @wumeibanfa
- gcd, lcm @wumeibanfa
- gamma @Patinlove
- signbit @wumeibanfa
- String Functions
- ord @CAICAIIs
- Vector(Array) Functions
- cross_product @juruo-c
- cosine_similarity @Pluto340
- Date Functions
- century @robll-v1
- Aggregation Functions
- geomean @0AyanamiRei
- entropy @wrlcke
- sem @wumeibanfa
- skew_pop, kurt_pop @mickaelli
- Map Functions
- map_contains_entry @DayuanX
- map_entries @DayuanX
Part V. MySQL (High Priority)
- POSITION(easy, syntax and alias function) @wumeibanfa
- EXPORT_SET (easy)
- INSERT (take care of utf8) @linrrzqqq
- MAKE_SET @linrrzqqq
- MID @linrrzqqq
- SUBSTR (more syntax) @wumeibanfa
- ATAN2 WITH TWO ARGS @linrrzqqq
- CURTIME WITH MICROSECONDS
- DAYNAME with sysvar LC_TIME_NAMES (code could copy from mysql, add session variable) @linrrzqqq
- GET_FORMAT @linrrzqqq
- PERIOD_ADD, PERIOD_DIFF
- MAKE_TIME @linrrzqqq
- SUB_TIME @dwdwqfwe
- TIMESTAMP WITH TWO ARGS @linrrzqqq
- TIME_FORMAT @linrrzqqq
- TO_SECONDS @dwdwqfwe
- UTC_DATE, UTC_TIME @linrrzqqq
- IS_IPV4, IS_IPV6 @Dog-Du
- INTERVAL @linrrzqqq
Others
- json_search with 4th and 5th arguments like MySQL @ChenMiaoi
More tasks is coming...
Solution
All the guidelines to implement an SQL function are in https://github.com/apache/doris/issues/48201. Please take a carefully look at!