Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StarRocks export data to csv format error #52004

Open
boolean-dev opened this issue Oct 17, 2024 · 0 comments
Open

StarRocks export data to csv format error #52004

boolean-dev opened this issue Oct 17, 2024 · 0 comments
Labels
type/bug Something isn't working

Comments

@boolean-dev
Copy link

Steps to reproduce the behavior (Required)

1. Objective

I want to import data from StarRocks into PostgreSQL, so I have taken the following steps:

  1. Export the data from StarRocks in CSV format to S3, there are two approaches:

    1. Export data using EXPORT | StarRocks
    2. Unload data using INSERT INTO FILES | StarRocks
  2. Use AWS PostgreSQL to import the CSV data from S3 into the database, the approach is as follows:

    1. Importing data from Amazon S3 to your RDS for PostgreSQL DB instance - Amazon Relational Database Service

2. Basic Data

My data contains several VARCHAR fields, where the data stored includes common delimiters such as |​, ,​ and #​ within the strings.

Table structure:

CREATE TABLE test_table
(
    id              INT            NULL,
    lrn             VARCHAR(255)   NULL,
    awb             VARCHAR(255)   NULL,
    tracking_number VARCHAR(255)   NULL,
    recipient       VARCHAR(65533) NULL,
    sender          VARCHAR(65533) NULL,
    currency        VARCHAR(255)   NULL,
    item            VARCHAR(65533) NULL,
    container       VARCHAR(255)   NULL,
    created_at      DATETIME       NULL,
    updated_at      DATETIME       NULL,
    ioss            VARCHAR(255)   NULL,
    parcel_no       VARCHAR(255)   NULL,
    vat_number      VARCHAR(255)   NULL,
    origin_ioss     VARCHAR(255)   NULL,
    update_ioss_at  DATETIME       NULL,
    archived        TINYINT(1)     NULL,
    transport_cost  DECIMAL(9, 2)  NULL
)

Data as follows:

49321237,
FTLaaaaaaa256993,
235-111111934,
6G58638256993,
{"city": "| COI****THIER", "name": "an****HER", "address": "798 **** Mo", "country": "FR", "zipcode": "7***0", "address2": "", "cityCode": "73"},
{"city": "BEIJING", "name": "BEIJ ***  LTD", "address": "R *** HENG", "country": "CN", "zipcode": "1***23", "address2": "SHI***R", "cityCode": "10"},
EUR,
{"count": 1, "value": 0.58, "hsCode": "6***9", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.187, "opedest": "O***ONNEL", "currency": "EUR", "describe": "CLOTHES", "nattrans": 11, "packagingType": ""},
FRFR1111111156,
2023-01-01 04:03:11.311006,
2023-01-01 04:03:11.311006,
\N,
\N,
\N,
\N,
\N,
\N,
false,
\N

4. EXPORT Plan

Export SQL

EXPORT TABLE test_table
PARTITION ('p202212') 
TO "s3://test_bucket/**/s1/"
PROPERTIES
(
    "column_separator"=","
) 
WITH BROKER
(
    "aws.s3.access_key" = "AAAAAAAAAAAA",
    "aws.s3.secret_key" = "KKKKKKKKKKKKKKKKKKKKKKKK",
    "aws.s3.region" = "eu-west-1"
);

CSV data

49318060,FTL611111151839,157-1111113,6G58111111119,{"city": "BaaaaST", "name": "ARNaaaaRY", "address": "5 SaaaaN", "country": "FR", "zipcode": "21110", "address2": "", "cityCode": "29"},{"city": "SHaaaaHAI", "name": "SHANaaaaLTD", "address": "Pudoaaaat", "country": "CN", "zipcode": "211115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 24.89, "hsCode": "3111110", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.624, "opedest": "OCaaaaaNEL", "currency": "EUR", "describe": "Diaaaaent", "nattrans": 11, "packagingType": ""},YWFTL11111111103001,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,R8j5uaiaaaaaaaaaHBFw==,\N,\N,\N,\N,true,0.00

The JSON data in the CSV contains ,​, it should be enclosed with "​:

"{""city"": ""BaaaaST"", ""name"": ""ARNaaaaRY""}"

5. INSERT INTO FILES Plan

sql

INSERT INTO 
FILES(
    "path" = "s3://test_bucket/**/insert",
    "format" = "csv",
    "csv.column_separator"=",",
    "csv.enclose"='"',
    "csv.escape"="\\",
    "compression" = "uncompressed",
    "target_max_file_size" = "1024", -- 1KB
    "aws.s3.access_key" = "AKIAaaaaaaaaSMKS",
    "aws.s3.secret_key" = "TpiojLaaaaaaaaaaaauDDaYZr",
    "aws.s3.region" = "eu-west-1"
)
SELECT * FROM test_table LIMIT 1000;

csv data

49318068,FTL61111541685,157-1111113,6G511111111185,{"city": "ME11C", "name": "CaaETTE", "address": "10aa rige", "country": "FR", "zipcode": "11110", "address2": "", "cityCode": "19"},{"city": "SH11AI", "name": "SHAN11SS CO.LTD", "address": "P11 ort", "country": "CN", "zipcode": "2115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 14.09, "hsCode": "8111", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.34, "opedest": "OCC111EL", "currency": "EUR", "describe": "Camping light", "nattrans": 11, "packagingType": ""},YW11113058,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,GUub04OA1111E51kG+p2Sg==,\N,\N,\N,\N,true,0.00

The JSON data in the above CSV still appears incorrectly

Expected behavior (Required)

the csv export data should be like this:

49318048,FTL6G58753149446,157-87924513,6G58753149446,"{""city"": ""BaaaaST"", ""name"": ""ARNaaaaRY""}",\N,....

Real behavior (Required)

but the csv data is this:

49318060,FTL611111151839,157-1111113,6G58111111119,{"city": "BaaaaST", "name": "ARNaaaaRY"},\N,....

StarRocks version (Required)

3.3.4

@boolean-dev boolean-dev added the type/bug Something isn't working label Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant