how to load a date field with NULLS via TSLoad

I have a table with a date field... this date field will frequently be NULL. other date fields work fine. but this one, is causing load to fail via TSLoad with the message:  Conversion failed, error=Can not convert "" to date using date format "%Y-%m-%d %H:%M:%S"

How do I set TSLoad to get Null values in a date field to load?

14replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • what have you set the --null_value flag to be? The error message indicates that the values have an empty string as null.  

    Reply Like
  • Damian Waldron i don't have --null_value set. what should I set it to for a date field? do you have an example? thx

    Reply Like
  • craig vigor I would use the following. 

    --null_value ""
    Reply Like
  • Damian Waldron Thanks Damian. I have a similar case where i have date as "\N". i gave the flag as --date_format '%Y-%m-%d' --null_value ""

    But it is throwing error - Can not convert "\N" to date using date format "%Y-%m-%d", data column=3, schema column=shpd_dt

    I am loading this from hadoop so i cannot change the source value. the date is "null" in source and is loading as "\N" to Thoughtspot 

    Reply Like 1
  • Can you share 2-3 sample rows of your data - please anonymize or mask any of the sensitive values - but it would be helpful to see the exact input as it's coming into ThoughtSpot.

    Reply Like
  • 13442947^2019-10-08^\N^881^71^8911649295^ZINUS INC^B-DOMESTIC MANAGED                                ^28000682^ID^556339319^14915119^5`` MATTRESS TWIN        ^ONLINEASAP^YB-500T-F           ^-263554342^56^540^57^CHARLESTON-DOTCOM^JAKARTA^40^HIGH CUBE^2020^2^2020-02-16^2020-01-31^\N^WAL-MART.COM^YOUTH5INCH MEMORY FOAM MATTRESS WITH MOISTURE BARRIER TWINFUSCHIA^U
    13442947^2019-10-08^\N^881^71^8911649295^ZINUS INC^B-DOMESTIC MANAGED                                ^28000682^ID^556339551^20532377^8INCH SPA SENSATIONS     ^ONLINEASAP^W-MGM-8C            ^-263552843^166^22^170^CHARLESTON-DOTCOM^JAKARTA^40^HIGH CUBE^2020^2^2020-02-16^2020-01-31^\N^WAL-MART.COM^8INCH SPA SENSATIONS  MYGEL MATTRESS CAL KING^U
    13442947^2019-10-08^\N^881^71^8911649295^ZINUS INC^B-DOMESTIC MANAGED                                ^28000682^ID^556339536^20532381^8INCH SPA SENSATIONS     ^ONLINEASAP^W-MGM-8W            ^-263552881^166^91^170^CHARLESTON-DOTCOM^JAKARTA^40^HIGH CUBE^2020^2^2020-02-16^2020-01-31^\N^WAL-MART.COM^8INCH SPA SENSATIONS  MYGEL MATTRESS KING^U


    Dave Cohen Please find my sample 3 rows. delimited by ^. My data has \N as value for a date column from source

    Reply Like
  • What format is your data saved as in Hadoop? Have you created a Hive table? 

    Hadoop can have some issues with the date datatype 

    Reply Like
  • Marc Price Yes i have a hive table created and its format is date. In the hive table the date is showing "null". I am systematically loading it to thoughtspot through a pipeline from hdfs and it gets converted automatically to "/N" in teh file. so when file is trying to get loaded to thoughtspot it rejects "/N" value to a date column in TS.

    Is there a way to avoid "/N" in that date column and show as "null"?

    Reply Like
  • Gomathi Muthuswamy what is the format of the underlying data? is it CSV, avro, parquet? 

    Reply Like
  • Marc Price it is ORC

    Reply Like
  • Gomathi Muthuswamy - Depending on the size of your input file, you can do a "sed" substitution as part of your tsload process.

    For your data, that would look something like this:

    >>> tsload.sh file >>>

    echo "loading null date sample ..."

    echo ""

    cat sample.csv | sed -e "s/\\\N//g" | tsload --target_database nulldates --date_format "%Y-%m-%d" --empty_target --csv --field_separator "^" --null_value "" --target_table table1

     

    If your input file is very large, then you may want to do it as a 2-step process - (1) first the sed command (which writes out a new output.csv file) and then (2) cat that new output.csv file to tsload.

    1. sed -e "s/\\\N//g" <input.csv >output.csv

    2. cat output.csv | tsload --target_database nulldates --date_format "%Y-%m-%d" --empty_target --csv --field_separator "^" --null_value "" --target_table table1

     

    Your output from both approaches would look something like this:

    Reply Like 1
  • hi Gomathi Muthuswamy I suggest you use COALESCE to handle nulls on your export to CSV Hive SQL.

    You may want to  use: 

    COALESCE(some_date-col,CAST('1900-01-01' AS DATE))  -> for a date
    COALESCE(some_measure_col,0) -> for a measure 

    COALESCE(some_attribute_col,-99) -> for a numeric attribute. -99 is a catch-all example and must mean something to your users.

    Reply Like 1
  • Thanks Ricardo Da Silva Dave Cohen
    Ricardo Da Silva COALESCE worked out and i was able to successfully load to ThoughtSpot. Thanks for your suggestion. Is there a way i can show those dates back to "null" without adding an extra formula column?

    Reply Like 1
  • Yw, great to hear the data has been loaded. You can replace the database column with a formula with the same name and a condition that checks if the database date column value matches the default value you have in the coalesce function. If true returns null, the database column otherwise.

    Something like: 
    if ( report date = to_date ( '1900-01-01' , '%Y-%m-%d')) then null else report date

    Let's connect f2f I'm at David Glass next week. 
     

    Reply Like 1
Like Follow
  • Status Answered
  • 2 wk agoLast active
  • 14Replies
  • 139Views
  • 6 Following