Change date format in thoughtspot.

Cant change date format in TSpot.

Tried changing via Data>Tables then Column and then changed the date format.

But it starts showing date as 03/00/2017.

20replies Oldest first
  • Oldest first
  • Newest first
  • Active threads
  • Popular
  • What date format did you use?  You could have a bad format string.

    Like
  • Here's the date formats/examples I have in my notes:

    • MM/dd/yyyy

    • MMM (for abbreviated month format)

    • DD/mm/yyyy

    • MM/dd/yyyy HH:mm

    • DD/mm/yyyy HH:mm 

    Like
  • Hi,

     

    Looks to me like we've got some confusing information in our manuals.. for the date/time formats:

     

    MM = numeric month

    MMM = 3 character month

    mm = numeric minutes!

     

    So in the examples we provide where we have a date format referencing mm we should specify MM. I also can't find a place in the documentation we call this out so I'll flag it as a problem. In your example, assuming you're using mm you are referencing the minutes part of your date which presumably does not exist, hence no result. 

    Does that help clarify things?

    Like
  • Stuart Herd - ThoughtSpot_Administration_Guide_4.2pdf page 119 - Set the format to use when showing dates.

    Like
  • Steph Fisher Yes - but what I'm saying is that they are inaccurate. Try any of the masks that suggest mm rather than MM for month and you won't get the result you expect. 

    Like
  • Apparently, I figured this out accidentally.What you have to do is leave the date format field empty and ThoughtSpot will figure it out automatically.Worked in my case well.

    Like
  • my data file has the date in  this format mm/dd/yy - how to transform it to mm/dd/yyyy using TSLOAD?

    Like
  • Lokesh Ceeba I would cat the file to tsload after running through a sed command with a regex to modify and **/**/** to **/**/20** (so long as the data is from the past 18 years. If you have to include potential 19** years, additional logic would be necessary.

    Like
  • Lokesh Ceeba Zak Konie there are actually two different places that we use date formats and they are different.

    The initial thread here was about format masks once the data in in ThoughtSpot. In that instance we have now updated (and corrected!) our internal documentation.

     

    Your question is about the loading of data. As our manuals state we support the strptime formats. This does support two digit years and treats them as follows:

           %y     The  year within century (0-99).  When a century is not otherwise specified, values in the range 69-99 refer to years in the twentieth century (1969-1999); values in the range 00-68 refer to years in the twenty-first century (2000-2068).

    So you'd specify --date_format "%m/%d/%y" and it will load into Thoughtspot as a fully formed date.

    Like
  • what is wrong here -

    tsload --target_database "my_db" --target_table "my_table" --date_format '%m/%d/%y' --null_value "" --source_data_format delimited --field_separator '|'

    file format:

    Market_Display_Name|UPC|Period|POS_USD|Units|ACV
    Total FOOD|060933208003|06/29/13|1.000|1.000|0.004
    Total US Drug|200002611714|06/08/13|677.410|79.000|0.435
    ABC Retailer IncI TA|009500800042|06/15/13|5550.160|657.000|11.428
    My Company Inc|004155421983|06/08/13|3894.440|388.000|5.849
    ------

    Error

    E0720 00:51:01.847581 32648 data_importer.cpp:1828] Stopped processing after 69672 lines, Exceeded max allowed errors 0
    E0720 00:51:01.847581 32748 data_importer.cpp:934] Market_Display_Name|UPC|Period|POS_USD|Units|ACV
    , line no=1 ;Reason: Conversion failed, error=Can not convert "Period" to date using date format "%m/%d/%y", data column=3, schema column=week
    Source has 69672 data rows, ignored row count 69671

    Like
  • Hi Lokesh Ceeba.  Take a look at this error message:

    line no=1 ;Reason: Conversion failed, error=Can not convert "Period" to date using date format "%m/%d/%y", data column=3, schema column=week

    These messages are full of useful information and worth examining closely.  This particular one tells us:

    1. "line no=1" -- tsload had a problem with line 1
    2. "Can not convert "Period" to date using date format "%m/%d/%y"" -- we are trying to load the word "Period" into a column that expects a date datatype with the format "%m/%d/%y"
    3. "data column=3" -- tsload had a problem with column 3 (so the exact position of your bad data value is line 1, column 3)
    4. "schema column=week" -- tsload tried to load the bad value into a column in your "my_table" table called "week"

    If you'd like to learn more about tsload or get hands on help with specific issues I would recommend requesting office hours. Here is the link https://thoughtspotcs-officehours.youcanbook.me/

    Like 1
  • Hi  Lokesh Ceeba - as Tyler says I've generally found the messages to be helpful as long as you look at the right one - which you are here (I tend to ignore the generic message we often see about file encoding at the bottom of the log file - scroll to the top where the meat is).

     

    When you've got an error on line 1 it typically means one of a couple of things:

    • Some of your formatting is way off for dates - yours looks fine (You've correctly got %y instead of %Y for a 2 digit year).
    • You've got a completely incorrect datatype. Here the message is talking about date conversion, but as we said your data and format mask look good. 
    • You've got something wrong with your header (you can sometimes get control characters at the start of a line). Header looks fine here and typically this would throw and error on the first field name.
    • You've forgotten to tell tsload you've got a header. I suspect this is the one we should look at. When your data has a header you need to specify --has_header_row. Try adding that and see if it helps. 

    I write quite a lot of tsload commands, and I still find it helpful to start with "tsload -h" and work my way down verifying which options I need - it can help avoid problems and remind you what options there are. 

     

    Office hours are a good option if you find you have follow up questions.

    Like
  • Yes I skipped the header because the order of columns was same.

    Like
  • but it doesn't look like you skipped the header. The load is complaining about a word that only appears in the header. 

    Like
  • Even if you provide the header it is the same issue repeats. 

     

    The column period is declared as Date column:

    Market_Display_Name|UPC|Period|POS_USD|Units|ACV

    Total US Food|060933208003|06/29/13|1.000|1.000|0.004

    Total US Drug|200002611714|06/08/13|677.410|79.000|0.435

    No source dsn or file provided. Enter rows below.

    Enter Ctrl-D when done.

    ------------------------------------------------------

    Started processing data row

    E0720 17:02:09.168476 35388 data_importer.cpp:1828] Stopped processing after 69741 lines, Exceeded max allowed errors 0

    E0720 17:02:09.168478 37666 data_importer.cpp:934] Market_Display_Name|UPC|Period|POS_USD|Units|ACV

    , line no=1 ;Reason: Conversion failed, error=Can not convert "Period" to date using date format "%m/%d/%y", data column=3, schema column=Period

    Source has 69741 data rows, ignored row count 69740

    E0720 17:02:09.173365 35388 data_importer.cpp:1909] Extract failed, exceeded max errors 0

     

    Source summary

    --------------

    Data source: standard input

    Source data format delimited

    Header row? no

    Tokenizer Options: escape_char: "" field_separator: "|" enclosing_char: "\"" null_value: "" trailing_field_separator: false

    Date format: %m/%d/%y

    Date time format: %Y%m%d %H:%M:%S

    Flexible mode? No

    Like
  • Lokesh Ceeba Here is the exact load script that I used to create your sample data and load it into ThoughtSpot. I think the key is to make sure you have this flag in your load script: --has_header_row

     

    sample file:

    Market_Display_Name|UPC|Period|POS_USD|Units|ACV

    Total US Food|060933208003|06/29/13|1.000|1.000|0.004

    Total US Drug|200002611714|06/08/13|677.410|79.000|0.435

     

    To create the schema, I used this TQL:

    CREATE DATABASE "test_import";

    USE "test_import";

     

    CREATE TABLE "fact_market" (

      "Market_Display_Name" VARCHAR(0),

      "UPC" VARCHAR(0),

      "Period" DATE,

      "POS_USD" DOUBLE,

      "Units" DOUBLE,

      "ACV" DOUBLE

    );

     

    And to load the file, this is my load script ( load.sh )

    # import test data

    cat sampleinput.csv | tsload --target_database test_import --has_header_row --field_separator "|" --date_format "%m/%d/%y" --source_data_format delimited --empty_target  --bad_records_file bad_records.txt --target_table fact_market 

     

    That loaded successfully and generated this output:

    Statement executed successfully. 

    $ ./load.sh

    No source dsn or file provided. Enter rows below. 

    Enter Ctrl-D when done.

    ------------------------------------------------------

    Header row read successfully

    Source has 2 data rows, has header row, ignored row count 0

    Waiting for rows to commit...(please wait)

     

    Source summary

    --------------

    Data source:                   standard input

    Source data format             delimited

    Header row?                    yes

    Tokenizer Options:             escape_char: "" field_separator: "|" enclosing_char: "\"" null_value: "(null)" trailing_field_separator: false

    Date format:                   %m/%d/%y

    Date time format:              %Y%m%d %H:%M:%S

    Flexible mode?                 no

     

    Load summary

    ------------

    Target table:                  fact_market

    Should empty target?           yes

    Status:                        Successful

    Rows total:                    2

    Rows successfully loaded:      2

    Rows failed to load:           0

    % of Rows successfully loaded: 100.00 %

    Load Rate (MB/s):              0.00 MB/s

    Load Rate (Rows/s):            1.30 Rows/s

    Start time (Wallclock):        Sat Jul 21 13:34:29

    End time (Wallclock):          Sat Jul 21 13:34:31

    Total load time = 1.53 seconds = 0.03 minutes = 0.00 hours

    Data size = 111 bytes = 0.11 KB = 0.00 MB

    Like
  • Dave Cohen ,

         Yes that is correct. I fixed it adding the header row and processed the data feeds at our end.

    On other hand, if we don't mention then I thought it should skip the first header row as long as the order of columns remain consistent. Ok in that case we have to strip the header with a sed script and then process the data.

     

    --

    Lokesh

    Like
  • Lokesh Ceeba Glad you were able to get the data loaded.

    Just to be clear, if you DO HAVE a header row in your data, there is no need to strip it out with a sed script - just include the tsload flag --has_header_row in your load script.

    If your data does NOT have a header row, then you do NOT include the has_header_row flag in your load script.

    Like
  • Dave Cohen

               Is there a command line option to skip rows and proceed forward? So, bad rows can be diagnosed later.

     

    Thanks, 

    Lokesh 

    Like
  • Lokesh,

     

    Let's look back at the tsload help:

     

    [admin@ip-172-31-5-56 ~]$ tsload -h
    tsload is a command line utility for importing data into the Thoughtspot system.
    For a list of commonly used flags type tsload --helpshort
    For a list of all available flags type tsload --helpfull

    Common flags:
      --source_file              input data file. If not provided input is read from stdin
      --target_database          pre-existing target database on thoughtspot system
      --target_table             pre-existing table in target database to load data into
      --source_data_format       format of input, can be csv or delimited
      --empty_target             if specified, any existing data in the target table will be deleted before the data load. If omitted, data is loaded as insert/update
      --max_ignored_rows         number of ignored rows exceeds this limit, load is aborted
      --has_header_row           if source has a header row
      --trailing_field_separator applies to both csv and delimited, if specified, the field separator appears after every field, including the last field per row
      --bad_records_file         if specified, file is cleared (if existing) and records that had errors are written to this file
      --field_separator          applies to both csv and delimited, character that is used to split record into fields e.g., comma
      --escape_character         applies to delimited only, character that is used to escape special characters e.g., backslash
      --enclosing_character      applies to csv only, character that is used to enclose fields e.g., double quotes
      --null_value               string that represents null values in input e.g., empty
      --date_format              string that describes format of date field (specificied in strptime library) e.g., %Y%m%d to represent 20011230
      --date_time_format         string that describes format of date time field (specificied in strptime library) e.g., %Y%m%d %H:%M:%S to represent 20011230 01:15:12
      --boolean_representation   string that represents boolean values in input separated by _. First value represents true and second represents false, e.g., T_F
      --format_file              filepath which describes formats for different columns getting imported
     

    If we specify --max_ignored_rows and --bad_records_file then we can get it to skip errors and save them for us to look at later. For example:

     

    --max_ignored_rows 100 --bad_records_file /tmp/bad_rows.txt

    This help is also available here:

    https://docs.thoughtspot.com/4.5/reference/data-importer-ref.html

    Like
Like Follow
  • Status Answered
  • 1 yr agoLast active
  • 20Replies
  • 1140Views
  • 8 Following