Announcement

Collapse
No announcement yet.

Distinct + Count via Excel ODBC driver

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Distinct + Count via Excel ODBC driver

    Many years ago, I had created hundreds of SQL statements as stored procedures on a mainframe, which were called by VB code on a PC... a great project! So, although I'm a bit rusty, I have a good grounding in SQL. Now, I am beginning to work with my copy of SQL TOOLS with PBWin10 and PBCC6, and I've hit a snag...

    For testing, I'm using an edited version of the sample SQLT3_DUMP.BAS No problems creating output files containing the results of my queries. The DSN file is fine.

    My data source is an .XLS that contains one sheet, HISTORY. That sheet (table) contains about 25 columns (fields) and 5041 rows (records).

    The records describe 5 different types of transactions (Debt, Order, Service Fee, Shipping Services, and Transfer), as found with this query:
    Code:
       SQL_Stmt    %IMMEDIATE, "SELECT distinct type FROM `HISTORY$` "
    My interest is with records of the type Order -- I want to do such things as count them, aggregate their values, etc. Normal stuff...

    An interesting thing is that for an Order, there can be multiple records, one for each Item/SKU in the order.
    So when an order contains three different SKUs, we see three records having the same order id.

    GOAL: I want a simple count of the number of distinct orders that we have received.

    Query 1:
    Code:
       SQL_Stmt    %IMMEDIATE, "SELECT count(total) FROM `HISTORY$` where type = 'Order' "
    returns 3078, the number of records having the type Order, BUT the count includes each SKU in a multi-item order,
    whereas I want to count each order only once.
    (This does the count on the total field, but for this purpose it could have been any field.)

    Query 2:
    Code:
       SQL_Stmt    %IMMEDIATE, "SELECT distinct(`order id`) FROM `HISTORY$` where type = 'Order' "
    returns a list of the unique (distinct) order numbers.
    This is actually excellent data, except that I don't want the actual order numbers, but just a count of them.

    When I examine the results file using an external text editor, I see we have 2658 lines (unique order ids).
    So that's the number I expect I should be able to obtain using a properly constructed SQL statement...

    Query 3:
    Code:
       SQL_Stmt    %IMMEDIATE, "SELECT count(distinct `order id`) FROM `HISTORY$` where type = 'Order' "
    does not execute; generates a syntax error.

    I've searched and read more than 20 articles on SQL/Excel syntax, and it seems that that syntax should work!

    I've tried variations such as:
    Code:
       SQL_Stmt    %IMMEDIATE, "SELECT distinct(`order id`), count(*)  FROM `HISTORY$` where type = 'Order' "
    but they have produced only errors.

    After so many attempts and failures, I'm wondering if I need to be doing something totally different, like an inner join
    (now I'm going to have to go back and study, as this area is nearly blank in my memory...)
    or a better WHERE clause... but at the moment, if one of these is the answer, I'm not seeing it.
    (Can DISTINCT be used in a WHERE clause? It just hit me, so I'll go try it...)


    Has anyone else been able to combine DISTINCT and COUNT() using the Excel ODBC driver?
    Or, can you see an error in my syntax, logic, or anything else?

    I appreciate any and all constructive thoughts!

    Thanks,
    -John

    Last edited by John Montenigro; 3 Feb 2019, 11:22 AM. Reason: fixed a coupole of tpyos

  • #2
    To my knowledge Count(distinc....) is not supported by the Excel ODBC - What did SQL Tools error return to you?

    Try

    SELECT COUNT([ORDER_ID]) AS DistinctCount
    FROM
    (
    SELECT DISTINCT [ORDER_ID]
    FROM [HISTORY$]
    ) Sq1
    Last edited by Carlo Pagani; 3 Feb 2019, 01:19 PM. Reason: Added some code

    Comment


    • #3
      Some drivers alllow Count(*), others require a specific column such as Count(Order_ID)

      But there are errors/misconceptions in your examples.


      Note:
      You say your sheet is called History. Why are you using "History$" ?
      Why the single quotes around the table name?
      DISTINCT is a qualifier of SELECT, you can't use it inside an aggreate function such as Count(), nor pass it a parameter
      Square brackets to avoid potential problems with use of possible reserved word "Type" and character "$" in names. (Bad idea)


      Carlo's code should work.

      Alterenatively, try GROUP BY

      Comment


      • #4
        @Carlo:
        When I try it with:
        Code:
        "SELECT count( distinct  `order id` ) FROM `HISTORY$` where type = 'Order' "
        I get this error messagebox:
        ---------------------------
        SQL_ErrorText
        ---------------------------
        [Microsoft][ODBC Excel Driver] Syntax error (missing operator) in query expression 'count( distinct `order id` )'.

        lCount = 1
        ---------------------------
        OK
        ---------------------------
        Trying this code as you suggested
        Code:
           SQL_Stmt    %IMMEDIATE, "SELECT COUNT([ORDER_ID]) AS DistinctCount FROM (SELECT DISTINCT [ORDER_ID] FROM [HISTORY$] ) "
        results in this message:
        ---------------------------
        SQL_ErrorText
        ---------------------------
        [Microsoft][ODBC Excel Driver] Too few parameters. Expected 1.

        lCount = 1
        ---------------------------
        OK
        ---------------------------

        However, using this:
        Code:
           SQL_Stmt    %IMMEDIATE, "SELECT COUNT( `order id` ) AS DistinctCount FROM (SELECT DISTINCT `order id` FROM `HISTORY$` ) "
        and the results file contains: "2659"
        (I didn't understand the trailing SQ1 and left it off... seems ok?)
        The only problem with this is that the count seems off by 1.

        FYI, I followed the guidance in SQL Tools Appendix M, regarding the use of delimiters around identifiers and literals. Seems Excel can unpredictably handle square brackets and backwards single quotes differently... So I'm going by SQL Tools' recommendations.


        I really appreciate your lending a helping hand!

        Thanks!
        -John


        Comment


        • #5
          Originally posted by Stuart McLachlan View Post
          Some drivers alllow Count(*), others require a specific column such as Count(Order_ID)

          But there are errors/misconceptions in your examples.
          No doubt! Last SQL I wrote was in 1996!
          I will continue to experiment with the Count() variations.
          I wish there were a Help File for the Excel ODBC syntax! That would make a lot of this less experimental.


          You say your sheet is called History. Why are you using "History$" ?
          The data comes from Amazon. From their perspective, this data is our History with them. I could just rename the sheet, but why?
          Also, the trailing $ is necessary, as that's the way the Excel ODBC driver sees table names.

          Why the single quotes around the table name?
          The single backquotes replace the more common square brackets, as explained in SQL Tools Appendix M.

          DISTINCT is a qualifier of SELECT, you can't use it inside an aggreate function such as Count(), nor pass it a parameter
          Yes, I'm re-learning SQL nuances after 22 years of disuse.


          Square brackets to avoid potential problems with use of possible reserved word "Type" and character "$" in names. (Bad idea)
          While I've seen a lot online for SQL Server that use these conventions, I have to defer to Appendix M for the final say about such things for SQL Tools.


          Carlo's code should work.
          Yes, it was only off by 1, so I suspect a boundary issue. I'll continue to study it...


          Alterenatively, try GROUP BY
          I have to re-learn GROUP BY because, although I did try it, I'm pretty sure my syntax wasn't perfect...

          Thank you for your help! I appreciate your time and questions!

          -John


          Comment


          • #6
            > SELECT count( distinct `order id` ) FROM `HISTORY$` where type = 'Order'
            >"Syntax error (missing operator) in query expression 'count( distinct `order id` )'."

            DISTINCT is a qualifier of SELECT. You can't put it inside Count().


            > "SELECT COUNT([ORDER_ID]) AS DistinctCount FROM (SELECT DISTINCT [ORDER_ID] FROM [HISTORY$] )
            >"Too few parameters. Expected 1"

            The table name is HISTORY, not HISTORY$ it's not finding a known table (parameter)

            >(I didn't understand the trailing SQ1 and left it off

            It wasn't necessary in the example, but if you want to use the result as a recordset, then SQL1 is its alias.

            >The only problem with this is that the count seems off by 1

            Possibly a Null Order_ID aggregated from other "Type" records.

            Try: SELECT COUNT( order id) AS DistinctCount FROM (SELECT DISTINCT order _id FROM History WHERE [type] = 'Order')

            Comment


            • #7
              I understand what you mean that DISTINCT is a modifier of SELECT, but I have seen it in other documentation as a qualifier in COUNT()
              Code:
              COUNT(DISTINCT expr,[expr...])
              I have not been able to find good help info on Excel ODBC syntax. Instead, the internet is abundant in other SQL variations, and I've seen a lot of COUNT() explanations that show DISTINCT inside the parens. I have to test everything I see in order to validate it...


              Table name ends with $:
              From SQL Tools help, Appendix M (Excel):
              When accessed through SQL Tools, Table Names will usually have a $ suffix.
              SQ1 as alias:
              Good to know. I'll have to read the SQL Tools documentation to see if that's supported.

              Off by 1:
              yes, by adding the WHERE clause, the number came back correct. There must be a record with an 'order id' but with a type that's not 'Order'.


              What SQL toolset do you use from within your PB code? Or, is this only about the syntax supported by the Excel ODBC driver?
              -John

              Comment


              • #8
                Originally posted by John Montenigro View Post
                I understand what you mean that DISTINCT is a modifier of SELECT, but I have seen it in other documentation as a qualifier in COUNT()
                Code:
                COUNT(DISTINCT expr,[expr...])
                I stand corrected. I've never used it with an aggregate function. Thanks for that - I'm going to add it to my repetoire


                Comment


                • #9
                  Originally posted by John Montenigro View Post

                  Table name ends with $:
                  From SQL Tools help, Appendix M (Excel):
                  I must admit, I've never really worked with SQLTools and Excel so haven't read Appendix M (I'll put it on my Todo list). I primarily work with Access, SQL Server, SQLite and mySQL databases..

                  Comment


                  • #10
                    Originally posted by John Montenigro View Post

                    Off by 1:
                    yes, by adding the WHERE clause, the number came back correct. There must be a record with an 'order id' but with a type that's not 'Order'.
                    Not necessarily, you were probably picking up and aggregating all the NUL "order_id"s from "non-orders". If there were a record with an order_id but not of type "Order", I'd expect you to be out by 2 (unless it was a duplicate of a real order_id).


                    What SQL toolset do you use from within your PB code? Or, is this only about the syntax supported by the Excel ODBC driver?
                    -John
                    I use SQLTools. Effectively, I just write the same SQL as I do when I work natively with the various RDBMSs, in PHP, or when using tools like SQLite Studio and DBeaver.
                    The whole point of ODBC is that it shouldn't matter what the database is. That's why it's called "Open Database
                    Connectivity".

                    Unfortunately Excel is not a database.(I can't repeat that often enough!) and so the ODBC driver has to use various kludges to appproximate the functionality of one.

                    Comment


                    • #11
                      Originally posted by Stuart McLachlan View Post
                      I stand corrected. I've never used it with an aggregate function. Thanks for that - I'm going to add it to my repetoire
                      Well, it does exist, but I haven't gotten it to work yet...

                      My guess: I'm doing something dumb, or the EXCEL ODBC driver does not support it. Not sure which. Either is quite likely!

                      -John

                      Comment


                      • #12
                        Off by 1:

                        With the addition of the WHERE clause,
                        Code:
                           " SELECT COUNT( `order id` ) AS DistinctCount FROM (SELECT DISTINCT `order id` FROM `HISTORY$` WHERE type = `Order` )  "
                        the count comes back correct.

                        Progress!

                        Never underestimate the determination of a Boomer!
                        -John

                        Comment


                        • #13
                          John - Which is exactly the code I posted in #2 so I am now confused why you went down a rabit hole
                          Edit - Ok, I read the thread from top and see you've deduced a few things. Trial and (t)error is what makes a great developer!

                          Comment


                          • #14
                            Hey Carlo,

                            I do a LOT of "trial and error", I enjoy the learning, and I definitely get a lot of satisfaction from solving business information problems, but I don't think I'll ever be a great developer.
                            I can only hope that someday I'll be a little better than an "OK" developer.
                            I just don't have a broad enough or deep enough understanding of Windows to ever be a "great" developer.

                            And I am in awe of the professionals on the PB forums, and am always grateful for their generosity in sharing and guiding those of us who are struggling.
                            I know full well that my accomplishments only happen because I get to "perch on the shoulders of giants".

                            -John

                            Comment

                            Working...
                            X