Announcement

Collapse
No announcement yet.

Statistical question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Statistical question

    I have a set of data: about 100,000 positive nonzero integers. For simplicity let's say 10,000 values from each of the last ten years.

    Each year, around 1/3 of the values are the number 50.

    Around 1/3 of the values are between 1 and 49, with lots of 25s and 10s.

    Around 1/3 of the values are greater than 50; large numbers of 100s, quite a few 250s... all the way up a small number of values (maybe 50 per year) are in the 10,000 to 100,000 range.

    Can anybody think of a good way to characterize this data, in terms of an annual value of some sort?

    An average isn't very meaningful, because a few large values has such a huge impact on the average.

    The median value is always 50, so that's meaningless as well.

    -- Eric
    "Not my circus, not my monkeys."

  • #2
    Characterization depends on what the data are, and even more on context.

    Is this order quantity? Payment amount? Students in a classroom? Hospital admissions?

    FWIW, the median value 50 may well be meaningful; but that's application-dependent.

    Then again, in context the RMS value may be meaningful. Or maybe throw out the 'n' high and 'n' low and take an average... or a median... or an RMS. Or maybe its the standard deviation from the average, mean, or median which means something.

    Just because you have a computer and can gather and present large amounts of data doesn't mean you understand it. Or worse, draw some kind of cause and effect relationship from it.

    Generally da numbers is just da numbers.
    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      You might want to consider looking at a plot of the data on an x-y graph where both the x and y axis are logarithmic.

      Comment


      • #4
        Michael --

        They are dollars.

        I should have at least tried RMS before posting, thanks!

        Just because you have a computer and can gather and present large amounts of data doesn't mean you understand it.
        That's the point: I am trying to understand what the numbers are doing.

        Or worse, draw some kind of cause and effect relationship from it.
        Well understood.

        John --

        I have been using graphs, and I need to collapse a couple of dimensions.

        Thanks!

        -- Eric
        "Not my circus, not my monkeys."

        Comment


        • #5
          This was one of the selling points of the very first "Killer App" for PCs: VisiCalc.

          You could present the same data to yourself many different ways until you found one which actually did mean something to you.

          BTW another thought... maybe the dollar amount is not as meaningful as would be a "count per period regardless of amount."

          e.g., in inventory management this number - 'number of occurrences/frequency regardless of quantity' - is called "bin hits" and can be quite useful.

          MCM
          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            Take the inverse of the data (1/x) times the most common value ( 50).
            Plot against the frequency.
            Take the average and standard deviation.
            It may give you a useful metric.
            Regards,
            Dave.

            You're never too old to learn something stupid.

            Comment


            • #7
              It almost sounds as if you want Bollinger Bands, which is just a moving average with -/+ standard deviation over some period. A search of the web will give you a better formula than one just off the top of my head.
              Rod
              In some future era, dark matter and dark energy will only be found in Astronomy's Dark Ages.

              Comment


              • #8
                Originally posted by Michael Mattias View Post
                This was one of the selling points of the very first "Killer App" for PCs: VisiCalc.

                You could present the same data to yourself many different ways until you found one which actually did mean something to you.
                I must have gotten a different version of VisiCalc. My recall is that it was the first spreadsheet. And even today a spreadsheet would have a problem handling 100,000 entries, much less presenting it in many different ways.

                That it was the first "Killer App" is without question, but I don't recall any "PC's" existing when it came out either ('78 or '79?). Lota of "micro"computers though.

                =========================================
                "He who loses money, loses much;
                He, who loses a friend, loses much more;
                He, who loses faith, loses all."
                Eleanor Roosevelt
                =========================================
                It's a pretty day. I hope you enjoy it.

                Gösta

                JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                Comment


                • #9
                  Use a SQLdatabase?

                  Maybe load the list into an SQL database, then play with several SQL queries including group by etc?

                  Comment


                  • #10
                    That it was the first "Killer App" is without question, but I don't recall any "PC's" existing when it came out either ('78 or '79?). Lota of "micro"computers though.
                    I used VisiCalc on a HP85, one of the first PC's ( not an IBM clone) in 1980, as I remember it.
                    Dave.

                    You're never too old to learn something stupid.

                    Comment


                    • #11
                      Ok, Ok:
                      " Personal Computer and PC are (registered?) trademarks of International Business Machines Corporation Armonk NY USA. "

                      Happy now?

                      Maybe you should print this up and make a Xerox... oops...well if you want to cry about it you can always wipe your nose with a Kleenex... oops, well, faggedaboudit and just treat youself to a bowl of Jello... oops.....
                      Michael Mattias
                      Tal Systems (retired)
                      Port Washington WI USA
                      [email protected]
                      http://www.talsystems.com

                      Comment


                      • #12
                        FWIW, the first Visicalc was written for the CP/M operating system.

                        You know, CP/M, where CHR$(26) (Ctrl+Z) = EOF? (which lives to this day in some applications!)
                        Michael Mattias
                        Tal Systems (retired)
                        Port Washington WI USA
                        [email protected]
                        http://www.talsystems.com

                        Comment


                        • #13
                          Originally posted by Michael Mattias View Post
                          FWIW, the first Visicalc was written for the CP/M operating system.
                          Not really. AFAIK, the first version was for Apple ][, coded with a macro assembler.

                          Bye!
                          -- The universe tends toward maximum irony. Don't push it.

                          File Extension Seeker - Metasearch engine for file extensions / file types
                          Online TrID file identifier | TrIDLib - Identify thousands of file formats

                          Comment


                          • #14
                            Originally posted by Michael Mattias View Post
                            FWIW, the first Visicalc was written for the CP/M operating system.
                            Hmmm .... My recollection is different. I thought it was written by two guys on an Apple (maybe Apple used CPM, I don't know), at least that's where I first saw it. I always thought CPM was only used on the 8086(?) chipset.

                            In any case, VisiCalc was the application that finally gave respectability to "micro" computers. Until then they were generally seen only as toys used by hobbyists and pretty much scoffed at by manly computer types (mainframers). The term "PC" was coined when IBM came out with their version of a micro in '81 or '82, well after VisiCalc first appearred.

                            Again, the above is from a (fastly) (vastly?) fading memory.

                            =========================================
                            A good conversationalist
                            is not one who remembers what was said,
                            but says what someone wants to remember.
                            John Mason Brown
                            =========================================
                            It's a pretty day. I hope you enjoy it.

                            Gösta

                            JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                            LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                            Comment


                            • #15
                              If you are comparing the years then the standard deviation and skewness might give some indication of what you are trying to dettermine

                              Comment


                              • #16
                                Originally posted by Gösta H. Lovgren-2 View Post
                                Hmmm .... My recollection is different. I thought it was written by two guys on an Apple (maybe Apple used CPM, I don't know), at least that's where I first saw it. I always thought CPM was only used on the 8086(?) chipset.
                                CP/M saw the light on the Intel 8080 famly, and then on the Zilog Z80. Those were of course 8 bit CPUs.
                                Since the Apple ][ run on a 6502 (and alike), CP/M usage eventually implied the installation of additional hardware (like a Z80 on a card style).

                                16bit versions, like the CP/M-86 for the x86 / PC, come only later.

                                Bye!
                                -- The universe tends toward maximum irony. Don't push it.

                                File Extension Seeker - Metasearch engine for file extensions / file types
                                Online TrID file identifier | TrIDLib - Identify thousands of file formats

                                Comment


                                • #17
                                  You don't need a memory with the internet....

                                  Visicalc's author's site has lots of stuff. (and is an extremely well-designed site if I do say so myself)

                                  http://www.bricklin.com/visicalc.htm

                                  Appears it was for....
                                  VisiCalc was coded in assembler, first for the MOS Technology 6502 microprocessor used in the Apple ][.
                                  A scanned image of the first advertisement also clearly (well, maybe not all that clearly) says, "A visible calculator for the Apple II"

                                  My bad.
                                  Michael Mattias
                                  Tal Systems (retired)
                                  Port Washington WI USA
                                  [email protected]
                                  http://www.talsystems.com

                                  Comment


                                  • #18
                                    No one number will sum everything up, but it sounds like you just need a histogram- the number of values in a range.

                                    Comment


                                    • #19
                                      If you want to do very powerful statistical programming, check the statistical programming language R (www.r-project.org). It has a steep learning curve, but is quite powerful once you get the hang of it. Just a thought.
                                      RTM

                                      Comment


                                      • #20
                                        Originally posted by Michael Mattias View Post
                                        " Personal Computer and PC are (registered?) trademarks of International Business Machines Corporation Armonk NY USA. "
                                        I wonder when they registered "Personal Computer". Dr Dobbs has references to "Personal Computing" in 1976 and "Personal Computer" in 1977.

                                        Comment

                                        Working...
                                        X