Announcement

Collapse
No announcement yet.

Financial time-series data - Advice sought

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Financial time-series data - Advice sought

    I scratch a living trading stock index futures on my own account. Over the years I have tried just about every piece of serious software/system available to the small professional retail trader. All have issues which render them less than optimally useful for the sort of trading that I undertake. My biggest and perennially ongoing issues concern the collection, cleaning, storage and post-collection processing/manipulation of large 'raw tick-data' files. I currently collect data from 2 separate sources and have giga-bytes of of the stuff. My problem is getting it into various forms that can be used by several of the more useful power/AI/GA analysis packages that I use.

    Briefly, it is a doddle to get standard timed Open/High/Low/Close/Volume/Trades data; most 'trading software' processes real-time raw data into that form as standard. What is NOT such a doddle is to construct Volume/#Trades/Range bars which include both #trades and volume at Bid and Ask and correlated values for the content of the order book either side of each last traded price during the construction of said bars. I have finally decided that the only way I am going to get what I want from this data is to programming it myself.

    So what are we talking about? Well it sounds simple enough (doesn't it always?). And, to the extent that the bulk of what I want to do is simply a matter of parsing flat text files, it probably is to an experienced programmer. Problem is my programming experience is limited to PAL (Paradox Application Language) - for DOS!! which, although extensive, ended over 15 years ago. I have just about settled on Power Basic because it is used by the authors of some of my more useful analysis packages to program various add-ons and indicators.

    If any of this rings bells with anyone I'd be grateful for observations, questions and/or advice to a rank beginner before taking the plunge.

    Peter Presland

  • #2
    You describe a quasi-classic case for engaging an outside contractor: you know what data you can get, from where; you know how you want to retrieve and display it; you don't know how to get from A to B.

    There are lots of contractors who can go the other way.... they can manipulate data nine ways from Sunday and display it technicolor wide-screen 3-D (with or without special eyeglasses) and make it sing and dance... but they have no idea what data are useful.

    That is, you have the "what" and can hire the "how."

    Disclaimer: I make a living doing this kind of thing.

    (Not this particular type of thing; I would be neither interested nor the correct choice for this application).
    Michael Mattias
    Tal Systems (retired)
    Port Washington WI USA
    [email protected]
    http://www.talsystems.com

    Comment


    • #3
      Thanks for the reply Michael.

      I guess I could spec out a project (3 urgent ones actually) and farm it out. Problem is experience tells me that producing a watertight, unambiguous spec and monitoring the project through to completion is itself a time consuming process. Writing this reminds me of a 3 picture cartoon series from years ago (probably still around somewhere): What the programmer produced; what the analyst specified; what the customer wanted - or thought he had ordered.

      Thing is my work-a-day routine involves hours of boredom awaiting various alarms and triggers. I generally fill it with historical analysis/backtesting and extensive current affairs reading; the former compromised by the problems already outlined; the latter - well - I do too much of it. So I figured I'd finally have a crack at something more useful. Also, in my experience, projects such as described are never-ending and the ability to tweak things oneself has a lot to commend it.

      I realise there was a lot of gobbledegook in my opening post but if there is anyone into trading on the forum it likely won't be gobbledegook. So I thought I'd post away and see what happens.

      Peter Presland

      Peter Presland

      Comment


      • #4
        Parsing Data

        Hi Peter;

        If your primary need is to parse the data you receive so that it can be used by one or more of your existing financial programs. PowerBASIC provides excellent parsing functions, the example below, from the PowerBASIC User's Manual, shows how delimited data can be easily parsed into a data array and then sorted:

        Example
        a$ = "Trevor, Bob, Bruce, Dan, Simon, Jenny"

        DIM b$(1 TO PARSECOUNT(a$))
        PARSE a$, b$()
        ARRAY SORT b$()

        Result
        b$(1) = "Bob"
        b$(2) = "Bruce"
        b$(3) = "Dan"
        b$(4) = "Jenny"
        b$(5) = "Simon"
        b$(6) = "Trevor"

        If your need is for an integrated program that "does it all", then Michael's suggestion to hire a contractor to write the program is probably your best bet.

        FWIW I've found PowerBASIC to be:
        1. Easy to learn
        2. Powerful
        3. 99.44% "Bug" free

        Comment


        • #5
          Given the starting point "scratching a living from..." and adding "outside contractor paid to do..." could equal "clinging by fingernails to life at edge of abyss."

          The data manipulation/reformatting aspect would definitely seem to be doable in short order--I'm thinking the Console Compiler for it. Graphs and bar charts? Well that too can be done by the industrious person.

          Were problems to arise, a quick post here and there'd be a bunch of useful replies maybe in minutes.

          Comment


          • #6
            You didn't give a time line for when you want this up and running.
            If you want it asap, the PBCC is the way to go. It's easier to learn to get started and can do the job.
            Expect to spend a large amount of time learning what to do at first, the learning curve being what it is. At least you don't have to learn the financial terms.
            Having a project of this nature will make learning easier because of your understanding of the markets and what you want to accomplish, and you won't get sidetracked as often.
            PBWin will also do the job for you, slightly longer learning curve though.
            Which ever compiler you buy, don't do what I did. I didn't read the manual first, from cover to cover. Once I read the manual, things seemed to fall into place a lot better.
            Good luck.
            Rod
            I want not 'not', not Knot, not Knott, not Nott, not knot, not naught, not nought, but aught.

            Comment


            • #7
              Given the starting point "scratching a living from..." and adding "outside contractor paid to do..." could equal "clinging by fingernails to life at edge of abyss
              Your glass is half-empty, sir.

              One might just as well describe the current situation as, " making do whilst laying the foundation for a major entrepreneurial success."

              MCM
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                Staring into the abyss is what inspires great minds

                (although I would not recommend staring at the sun either, but hey it could spark an idea under the right circumstances)
                Engineer's Motto: If it aint broke take it apart and fix it

                "If at 1st you don't succeed... call it version 1.0"

                "Half of Programming is coding"....."The other 90% is DEBUGGING"

                "Document my code????" .... "WHYYY??? do you think they call it CODE? "

                Comment


                • #9
                  Peter,

                  I played in the day trading S&P futures sandbox for several years (back when it was in the 200's), doing my own programming and charting (not PB though). Gave it up primarily because of significant losses (what else?).

                  Either PB (PBCC -think dos screens without any graphics, or PBWin - which will produce as sophisticated charts as you can imagine) would do an excellent job. If you want charting though, you'll have to go with PBWin but the learning curve for the graphics required is probably going to be pretty steep initially.

                  If you just want to compare data sets and stuff like that, then you should have little trouble picking it up as PB have some very fine tools for reading data files and very very fast too.

                  I would suggest PBWin as it has all the power of PBCC with the potential of getting into charting (graphics) as you become comfortable with handling the data sets (not really difficult at all, piece of cake really with PB).

                  I wouldn't mind getting back playing in the futures water but haven't run across a real time data feed that I could access with PB. If I had had PBWin back when, I just might have pulled ahead a little. {grin}

                  =============================
                  "My people are destroyed
                  for lack of knowledge.....!"
                  Hosea 46
                  =============================
                  It's a pretty day. I hope you enjoy it.

                  Gösta

                  JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                  LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                  Comment


                  • #10
                    I have done this sort of thing with a Excel Add-in.

                    The Add-in gets the data, and I use Excel to do the number and chart stuff.

                    I would be interested in doing this is PBWin.

                    Blair,

                    Comment


                    • #11
                      Thanks to all

                      Thanks to all for replies on this - very useful.

                      A few things:

                      I am not interested in charting or any fancy all-singing/dancing system. All I need is a user interface to set up and initiate jobs. Each job to involve specifying, checking and reading an input file and performing per-record accumulations/calculations etc., whilst writing new records to an output file. Each output file is likely to contain between 1-30% of the number of input records. In other words, each output record is some kind of aggregation of multiple input file records. I also want to do things like writing a sub-set of already processed files containing records between specified times which sounds straightforward enough but involves all the hassle of querying and manipulating a single string containing both date and time (ie a simple >= or <= on the entire string is no use because it selects primarily by date).

                      Gosta - commiserations on the losses. Believe me, I know EXACTLY how it feels. When I first started it took me 4 years to hit break-even from astronomical first year losses which came close to forcing an exit for me too. With hindsight I now see it as the cost of my apprenticeship. In fact I doubt anyone gets to make a living at it (that's own-account retail trading) without going through a similar, near soul-destroying experience.

                      Peter Presland

                      Comment


                      • #12
                        Originally posted by Peter Presland View Post
                        ...
                        Gosta - commiserations on the losses. Believe me, I know EXACTLY how it feels. When I first started it took me 4 years to hit break-even from astronomical first year losses which came close to forcing an exit for me too. With hindsight I now see it as the cost of my apprenticeship. In fact I doubt anyone gets to make a living at it (that's own-account retail trading) without going through a similar, near soul-destroying experience.
                        The biggest drawback when I was intra day trading futures (15-20 years ago) was human fills. It might take 10 minutes to get confirmation of a fill, during which time the market may have moved significantly. And a very real problem with hf's is the trader is handling multiple clients at the same time and may report back to the office a range of fills for a bulk order.

                        The office then assigned the fills to clients (among whom the office itself may be included). Preferred clients got better fills (Exactly the mechanism used to bribe (excuse me, "influence") the Clintons when he was governor of Arkansas. Hillary had a trading account with a firm that did business with the state. The account was managed for her and at the end of the day (when orders were were sorted out), she got assigned the best fills, so profits were guaranteed.)

                        With intra day trading (where positions may only be held a matter of minutes) rapid fill prices are critical. In today's environment where orders can be submitted and filled electronically, I think small quick profits could be chipped away pretty consistently but access to real time pricing and immediate fills would be critical.
                        It's a pretty day. I hope you enjoy it.

                        Gösta

                        JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
                        LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

                        Comment


                        • #13
                          Originally posted by Gösta H. Lovgren-2 View Post
                          With intra day trading (where positions may only be held a matter of minutes) rapid fill prices are critical. In today's environment where orders can be submitted and filled electronically, I think small quick profits could be chipped away pretty consistently but access to real time pricing and immediate fills would be critical.
                          Gosta

                          That pretty much describes my modus operandi. I had two trades today where my holding time was precisely one second! Average ping time to my executing broker is about 100ms. If I pre-park orders outside the current market and keep them out, I can get an execute-to-fill acknowledgement when I hit the 'fill-at current market' button in about 180ms. The biggest problem with this type of trading is a variation of the Heisenberg uncertainty principle - multiplied by some enormous factor. Why? because, even if you are trading just one contract, what you do can and most certainly does affect the behaviour of others (butterfly wings in India/storms in the Caribbean and all that) Depending upon overall liquidity (itself a function of traded symbol and time of day), the big exchange-member market makers can see exactly what you are doing if you place target and stop orders following a fill - and they react (ie do their level best to screw you) accordingly. Depending upon their current OI/risk exposure, they generally have pockets deep enough to move the market sufficient to take out close, newly-placed stops and back again, no sweat. My solution is to keep my real stops parked on my execution platform and only send them when the price is hit. Problem is, that substantially more than doubles execution time such that price can run past a simple limit order and wipe your day out. Also you can lose your connection which can be disastrous. Together with the spread and execution fees you just have to recognise that your average trade loss will be maybe double your average win (ie you need to have a 70% + hit rate just to keep your head above water) and do your homework accordingly.

                          Anyway, I've bought the windows compiler and will no doubt be posting all kinds of ridiculous requests for help in the near future.

                          Also probably best to move any further discussion about trading per se to the cafe eh?

                          Peter Presland

                          Comment


                          • #14
                            Further to my original post and post 11 in this thread:

                            I've aquired the Windows Compiler and done some homework. My original intention was to read my data in its 'sequential file' .csv format, perform various calcs/manipulations on the resulting 2 dimensional array and write the results using append to a 2nd sequential file; but! - BIG BIG problem:

                            My raw data is in 3 month chunks (The length of a typical 'front month' futures contract). One of the larger of the contracts I am interested in typically contains around 25 million records occupying @1.2Gb in CSV format. I can import it into MS Access OK (takes nearly 30 minutes) and a file of @2 Gb. results. Clearly it is impractical to expect to manipulate such large files entirely in memory when (as I understand it) there is a 2 GB PB limit anyway. It may (or may not?) be practical to write the output as a sequential file because it will typically have only about 2-5% the number of records and be around 10-25% the total size of the input file.

                            There are no relational type facilities needed for what I want to do (see post #11) but I am obviously going to have to use some form of flat file database engine so that the raw data can be stepped through from beginning to end without the need to hold the whole lot in memory.

                            Any suggestions as to a suitable piece of flat file database software to use?

                            Peter Presland

                            Comment


                            • #15
                              Originally posted by Peter Presland View Post
                              Any suggestions as to a suitable piece of flat file database software to use?
                              Why flat-file when you could use SQLite for exactly the same cost (£0), and probably less complexity. I'm guessing about complexity because I don't use flat files any more. The result can be migrated to any SQL database. There are very good SQLite resources for PB.

                              If you have PB Windows, I suggest that you add PB Forms too, you will be able to save quite a lot of time if you are proptotyping. I think it is worth its price just as a tutorial aid.

                              Comment

                              Working...
                              X