Announcement

Collapse
No announcement yet.

Intelligently determining a date? Suggestions?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Intelligently determining a date? Suggestions?

    Hello. Suppose in field ObitTxt$ I have something like:

    John William Doe, of 2004 North 11 Street, died yesterday June 22, 2004. He was a member of the Walnut Baptist Church. July 8th 1967, he was married to Kathy Wilson. (and so on.)

    I see several ways to have dates. They can be in format of:
    MMM Dd, YYYY
    MM-DD-YYYY
    YYYY-MMM-DD
    YYYY-MM-DD
    and so on.

    Does anyone know of a tested method for extracting the date no matter the format? My first thought was to simply look for a 4-digit number and extract that as the date, but that gets foiled when the obit includes a home address. Plus, when more than one date is in the obit, it could be confusing.

    Any ideas? I am hoping someone else has run into this one before.

    Thanks.

    Robert

    ------------------

  • #2
    I don't think something like that is feasable. I'd choose a standard
    date format (July 4, 1776) and stick with it especially for an obit.

    Or are you trying to read in data from another source?


    ------------------
    There are no atheists in a fox hole or the morning of a math test.
    If my flag offends you, I'll help you pack.

    Comment


    • #3
      No. This is all for obits emailed to me that *either* I or another history researcher suspects is related to my family. I have 100s now, and thought well, I will just export them all as text, and let this program auto-extract the dates, names, places, etc., and write it to a CVS so my genealogy program will import it. This may sound easier than actually done. I will try your suggestion for a while. Thank you.

      Robert

      ------------------

      Comment


      • #4
        This is a quite common task. The only solution for it is regular expressions, something that PowerBASIC has no ready facilities for.

        Of course there's also the issue of pulling out the dates in a meaningful way (is it a death date, a birth date, or what?) which is the realm of context analysis and artificial intelligence.


        ------------------


        [This message has been edited by Michael Torrie (edited June 11, 2006).]

        Comment


        • #5
          -Emphasis-
          Of course there's also the issue of pulling out the dates in a meaningful way (is it a death date, a birth date, or what?) which is the realm of context analysis and artificial intelligence.
          -End Emphasis-

          I agree. I would -hope- that given two dates, the earlier one could be counted on to be the date of birth. This is not happen often, but it would 'confuse' the program if the newspaper (or funeral home submitter) made a mistake in the dates and said someone was born June 18, 1836 and died May 25, 2005. I haven't decided how to handle that one. I may have no choice but to require human intervention, I -could- question the dates when the age is > then X (say greater than 106, for example), and ask for human help.

          When there are just two dates (birth & death), it is usually safe to assume the earlier one is the birth. I have on my desk an obit that has 27 dates- and I am still wrestling with how to handle that one! So far, in five tests, my PB routine correctly pulls out the birth date and death date in 7 out of 9 tests. Not bad, I might have to work on improving this all the time.

          Thanks, everyone!

          Robert

          ------------------


          [This message has been edited by Robert E. Carneal (edited June 11, 2006).]

          Comment


          • #6
            Michael,
            My initial approach would be something like this:

            Search for the months, there are only 12 of them so that should be easy (although you maybe ought to accept abbreviations and common spelling mistakes too).
            Take 2 "words" either side of the month (i.e. space/end-of-line delimited blocks of characters)


            "John William Doe, of 2004 North 11 Street, died yesterday June 22, 2004. He was a member of the Walnut he was married to Kathy Wilson. (and so on.)"

            would give:
            died yesterday June 22, 2004.
            and
            Baptist Church. July 8th 1967,


            Strip non-numbers from the words. to give:
            <blank><blank> July 8 1967
            and
            <blank><blank> June 22 2004

            Check the values of those words.

            Ignore blanks, values > 1000 will be years, value < 32 will be day of the month if neither number is > 1000 then the last will be the last 2 digits of the year in the current century.


            Let's try a couple of other examples:
            23rd April, 1564 (Shakespeare's birthday)

            becomes <blank> 23 April 1564 <blank>
            1564 is > 1000 so it's the year, 23 is the day in April.

            Today: 12th June '06
            becomes <blank> 12 June 06 <blank>
            neither value is >1000 so the last is the year of this century, 2000 + 6 = 2006, and the other is the day of the month.

            Of course this will not do too well if you look at Shakespeares birth cirtificate which will probably say
            Born, on the twenty third day of April in the year of out Lord, one thousand five hundred and sixty four" but for that sort of format it's probably better to refer to a human.

            Paul.

            ------------------

            Comment


            • #7
              Not really an answer but..

              If you are going to bother writing an obituary at all, maybe a little 'human intervention' is not such a bad thing.

              Michael Mattias
              Tal Systems Inc. (retired)
              Racine WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                Michael -

                I'm sorry to be the bearer of bad tidings, but at some level of generality you're going to run into a real problem.

                Americans do dates as MM/DD/YY. Europeans do it as DD/MM/YY.

                Best of luck.

                ------------------

                Comment


                • #9
                  Originally posted by Jim Martin:
                  Americans do dates as MM/DD/YY. Europeans do it as DD/MM/YY.
                  And that's not all! Relative dates were much used in commerce and in legal documents.
                  At school I was taught the abbreviations inst, ult, and prox for this month, last month, and next month, and title deeds (in UK anyway) will contain dates expressed as "the twelfth date of July in the reign of his late Majesty .." and so on.

                  Welcome to the world of data cleaning.

                  Good luck indeed!



                  ------------------

                  Comment


                  • #10
                    That goes to show I had not thought this completely through on International terms! Well, I could argue that since it is going to be used on American newspaper obituaries, that does not apply. Actually, my 'target' usage is newspapers in this country from 1938 to present.

                    I am finding all the dates in the obit now (at least in my test obits), but where I am messing up are funeral home data dates where the obit says something like:

                    Monday, October 13, 2003 the visitng hours are 5:00pm to 8:00pm. October 14, 2003 the visiting hours are 2:00pm till 8:00pm. October 14, 2003 8:15PM will be prayers. (And so on.)

                    I might have to write this for just a few obits, and gradually add more to get the rest of the data as I come to new data. That might be easier.

                    Gosh, when I did the flowchart on pencil & paper, this looked like a simple project!! I am going to scale it way down!

                    Thank you everyone.

                    Robert

                    ------------------


                    [This message has been edited by Robert E. Carneal (edited June 12, 2006).]

                    Comment


                    • #11
                      Bob,

                      Welcome to the wonderful world of user data input. What you're
                      attempting to do is close enough to impossible that what's left
                      over is more code (and more chances for bugs) than just doing it
                      manually is prefered.

                      The only solution is to regulate the incoming data. If you're
                      getting the data from a hand written (typed) form then put some
                      fields on the form and you may get about 90% of the incoming data
                      correctly. But even if you write an input routine that takes the
                      data in 3 seperate fields you're still going to get some bad input
                      as there are 12 months in the year and all months have more than
                      12 days and the year is now '06 so unless you force a 4 digit year
                      you'll have 3 fields that can, in a lot of cases, be interchangeable
                      with any of the other 2.

                      Then, agian, you could require only intellegent, computer savy
                      individuals fill out the data but, even then, you'll only get about
                      90% accuracy.

                      Ain't programming fun??


                      ------------------
                      C'ya
                      Don
                      don at DASoftVSS dot com
                      http://www.DASoftVSS.com
                      C'ya
                      Don

                      http://www.ImagesBy.me

                      Comment


                      • #12
                        Originally posted by Don Schullian:
                        Then, agian, you could require only intellegent, computer savy
                        individuals fill out the data but, even then, you'll only get about
                        90% accuracy.

                        Ain't programming fun??
                        If it weren't fun, I wouldn't do it. The only person using this program would be me.

                        I think I am going to give up on trying to go the Automation route and write a form that will ask me a series of questions that I can fill in. And somewhat check the validity of my typing, so that it won't accept June 45 1953 as a date. (Yes, I do have that date for a death for an ancestor!) I could write a form to fill in the fields and it would (I hope) remind me to enter everything. Personally, for typing speed, I like to enter a date such as today (June 13, 2006) as 06132006, and let the program properly format it. And also check the spelling of locations in America, so I don't enter Gun Barrel, Tennessee when I should have entered Gun Barrel, Texas. That part is not hard to do, just compare the location to a list of properly spelled locations and if found, accept it.

                        Well, thank you everyone. I now plan to give up on this project and attack from a different direction- namely a program that asks me all the questions in effort to save the data I want to save.

                        Thanks.

                        Robert

                        [This message has been edited by Robert E. Carneal (edited June 13, 2006).]

                        Comment


                        • #13
                          What about date-to-integer conversion? For instance: <U>Gregorian Day Numbers</U>.
                          See: http://www.egbertzijlema.nl/programming.html

                          ------------------
                          Egbert Zijlema, journalist and programmer (egbert at egbertzijlema dot nl)
                          http://www.egbertzijlema.nl/programming.html
                          *** Opinions expressed here are not necessarily untrue ***

                          Egbert Zijlema, journalist and programmer (zijlema at basicguru dot eu)
                          http://zijlema.basicguru.eu
                          *** Opinions expressed here are not necessarily untrue ***

                          Comment

                          Working...
                          X