Announcement

Collapse
No announcement yet.

Names and more Names

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Names and more Names

    With feedback from clients, I have concluded that when it comes to handling persons and organizations in databases, folks like to have a one-line free form column when editing names and they want this to display/print exactly as entered. This complicates searches a bit as it now becomes necessary to split apart those names into discrete fields/columns to be able to use things like SQL "Like" and to distinguish common prefix/suffix, connectors and double word names accurately while stripping out periods, comma's, extra white space, etc.

    Before I get into building some code to do this (which I'll post)...Has anybody any working code snippets?
    ------------------------------------------------------------
    sigpic

    It has come to my attention that certain dubious forces are interpolating their desires in my search for Mom, apple pie and the girl you left behind. Stop it or I'll scream...

  • #2
    It would have to be some snippet to do the job properly.

    Most people are used to being wrongly addressed, which is a comfort. However, those who are hardest to get right are the most vociferous and easily offended.
    I can recommend Debretts as a useful reference when deciphering UK titles and forms of address. Possibly irrelevant - your profile does not reveal geography.

    Comment


    • #3
      Originally posted by Rick Kelly View Post
      With feedback from clients, I have concluded that when it comes to handling persons and organizations in databases, folks like to have a one-line free form column when editing names and they want this to display/print exactly as entered.
      Is it possible to have and additional field/column(s) just for that purpose? ie
      Type Stuff
      Name as string * 25
      Address as string * 25
      ...
      Display_Personal as string * 100 '
      Display_Formal as string * 100 '
      End Type

      ===============================
      "A modest man
      with much to be modest about."
      Winston Churchill
      ===============================
      It's a pretty day. I hope you enjoy it.

      Gösta

      JWAM: (Quit Smoking): http://www.SwedesDock.com/smoking
      LDN - A Miracle Drug: http://www.SwedesDock.com/LDN/

      Comment


      • #4
        You can do that if you want, but it is disregarding the First Normal Form (1NF) in a relational model database: All attribute values are atomic.

        From C.J. Date, "An Introduction to Database Systems":
        Hence, the one-word answer to the question "Why do we insist on normalization?" is simplicity - simplicity, that is, in the basic data objects we have to deal with, which leads in turn to corresponding simplifications everywhere else in the system.
        Better, if possible, to educate the clients and record the data properly.

        One option is to have separate columns for Last Name, First Name, Prefix, Suffix, etc.; then a separate one for a free-flow name. This can be populated manually by the user, or by program action.

        It may not matter in Alaska, but in several Asian countries, the Surname (Last Name) is always written first, not last. Not having an atomic data structure makes it impossible to do anything meaningful with name data if there is a mix of naming structures within it; so there are very practical reasons to ensure the data is normalized.

        Comment


        • #5
          A 'freeform text' field is not really suitable for searching except with a 'like' or 'contains' operator, and that's aside from performance issues.

          Also not suitable for searching are variable-location and/or length substrings of single fields... but that is kind of redundant since that by definition means the data are not normalized.
          Michael Mattias
          Tal Systems (retired)
          Port Washington WI USA
          [email protected]
          http://www.talsystems.com

          Comment


          • #6
            Whether one provides discrete fields on a data entry form and builds a blended result or takes the blended result and breaks out and builds the discrete building blocks the ending results can be identical. The searches/indices, etc do not involve the blended, or in this case, the full display name of an individual or organization but rather the parts of a name that matter. Just as a soundex calculation can be done and stored for later use, so can the portions of a full name. It's the same basic process although in this case it's a lot more complicated.

            It was a simple question originally and I'm surprised at this offshoot of "unsolicited" feedback. I know database theory as well as any poster and am also not adverse to bend or even shatter it when it suits my purposes. Real world and academia are not the same.
            ------------------------------------------------------------
            sigpic

            It has come to my attention that certain dubious forces are interpolating their desires in my search for Mom, apple pie and the girl you left behind. Stop it or I'll scream...

            Comment


            • #7
              >and I'm surprised at this offshoot of "unsolicited" feedback

              You've only been a member for a year. You'll get used to it.
              Michael Mattias
              Tal Systems (retired)
              Port Washington WI USA
              [email protected]
              http://www.talsystems.com

              Comment


              • #8
                Rick,

                I would advise using a distinct field format when storing names belonging to individuals. You will definitely encounter some ambiguous free format names which can be parsed in a number of ways.

                Your proposed 'splitting' processing scheme would either need to identify the ambiguity and require some further input, or make a decision/guess about which is the correct interpretation (and accept that sometimes it will be the wrong decision).

                For example, consider the name 'Mr Van Anson'.

                You could separate out the name elements as

                Title: Mr
                Forename: Van
                Initial: V
                Surname: Anson

                or

                Title: Mr
                Forename:
                Initial:
                Surname: Van Anson

                Which is the correct interpretation?

                Mr Van Anson would definitely know.

                Regards,

                David

                Comment


                • #9
                  Originally posted by Rick Kelly View Post
                  Whether one provides discrete fields on a data entry form and builds a blended result or takes the blended result and breaks out and builds the discrete building blocks the ending results can be identical.
                  Want to bet? David has given you just one simple example of ambiguity. There are many more. It doesn't matter how good you are at database theory. What will happen is you will have to flag some names for manual cleanup. You will also get some complaints from ones which looked clean but weren't. Depending on your budget and location, there may be directories which you can reference for a surname check.

                  Yours unsolicitedly...

                  Comment


                  • #10
                    Originally posted by David Warner View Post
                    Rick,

                    I would advise using a distinct field format when storing names belonging to individuals. You will definitely encounter some ambiguous free format names which can be parsed in a number of ways.

                    Your proposed 'splitting' processing scheme would either need to identify the ambiguity and require some further input, or make a decision/guess about which is the correct interpretation (and accept that sometimes it will be the wrong decision).

                    For example, consider the name 'Mr Van Anson'.

                    You could separate out the name elements as

                    Title: Mr
                    Forename: Van
                    Initial: V
                    Surname: Anson

                    or

                    Title: Mr
                    Forename:
                    Initial:
                    Surname: Van Anson

                    Which is the correct interpretation?

                    Mr Van Anson would definitely know.

                    Regards,

                    David
                    I agree there is ambiguity and your example is what I had in mind in the original post where I mentioned "double names". This may get too complicated as in the USA we have such a diverse representation of naming styles from around the world. A compromise that comes to mind is to apply the algorithm and include the results on the input screen for manual tweaking. It's the data input effort that my clients are most focused on as they prefer to just start typing away without using drop-down lists for suffix/prefix/titles/etc and tabbing to different fields.

                    Using your example, the main ambiguity is whether the first name is even entered and "Van" could be a double-name flag or a first name and after entering "Mr. Van Anson" the screen could show:

                    Last Name: Van Anson
                    First Name: <blank>

                    I believe I could get it right a large percentage of the time and tabbing a few times would get you to the address portion which is another issue altogether with postal codes/country codes and allowing multiple, time-sensitive alternates. (Lots of snowbirds here in Alaska that live in warmer locales during the winter)

                    Rick
                    ------------------------------------------------------------
                    sigpic

                    It has come to my attention that certain dubious forces are interpolating their desires in my search for Mom, apple pie and the girl you left behind. Stop it or I'll scream...

                    Comment


                    • #11
                      I feel their is no excuse for getting a customers name wrong. Personally, I am insulted when a business does not take the proper time to get a name correct. Because i have seen people put in names and not follow the standard conventions you want for a one liner name input, I feel it is imperative to allow for plenty space in databases for naming. I have had to spend days and nights on end to make corrections before moving to a new main program from a old system. I like also having an alternative single line for the whole name. If the single line is blank, then you can build a name from the individual fields. It is also good to have a field for nick names. When it comes to names, i want our people to use the christian name(legal name). By splitting up names, it is easier to write programs to check for misspelled names, titles, and subfixes.
                      It is just a lot more simplier to have names split into fields than to stay on somebody's butt for entering names in a correct format. Multiple fields help workers police themselves.


                      I have also seen in Louisiana where you have to please the dept of public safety (DMV) and have papers with names that are wrongly typed just because the DMV has the name entered wrongly spelled names on their systems and when processing papers, your papers have to match the wrongly spelled names as on the DMV, no matter of the actual real name.
                      Last edited by Paul Purvis; 7 Nov 2009, 12:27 AM.
                      p purvis

                      Comment


                      • #12
                        One successful way to avoid unsolicited responses is to not post.

                        A paper that provides a clear description of the problem with recording names of people in databases is written by IBM at http://www-935.ibm.com/services/uk/c...osnameisit.pdf. While it does not offer any solutions, the conclusions are worth giving some thought. The penultimate paragraph is of relevance to this forum topic.

                        Comment


                        • #13
                          Rick --

                          An interface similar to Google Advanced Search (here) might provide a solution for you. As you type, it parses and displays the final sub-field results in real time. You could allow the user to type a name in a single field, and your program would auto-parse and display the first name, last name, middle initial, etc. Then, if the user edits/corrects one of the sub-fields, the main string would be modified. Only when the user clicks OK (or whatever) would the final data be accepted.

                          > I'm surprised at this offshoot of "unsolicited" feedback. I know database theory as well as any poster

                          Don't take it personally. There's no way for other posters to know your level of experience, so everybody pitches in as best they can, based on their interpretation of your question and their level of experience. That's the nature of most peer-support forums.

                          -- Eric
                          "Not my circus, not my monkeys."

                          Comment


                          • #14
                            Originally posted by David Warner View Post
                            Which is the correct interpretation?

                            Mr Van Anson would definitely know.
                            Yes, Mr. Van Anderson would indeed know.

                            As Paul stated, if you ("you" as in "the one inputing the data/having contact with Mr. Van Anson") haven't taken the time to ask him, but entered your "best guess" into your database, you shouldn't be surprised if things start to get messy. As the old saying goes: garbage in -> garbage out.

                            One the issue itself, why not create a virtual column on the fly, something like
                            Code:
                            SELECT (Salutation + ' ' + FirstName + ' ' + Initial + ' ' + LastName) As DisplayName

                            Comment


                            • #15
                              Knuth, I think the greatest challenge lies in deriving the components from an existing freeform "name" which could be one of several things, each capable of being formatted originally in several different ways. In my experience it is non-trivial.

                              I don't know, and am not likely to know, the details of Rick's requirement, so my comments are of a general nature.

                              Consider, for example, that the name may refer to an individual, a couple, several people, a family, a titled person, a company, a non-commercial organisation, the estate of a deceased former customer, a substitute for the name where it is not known, a government department.

                              Then think about the many different ways in which each of those "names" might have been captured, possibly over a long period of time, possibly with accretions of comments from operators (yes, it happens!) ranging from the helpful to the unmentionable.

                              If the volume of data permits, then it is best to "convert" existing names to structured names manually, preferably not using temporary staff!

                              If an exclusively automated conversion is used, trouble can be expected. It is also wise to ensure that its testing is signed off by the customer-facing part of the organisation!

                              Comment


                              • #16
                                >I feel their is no excuse for getting a customers name wrong

                                Of course there isn't. This is Business 101, The Customer Is King.

                                However, when your underlying database design tolerates ambiguity, seems to me the designer of same forgot the I/T department's job is to SERVE the business requirements.

                                MCM
                                Michael Mattias
                                Tal Systems (retired)
                                Port Washington WI USA
                                [email protected]
                                http://www.talsystems.com

                                Comment


                                • #17
                                  I.T. is not to "serve" the clients... that represents a slave/master relationship That is the I.T. of old.

                                  The new I.T. is here to offer a SPECIFIC SERVICE at an AGREED UPON service level, at a SPECIFIC COST. As an entity, I.T. is to continually improve their internal methodologies so as to lower their internal costs to maximize reinvestment to continue the cycle. I.T. is not supposed to care if clients are "happy", they are supposed to deliver within the agreed upon contract.

                                  Most OU's used to consider I.T. to be a black hole that money is poured into, they themselves forced I.T. into this position. They wanted I.T. to become more transparent and expose their costs. Sadly, the days of I.T. "helping" clients is over, we're are now a business like everyone else. Customer satisfaction rarely comes into play anymore, now it's a matter of a numbers game, did we meet contracted agreement at the lowest cost? Then we win, even if the customer was not "happy".
                                  Last edited by George Bleck; 7 Nov 2009, 09:52 AM.
                                  <b>George W. Bleck</b>
                                  <img src='http://www.blecktech.com/myemail.gif'>

                                  Comment


                                  • #18
                                    Just a fyi
                                    here are some lengths of variables that have severed us well in south louisiana for local use.

                                    TITLE$=5
                                    FIRST$=20
                                    NICK$=20
                                    MIDDLE$=20
                                    LAST$=20
                                    SUBFIX$=5
                                    ADD1$=35
                                    CITY$=19
                                    STATE$=2
                                    ZIP$=10

                                    i would increase the address line to 50 next time and the city could be increased some for other areas.
                                    p purvis

                                    Comment


                                    • #19
                                      I.T. is not supposed to care if clients are "happy", they are supposed to deliver within the agreed upon contract.
                                      And you wonder why the IT department got out-sourced to Bangalore?
                                      Michael Mattias
                                      Tal Systems (retired)
                                      Port Washington WI USA
                                      [email protected]
                                      http://www.talsystems.com

                                      Comment


                                      • #20
                                        Hey not my choice! I love walking away making people happy. There's a sense of accomplishment. I.T. is a business, it all revolves around the dollar, euro, yen, etc. Big management made this decision, not the worker bees like me.
                                        <b>George W. Bleck</b>
                                        <img src='http://www.blecktech.com/myemail.gif'>

                                        Comment

                                        Working...
                                        X