Tuesday, April 28, 2009

Ancestry.com: More relevant search results coming Wednesday

NOTE from DearMYRTLE: The following was just received from our friends at Ancestry.com. This new search algorithm is a welcome change, one Ol' Myrt here has long lobbied for. Please address all inquiries to support@ancestry.com.

More relevant search results are coming Wednesday
From the Ancestry.com Blog
posted by Anne Mitchell

There is a long list of things we want to improve in search (and in new search in particular) – and we’ve started with what you’ve told us is the most important – getting relevant results; and relevance is our top priority this year in search.

And date relevance seems to be the most requested change. If you tell us grandpa died in 1910, you really don’t want to see a 1930 census record.

Making places more relevant and names more relevant are also important, but dates seem to be the one thing we’ve heard the most about. And not to worry, we will get to places and names as well.

So, sometime Wednesday around noon EDT (that’s about 4pm GMT, and about 9am PDT) , you will start to see some changes in your results for ranked search.And when you’ve got billions of names and records this stuff takes a while to roll out, so I can’t pinpoint the exact time. But this is reasonably close.

Here are the changes we’ve made:
  • If you are searching for someone and you just know a birth year, we will assume the person lived about 100 years. And we will only return records from the birth year - 5, and birthdates + 102.
  • If you are searching for someone and you just know a death date, we will again assume the person lived about 100 years, and we will only return records from the death year - 105 to death year+2.
  • If you put in both a birth year and a death year, we will return records between birth year - 2 to death year + 2.

Why did we choose a 5 year “fudge factor” for birth year and a 2 year “fudge factor” for death year? We’ve spent a lot of time with census records, and vital records, and when those dates are wrong, they usually fall into that range.

I’m going to try and guess at some of your questions. If you have other questions about the changes we made to make dates more relevant, please post them in the comments.

Questions

  • So what if I use a range on the birth or death year? If you have a birth year of 1850, and choose a range of +-2, and a death year of 1904 +- 10, then we will look for records between (1850-2-5) and (1904+10+2) or 1843 and 1916.
  • Why do we use a fudge factor? Because our ancestors were absolutely horrible with dates and getting them right. Our tests show that a “fudge factor” of five for birth year and two for death year gets better results.
  • What if I don’t want the fudge factor added in? Then add five to the birth year, or subtract two from the death year and you’ve outsmarted the system. I wouldn’t recommend it; you may be outsmarting yourself.
  • Should I mark dates exact? Depends. Death date is usually a very bad date to mark exact, because so few records have a death date. So enter the death date as limiting factor, but don’t mark it exact unless you are specifically looking for records that have that exact date in them. Birth year shows up in lots of records, so that is a better choice for exact, though that does require that a record have a birth year or an age. And remember, you can mark exact and a range, and that will match anything exactly in the range. I recommend this strongly for birth year.
  • What if I see a record that looks like it should be date filtered out of my results set, i.e., I put in death date of 1903, and it’s from 1920? It probably means we haven’t reindexed that data set yet — we’ve covered about 95% of all eligible records for launch. Feel free to leave the name of the data set in a comment on this blog post and we’ll make sure it gets onthe list. We are working our way through all of our data sets, but we started with some of the biggest and most commonly surfaced in our search results.
  • What if I don’t want you to date filter for me? If you don’t use dates at all, we can’t and won’t lifespan filter. Or you can type in a broader range of dates to include more records. But this one is a no brainer, as many of you have pointed out — lifespan filtering is going to give you better results. Now when we launch place filtering (hmm….wonder if that is a hint of things to come soon…) we will make that something you choose or not choose, because you will need more control over that.

This is a new addition to our algorithm, so if you have questions, this is the place. I’ll be keeping an eye on this blog post.

This will benefit both old and new search, but we really think you’ll see the difference most in the new search interface. There are many more improvements to come, and in the meantime, I’d encourage you to take a fresh look at new search and see how much this has improved the results you see.

One other thing – we’ve also heard from a number of people that you like to use new search for some types of search, and old search for others – but that switching between them is a pain. To make this easier, we’ve just retired the “introduction page” and introduced a simple link in the yellow bar at the top of the page to enable you to switch easily between the two searches. This will be available tomorrow (Wednesday) as well.

Happy Searching!

No comments:

Post a Comment