Friday, March 20, 2020

Guild Blog Challenge Post 7: Human errors (mine!), source data accuracy - and some PARRY deaths in 1918

I've fallen behind on the Blogging Challenge, having not posted anything for a few weeks.  That doesn't mean nothing has been happening on the study - I do have several posts half written.  These were often prompted by sources other researchers have mentioned, either in their blogs or on the mailing lists and Facebook groups.  But doing the research necessary to move from some roughly drafted notes, to a completed post, always seems to develop into a much more involved task than I initially expect.

Like most people at the moment, I imagine, I have been following the developing situation regarding the coronavirus pandemic.  Various events I would have attended have already been cancelled, and I wouldn't be surprised if more were to follow. Daily life is changing.  Even what used to be a simple decision, like how we manage our shopping, has become more complicated.  Do we pop down to the shops regularly for a few bits - potentially contravening the advice about social distancing which emphasises 'stay at home' as much as possible?  Or do we wait to do a larger weekly shop, knowing that the time lapsed since the previous one should allow for any potential contagion to have shown itself - but risking difficulties in obtaining specific items, as well as the ire of the "daily shoppers", who might think we're stockpiling?  With daily 'essentials', such as milk, still disappearing rapidly from the shelves, and limits in place for how much can be purchased at any one time, we are all having to balance our actions, and be considerate of others. 

I wonder to what extent this will permanently change our ways of life?

I suppose it is only natural for comparisons to be made to similar situations in the past, in particular the influenza pandemic of 1918 (which, I only recently learnt, ran from January 1918 to December 1920, rather than just during 1918*).  So I thought I'd take a look at the PARRY deaths for that year, to see how much more than 'usual' they were.  This was particularly prompted by the fact that two of my own ancestral line died that year - John PARRY, my great grandfather, and Thomas PARRY, John's father.

Initially I was going to look at the range of years from 1910 - 1925, to provide an indication of how many 'more' deaths there were that year.  But, using FreeBMD for the figures, I was surprised to find that the number of registered deaths for the surname in 1918 (455) was actually less than the number registered in 1911 (468) .  So I expanded the range to 25 years, to see if that gave me a better impression of the 'usual' numbers.

One of my first considerations was how to ensure the figures were as accurate as possible. I had carried out some extractions of the death index from FreeBMD back in 2007, but those extractions didn't cover all the date range I was currently interested in.  Several of the years extracted had also been incomplete at the time and had transcription errors in them, which I have sometimes noticed leads to duplicated entries, and so would inflate the totals.  I therefore began by extracting those years I knew were missing, and checking the totals for those years previously collected, in case of changes.

I was going to compare the details to the same indexes on other genealogy sites, but then I realised some sites, such as Ancestry, actually cite FreeBMD as their source, at least for the index up to 1915, so I'd clearly be comparing the same data there.  The only obvious comparison to make would be with the UK Government's own General Register Office site, which covers the birth and death registrations. Many one-namers probably checked this site when it first went online, and did comparisons to the data they already held, but I hadn't yet done so.

The GRO transcriptions, I believe, have been carried out by going back to the original copy certificates that were submitted by Registrars each quarter, whereas the FreeBMD index is from the indexes which were produced from those certificates at the time of submission.  So I expected there might be some differences between the two indexes and was prepared for small variations in numbers.

But I was initially surprised to find a couple of years which seemed considerably out (over 20 entries out of 500).

But that experience will (hopefully!) teach me to read the small print in future!

The GRO site returns a maximum of 250 entries per search and three of my searches exceeded that number and were therefore incomplete.  I only realised this after deciding to extract all of the entries for those years, to compare them to the FreeBMD listing and then discovering it was the names at the end of the alphabet that were missing.  🙄

Having collected the correct figures from the GRO site (by checking the totals per quarter within the relevant years), I then looked at the totals across the two sites, which I plotted on the following graph:



By now I was more interested in the differences between the two sites, rather than focusing on the level of deaths caused by the 1918 Influenza pandemic.

So I plotted a separate graph of just the differences between the two sites, to make that easier to see:



As you can see,  the totals are identical for only three years.  So I returned to looking at the specific event data for several years, to try to account for the differences.

One would think a comparison of the two indexes shouldn't be difficult.  After all, the details relate to the same information, so entries that are in both indexes should be easy to match up and I'm sure the more technologically minded would be able to write formula to attempt this.  However, I only know how to do fairly basic comparisons using formula, so find it easier to paste the two indexes into one spreadsheet alongside each other and then use simple comparison formula to help me align matching entries.

I did some preliminary work, to get the two indexes into a comparable order to start with.  For example, the GRO site is gender specific, so the two extractions per gender need combining and sorting alphabetically to match FreeBMD.  I also added a column for the quarter numbers, rather than using the text versions for sorting, so that I could sort both indexes by quarter within the year.  I also used the two excel formulas, "PROPER" and "TRIM", to remove additional spaces in the first names and districts, and to ensure the format coincided.

I did cause myself one problem, having extracted the data into Word initially and then using "Find and Replace" to bring each entry onto just one row with tabbed data - since the surname “Parry” was the term I used to identify the start of each row, someone called “Parry PARRY” caused an additional split!

Having got the data into an order I was happy with, I used the "IF" formula to check whether particular columns from the two indexes were identical, and then set about checking all the mismatching items - of which there were quite a few, to start with!

Most of these are probably the sorts of issues that would make automation of such a comparison more difficult as well, since they relate to inconsistencies in the way the data is recorded:
  • Unnamed deaths - these appear as ‘male’ or ‘female’ on FreeBMD but, since the search is by gender on the GRO, appear with a dash for the name on that site.
  • Not all of the unnamed deaths seem to be in the online GRO
  • The online GRO index includes people's full names, rather than using initials for extended middle names. This didn't just make direct comparison more difficult but messed up the “equivalent“ sorting I’d done.  Entries that were in a particular order when only a middle initial was present, were sometimes in a different order once the middle name was written in full (eg 2x Mary A, that then turned out to be Mary Ann and Mary Alice.)
  • Some districts on the GRO occasionally have "Of" before them (Ruthin, Cardiff, and Haverfordwest, were ones I found)
  • Mis-spellings in the online GRO, eg Haverfordwest occurred as "Hamfordwest" on several occasions, Prestwick instead of Prestwich
  • Registration District name variations.  
    • I imagine space restrictions necessitated the shortened forms used in the original GRO indexes, which were then transcribed "as is" by FreeBMD, but the RDs in the new index are (mainly) transcribed in full.  There still seemed to be anomalies though, eg on my FreeBMD 2007 extraction, an entry was listed as "St Olave Bermondsey”.  That's now become just "St Olave" on FreeBMD, but the GRO has "Saint Olave Southwark".  That entry was in 1900, but another entry, in 1921, shows the RD as  "St. Olave (Bermondsey)" (use of full stops seems to be a bit variable as well!) 
    • Bedwelty/Bedwellty, Aberystwith/Aberystwyth, Festiniog/Ffestiniog, Shiffnal/Shifnal
    • There are two Newport RDs - "Newport Salop & Stafford" or "Newport Mon/(Mon.)" but I did find one entry as just "Newport"
    • Portsea (or Portsea Island) on old index – entry now as Portsmouth
    • West Derby on old index – entry now as Bootle.
    • Some entries in the online GRO index are missing their Registration Districts
All of the above added to my opinion that I should probably be creating a "standardised version" column for items such as names and Registration Districts, in all my spreadsheets, just to make things easier when I want to compare across datasets.

In the columns I checked (which was not all of them), I only found a couple of mistranscriptions on FreeBMD (ie where, on checking their image of the old GRO Index, their transcription was incorrect), eg  Ames instead of Amos, Catherine instead of Catharine, and a Volume 7b which should have been 8b.  Which, I think, indicates how effective the transcription process on FreeBMD is.

But there were some entries with name differences where I have no way of knowing which index is correct, as the FreeBMD transcription matched to their image, but not to the new index, eg Anne/Annie, Harriett/ Harriet, Iorwerth/Jorwerth, and May/Mary.  Perhaps evidence from other sources might eventually help to resolve those.

And then, finally, there were the "missing" entries, which accounted for the differences between the two indexes:
  • Net difference in 1900 - 4, made up of:
    • 5 on FreeBMD but not on GRO
    • 1 on GRO but not on FreeBMD
  • Net difference in 1901 - 9, made up of:
    • 10 (7 named, 3 unnamed) on FreeBMD but not on GRO. 
    • 1 on GRO but not on FreeBMD
  • Net difference in 1921 - 11, made up of:
    • 13 (11 named, 2 unnamed) on FreeBMD but not on GRO
    • 2 on GRO but not on FreeBMD 
It is difficult to identify whether some of the 'missing' entries could have been due to errors in the original index.  For example, two Elizabeths, aged 44 and 84 on consecutive pages within a district, where only one of them is in the online GRO index.  Again, two Joseph Walters, aged 58 and 2, on consecutive pages for same district on the old index but only one of them appears in the newer index. Hopefully, as with the name variation entries, information from other sources, eg burial records, might eventually resolve these issues.

As you can probably imagine, by now, what had begun as a simple question about the PARRY deaths potentially caused by the 1918 influenza epidemic had taken up rather a lot of my time, with very little of it specifically related to the original question!

The original plan for this post also included adding some details about the two PARRYs from my own ancestral line who died in 1918.  But it has taken so long to deal with obtaining statistics that I am happy with, that I'm going to leave that for another post. (I guess that's one way to get my Blog Challenge completed! 🙂 )


*
The 1918 influenza pandemic - https://en.wikipedia.org/wiki/Spanish_flu
FreeBMD site - https://www.freebmd.org.uk
UK General Register Office Site - https://www.gro.gov.uk/gro/content/certificates/indexes_search.asp (registration is necessary but searching is free)

No comments:

Post a Comment