All Things Techie With Huge, Unstructured, Intuitive Leaps

Interesting Problem Requires Algorithm

As a data geek, I get a lot of questions directed at me looking for real-life solutions to data problems. Consider this one that came in my email last night.

A social service runs a drop-in center for people. They provide services free of charge. They cannot take the people's names, because if the people thought that it was not a confidential service or that they were being identified in any fashion, they would stop coming.

They run these services sessions once a week. What they are allowed to record is only whether the person is a newcomer or has attended before. People try to attend weekly, but they come some weeks and skip other weeks. Some drop out forever, and some just for a little while.

The data held by the social service consists of just the following -- 1) the week 2) the number of people who have attended previously and 3) how many newcomers there are each week. They have five years of data.

Now, the government subsidizes this social service, and the government has asked a simple question that they cannot answer. The question is "How many unique individuals have you seen this year?".

It's a fairly complex problem. They don't know if at the start of the data, that it is the start of the records keeping coinciding with the start of the program. If it was, then it would be an easy exercise. They would just take the starting number of attendees and add all of the newcomers on a weekly basis. But this isn't the case.

With the starting number that they have, they don't know if the breakdown includes newcomers as well previous attendees. And if the starting number has previous attendees, how many are there. What they don't know is if the pool of starting individuals is complete, or if there are some individuals missing from the pool because they were drop-outs when the counting started. In other words, there could be any number of individuals who have attended once before the counting started, and then randomly show up throughout the year.

So the question is "How many unique individuals have been provided services since the counting started?" Government grant money rides on having an accurate answer.

How would you solve this? Comments are open.

(I'll put up my solution later)

No comments:

Post a Comment