Like many people, I read Stephen Covey’s The 7 Habits of Highly Effective People when it came out. Although I understood the bestseller’s popularity, I found it a tad simplistic. Strangely, though, one of his suggestions stuck with me over the years:
No, this isn’t always possible. Exhibit A: Slack started as a video game. Exhibit B: YouTube began as a dating site. Countless other startups have tried to “pivot”—although relatively few are successful. When it comes to data collection and analysis, though, Covey’s rule holds up in spades.
Analyzing structured data remains far easier than analyzing its unstructured counterpart.
Sure, powerful tools such as Open Refine are constantly improving their ability to make sense of unstructured data. Natural language processing continues to make strides. Ditto for robust Python libraries. Make no mistake, though: Analyzing structured data remains far easier than analyzing its unstructured counterpart—and I don’t see that changing anytime soon.
Cases in Point: Making Unstructured Data More Structured
When I started teaching each of my courses, I noticed that my predecessors by and large used survey tools that asked students to provide unstructured data. This was a problem. For instance, to collect peer feedback on capstone projects, students needed to enter the names of their teammates. Sure, this was easy for the students, but this immediately made my data corralling an unnecessarily complex exercise. In my larger classes, simply determining student averages took two hours. This I would not abide.
Let’s say that a six-person team consisted of Steve, Steven, Ian, Mark, Pete, and Lucy.1 Allowing them to enter their teammates’ names and scores in free-form text fields resulted in chaos. Problems ran the gamut. Some people referred to Pete as Peter. International students often go by nicknames. The surveys allowed for typos. You get my point.
I quickly reconfigured the surveys in future semesters to make student-response data structured. Now they need to select from team-specific drop-downs.2 Typos have gone the way of the dodo. Determining average student peer feedback merely requires creating a pivot table. I then upload their scores to Canvas and voilà! The entire process takes me maybe ten minutes.3
This hardly makes me exceptional. In a similar vein, Nextdoor made its data-collection process far more structured when confronted with a racial-profiling issue. (For more on this, see Analytics: The Agile Way.) Many other examples abound.
In my analytics class, students undertake semester-long individual research projects. To be sure, the scope of these projects varies immensely. During the early stages, some students take my advice to heart. Sadly, others ignore nuggets such as these—usually at their peril.
Regardless, sometimes the data arrives in a messy format—something that many of my more experienced pupils have already discovered over their careers. To the extent that we can control survey design and data collection, though, a little extra thought and time typically pays massive dividends down the road.
What say you?
- Yes, this is a Marillion reference.
- I require students to enter a portion of their student ID numbers to prevent fraudulent responses.
- In the future, I’d like to streamline this further by using the Canvas API.