Part of Microsoft Accelerator’s batch 3 of startups, DefinedCrowd is filling a niche in the big data and machine learning community, providing near-real-time feeds of rich language data, checked by actual well-informed humans all over the world.
The need comes from the Catch-22 that often arrests deep data analysis, in that you have to understand the data to analyze it, but you must analyze it to understand it. The vast landscape of the spoken and written word and its big data counterpart in natural language processing is especially troublesome in this way.
“In the artificial intelligence space, to develop virtual assistants like Cortana, or Apple’s Siri and things like that, you need large amounts of voice recordings, you need transcriptions of those voices, you need intents and empathy labeling of those voices,” said Daniela Braga, co-founder and chief scientist, in an interview with TechCrunch. “The crowd input provides the extra refinement of the data that basically no machine can do.”