Skip to main content

VB Transform Innovation Showcase winner: Unstructured.io

#VBTransform of @AnnaGriffinNow @jeggers @manuaero @may_habib @mmarshall @nickfrosst @parasnis @PhilipDawson @sharongoldman @stevewoodwho @uljansharka @Venturebeat

Enterprises today have vast amounts of unstructured data scattered across numerous environments.

The “dirty secret,” according to Unstructred.io founder and CEO Bryan Raymond, is that data scientists are often still processing all that data exactly as they were doing 20 years ago, typically by manually building pre-processing guidelines.

“Data scientists hate pre-processing,” he told the audience at VentureBeat Transform 2023. “It’s like going to the dentist.”

Unstructured.io, which uses natural language to transform data from its raw form to learning-ready, was selected as Most Likely to Succeed at the Innovation Showcase at VentureBeat Transform 2023.

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

?

Request an invite

Connecting data to LLMs

Raymond described his company’s platform as an ETL — extract, transform and load — for large language models (LLMs).?

“We like to think of ourselves as top of tunnel,” he said.?

Unstructured.io connects data to LLMs and uses a variety of technologies — including computer vision, natural language processing (NLP) and Python scripts — to extract complexity.?

The unstructured data is curated, cleaned of artifacts and made LLM-ready, Raymond explained. This is a simpler and faster strategy and data scientists don’t have to write hundreds of lines of parsing code.?

Clean, structured data can be elusive

The tool’s enterprise API enables browser workflows for all types of developers, and supports pre-processing of more than 25 file types and thousands of formats in more than 100 languages, said Raymond. It is available as a free API, as a Google Colab notebook and on GitHub, where its library provides open-source components for pre-processing text documents such as PDFs, HTML and Word documents.?

Raymond said he came up with the idea for the company after being “stuck in data engineering hell” at a previous employer. Just getting clean, structured data took years, he said.?

Unstructured.io was founded in 2022 and the company is now “hard at work” on enterprise-grade data connectors that are resistant to interruptions and can detect new file versions and easily parallelize, said Raymond. The company currently has 15 data connectors, and plans to increase to more than 30.?

The Innovation Showcase at this year’s VentureBeat Transform highlighted 10 unique companies in the generative AI, machine learning (ML) and analytics spaces. The three winners were Unstructured.io, Arize AI (Best Technology) and Skyflow (Best Presentation Style), along with seven Honorable Mentions.

>>Follow all our VentureBeat Transform 2023 coverage<<

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Want must-read news straight to your inbox?
Sign up for AI Weekly
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news
news