AI

Voicemod tools up with $14.5M to ride the generative AI (sonic)boom

Comment

woman speaking into a microphone in a recording studio
Image Credits: Nicola Katie (opens in a new window) / Getty Images

The first thing we ask Voicemod‘s CEO and co-founder, Jamie Bosch, when he picks up the phone to talk about a new funding round is not something we’re accustomed to asking — but our question may become the norm in the generative AI future that’s fast-flying at us: Is this your real voice?

Bosch’s startup has been fiddling with audio effects for almost a decade, playing in the field of digital signal processing (DSP) — where its early focus was on creating fun ‘sound emoji’ effects and reactions for gamers to spice up their voice chats. And gamers do remain its main user-base (for now). But the audio field is being charged by developments in AI — which Voicemod’s team is hoping will lead to whole new use-cases and many more users for its tools.

So where DSP technology was about applying effects to a person’s (real) voice, developments in artificial intelligence are enabling startups like Voicemod to offer tools to create entirely synthesized (unreal) voices. And even the ability for users to ‘wear’ these voices in real-time — so they can speak with a voice that isn’t theirs. Think of it as the audio equivalent of a Snapchat lens or TikTok’s viral teenage filter or Reface’s celebrity face-swaps.

AI voice can even enable voice-shifting into another person’s (real) voice. And not just for talking about the weather or shooting the shit. But for what’s known as sing-to-sing voice conversion. Meaning you could get to sing in someone else’s voice — supercharging your karaoke game, say, by singing Bohemian Rhapsody as literally the voice of Freddie Mercury. And even switching between Mercury, May and Taylor, for the full mock opera effect if you have enough trained AI models (and microphones) on hand. Mamma-mia! 

Artificial intelligence makes all this possible — even if legal and ethical questions may create pause for thought about rushing to unleash real-time voice-shifting upon a world that still relies plenty upon fixed identities. (Banks pushing customers to record ‘a unique voiceprint’ to use as a password definitely need to sit up and start listening.)

Voicemod acquired another audio effects startup last year, called Voctro Labs, whose technology Bosch says it’s working to blend with its own to create an amped up hybrid platform. The combo has already allowed it to expand what it offers — launching a text-to-song feature last December which lets you turn your own lyrics into a vocal composition using generative AI. He tells us more is on the way — including the aforementioned sing-to-sing feature.

Voctro’s tech may be familiar as it was involved in the development of a voice clone of musician Holly Herndon which appeared in a viral Ted Talk last year — in which her AI voice could be heard duetting with another musician (Pher)’s real voice in real-time. Which, well, if you haven’t already seen it is quite the visual-audio spectacle, as well as being a mouthful to explain. It’s also a taster of what Voicemod has coming to a keyboard near you.

“We’re definitely going to launch more products and more ways for people to express themselves with the generative AI technology,” Bosch tells us. “Not all Voctro Labs’ technologies are related to music — but they have a lot of technology related to singing, from this text-to-song technology to sing-to-sing technology in real time. So we have a lot of new projects and new products of upcoming.

“We are going to strengthen our speech-to-speech AI real-time technology, because we are basically merging our technology with their technology. We’re basically creating an hybrid technology that will be better than ours — or there’s a mix of both… [So their sing-to-sing technology will be] combined with our DSP technology — that we could use to do autotune. So we could potentially help artists with their voice and on the tone. And so this is, this is gonna be really, really interesting.”

As well as providing direct-to-consumer/creator audio tools, it offers its technologies via SDK and APIs for third parties to integrate into their own products, from games and apps to hardware. So it’s set up to distribute its tech across the gamer-creator ecosystem and have demand come find it.

Generative AI-powered disruption in audio of course mirrors (in a non-exact fairground ‘crazy mirror’ kind of a way) developments we’re seeing happen elsewhere: Visually, to graphics and illustration, as a result of deep learning and the advent of prompt-based image generation interfaces (such as DALL-E and Stable Diffusion). Also to the written word, through the large language models that underpin generative AI chatbots like ChatGPT that can produce song lyrics or a whole essay on demand. And, indeed, in the case of musical composition — where Google recently showed off a prompt-based generative AI song composer which can apparently produce arrangements that match the musical vibe you describe (although it said it’s not releasing that particular generative AI model — but surely someone else will).

It’s clear that AI is bending the rules of what it’s possible for a single person to create. And, well, as with freedom, the open concept, this is both thrilling and terrifying. Because, it’s what you do with it that counts.

The coming years are going to be all about finding out what people do with such powerful AI tools at their fingertips.

Voicemod team photo
Image Credits: Voicemod

Voicemod is positioning itself to ride this wave by building a toolbox for creators to survive and thrive in a reality-bending future and across a range of use-cases — hence it’s talking in terms of sonic identity and voice avatars for the social metaverse (at the future-gaze-y end) but also just helping you sound your sparkling best on a work Zoom call. So a sort of audio make-up as it were. Apply as needed.

“Now suddenly everyone can become a creator,” predicts Bosch of the generative AI boon. “Everyone can come, basically, with no skill set. Or with no learnings on how to really craft those audios. They will be able to actually create those pieces of music. Songs. And this eventually evolves into into — probably — even voices. So the ability to create voices.”

“This could potentially be something really viral for platforms like TikTok, or YouTube Shorts or Instagram… And this could eventually evolve into things like karaoke, for example. And be, I don’t know, part of game consoles, or things like that, for people to use this to entertain. And, if we go a step further — and it’s the technology getting better and better as we think it will be — this could potentially be a professional tool for people who want to create music. Or for people who want to create voices for movies or voices for games characters.

“We have a strong belief in user-generated content, and we are building tools for our users to start creating sounds and creating voices. And we will be putting technology in the hands of the users to create those [sounds]. And, eventually in the future, hopefully, they will go even to a professional level.”

So while — currently — in order for the startup to synthesize a whole voice it does still involve a team of sound engineers and designers, Bosch suggests generative AI will put that power in the hands of the individual — and it’ll happen soon; “in the near future”.

“I don’t know if we’ll be prompting — now we’re in this wave of everything is done through prompts — I’m not sure if that will be the way or it will be more tools that will have AI technology embedded and we have user experiences that will make things a lot easier,” he adds. “But definitely what I see from generative AI in the audience but also in the management phase is that suddenly everyone’s can come become a creator, which I think is really interesting.”

The birth of AI voice may not sound like amazing news for the employment prospects of sound engineers and designers (albeit, tech advances may simply create new requirements that just shift where their expertise is needed). But Bosch reckons that voice actors, at least, will still have a key role to play — emoting for AI. Since robot voices aren’t good at getting the pitch and intonation, or indeed emotion, right. It’s a voice clone without a soul, basically. (Or as Nick Cave might put it, AI voice lacks ‘its own blood, its own struggle, its own suffering’ — it lacks humanness.)

“I think that you will always need a human factor in your sample with these voices,” suggests Bosch. “You could have the best voice — of even a famous person — but what really comes is the impression. You still need a human to do the cadence on the words. You still need a human to do the rhythm, the tone. So [it’s not just that] I can speak normally and I will sound like a famous person — no, you don’t — you still need to act a little bit. So… I think human factor for expression is key.”

Might generative AI not be able to be learn to emote as well, with the right human data-sets — and further dial up its mimickry so as to make us laugh or cry or love or hate on-demand too?

“Yeah. Well, we will see,” responds Bosch. “I’m not sure. I mean, as of today, for me AI is a tool to be used by humans. But yeah, we don’t know where this is going to evolve.”

Voicemod for Desktop
Voicemod for Desktop. Image Credits: Voicemod

Voicemod is gearing up for whatever phonic crazyiness lies ahead with a fresh tranche of funding. The 2014-founded startup has been revenue generating for years, via pro versions of its tools — its main product, Voicemod for Desktop, has had more than 40 million downloads to-date, while Bosch says it has 3.3 million monthly active users — but it’s just closed $14.5 million in expansion funding, following an $8M Series A back in summer 2020Madrid-based Kfund’s growth fund Leadwind, led the round, with participation from Minifund (Eros Resmini former CMO at Discord) and Bitkraft Ventures.

“We’re super excited by what generative AI can do to all creative industries and more specifically audio, especially when it comes to enhancing and augmenting the job that creative people already do,” Jamie Novoa, partner at Kfund, tells TechCrunch. “In the past few months there’s been an explosion in generative AI in general and more specifically in audio but we think this is a phenomenon that’s just starting.

“What many of the cool technologies being launched to market lack are concrete and scalable business models attached to them, and Voicemod differentiates itself from the pack by having built a product used by millions of people on a daily basis and with significant revenue traction. We’re super excited about what Jaime and the rest of the Voicemod team have in the pipeline and what’s to come.”

Voicemod says the extra funds will be used to enhance the development of its real-time AI voice identity capabilities — and dial up its proposition for Gen Z, gamers, content creators, and professionals of all skill levels wanting tools to help them express themselves vocally in digital spaces.

Per Bosch, part of the reason it’s taking more funding now relates to the acquisition of Voctro Labs. Beyond that, he says it’s about making the most of the opportunities sparking off the Cambrian explosion in generative AI tools.

“We are in the middle of tremendous revolution in AI,” he says. “We want to be well funding in order to be able to develop technology but also to be able to deliver technology to users. So I think one of our competitive advantages is that we already have the market and the traction and we basically are able to put this in the hands of the users. And I want to make sure to have enough runway, also due to market conditions, to be able to put all of this in place. So it will be mainly focused… on building the next generation AI technology and putting it in the hands of the users and also building these creation tools for the users to create content.”

The first new tool will be landing next month — with a launch of Voicemod’s desktop product on macOS (currently it’s PC only). The goal is to evolve into a multi-platform product spanning all devices. “We’re also working on a creation tool mobile app that hopefully will see the light towards the beginning of next quarter. And, and yeah, some more stuff to come, hopefully,” Bosch adds.

He also tells us the startup is working on a watermarking technology which it hopes to launch in Q2 this year — to give platforms a way to be able to spot AI-generated voices in the wild.

Such a feature is likely to be a vital tool to counter all the possible negative use-cases (scams, fraud, manipulation, abuse, bullying, trolling etc etc) one could imagine humans coming up with for voice-shifting tools that let you sound exactly like someone you’re not.

“It’s an algorithm to watermark the audio,” explains Bosch. “Moderation is is complicated because it really changes depending on the space… on which are the platforms where the audio is used — so we believe that the channel is the one that should own that moderation and what we are doing is we will be providing this watermarking system in order for them to be able to know if the audio is created via synthetic voice or is created by a real voice.”

“Every single new technology can be used for for the good or for the bad,” he adds. “So we are of course putting some technology some tools in place to be able to have more control around a misuse of this technology.”

On questions of licensing for training data, IP issues here are currently a grey area — as the law hasn’t caught up with developments in AI (let alone generative AI). That means startups operating in the space have to consider whether to make the most of total legal freedom to do whatever they want (and hope expensive consequences don’t come clanging down on them in short order), or tread more carefully and thoughtfully. (Other startups in the space include the likes of Voice AI, Koe and ElevenLabs.)

Bosch claims Voicemod is taking the latter approach — using (paid) voice actors to build up data-sets to train and hone its AI models. If it wants to make use of some original content he says the team will go to the IP provider and negotiate — and figure out what kind of licensing terms they’d be up for. (The generative AI boom is also a crazy-thrilling time to be an IP lawyer, clearly.)

“We are basically pioneering here,” he adds. “So a lot of things are without laws yet so we were trying to stick to our values, basically, and try to do the right thing. That’s our approach on the data [side]. But yeah, you’re completely, right — there’s no ‘legal attachment’ to your voice, as of today… We own our fingerprint. You don’t own, like, whatever the fingerprint of your voice [is]. As of today.

“It sounds a little bit like science fiction but maybe, in the future, we will ‘own’ something related to our voice.”

For the record, Bosch was talking to me with his actual voice. The company’s real-time voice-shifting technology doesn’t yet work over mobile. But he says that’s coming too. So buckle up: The synthesized future is gonna be a screaming wild ride.

As ChatGPT hype hits fever pitch, Neeva launches its generative AI search engine internationally

More TechCrunch

Meta’s Oversight Board has now extended its scope to include the company’s newest platform, Instagram Threads, and has begun hearing cases from Threads.

Meta’s Oversight Board takes its first Threads case

The company says it’s refocusing and prioritizing fewer initiatives that will have the biggest impact on customers and add value to the business.

SeekOut, a recruiting startup last valued at $1.2 billion, lays off 30% of its workforce

The U.K.’s self-proclaimed “world-leading” regulations for self-driving cars are now official, after the Automated Vehicles (AV) Act received royal assent — the final rubber stamp any legislation must go through…

UK’s autonomous vehicle legislation becomes law, paving the way for first driverless cars by 2026

ChatGPT, OpenAI’s text-generating AI chatbot, has taken the world by storm. What started as a tool to hyper-charge productivity through writing essays and code with short text prompts has evolved…

ChatGPT: Everything you need to know about the AI-powered chatbot

SoLo Funds CEO Travis Holoway: “Regulators seem driven by press releases when they should be motivated by true consumer protection and empowering equitable solutions.”

Fintech lender SoLo Funds is being sued again by the government over its lending practices

Hard tech startups generate a lot of buzz, but there’s a growing cohort of companies building digital tools squarely focused on making hard tech development faster, more efficient and —…

Rollup wants to be the hardware engineer’s workhorse

TechCrunch Disrupt 2024 is not just about groundbreaking innovations, insightful panels, and visionary speakers — it’s also about listening to YOU, the audience, and what you feel is top of…

Disrupt Audience Choice vote closes Friday

Google says the new SDK would help Google expand on its core mission of connecting the right audience to the right content at the right time.

Google is launching a new Android feature to drive users back into their installed apps

Jolla has taken the official wraps off the first version of its personal server-based AI assistant in the making. The reborn startup is building a privacy-focused AI device — aka…

Jolla debuts privacy-focused AI hardware

OpenAI is removing one of the voices used by ChatGPT after users found that it sounded similar to Scarlett Johansson, the company announced on Monday. The voice, called Sky, is…

OpenAI to remove ChatGPT’s Scarlett Johansson-like voice

The ChatGPT mobile app’s net revenue first jumped 22% on the day of the GPT-4o launch and continued to grow in the following days.

ChatGPT’s mobile app revenue saw its biggest spike yet following GPT-4o launch

Dating app maker Bumble has acquired Geneva, an online platform built around forming real-world groups and clubs. The company said that the deal is designed to help it expand its…

Bumble buys community building app Geneva to expand further into friendships

CyberArk — one of the army of larger security companies founded out of Israel — is acquiring Venafi, a specialist in machine identity, for $1.54 billion. 

CyberArk snaps up Venafi for $1.54B to ramp up in machine-to-machine security

Founder-market fit is one of the most crucial factors in a startup’s success, and operators (someone involved in the day-to-day operations of a startup) turned founders have an almost unfair advantage…

OpenseedVC, which backs operators in Africa and Europe starting their companies, reaches first close of $10M fund

A Singapore High Court has effectively approved Pine Labs’ request to shift its operations to India.

Pine Labs gets Singapore court approval to shift base to India

The AI Safety Institute, a U.K. body that aims to assess and address risks in AI platforms, has said it will open a second location in San Francisco. 

UK opens office in San Francisco to tackle AI risk

Companies are always looking for an edge, and searching for ways to encourage their employees to innovate. One way to do that is by running an internal hackathon around a…

Why companies are turning to internal hackathons

Featured Article

I’m rooting for Melinda French Gates to fix tech’s broken ‘brilliant jerk’ culture

Women in tech still face a shocking level of mistreatment at work. Melinda French Gates is one of the few working to change that.

1 day ago
I’m rooting for Melinda French Gates to fix tech’s  broken ‘brilliant jerk’ culture

Blue Origin has successfully completed its NS-25 mission, resuming crewed flights for the first time in nearly two years. The mission brought six tourist crew members to the edge of…

Blue Origin successfully launches its first crewed mission since 2022

Creative Artists Agency (CAA), one of the top entertainment and sports talent agencies, is hoping to be at the forefront of AI protection services for celebrities in Hollywood. With many…

Hollywood agency CAA aims to help stars manage their own AI likenesses

Expedia says Rathi Murthy and Sreenivas Rachamadugu, respectively its CTO and senior vice president of core services product & engineering, are no longer employed at the travel booking company. In…

Expedia says two execs dismissed after ‘violation of company policy’

Welcome back to TechCrunch’s Week in Review. This week had two major events from OpenAI and Google. OpenAI’s spring update event saw the reveal of its new model, GPT-4o, which…

OpenAI and Google lay out their competing AI visions

When Jeffrey Wang posted to X asking if anyone wanted to go in on an order of fancy-but-affordable office nap pods, he didn’t expect the post to go viral.

With AI startups booming, nap pods and Silicon Valley hustle culture are back

OpenAI’s Superalignment team, responsible for developing ways to govern and steer “superintelligent” AI systems, was promised 20% of the company’s compute resources, according to a person from that team. But…

OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says

A new crop of early-stage startups — along with some recent VC investments — illustrates a niche emerging in the autonomous vehicle technology sector. Unlike the companies bringing robotaxis to…

VCs and the military are fueling self-driving startups that don’t need roads

When the founders of Sagetap, Sahil Khanna and Kevin Hughes, started working at early-stage enterprise software startups, they were surprised to find that the companies they worked at were trying…

Deal Dive: Sagetap looks to bring enterprise software sales into the 21st century

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world…

This Week in AI: OpenAI moves away from safety

After Apple loosened its App Store guidelines to permit game emulators, the retro game emulator Delta — an app 10 years in the making — hit the top of the…

Adobe comes after indie game emulator Delta for copying its logo

Meta is once again taking on its competitors by developing a feature that borrows concepts from others — in this case, BeReal and Snapchat. The company is developing a feature…

Meta’s latest experiment borrows from BeReal’s and Snapchat’s core ideas

Welcome to Startups Weekly! We’ve been drowning in AI news this week, with Google’s I/O setting the pace. And Elon Musk rages against the machine.

Startups Weekly: It’s the dawning of the age of AI — plus,  Musk is raging against the machine