This series of arti­cles on Gen­er­a­tive Arti­fi­cial Intel­li­gence cov­ers best prac­tices, projects, and tools in use at Van­cou­ver Com­mu­ni­ty Col­lege, and how the Cen­tre for Teach­ing, Learn­ing, and Research (CTLR) is sup­port­ing and guid­ing the use of GenAI in elearn­ing devel­op­ment.

Even with my cyn­ic’s hat firm­ly affixed to my skull (see my ear­li­er AI post), I haven’t been able to avoid using GenAI tools to enhance con­tent for our online cours­es.

Here are some exam­ples where I’ve used AI to gen­er­ate or enhance con­tent for elearn­ing projects at VCC.

This Persona Does Not Exist

In the last year alone, the qual­i­ty of syn­thet­i­cal­ly-gen­er­at­ed images of peo­ple has con­tin­ued to improve marked­ly. Month upon month, it has become dif­fi­cult for me to tell the dif­fer­ence between false and real peo­ple in online images. It’s kind of mad­den­ing that this arti­fi­cial real­ism is start­ing to grow beyond the “uncan­ny val­ley” lev­el.

In the past, I have used to grab real­is­tic images of peo­ple’s faces when design­ing Agile Per­sonas for soft­ware design. To cre­ate Per­sonas, I need­ed to cre­ate one-page descrip­tions of typ­i­cal users who would be the ben­e­fi­cia­ries of a design project. I want­ed faces that were real­is­tic enough to induce empa­thy for the design­ers and stake­hold­ers, but I did­n’t want to use real peo­ple’s faces I found on the Inter­net.

Reviving old Videos through Upscaling

I’m not such a big fan of using GenAI for con­tent cre­ation. I much pre­fer using tools where AI’s fum­bling fin­ger­prints may be hard­er to notice. For exam­ple, I’ve used an AI-dri­ven video upscal­ing tool at to scale-up old edu­ca­tion­al demon­stra­tion videos. To bring a twen­ty-year-old SD res­o­lu­tion video (e.g. 480x360 pix­els) clos­er to mod­ern HD res­o­lu­tions, AI tools basi­cal­ly intro­duce new pix­els to dou­ble the video’s width and height. In this case, the end-prod­uct becomes 960x720 pix­els, which is pret­ty close to the 720P HD stan­dard, much sharp­er in detail, and less “noisy” or grainy in appear­ance.

In the upscaled video (on the right), folds of cloth­ing and mechan­i­cal edges look crisp, facial fea­tures and let­ter­ing, not so much. (VCC Auto Col­li­sion pro­gram)

The results aren’t per­fect, but they can be pret­ty usable. In the upscaled end-prod­uct, objects with straight edges end up look­ing fan­tas­ti­cal­ly crisp, but human fea­tures do not always resem­ble their orig­i­nal own­ers (look at mouths and eyes, espe­cial­ly). Let­ter­ing in sig­nage or graph­i­cal titles also tend to lose their styling.

The AI behind the image upscal­ing does its best to inter­po­late pat­terns of pix­els to increase the spa­tial res­o­lu­tion of the video, but noth­ing in it actu­al­ly “knows” if its new pix­els are accu­rate­ly ren­der­ing the con­tour of a per­son­’s mouth or eyes. To the image upscal­ing AI, an edge is just an edge and it’s all just a bunch of pix­els that need trans­form­ing.

I’ve now done this for over one hun­dred Auto Col­li­sion and Refin­ish­ing (ACR) demo videos at VCC. I will like­ly be doing sim­i­lar upscal­ing for at least six­ty more videos to come. In this case, the ACR instruc­tor’s local­ly-made video con­tent is extreme­ly use­ful to their pro­gram. They also deter­mined that shoot­ing new HD ver­sions would take them too long, and it was also not pos­si­ble to find equiv­a­lent videos to license from oth­er ven­dors or pub­lish­ers.

Upscal­ing old videos this way isn’t an ide­al solu­tion, but it can be a use­ful com­pro­mise, giv­ing you bet­ter-qual­i­ty video rel­a­tive­ly quick­ly, and buy­ing your instruc­tors and con­tent devel­op­ers extra time to pre­pare for the inescapable day when they will need to record new demon­stra­tion videos.

Cleaning and Sharpening Audio Narration

AI-dri­ven audio enhance­ment can work along the same lines, remov­ing back­ground noise from an audio track to make human voic­es loud­er and more dis­tinct. AI helps to rec­og­nize pat­terns of sound that belong to a human nar­ra­tor so that they can be made loud­er and sharp­er, and so that back­ground noise can be de-empha­sized and squelched down. (Improv­ing the clar­i­ty of voice nar­ra­tion, whether by record­ing with bet­ter micro­phones or by using AI-dri­ven audio pro­cess­ing, can also result in a video’s closed cap­tions being more accu­rate.)

Adobe’s Pod­cast audio enhance­ment tool, Enhance Speech, is a free AI fil­ter for clean­ing up spo­ken audio. It pro­vides very good audio clean-up qual­i­ty. I’ve used it to cre­at­ed eas­i­er-to-hear audio from video record­ings tak­en in loud con­fer­ence rooms where every­one is talk­ing at the same time, but we real­ly want to hear what the pre­sen­ter is say­ing.

Text to Speech: Using Synthetic Narrators

Video often ben­e­fit from hav­ing spo­ken nar­ra­tion, but cir­cum­stances some­times make it too dif­fi­cult or time-con­sum­ing to record a live voice record­ing.

To edu­cate users about using VCC’s WeB­WorK math home­work plat­form, I had pre­vi­ous­ly made ten tuto­r­i­al videos. I orig­i­nal­ly cre­at­ed them as “silent movies” with­out any spo­ken nar­ra­tion, in order to reduce my pro­duc­tion time. How­ev­er, months lat­er, I decid­ed that those videos real­ly need­ed spo­ken nar­ra­tion help to fill-in the gaps in the nar­ra­tive and to “human­ize” the pre­sen­ta­tion. The prob­lem was that I was super busy, my office space was a busy place, and it would be incon­ve­nient to book a qui­et room to record audio for all ten videos.

I end­ed up try­ing out a syn­thet­ic text-to-speech tool inside the video edi­tor (This tool may be bun­dled with Win­dows 10 or 11.) I used a free account in the web edi­tion, and signed in using my email address and a cus­tom pass­word.

I picked a male voice named “Ste­fan”, which had good inflec­tion and sound­ed nat­ur­al enough for the video con­tent. I’ll let Ste­fan speak for him­self:

A sam­ple of the “Ste­fan” AI nar­ra­tor voice, from

I admit that to my ear, Ste­fan sounds a lit­tle bit like a dis­af­fect­ed nephew of Wern­er Her­zog, but the enun­ci­a­tion and inflec­tion of the voice are very good, and the over­all audio qual­i­ty is absolute­ly pris­tine. There is no back­ground noise of any kind between his words because it is syn­thet­i­cal­ly gen­er­at­ed. Silence is silence. I think it would be almost impos­si­ble to get a live micro­phone record­ing to have zero sound between the speak­er’s words, unless you had a sound-proof booth and a real­ly nice micro­phone (none of which we have here).

In anoth­er use of this fea­ture, I cre­at­ed spo­ken nar­ra­tion for a fel­low staff mem­ber who was ner­vous about using her own voice for her video nar­ra­tion.

I asked her to time the dura­tion of each slide of infor­ma­tion in her video, based on the speed at which she could say it out loud. Then I asked her to send me her writ­ten text. Using her text and rough tim­ings, I picked a female voice to recite her words and added the result­ing audio track to her video. She got a pro­fes­sion­al-sound­ing video nar­ra­tion with no stress­ful or embar­rass­ing record­ing ses­sion, and she loved the final results.

John Love

