From Messy Survey Data to Ready‑to‑Analyze Datasets: A Step‑by‑Step Guide to Data Cleaning and the Benefits of the K‑Square Platform

Name: Data Cleaning and Manipulation
Uploaded: 2026-02-19T07:46:38.913426+00:00
Channel: Chisquares
Description: Summary and key takeaways on From Messy Survey Data to Ready‑to‑Analyze Datasets: A Step‑by‑Step Guide to Data Cleaning and the Benefits of the K‑Square
Chisquares
Feb 19, 2026
•
5 min read
YouTube video ID: 8ITJ_bNba7o
Source: YouTube video by Chisquares — Watch original video
PDF
Why Raw Survey Data Often Needs a Deep Clean

Multiple‑response items in one cell – Google Forms stores several answers as a comma‑separated string, breaking the rule that one item per row.
Contradictory answers – Respondents may claim both "used tobacco" and "did not use tobacco" in the same period, creating misclassification bias.
Mixed data types – Numeric values (e.g., "21 years") are stored as strings, causing Stata to treat the whole column as text.
Variable names are full sentences – Columns labelled with long questions cannot be used directly for analysis.
Missing‑value ambiguity – Google Forms forces every participant to answer every question, so non‑applicable answers appear as blanks or nonsensical strings.
Core Principles for a Clean Dataset

One observation per row, one variable per column – Split multi‑response fields into separate binary columns (e.g., cigarettes_yes, hookah_yes).
Consistent, concise variable names –
Start with a letter, no spaces or special characters (underscore is allowed).
Keep names under 15‑20 characters.
Use intuitive abbreviations (age, tobacco_hist, sigs).
Separate quantity from unit – Store the number of cigarettes in one column and the unit (packs, sticks, puffs) in another.
Document every transformation – Create a codebook that records original values, cleaning actions, and rationale. This ensures reproducibility.
Handle missing data deliberately – Replace "NA", "never smoked", etc., with a numeric code (e.g., 0) only when the respondent is truly eligible; otherwise mark as missing (. in Stata).
Practical Cleaning Workflow in Excel

Download as CSV – CSV avoids hidden Excel formatting that can corrupt data.
Insert a header row and rename columns according to the naming rules.
Use Find‑Replace to strip textual units ("years", "packs") from numeric columns.
Create new columns for each tobacco product using formulas like =IF(ISNUMBER(SEARCH("cigarette",F2)),1,0) and copy down.
Convert strings to numbers by removing non‑numeric characters and applying VALUE().
Add a "unit" column for variables that need a measurement descriptor.
Save a master copy before any manipulation; keep a change log for reproducibility.
From Excel to Stata

Copy the cleaned CSV into Stata’s Data Editor.
Tell Stata that the first row contains variable names.
Verify that numeric columns appear in black, strings in red, and factors in blue.
Run descriptive checks (describe, summarize) to confirm that each column now follows the one‑item‑per‑column rule.
Introducing the K‑Square (Kai Quest) Platform

One‑click questionnaire import – Upload a Word‑formatted questionnaire (Q/A tags with @@ delimiters) and the platform auto‑creates all variables.
Built‑in skip logic & validation – Prevent non‑eligible participants from seeing irrelevant questions, eliminating the need for post‑hoc cleaning of impossible answers.
Automatic codebook generation – Every variable receives a metadata entry (label, coding scheme, unit) that updates in real time.
Real‑time data quality scoring – The system flags duplicate entries, out‑of‑range values, and incomplete responses as they occur.
Export clean or raw datasets – Choose the version you need; the clean set already respects numeric/string separation and proper coding.
Scalable and secure – No limits on respondents or collection period; unique IDs preserve anonymity while allowing longitudinal tracking.
Support for consent forms, multilingual surveys, and custom taxonomy – Add a consent page, translate questions, or define bespoke labels (e.g., custom doctor specialties) directly in the questionnaire.
Frequently Asked Questions Highlighted

Can I run the cleaned data in SPSS or R? – Yes; once the dataset follows the standard naming and coding conventions, any statistical package can read it.
What if I need to collect qualitative interview guides? – Use the ? tag for open‑ended questions; the platform treats them as free‑text fields.
How do I handle large surveys with many sections? – Insert section header prompts to group items; the platform keeps the order intact and warns you if you rearrange questions after adding logic.
Is institutional licensing available? – Universities (public or private) can obtain a verification code from the K‑Square team; the license covers all faculty, staff, and students for a year.
How are missing values represented? – The platform distinguishes between skipped (not eligible), partial (started but not finished), and unknown responses, making downstream analysis clearer.
Quick Tips for Future Projects

Design the survey with cleaning in mind – Use single‑choice questions where possible, enforce numeric limits, and avoid free‑text where a coded answer will suffice.
Preview before launch – The platform’s preview mode shows validation messages (e.g., minimum word count) and lets you test skip patterns.
Leverage the question bank – For health, social science, or humanities topics, start from a curated library of validated items and adapt them to your study.
Document everything – Even when using K‑Square, keep a brief log of any manual edits you make after export.
Bottom Line

Cleaning survey data is often the most time‑consuming part of research, but following a disciplined workflow—splitting multi‑responses, standardising variable names, separating quantities from units, and documenting every step—turns a chaotic spreadsheet into a reliable analytical dataset. The K‑Square platform automates many of these chores, from questionnaire creation to codebook generation, allowing researchers to focus on the science rather than the minutiae of data wrangling.
Effective data cleaning transforms raw, error‑prone survey responses into a trustworthy dataset; by applying clear naming conventions, separating values from units, and documenting every change, researchers can avoid misclassification bias and spend more time on analysis—especially when a tool like K‑Square handles the repetitive cleaning tasks automatically.
Frequently Asked Questions

Who is Chisquares on YouTube?

Chisquares is a YouTube channel that publishes videos on a range of topics. Browse more summaries from this channel below.
Does this page include the full transcript of the video?

Yes, the full transcript for this video is available on this page. Click 'Show transcript' in the sidebar to read it.
bank** – For health, social science, or humanities topics, start from

curated library of validated items and adapt them to your study. - Document everything – Even when using K‑Square, keep a brief log of any manual edits you make after export.
Helpful resources related to this video

If you want to practice or explore the concepts discussed in the video, these commonly used tools may help.
Excel Data Cleaning Workbook Recommended
Provides ready‑made templates and formulas for splitting multi‑response cells, converting strings to numbers, and documenting cleaning steps, which speeds up the process described in the article
Amazon →
Stata For Data Analysis Book
Explains how to import cleaned CSV files, interpret variable types, and run descriptive and inferential statistics, aligning with the article’s Stata workflow
Amazon →
Survey Design And Questionnaire Book
Guides researchers on constructing surveys that minimise post‑collection cleaning, covering skip logic, single‑choice design, and best‑practice wording
Amazon →
Data Quality Checklist Printable
Offers a physical checklist to track each cleaning step, ensuring reproducibility and proper documentation as recommended in the article
Amazon →
Links may be affiliate links. We only include resources that are genuinely relevant to the topic.
Summarize another video
Full Transcript YouTube

have to be able to for
um we have to we have to now clean the
data to be able to for it to align in a
way we can use it so that's issue number
two um issue number
three um which of the following tobacco
products have you used in the past 30
days so you see you have remember one of
our fundamental rules was you can only
have one item in a row you you must only
have one item in a row before you can
analyze
data but here you see this person has
one two three the way the data are
collected in Google forms is that if you
report and multiple items on the
multiple response questions they are all
collected in the same row as comma
separated items so now you have to you
have to now start being clean this is
another reason why you have to clean the
data right because now the data that
violat our basic principle that in a row
there can only be one item you have many
items here and then you also even have
inconsistent items this person
says um I smoked hookah electronic
cigarettes some tobacco product but
still they also said I did not use any
form of tobacco product in the past 30
days so the question is which is which
did he use did he use tobacco products
or did you not so as part of a data
cllean you have to figure out how you
want to categorize people like this do
you want to classify them as tobacco
users or do you want to classify them as
tobacco non-users or how exactly do you
want to deal with them so that's the
problem we have there right two problems
that uh again we have this individual
too this one says I use Nal tobacco and
still they also told us I did not use
any form of tobacco in the past 30 days
that is a big problem for us because it
raises into question issues of validity
of the data and issues of
misclassification so remember in science
we talk about three things truth chance
or bias right truth chance or buyers
those are the whole of a research World
focuses around those three things and
when we come to buyas one of the key
types of buyas we talk about is
measurement buyers right measurement
buyers simply asks where the data
collected in a correct manner is the
data collected accurate all right and
one of the major threats to measurement
buyers is misclassification
and that's the problem we have here this
guy that says I used n Nal tobacco and I
did not use any form of tobacco at the
same time both things cannot be true
it's either you used tobacco or you did
not right so the issue here is if we
classify them as a tobacco user but in
reality they don't use tobacco we have a
problem on our hands right because we
have issue an issue of
misclassification at the same time too
if we classify them as a non- tobacco
user but they actually use tobacco
products we still have another problem
on our hands right so
that in
this and then coming here we also have
the same issue which was about campaigns
you had you had watched in the past past
you know on Tobacco prevention um again
we have multiple items been captured in
the same row which violates the basic
tenant of data data analysis you can
only have one item in a row right this
this person also says um you know you
know they this these two things are also
contradictory I'm not aware of tobac
campaigns um so bottom line is that you
cannot have um two items in a row and in
a column you can only have one
variable then lastly this column here we
also have that same problem of a mix
between um numeric values as well as
string so that means that when we import
this data into into um sta we're going
to have problems the problem will be
that it will be classified as as you
know as as string now let us now let us
now um look at another problem we have
if you look at here the the first row
which are the variable names you notice
that these are entire sentences right we
can't analyze data that way it it just
will not work so that means that we have
to now assign variable name Nam to these
columns to this you know each of this
columns because right now what we have
are not variable names they are entire
sentences for any data analysis to be
done we need to have meaningful variable
names so what we have to do now in this
data set is to clean it so and how do we
do that well first of all we have to
download this data set so let me f first
of all download
this um download let move this away
and I'm going to select CSV if you have
an option of using CSV always always
always select that CSV is more it's a
very stable form of it looks like an
Excel spreadsheet but it's just more
commonly used and more stable um it
Excel tends to come with many formatting
elements inside it which might throw
some errors but CSV is like Excel
without those extra drama so that's just
how can you can think of it simply so
let's download as a CSV file
so that is the that is the file
here and let's show in
folder let's open this
with as an Excel
file right so this is our data set here
if you want to now one of the things I
would really encourage you to do is to
learn how to use Excel as as as people
who want to do research Excel is a very
handy tool I mean it's not it's not it
can do some analysis obviously but even
if you're not using it primarily for
your analysis there's no way you're
going to come across you will need Excel
in your life so figure it out to you
Excel has very amazing formulas and so
that could just be a pet project for you
to figure out um questions is it
possible to be a smoker and not smoke in
a past 30 days I'm going to ask some of
these questions you answer some of them
at the end of the day um so let me just
pack them for now so to see all the
items in Excel in an expanded view click
on that first node at the top here and
then double click on any of those lines
so that will expand it in full view so
now you can see everything in in full
view all right so what we have to do now
is to one our sign variable
names so let's to do that we're going to
create an ex ract row up here so we'll
click on that first row and we click
insert that that inserts a row above
what we're working with so this so all
of these things we're doing this is what
we call data cleaning so that name like
I said is it's commonly used but in
terms of practically this is what this
means right so we will call this um how
old where you let's call this um let's
highlight that so that it's clear what
we're working with
timestamp let's call it just timestamp
as the name implies now there are some
rules for assigning variable names you
shouldn't make them too long variable
name shouldn't be like ideally I would
recommend you to keep it at 15
characters a variable name cannot be
started with a number you cannot start
that sta does not accept that you cannot
start a variable name with a number it
has to start with a letter right um and
in general try to give variable names
that are intuitive that that makes sense
to you um so if if the variable is
called how old were you last December
I'm going to call it for example age
right or how old that way it's intuitive
like when I see it I know what it means
then the next variable is regarding
tobacco use whichever flowing applies to
you I'm going to call this tobacco
history then in total how many
cigarettes have you smoked in your
entire life I'm going to call it um sigs
smoked and then um which of the
following tobacco products have you used
in the last 30 days I'm going to call it
6 30
days so variable names can have numbers
inside them the numbers just cannot be
at the front so you can have a number in
the beginning sorry at the end or in the
middle but you cannot start a variable
name with a number they also cannot be
spaces you cannot have spaces in
variable names you cannot have special
characters either right so let's
summarize the rules of variable names
they have to start with a letter based
on status usage or Rules start with a
letter they should be relatively short
again my recommendation is that keep it
to 15 to 20 characters no more than that
you don't want to become an encyclopedia
right easy to remember it should be
intuitive you can have numbers in the
variable names or at the end but numbers
cannot start it the only the only
special character that sta allows is an
underscore so you cannot use special
symbols like dollar sign or the and sign
in a variable name so keep it simple
keep it understandable in a way that you
can understand
yourself then let's call this TV
campaigns and let's let's call This
legal
age so now this is the first thing you
have to do when you're analyzing data
when you're cleaning your data from and
again if you col data on on Survey
Monkey or on qualtrix or on most other
platforms this is this this is the same
procedure you're going to have to do the
only platform that spares you all this
Spain is the kpr platform and we'll come
to that later now you are going to so so
you don't forget what this means it's I
advise you to immediately copy this and
start creating your code book right
right do when you're working so I'm
going to come here and start creating my
codebook so let me
transpose
and so I have this
here by the way to transpose you just
copy whatever you did I just copy that
and then when you come there you press
alt control
V alt control V will open this panel and
then you can paste as transpose the
reason why I want to transpose it is
that I want it to switch so that the
rows will become columns and the columns
will become rows so it's easier for me
to look at the or the new
variables and then the
old
variables so that when I come 10 years
from now I'm like H what exactly is TV
campaign what what was that measuring I
can look back here and say okay this is
what it was measuring
um and any quick questions before we
proceed is anyone
confused okay no questions all right
that's great all right so that's um
that's the first part of what we're
going to do and then we are going to go
ahead and delete this part now this
yellow one because we don't need it
again so we're going to delete this see
some hands are raised oh repeat repeat
it okay on transpose okay transpose you
copy the the rows you want so in this
case I'm interested in this two rows uh
I copy
this right contrl C to
copy and then when I go to whever I want
to paste it I press
alt control and V together at the same
time alt control and V letter
V exactly and then I'll select transpose
and so that will put it nicely for me
like
this are we all
happy any questions on that oh we good
all right all right now now let's go
back to clean our data now like I said
sta is awesome for data analysis sta is
not good for data cleaning so remember
our one of our rules was use the
simplest tool you have to get the job
done so that's why we're doing it in
Excel even though you have sta because
this this the topic for this session
today is data cleaning and so that's why
we're focusing on that so I hope we're
all on the same page as to why we're
doing what we're doing all right so now
let's go back to our data so now we can
go ahead and delete that yellow
row and this this is now our variable
names the first row now we said that you
can only have um you can only have
numbers or letters only only right so
that means that we have to find a way of
removing um the years or here right in
this column right so we have to find a
way of removing every single thing that
is not the number there are many ways to
do that let's just go go ahead with the
easiest way so you select that column
and we'll just find wherever we have
years and replace it with nothing so we
make sure you select only that column
you're interested in make sure don't
select all the other columns right
because you might have years somewhere
else and you don't want to replace them
so in that case I'm going to search for
year um now when you have both years and
year make sure you type the plural first
so that it removes that complete chunk
of word first so you're not left with
some letters that you don't understand
and replace it with nothing so I'm going
to type years and replace it with
nothing and then I'm going to check for
I'm going to change that maybe upper
letters and also try replacing there's
nothing there now I can change it to
lowercase and singular year
and replace it with
nothing so let's
check um oh there are so many of them
here um there is so you see why this is
so manual you see this is why I'm saying
that I told you earlier that if if you
doing your research this is not the kind
of tool you want to use because it's
it's leading with a lot of manual
processes that in and once you're doing
anything manually that automatically
raises the risk for error so that's the
problem of doing anything manually uh
now one of the things you want to know
about Excel is that numbers are to the
right and string are to the left so this
is why you have numbers to the right and
so what whatever is on this side is a
string so I can double click on that
those three dots I remove
them and now everything here
is is
um is string so and also keep keep notes
of what you're doing so um so you can
say variable
age
replaced all
occurrences of years year
year with
nothing to transform the column from
string to numeric this is called
documentation right and before you mess
with your original data set make sure
you have saved a Master Copy make sure
you have saved a master copy of a data
set somewhere safe so that in case you
made any mistake you can go back to that
right and also this is the whole point
of science is
reproducibility if you gave me that
original data set you have and I go
through your documentation carefully I
should be able
to and come to the same result so that's
why you want to keep make sure you keep
good notes uh or document and this is
this is what eventually forms your code
book I hope we're all on the same
page all right I assume
so so we have taken care of this now we
want to make sure that um all of these
things things like Neil and none are
replaced with zero right so one easy way
we can do that is to select this come to
this icon this filter and sort icon and
come to this filter
we select that filter
icon and we're going to filter out the
things that we that are problematic for
us so we're going to select things like
let's start with Neil so we have to
start doing this manually one by one so
let's start with na never smoked never
smoked Neil all of this we're going to
select all of
them and we're just going to replace
them with
zero so we'll copy that and
replace all of them with zero all right
so that when we come back to
our so we still have things that are
like I said remember in Excel the things
to the right are numbers the things to
the left are string remember so remember
we talking about string numeric and
Factor now this variable was capturing
how many cigarettes how many that's
number interested in number so that's
why we have to convert everything here
from string to number it was supposed to
be captured as numeric it is just the
way we collected the data that made it
appear as though it was string what
we're interested in it numeric so and so
what what would be helpful here is to
have another
column that will call unit of
measurements or this just call it unit
right and so we'll come to our code book
we say um SE
smoked action taken we created a new
[Music]
variable called
Unit which
describes the unit of
measurement for that tobacco
product
usage
reported the values
are values of
units or
Poss it could be Puffs it could be
sticks it could be um packs or it could
be
cartons so
individuals who reported if you rep zero
it doesn't really matter zero packs zero
C doesn't matter so we can call this um
zero packs so um here this this this
individual reported um pack so we're
going to change two I'll call it
PS um these are all numbers
23 packs so remember we said one column
can only contain one information if here
we have two information we have number
of cigarettes and a unit of measurement
that cannot work you can only have one
thing either you have sticks or you have
either you have the quantity and then
the unit has to be in a separate column
that's one of the rules we established
so we have to comply with those rules we
have 100
PS and let's see where else we
have not applicable so not now this is
another problem we have right the way
the data were collected is that even
people who are not
smokers were still asked this question
so people are now forced to say okay I
don't even smoke so what happens to me
right so ideally part of your data
cleaning is that you know this
individual should not have been there
right so these are missing values but
they are missing because they are not
even eligible for that question they
should have been skipped right but
Google forms does not have have a way
for you to skip people who are not
eligible that means that everybody is
asked every question even the questions
that not do not concern them right and
that gives rise to problems like this
where we have to now start manually
assigning um missing values a missing
value in sta is a DOT so for people who
are not eligible we can't assign them as
zero per se because they are not
eligible they are missing they should be
missing so I'm going to put this on as
missing this is one Puff
um some missing
values this is one
puff so um and this is sticks
five
sticks and then
for all the other ones here we can all
the ones that missing we can just put
them
as um
as sticks because you know zero could be
zero sticks doesn't really matter so
let's just put all of so that every unit
has a
value oh this one did not
report
um so 689 this one did not provide any
so is it like is it 689 packs or 689
sticks we don't know he just he just
said 689 which is also a problem will
have right because every single
um no Google forms does not have skip
patterns logic um
so so that that that gives us a problem
there um
so so in this case here you now have to
start figuring is it sticks you you are
now having to insert your own
information and and say okay is it
possible that the person could have
smoked 689 packs in the past 30 days but
now you you are now adding information
that was not there you're making
assumptions and one of the fundamental
rules of of analysis is that oh I should
add a new rule here do not make any
assumptions do not make any assumptions
right because you are injecting your own
buyers into the data so this is one of
the problems we have right this problem
this value has so many issues because
one it is an extreme value that may or
may not be PL and two we don't have a
unit of measurement right
so that is one of the problems we have
with that um but let's just hurry up and
go to the next
variable uh
oops now another that problem is that
with things like copy and paste you can
easily go and paste something in the
wrong field I could easily have pasted
this thing here and then that that would
have been problem so those are some of
the things you want to watch out
for then here we have a the let's see
the type of tobacco product that the
individual used in the past 30 days
there were several types of tobacco
products that were assessed from Nal
tobacco to um chewing tobacco and so on
each of those must be created as a
varable of its own right so you must
create a new variable um for each of
those questions so one of the things you
also have to learn is when when doing
your research is you have to learn how
to get information for yourself or by
yourself you have to learn how to learn
so in this kind of case what you need is
to go to Google and figure out what
formula so in this case for example I
can come here and go to
Google so Google has to be your friend
and CH GPT has to be your
friend how to
extract
specific
text from a c somebody has already asked
that question so you you type
that let's
see so because you you might never know
everything that you need to use so you
have to figure out how to how to Google
and so you might you might enter it and
it may not work but again you try again
and over and over um so
here okay can't seem to see anything I
like there let's let's the the earlier
it is in
the more likely it's going to be there
so your best bet is to start with number
one get substring that's not what I'm
asking
for
um okay and if you can't find it you
have to find a way of refining your
search this is also part of you learning
how to learn so how
to if spe if a cell cont okay let's see
that
um
texts aha so this is what we're looking
for check if cell if out of a Cell
matches specific text right so this is
the process I needed to go over when
you're trying to figure out information
for yourself you will never have all the
information you
need on hand you at some point you will
have to go to the internet and figure
out things by yourself so that should
become a habit you you really you can
try chat GPT you can um whatever the
case whatever you're familiar or
comfortable with just do it but you have
to master the art of learning how to
learn and trying to figure out things by
yourself so here we had a lot of things
here but what we need is check if part
of a Cell matches specific
text so here we're saying um if it's
number um this okay okay so let's copy
this let's see if it's it can't be
copied all right let's just
um let's just let's just type the
formula like as it is so you'll have to
do that for every single
product on cigarettes and all of that so
you have to create a new column for each
of those products so insert so let's
start with um cigarettes or let's say
chewing tobacco since I saw chewing
tobacco there so the new variable name
will be chewing
tobacco because remember one of the
rules says you can only have one item in
a column here we have many items in that
same column so we have to extract each
item into its own column so the answers
will be yes it is there or no it's not
there or 0 versus one so chewing tobacco
the formula was equals to his
number
search uh the text was chewing
tobacco in this
field so that's
what so so that's the formula there so
you know we are saying is this is this
there or is it not there right um so
here is false so we apply
that to the whole
column so you have true or false right
um so we have to do that for all the
tobacco products this is true in tobacco
we have to do that for
cigarettes again um now if you doing
this one of the things I would suggest
is that you make the you make the f
column fixed the F column is where all
the information is coming from so to
make it fixed you just put a dollar sign
in front of the F so that that fix so
that all you're doing is copy this and
you're pasting it let's just sorry let
me explain what that's
happening so like I said you you guys
have to learn how to use Excel right
because it's Excel is fun so um so if I
put the dollar sign in front of f the
platform the the Excel knows that every
every time I copy and paste even for
cigarettes it has to go and look at
column F if I did not put dollar
sign if I remove a dollar sign here it's
going to let me remove it so you can see
what happens if I
don't it's it's going to search for the
next column not column F you see that
right so that's why you want to put the
dollar sign if you want to restrict
the
movement so so now if I copy it it's
always going to search for but now what
I'm searching for is
cigarettes and why are we doing this
because any column can only contain one
item it cannot contain cigarettes and
pipe and it can only contain one column
one item in the column right so we have
to do that for every single item that
was assessed in that survey Che into and
pipes and snooze and so you can see how
data cleaning takes it could it could
take you weeks or months to clean your
clean your data right because it's a the
traditional way of doing it is extremely
labor intensive and it's extremely error
prone the smallest thing you can make an
error you don't even realize you've made
an error and the same thing right we
have to do that for all of these
variables TV campaigns will now start
creating one column for every single
entry right
um but essentially right that is what is
involved in data cleaning so again let's
go and review our rules that we
mentioned
earlier to understand why we're doing
those things because we have to
understand the why the why is because
you can only have one item per row so in
our case we had things like 21 years or
21 packs that cannot work we need to
only have one item per row so we have
the quantity in the First Column and
then we can move the PS or the unit of
measurement to another
column only one type of variable per
column so you can only have numeric and
factor or string remembering that string
is extremely powerful and whenever
string comes in contact with anything
string will make that thing string so if
string is in the same column with
numeric obviously it will become you
know um it will become
string you have to create a code book
which we've I've shown you how to do
that you have to ensure there are no
inconsistent entries so that creates a
problem for us right for people who said
oh in the past 30 days I never smoke a
tobacco product and in the past 30 days
I also smoke cigarettes that's a problem
right we don't we there is no there's no
way to to manage that that makes sense
it's either we we classify them as a
smoker or as a non-smoker but we always
run the risk of
misclassification if we classify them as
a smoker when they're not a smoker we
have misclassified them and the same
applies vice versa
watch out for extreme values use the
simplest tool possible in this case
we're cleaning the data in Excel because
data is not well suited for cleaning
data so that's why we're doing this in
Excel do not make any assumptions and
prevention is better than cure now once
you are done with all of your data right
you can then now move in into STA and
there are many ways to do that you can
select the whole data copy it
and go to sta let me see where my sta
is and you come here data editor click
that and paste control V sta will ask
you how do you want me to treat the
first row should I treat it as data or
should I treat it as a variable name you
can then select yes it's a variable name
so you paste it so you see so that would
then read it that okay yes um so you can
see that our data CLE this six smoked
this was a bunch of numbers and letters
but now it's black so remember we said
sta is color coded black is numbers red
is string and blue is Factor variables
right our age now is also all numeric
because we have cleaned it that way our
six smok is also all um numbers because
we've cleaned that way all the other
ones are string right we have to find a
way of of of of managing them and
remember we said also that for this
variable 630 days we needed to create a
new variable for every single product so
we have cigarettes um chewing tobacco
and cigarettes and all of them will have
values of either zero or one because
this is a completely useless column we
cannot analyze that this way we have to
separate each of them into the
respective columns we also have to do it
for this TV campaigns this exact same
thing right and then we also have to
clean the legal age that is what is
involved in you know so this is
something that could easily take you
weeks especially when you're working
with data set with hundreds of of
variable oh this could easily take you
months to just be able to clean your
data and get it into that into that
shape now I'm going to allow you some
time to ask your questions before we
look at the way easier way of doing this
but so any any
questions let me answer any questions
you might
have okay let's start with uh people
have raised their hands I'll start with
siso I'm allowing you to talk
siso
um
[Music]
um so if you guys want to
speak you can do
so uh SL you also had your hand
up
okay would the video be available yes
the video will be sent to everyone who
registered um just scrolling
up okay there's a lot of AI assistant
messages
here okay so can we have the the the
people with their hands up ask a
question so that we can move on we have
we're going to
have yeah thank I just want to ask
regarding the missing
of I be able to do
that sorry do you have headphones on we
can hear you very
well yes I
okay I'm asking that how do we do the
miss the missing observations um now the
problem is that we don't know why the
one of the problems with using a
platform like Google forms we don't know
because the kind of missing variables
has a huge impact in how you analyze it
not all missing values are the same
right because there are different types
of missing values and that's why I'm
that's why I mentioned earlier that um
this kind of platform is great for I'm
going to send you a survey want to have
a party let me know whether you can come
to party or not but if it's serious
research um you want to use something
else because these are some of the
problems that surround the use of tools
like this so it's a great platform and
it has its uses but when it comes to
research formal rigorous research there
are also a lot of gaps in the process
and so in the second part of this
session we're going to look at the
easier way of doing all of this so that
will answer your
question thank you can one say that
creating separate columns for each
variable is more of coding activity than
cleaning no it's not coding because you
have to cleaning of a data is to get the
data in the form you can analyze it you
can't analyze this data the way it is
now so cleaning means all of the tasks
you have to do just to get data ready in
the form it can be analyzed so in as
much as we can't Analyze This column the
way it is now this is also part of data
cleaning so I hope that makes
sense
hello go ahead please yes I'm Mr s kasim
from Nigeria
yes uh first and foremost I want to
appreciate you for this wonderful
knowledge sharing sir my question the
way you we see that there is a strange
in the stter can such such a qualitative
data be run with SPSS thank you sir yeah
so yeah thank you very much so the issue
is now with this with st or SPSS per se
the issue is how we collected the data
so we want to separate those issues
apart like sta is an awesome tool like
it's it's powerful and all of that but
sta is not built for cleaning data so
again remember our rule number the last
rule we had prevention is better than
cure you have to figure out a way to
collect your data that minimizes all of
his problems then you can analyze your
data in stata or SPSS or whichever tool
you want but first you must collect the
data in a way that makes sense and it's
less prone to error so for now that's
what we're focusing on how do you
collect data in a way that that is
elegant and is not prone to error so
that's what we're focusing on but after
that you can the choice of software
doesn't matter um sta SPSS info R python
it does not matter they all will get
give you the exact same
result all right any any other
question okay um thank you very much my
name is Joy Oba and I would like to ask
that so if Google doc I'm actually doing
my thesis and I actually use Google doc
which has taken me like three months now
trying to clean my data but because I
had started before I joined the class
for Kai Square I now know that Kai
square is a good tool for collecting
data but so I'm asking that in place of
Google doc and square is there any other
tool that you can recommend or suggest
that can do a good job as well so all
the tools right all the tools will
collect your data for you accurately
right the issue is
however the the the suitability of those
tools for research and how much work you
have to do um afterwards with with with
cl squ we just built in a lot of user
convenience and accuracy you know um
unlike other tools again kqu is built by
researchers who have who know the pain
of research and so everything we do on a
platform is is geared towards research
it's not like other platforms that were
built by some guy who finished college
or finished high school and like hey I
know how to code I'm going to write a
platform right K is built by researchers
for researchers so the way we address
any problem of course is quite different
from the way they're addressed on any
other platform again just because our
focus is purely on Research so
unfortunately even if you collect your
data with any other survey platform you
will still have to go through this exact
same thing we've described because they
they are not necessar designed for for
by by researchers per se again that
doesn't negate the fact that data is the
data you're collecting is accurate it's
just that you are going to have to sweat
to get the result you
need
yeah all right let's take one more
question and then let's see how we can
how the same research right how we can
do it on on the plat on K platform so
one last
question can I ask one that is here also
looking into individual rows and columns
is there a better faster way to do if
you if the data is large or the method
is the same and one has to do it anyways
it's the same so data analysis it
doesn't matter whether you have two rows
or two million rows it's the exact same
process that's why you also want to make
sure that again prevention rule number
10 prevention is better than cure so you
want to make sure that you you're
setting yourself up for success by using
tools that are that are well suited for
that all right let's now let's let's
switch to the second part of our
question our session then and see how we
can how we can apply the principle of
prevention is better than cure right so
let's so one of the first problem we had
if you notice was that we had to
manually we created our questioner in in
um in Microsoft Word but we have to now
manually type all these questions again
in
um in in in Google forms right one of
the way the K platform allows you to be
able to upload your word document
directly right but before you do that
there are some formatting elements you
have to impute in your survey and this
is simply Q and A everything in front of
a question must have at at Q and
everything in front of an answer must
have so it's question and answer Q or a
that's all so it's add at Q or add at a
and everything at the end ends with
three hashtags so if you had your survey
and this is this is why you have to know
how to use Excel if I had a survey like
this um so there are two ways to do this
right you can do this manually or you
can just copy your entire survey take it
to um
Excel paste it there so you don't have
to do this manual
so again here this is ADD at I can just
put add at a for everything now and at
the
end let me drag all of
this so again everything in front and at
end is three
hashtags and I can just combine all of
them together with a basic Excel paste
formula
all right and then I just apply
everything and just grab that and take
it to Microsoft Word back all right
so and then delete
the things I don't need
so remember questions for the title t
for title so you put t for questions Q
for question so that's why you have the
answer is at at a again question and
this is if you don't want to start
copying and pasting this stuff in the
platform you can just format it this way
so that the platform will automatically
read it right um question just put q a
for answer Q for question a for answers
Q for question a for answers and that's
all so once you formatted your your
question this way that completely
eliminate for the need for you to
manually start copying and pasting all
the single questions so now let's go and
import that item and let me log into my
account so let me first of all set up
the survey
um so I'll set the survey the title
is survey of tobacco among
adults let me just say Nigerian adult
just for
flavor um the country let me select
Nigeria aims I can generate the Asim my
by myself or with AI let me just quickly
generate some AI
a all right let me just select any two
or three and clear keywords generate a
few of
them this is very lazy a I come only giv
me two keyw all right great new project
I don't want I don't want co-authors for
now so let me skip that for now so I
have my project set up choose question s
I'm just going to come here and say
import from word since I have my
questionnaire already created the way I
designed it so I upload
that and that's it the questions are
already uploaded um since there's
possibility for you to make mistakes
that's why they are flagged so that that
forces you to go back and also the
variable names if you want to change
them you can do that that way so I come
here edit um how old were you last
December let's call this age and let's
change the settings let's say let's
allow only
numbers and let's also allow for the
possibility let's say the minimum age
you can be is zero and let's provide
unit of measurement let's say you could
be you could be aged months you could be
aged in years
and let's just allow and let's apply so
once we're done let's mark that as
resolved and let's update that
question the second question was
regarding tobacco use whichever
following applies to
you um let's call that we call that
tobacco history so let's just make the
names constant we Mark that as
resolved um we update the
question in total how many cigarettes
have you smoked in your entire life uh
the platform has already distinguished
them into the units of
measurement um we can um we call that I
think
six3 just keeping it constant 630 days I
think that's what we called it
um we want to allow only numbers again
prevention is better than
cure
um let's apply
and let's update question let's mark is
resolved and then which of the following
tobacco products have you used in the
last 30 days let's edit that
question um these are the different
variable
names um some other tobacco products
that is a special variable that is other
so let's create other let's just copy
this because that's the magic button
some other tobacco product not listed
here so let's call that some other and
we can delete that original one and then
I did not use any form of tobacco that's
also a special that's an exclusive
button if you said you not use we do not
expect you to use select any of those
that I want so let's use this special
button exclusive and let's copy this and
let's paste that here and we can call
this never
tobacco so that if you select this the
platform will not allow you to select
any of those other ones
right um and so let's mark that as
resolved and let's update that
question multiple
response um let's edit the
question um other campaign let's select
the other so that if you selected that
it will allow you to type whatever you
want and let's call this other
campaign we can delete that one
I'm not aware of any tobacco that's an
exclusive button so let's type that
let's just copy that and put that here
oops I am not aware that anything but
I'm just being lazy now so and then I
don't watch tobacco so let me delete
that first one then that's also another
exclusive button I don't watch TV or any
other media so let's put that there
let's copy that and let's that there and
we can delete this one
too um Can Mark as resoled the the the
answer was please select your top two
responses that means that we don't want
people to select more than two answers
so we can come here settings and say
maximum selected boo responses let's say
two because the the question says you
can only select two so we can apply that
so you cannot select more than two
unless update that
question and then next question is what
is the legal age of smoking your country
um let's edit that let's um call it
legal
age and let's allow only numbers and
let's let's let's add units of
measurement because who knows some
countries may allow people who are 12
months old spoke months or
years or let's even add days it's
possible let's let's say days
months
oops days
months or
years so there are three units we have
the minimum is zero so you don't have
negative values un let
apply um so Mark has resolved what's the
legal age let's update that question now
if you said you have never used any
tobacco product ideally we shouldn't be
asking you how many cigarettes you have
smoked or which of tobacco products you
used if you said you never use tobacco
here we should move you directly from
this question to this question so let's
add logic we come to skip
patterns I will set up a skip pattern I
will say let's use the tobacco question
let's done selecting if tobacco use is
exactly equal to never tobacco user then
go to this question
um this one
here submit so that they don't have to
be answering unnecessary questions that
don't apply to them so now we're done
with that let's um close this so our
survey is all set almost um only thing
is that we have to This Light is
flashing because we've not set our
timelines yet so let's set the study to
begin today and to
end the end of week let's see
that move this out of the
way and we can go ahead and preview the
preview allows you to see how the survey
will be done when participants take it
so let's preview the
survey so let's start so that if you
made a mistake in setting up your survey
you can always go back and and type
whatever so years
which so if I say I'm never tobacco user
it should not allow me to see the
questions I have to do with tobacco it
should take me directly to this um so my
logic is um okay I didn't say that logic
very well then let me go back sorry hold
on a
minute let me restart that I
mean hold on a minute did I set it right
at all so that's the whole point of
preview to allow you to make sure that
if you did not set it up very well
start survey you can go back you don't
want to launch a survey and then realize
that you made a mistake and participants
are busy selecting wrong things so that
you have to you should always preview
your
survey um before you launch it so never
tobac user let's
see okay so I go back to my survey let's
go back to my logic skip patterns to
make sure that
um let me edit
it then go oh all right oh I select I
selected the wrong destination if
tobacco history is exactly what never
tobacco user then where I should
actually route them is here not the
other one so let's go back and submit
this is why you have to preview your
survey right so that if you made silly
mistakes you can correct them before
your participants get to take take a
survey so let's let's do that
again and then if you want if you need
to translate your survey before we do
the preview I think some of you already
had session last time but this is if you
wanted to translate our survey we could
we could add a new language the survey
translation does not affect logic so you
can have as many languages but all you
need to do is set the logic in the
master language all right let's go on
with our preview again so always always
preview your survey before you launch it
because that that will prevent you so
many heart breaks instead of you
launching a survey that has
mistakes so let's say never tobacco user
aha so now it has keep me correctly to
the question that that has nothing to do
with with smoking so now let's um if you
have special buttons you want to make
sure they all work and this is the whole
point of preview so if I said other it
should allow me to type something below
if I say if I selected any of the
exclusive items I should not be able to
select anything right so that's the
whole point of preview to make sure that
you are testing your survey to make sure
everything works
um so so here it says please select your
top two responses that means I cannot
select if I I cannot select more than
two because I that was a setting that I
used again this is to prevent you from
data cleaning remember prevention is
better than cure right that's the whole
premise
what's the legal age of smoking 21 years
submit and that so now I have taken my
preview and I know that my survey is
ready for launch so I can go ahead and
launch my
survey and share with my participants so
let's do that now so I will have you
take the survey again for the second
time so that we can see the
difference so let's copy the link and
let's drop the link for you here
so all right so take the survey and
let's look at the differences you know
in the data architecture um and how the
issue of cleaning is addressed
automatically for you so let me take the
summon myself too
oh I I did not catch this in my preview
I should have anyway it's fine uh I
should have added a unit of measurement
I did not add that um
but so I feel in my own
preview so I cannot emphasize enough
that this are kind of Errors there the
kind of error I made now you want to
avoid that by previewing your survey
carefully you also want to invite your
colleagues or your supervisor to preview
your survey right because once you've
launched the survey like you've already
started collecting responses so you
don't want to make that that kind of
mistake um that I just made now so we
have 56
responses um so let's go ahead and see
um how the platform does this so on the
platform you automatically have you can
download a raw data set but you can
download the clean data set the clean
data set all this work we're doing
earlier has been done for you already
right you don't have to do that manually
so if you just want to clean it that all
you have to do is just click this and
just open it so you see they the way the
way it's the way it's done all the
variables that are numeric are cleanly
numeric you can see numbers um the
variables like the let me let me expand
this
um your age everything is clean in one
column um and then you you have the type
of missingness right for each variable
you have a variable beside it that tells
you if if if the values are missing so
for example um history you have missing
history the missing tells us why a
particular value is missing this guy did
not give any answer and he says the type
of missing is o oh means order we don't
we don't really know why they chose that
response or this individual is missing
and the value is p p means partial
complete it means they never got to that
part in the questioner and they just
submitted it right um here we see other
types of missing
um 630
days so this is how many cigarettes you
smoke in the past 30 days s means they
were skipped so that means that the
individuals were not eligible for that
question so the platforms skipped them
automatically um and so you have and
that's quite different from this guy
this guy also did not answer that
question this guy also has a missing
value here but his missing value is p p
means partial complete that means that
they were eligible for the survey for
that question but did not get to that
point right so the platform helps you
understand the kind of missing values
that you have so that you can
effectively deal with it but importantly
all of the information you need is
already is already captured right and
and
importantly on on the platform you you
also you also have the ability to just
um to download the you know report the
analysis you don't have to go and take
the data into State and start analyzing
it manually or start writing the meth
report you just download the methodology
reports for example um sorry I have
internet issues now let's see let me try
again
um so I saw some questions are there
Sops designed to use um Kai
squares
um s so so let me address that question
so if you have if you have questions you
can come here quick
questions and ask whatever question you
have so if you want to ask for example
how can
I set up skip
patterns in my
survey you ask that and then the pl the
AI will automatically answer the
question for you so that is what it's
trying to do here and so um that's what
it has just answered here right so uh if
you have any questions you can come to
quick question and ask your question and
that would that will give you the answer
you
need any other
questions um
okay
Salo yes my name is Salo from South
Africa I'd like to ask several questions
how do I know who has responded on the
survey let's say like I'm running an
audit of some form and uh the
respondents have responded
I'd like to draw a report and how do I
draw a report based in terms of their
responses by cating their responses so
that I can develop a
report so okay that's a good question so
I so I want to clarify what the what
you're asking your in your question
right are you saying that you want to
identify who did the survey is that what
you're asking
for yes who did the survey
firstly okay I mean who responded who
responded from the
survey okay I I would have created the
survey but who has responded from the
survey particular
individual
okay so the surveys are Anonymous so we
can't we can't tell who responded or who
did not respond because there's no like
in this okay oh I stopped sharing my
screen let me go back to this
here so like in the survey I did right
the let's go back to the
questions there's no question here that
allows us to know any anything about the
individual right we just know this you
know their age so there's no if you
don't have a question identifies
individuals then you won't know but it's
a different thing if you ask them what
is your name then you can know right but
if it the second item was how do you now
analyze the data well that one's easy
you come to download report and click on
download analysis report let's say you
want to download for the whole
population you click there the platform
will take the data analyze it that's
what it's doing right
now it has finished analysis so we can
come here and click on view download
history and download the analysis report
which is this so now we can have the
analysis report that tells us everything
about the analysis we just did right um
we know they this is the this is the
questions this is you know you know in
terms of units each question is answered
one after the other right um but in
terms of knowing who answered what we
can't tell because there are no
identifiers in the survey we don't know
their names or any other identifier so
that is impossible for you know for for
you to access so if you want if you want
to have that information you have to ask
them the question already does that make
any sense let's say for example the
respondent have then typed multiple
strings how would I then know I mean how
those multiple strings be extracted from
the report okay can I go to point so you
have to go to the data so let's go to
the data again let's go to the um let's
go to the raw data set now let's
download the raw data
set so each row is a person this is each
row is a unique
ID this the data set and you can now
look at okay but this individual these
are the answers that they provided
they're never tobacco users right but
even then you can look at all the
answers that is selected that this
individual selected truth campaign and
the selected tobacco industry campaign
but they are captured in different
columns all like what you had in in
Google forms they in separated columns
but so that's how you can tell the
responses from the individual you know
individual participants you go to the
and go row by row and then look at their
individual answers right now they the
data set here already captures the the
variable names that we created so if you
want to look at what each of those
things mean you come to the
codebook the codebook is like a data
dictionary that remember we have to
manually create our data dictionary on
the K platform you don't have to
manually create it because the platform
will automatically create that for you
right so you can see that each variable
one is months and two is years and so on
and so forth so that you don't have to
do that manually by yourself and those
are the variables and you can you can
tell what each of those variables mean
so that that spares you the task of
having to do that manually but Ina in
terms of looking at the individual
responses you have to go to the raw data
set and look at that row by row if
that's what you're interested in but
remember that for you to be able to
identify individuals you must have you
must have asked a question
so if you had asked let's say this was
the name this let's say this column was
the name and you say this is Mr ABC then
you can say oh this is and he typed his
name obviously you can see oh Mr ABC
this is what he answered but without
asking a question like that it's
impossible to know
it for example let's say I launch a
survey okay when I launch a survey it
gives me various ways that I can
distribute my surveys to various
participant yes when I select by email
it automatically takes me through to not
a Outlook but it takes me through
another
application how do I force it to take me
to Outlook oh okay if you want to attach
emails VI emails on the kqu platform
what you do is that you come here and
you under invite
participants you come to click here to
upload participants emails
you click on that button you come to add
participants you click there and you can
add many ways you can add manually you
can copy and paste if you want to copy
and paste for example you can just copy
and paste all your emails here for
example for example you can put my
email i
g.com and just type you know separate
them you type all the emails you want
and click submit right if you want to
upload if you have a spreadsheet that
has all the emails you can click here
and say upload from file and you choose
the file that contains all the emails
and you upload it right the platform
will automatically send the emails to
the people so that you don't have to
send it
yourself you can also click here to
create or edit the messages that will be
sent to participants so if you come here
and you can say this is a survey
invitation here is a message I want to
you can type the message yourself or you
can click here to generate the default
content so this is saying hi Mr so and
if if the names are in the column in the
spreadsheet the platform will
automatically use their names hi so and
so we hope this message finds you well
we're writing to to invite you to
contribute to our survey click here to
take the link this is a message that
will be sent to participants but you can
edit it yourself right once you edit and
save the platform will then send emails
to all the people that you were in that
spreadsheet so that you don't have to
manually send it
yourself does that answer your question
that answers my question the very last
question last week we covered we covered
a a way to H sort of H have a a string
length I can't recall what it is but
I'll just put it in a l minut time way
we put rules to say like a minimum of 20
wordss a maximum of let's say 500 wordss
when a person types H goes to the survey
the survey does not actually like guide
the user to say the minimum W is 20 and
the user will want to type words and the
next button will not highlight
automatically how do we ensure that
you'll put maybe
a some wedding underneath say like the
minimum we is 20 wordss oh I see your
point you can you can type it in the
question itself so if you want to if you
want to type let's let's pause this
survey if you want to do that you can
say um so let me just address your
question quickly so let's say we want to
add a question like what you just said
let's say um that's an open-ended
question and it's a multi-line question
um what has been your experience
traveling around South Africa
in the past 12
months please limit your
response not it thanks to
between let's say 30 and 100
words right so you tell them
there let's say let's call this travel
but you also go to the settings and say
the minimum word count should be 20 the
maximum what count should be 100 the
platform is going to enforce that right
so if for let's say let's say the
minimum is
two um or let's say just let's keep that
20 as we said and let's let's let's if
we look at the preview if if if the
users types one word the platform will
say write at least 20 words two words
like the platform is just telling him
you know you have to type more until aha
that they've gotten to 20 now then it
will stop right then if if if they get
above now they've gotten more than 100
words the plat you will not be allowed
to sub submit your question until you
cut down to the the limit right then
that so that's how the platform will
enforce it so whatever so if everything
is in your hands how much answer you
want the parant to give so that the
whole idea is that you don't have to
start cleaning or triming data later
it's everything is done automatically or
en Force automatically for
you in the preview mode in the preview
mode it does give you that red text just
to guide you that the minimum WS are
that but once you've launched your
application H then it does not give you
those red uh highlights to say the
minimum moment is that are you aware of
that no it does so for example let's
take it let's take preview right so
let's uh let's start
here
um oh by the way let me correct the
other problem I had with the survey this
question which of
following
um okay I should have added in this
question I should have added a unit of
measurement which I didn't provide units
of measurement I should have added
Puffs and sticks so you be always be
careful whenever you're do a review to
make sure that it's you know you have
captured everything that you need to
capture
patterns or
packs and
catons so that is very important that's
a whole point of a re a review did I
save it at all okay apply update
question date all righty um so if do our
preview I just want to move to your
question current toac use so just want
to move there
move the AHA see less than two words the
platform will
firsted and it's it's it it's showing
you the number of words typed so far can
you see
that now now once we get Beyond
100 see maximum what count must be 100
so this is what the participants are
going to
see so the platform whatever you set in
your in your settings the platform will
religiously enforce it for you so it's
it's everything is in your own control I
hope that answers your question yes
thank you all you're welcome yes any
other
questions uh doctor oh okay there was a
doctor he's gone now had his hand up for
a long time oh Dr ABD yeah your hand up
for long you can speak
now uh thank you very much uh uh Dr
Abdul from abara Nigeria uh my question
is uh uh this using the kai Square
platform uh can we get a prepared uh
questionnaire for example and then
insert it or is it more better we allow
the AI and the car Square to generate
the question then secondly does it only
relate to a health issues or it can also
go to the area of humanities and the
rest of them then
lastly uh the code book the code book
I'm interested in the code book uh how
it is generated at the K Square level
thank you very much all right this an
excellent question so
um and I have forgotten all of them
promptly okay I remember no no no no I'm
not going to put through that the first
question should you
should I'm just growing old um senal
dementia first question was whether you
should create your own questions or have
ai generated for you I think the best
questions research is driven by the
individual right AI is supposed to help
you but AI cannot think for you so the
best thing is for you to create your
questions and you can use AI assistant
so you can use questions from our
question Bank we have extensive we have
thousands of questions in our question
bank so if you know if you're research
is let's say something about let's say
the humanities you can come to
scientific research surveys and get an
idea so let's say you are doing work on
um social and Behavioral Science or
economics policy and politics you can
look at the service that exist and look
at okay um is that some let me look at
politics since it seem to be some
someone in the humanities look
um let's
see let's say voting an election survey
right
um you can look at the questions that
have and then you might say okay these
are the exact questions I want in my
survey and then you can say oh I just
want to copy this over so you can just
copy this entire survey and continue
right so it would just it should take
all of a survey and create a survey for
you in your own in your own you know um
in your own question let's start by
create datee so now I have voting an
election survey right I can then go
ahead and edit the survey I can say okay
these service are these parties are not
some parties in Nigeria let me change
this to PDP right um or to NRC or
whatever right um so the answer is
whatever it is that makes that it makes
sense to you as a professional you can
cre your questions from scratch or you
can you can use questions from our
survey bank or you can you can have ai
to create questions for you what matters
is that you over oversee that process so
whether even if it's AI that is creating
questions for you what matters is making
sure that you the scientist are in
oversight of that process and that's why
whenever you bring things from a
question bank or from AI we flag it that
that forces you to go back and make sure
you review it so I hope that part is
clear um the second part was whether
social sciences as you can see this is
voting and elections that means that the
kqu platform is not just for you know um
SC you know well social sciences are
actually Sciences too right but it could
be any Endeavor at all that involves
data collection whether it is
epidemiology or or political science or
any area of human research Endeavor you
you can use that um you can you can
collect data on the platform with that
now the codebook the question was third
question was how does the platform
generate the codebook right the codebook
as mentioned let me go back to the
codebook um hold on a minute if I can
pull it up
I'm not that
one let
me figure out
the so this is the codebook right um the
codebook has a lot of um information
that is created in real time so one of
the first things you notice in the
codebook is that every individual who
participates in the survey
has a data quality score so as you are
as participants are taking the survey
right the platform is watching and it's
looking for specific things as people
are taking the data in real time so if
people like I think I mentioned this
last week if you have if people had
duplicate the platform is not noticing
and detecting it in real time if they
spare through the survey the platform is
noticing and detecting and Reporting it
in real time so that is how it gives
right to this values right so then they
question specific aspects of a data
dictionary like this question says um
how old were you December 3 of last year
right um they you had response options
one many months and two many years this
is metadata that is generated when you
are creating the questions right so when
I was creating my survey um when I was
creating my let me go back to that
survey um let me go to sort by
created
dates so when I was creating that survey
and I came here and as I was as I was
creating
the the the values right I typed months
here I T years I tapped centuries the
platform notices the index positions
it's it's it's basing this on index
positions so month is index position one
years is index position two centuries is
IND index position three so that when
it's cleaning the data set it now
assigns months as one years as two
centuries as three and that is the
metadata it's capturing in this
different fields is what it's keeping in
its memory and using it to generate the
codebook for you so when you as you as
you entering things in different cells
or different parts of the platform the
platform meticulously tracks all of that
data for you and it's not just tracking
it's keeping a record of it and using it
to write all of us logical reports for
you so that you don't have to do that
manually so remember that when we're
doing our own data cleaning in Excel we
had to create a different cell and start
typing this thing one by one well it is
to avoid us save us all of that stress
of doing it manually that's why the
platform does that automatically for you
I hope that answers the question for
you yeah thank you very much then lastly
what is the exence of the code book the
code book is a data dictionary right if
you if you open the if you open okay let
me open this the clean data set for
example if I open the Clean data
set and I said data quality score what
does 11 mean there's no way I'm going to
be able to know what 11 means or if I
said tobacco
history um a value
of tobacco history what's the value of
one what does one mean what does two
mean that I don't know what that means
all right the only way I can know what
that means is to go to the codebook
right so when I go to the code book I
can now see okay for tobacco history one
means never tobacco user say oh okay I
see or two means former tobacco user so
the code book allows you to be able to
interpret what in the clean data set so
does that make sense so it's you use a
code book and the data dictionary and
they clean the set hand in hand so that
you can know what the codes mean all of
those things are called codes that7 is a
code and one is a code so to explain
those codes we need a code book so the
code book explains what the codes in our
clean data set are so that we can
analyze our data meaningfully so sorry
sorry then uh when it comes to
analysis uh for example regression and
correlation does the K Square also does
it yes but we've not release that
feature yet but we're going to release
it shortly so keep you know um stand be
on a standby thank you very much thank
you you're welcome yes
yeah any other
questions okay thank you very much yes
my question is
H are there limitations to the number of
responses that uh this uh T squares can
actually take and uh do you do you have
any limited time for data collection or
possibly you set it thank
you no you can you can set the data
collection to last you one century
hopefully you stay alive by the end of
that time and you can you can that can
there's no limit to the platform in
terms of number of responses you can
collect data for millions or billions of
people the platform is scalable and so
that means that it can it can
accommodate no matter the number of
responses that you have so there are no
limits on either the time the duration
of your survey or number of responses
that can be
captured all right thank you you're
welcome any other
questions yes
tamy tioso M mcken your hand was up for
some time I think you gave up but if you
still want to ask a question please feel
free to do so okay yes there you go you
can I'm sorry it was up earlier on I
forgot to just put it down
okay okay uh
Susie yes um hello good evening everyone
hello so um Su I'm sus H from um bway
State
University I'm also carrying out a
research and I wanted to know because I
have I currently have four variables so
it means I'm going to carry out a survey
for four different sets of questions so
I want to know how do I integrate that
in the kai squares do I take them
separately or can it be done as a single
research a single survey per person
because each person will be required to
answer the four separate surve so I just
want to know how to upload separate the
separate um questionnaires and make it
like a single document thank you so if
you have so let me get this right you
have four different pieces of documents
and you want to combine them
together yes can I do that or do I um
fragmentize it no you can you can you
can do what so the different for are
they different sections or what yes
they're like different I have like
demographics first then I have section A
for a variable section B section c to
the e that's so how do I do that so what
you need to do is that you add sections
so on the platform you can come here see
see it says add section header so let's
say let's add a section let's come here
and say let's go back let's add new
survey item and let's call this you see
here where it says prompt let's create a
prompt and let's select the type of
prompt as section header prompt right
okay and let's call this
demographics
okay and so it says let's just add some
description this section
six to
understand the your your
demographic
background right so we add push a
question
here and let's add another section
header prompt so that that addresses you
know at least the plurality of the
issue let's add that let's let's call
this
um let's call this um tobacco
use
history and let's describe it as here we
want
to
know your tobacco
use I will push the
questionnaire okay and so we come to our
first question right and we edit this
and we click add section Heather front
you see this
okay we check that to be on and then we
select which header we want let's say we
want demographics on that one we update
the
question right and then let's see on
tobacco use history we come
here
um and we say let's add the section
header prompt here
too um and we select tobacco use history
we update that right so when we're now
previewing our survey it will appear in
sections like
that um so you see
demographics this
sections background that has been added
that you see it wasn't there before
right
okay tobac use history so then the user
takes all the questions under that
section and then so no you don't have to
divide your question you don't to have
you don't have to have separate um
surveys you can just have that's the
whole point of having those those um
those prompts so that you can add and
describe things so that it Mak sense
participants when they're taking it I
hope that answers your question
beautifully well thank you right
great any other
questions
H hello so I see a question about
organizational plans um so here let me
let me have a slide for that so for
those of you who are in
Nigeria um the national universities
commission has let me find out that
slide H the the national universities
commission has a has paid has um
sponsored a national license for all the
universities in Nigeria right um so but
we we we will need to have I think
setting setting it up is it's a
logistical nightmare so what you could
do is you could get in touch with
whoever is maybe this could be at the
level of Vice Chancellor to get in touch
with us so that we can help them set it
up because the the the the the
institutional accounts have already been
set up the problem is now is getting in
touch with people in those institutions
so what you could do to accelerate that
process is you could get in touch with
your Vice Chancellor the central office
of research and let them reach out to
and we will set set up the institutional
account for them now if you're in
another institution not in Nigeria or if
you are in a poly Technic in Nigeria you
can also have your institution reach out
to us right and so we can also set up an
Institutional account but this has
already been sponsored by the national
universities commission so for those of
you in the audience who are in Nigerian
universities you can you can arrange
with your your you know the vice chanc
know the office of central office of
research there can only be one
administrator in that institution so the
way it works is we will provide the each
institution has a special verification
code which we will
provide and the person who enters a
verification the account for the whole
institution and can invite everybody in
that institution so there can only be
one administrator so this is how you can
help in that regard you can reach out to
the folks in your um your your Vice
Chancellor's office or your dinner of
research the process of of of um
expanding Rich into all the univers 247
universities um and if you have any
questions on that let me know
please hello sir yes please you can ask
your question yes yes I I am Paul Lei
fromo State College of Technology aqu in
Nigeria I have two questions to ask very
briefly and the first one is you know
normally we take the consent of the
respondents whenever we want to ask them
or want to administer our question then
I want to ask does k s provides this
kind of um
uh h no receiving the consent from the
respondent before being administered the
question that's number one then number
two is that the link we are sending I I
know that it's going to be uh a link
that will be generated from the K and by
the time when we send it to all the
respondents that we want them to to to
respond to it is he going to give them
their ID number separately or it's going
to be single uh ID
number thank you sir good question so
yes um your first question about the
consent form yes because of course it's
a research platform so we have consent
form so if you want to add a consent
form to this survey you come here under
questionnaire click there and you see
consent to assent forms click that and
right now we don't have a consent form
in this survey yet so you can click to
generate one you can type whatever you
want here if you don't know how to
generate what the conent form looks like
you can click on this button that says
generate default content and the
platform writes about the default
content for you and so you can submit
that so that if I'm taking the survey
now let's preview the survey
again now and I start the survey the
first thing I have to I see is the
consent form right and then if I accept
it then I can proceed to the survey so I
hope that aners your first question yes
sir thank you very much sir and the
second question was about the unique
session ID so remember that um let me I
think the best way to look at this is
to so this is a surve we just took so
you can see that each person has a
unique ID right you have a unique ID
that is unique to you so if you don't if
you post that serve meid time and you
came back to continue that survey on
that same device the platform knows
right so so even though it's the same
link that is shared to everybody
everybody has their own unique ID on the
platform right and so if you're doing a
lunch study where you following people
over time when you when the survey
administers the survey the same person
the next time they will still maintain
that same unique ID so this unique ID is
special to that person and stays with
them forever and ever I hope that
answers the question yes sir thank you
very much sir all right great um the
other question was can there be more
yeah you can have more than one question
in one section yes you can you can the
idea is you select the question so if
let's say questions one two and three
are demographics I only have to put the
The Prompt under question one so
questions two and questions three would
then not have a prompt because they're
following under that question and then I
can put the next prompt on question
number four I I hope that makes sense so
yes you can have many questions that
fall on um same prompt or if you want
you can you can have all of the
questions have the prompt displayed at
the top that's your own personal
preference any other questions now I had
I saw another question that says what
about n so this this um this link here
this is just because this body reached
out to create this special access for
their members right now if if your
institution also wants to create so for
for for for the question that says what
about Nigerians in diaspora you can
reach out to us and we can also provide
institutional access for your
institution so this there's nothing
special about Nu or Nigeria it was just
that this organization wanted access for
all of the members and so that's why we
have this special Arrangement set up for
them I hope that makes sense so
regardless of where you leave whether
you are in Ghana or Canada or you the us
or wherever if you want your institution
to have access you can reach out to us
you can have someone in that institution
maybe at the level of the the the the of
research or somewhere somebody in that
kind of role reach out to us and we can
set up the institutional access so I
hope that answers that question as
well any other questions can students
have access to the platform of course so
institutional access means everybody the
institution students faculty researchers
everybody how sorry how do I
download how do I download
the Excel version of questions into CH
Square we have done the word version
let's see if I had an Excel version how
would I download it into CH Square
thanks so right now you have to get in
word so we we we accept the questioners
in the form of Microsoft Word so if you
have the questions in Microsoft Word
Microsoft Excel just copy them literally
and paste them in word and import it
that
way does that make sense makes sense
okay how about independent researchers
um if you're an independent researcher
you can you can um L do do you want to
share the the special access code we
have yes I shared it and um yeah I
thought we would just talk about it
before the I'm looking for that um link
again but I shared the link earlier okay
uh select
so let me look for it and see if I can
drop it here too the invitation right
yeah the link that invites people to our
special yes yes I just I just sent it I
just sh shared it now okay so you can
also if you you can also click on that
link that that um invites you to join
our own special organization as Kai
squares so that that gives you access
for the next you know year or
so can kqu generate biodata pie chat yes
it it
does depending on the kind of data the
platform automatically knows the kind of
data you have and so the kind of
illustration of figure it creates is
specific to that so it's not just
generating random figures it knows the
kind of data you have and the kind of
kind of illustration of figure that is
well suited for
it um can I use more than one
questionnaire for different
study um can I use more than one
question for different stud I don't
understand the question are you saying
you can use one questionnaire for
different studies yes you
can how does University contact kqu for
institutional license you can you can
simply reach out to us at info kqu
quest.com so that's the email info@ kqu
quest.com the email was am I still
sharing my
screen
no um so that's the email here so your
somebody from the University level will
reach out to us and we'll make that
arrangement for your
institution any other questions um EV
has a question he asked about logic
earlier but I can't find his question
anymore Evans do you want to
speak can you raise your hand so I can
allow you to
speak okay
all right okay thank you uh
doc yes can you hear me yeah we can hear
you clearly okay
so um I tried using the platform to
create a
questionnaire and anytime I add a logic
and I go back to maybe correct my
questions the arrangement
changes yes how do I correct that okay
so let me that question
so um so over here you see
under this text here the sequence of
questions displayed here may not align
with your pre-arranged order which
reflects the final sequence participants
will encounter click here to view the
final Arrangement this is the final
Arrangement now if you look at this hint
here the hint beside says before
implementing any survey logic ensure you
finalize the question order rearranging
questions after setting up logic will
require deleting the previous logic and
setting it up again from scratch
remember that surve logic is dependent
on the order in which questions are
right so if you if you have a question
Dr yes were you try were you sharing
your screen we can't see anything oh
really yes oh okay
SL sorry can you see now thank you I
didn't notice yes yes okay so what I was
saying was here this hint here it's
saying before the point of that is that
it's what is telling you that if you
have you should make sure you have
satisfied with the final order of the
questions before you go and start
implementing logic because once you
implement logic and you addit the
question because you have you have to
think about how sensitive logic is Right
logic means that by are saying if so and
so values are met then take this
action so it depends on the value of the
question it also depends on the position
those are very very serious things so if
that means that if you make any change
to a question after you set up logic we
don't know how it impacts the validity
right so the only safe thing for us to
do is to completely cancel that logic
and have you set it up again like we
value convenience a lot obviously we we
we try to prioritize user convenience
but when user convenience would
negatively impact the data of quality of
data that's when we say no you have to
just suffer a little right and that's
why we provide a hint that says if you
are going to set up logic make sure you
have finalize a questionnaire make sure
you like how the wording is and the
arrangement because should you set up
logic and you go back and change the
questionnaire and you move things around
obviously that means the logic is
compromised the only way to address that
is to have you start all over again I
hope I hope that understand that that
makes
sense yeah you're done thank you you're
welcome
um
Margaret yeah thank
you you hear me yeah we can hear you
clearly yes yeah thank thank you very
much for the I just want to ask about
the institutional
Arrangement our privately owned
universities also covered can private
universities also yes that's correct
okay thank you
and thank you are you are you in a
Nigerian private university yes yes yes
in
Nigeria the everything already is
existing right the Verification codes
everything has been set up we just need
the the responsible person from those
institutions to reach out to us so if
they reach out to us tomorrow you can
start enjoying institutional access
tomorrow because everything has been set
up
already so yes so they can reach us at
that email info@ kqu
okay thank you you're
welcome what what university are you
from if I might
ask I'm from a CL University in
okay okay all right so because I know we
have a whole bunch of private
universities that set it up already
yesterday or so but that was from
another other univ so yeah you can you
can have folks from University reach out
to
us is all right
and again if you from outside Nigeria or
from another University again you can
also have people from your institution
reach out to us um and we will set up
the institutional arrangement for
them any other
questions yes
there's
Gabriel and
Andy
will hello
Gabriel hello yeah we can hear you
Gabriel yes we can ask your
question yeah so uh hello this is this
is Gabriel from Liberia please is it
possible to access uh last Friday
webinar especially for those of us who
were on able to log in please is it
possible and if possible where can we
access it
please so I'll send it uh together with
the recording today's
recording let's say bye
tomorrow uh via
email yes via email so the emails email
addresses you use to register for the
webinar I will use those to when I
send okay thank you so
kindly all righty and and we also by the
way we also just to emphasize we also
willing to if we set up institutional
access we also have willing to have our
staff to provide um training obviously a
tool is only as good as the training you
have on it right so we we also provide
trainings to your institution and so
these are some of the things we can set
up with um whoever is the ad
administrator in your in your
institution okay so there was a comment
from funi again well funi wanted to do
something on sta
today is there anything you can do about
that Dr gaku some homework or something
okay
um they we already gave you the manual
right the manual they you remember
manual was share last time you are
supposed to be replicating every single
thing in that manual on your own that is
your homework for the whole four weeks
so the sessions in the the style of
teaching is not that we would have hey
this is the you know it's supposed to be
self-directed so so please go to that
manual and you know I don't know if if
if some of you don't have the manual and
L please can you can share it again so
it the manual follows the 10 step
process step one step two follow those
steps one by one by the end of this
webinar series you
should that's how so you are the one who
who will teach yourself right it's it's
learning how to learn and then when you
have problems we can then engage on
those problems in
class so so there your homework has
already been assigned to you which is
the following the step by-step um
procedures outlined in the
manual yeah I'll send it
along with the recording those who were
here last week should have received it
because I I sent it to everyone so I'll
do that again um Andy I don't know if
you want to to say
something uh I've
been giving
you permission to talk as well as
drumming so anyway any other questions
let's see Charles
Charles what are you
from thanks
Jer can you hear me hello Charles ask
yes we can hear you okay thank you and
good
evening I had um actually raised my hand
multiple times but anyway um I just want
to ask about qualitative um qualitative
data
collection
um the the
the the code or the pattern you told us
about
at BQ or at T at all of that is that the
same method we will use if we want to
upload a qualitative uh interview guide
in this uh case that's one secondly like
somebody asked that I think Susie asked
about including demographics and when
you included it automatically it came to
the Top If I want to order if I want to
reorder the positioning of my variables
will I be able to do that for instance
assuming demographics is something I can
move to the middle or maybe I just want
to create another section in between how
do I go about that and then finally
the I was asking for some of us who are
currently not in the country but we've
been paying our taxes before we left and
Nu has
graciously
um I'm sure you know where I'm going to
how can you help
those thank you thank you I was light
note CH that is hilarious but I feel you
okay let me answer your questions one
after the other so let's go to the first
question which was how do you um so what
if the question is not so every question
starts with ADD at CU but it depends on
what follows is and so first of all you
can type these things manually directly
in the platform but this is in case you
already typed the Microsoft Word how do
you not avoid how do you avoid copy and
paste right so that's whole that's the
whole context so let's say you have this
question that says please
describe your pain
right that's that's in in in in many
words so this is a this is an open-ended
question and so on the platform how we
indicate an open question is a question
mark right so so everything starts with
ADD add q but then the platform allows
and if if you're not sure what to do
just leave as ADD Q and then when you
import you can change a question type so
if you don't want to memorize all this
hashtag and whatever just leave as at at
Q and end it with three hashtags and
when you get to the platform you you
when you when you remember everything is
flagged when you create a question such
that when you now edit you can change
the question type when you're are
creating it so for example here let me
go back to that uh get created
so when you import it you can choose to
say okay this question type um you know
is is it's it's this other question type
I have but the idea is that you know um
the the tags just allows you to be able
to to import easily and if you want to
know more we have on our website we have
let me show you K Quest we have that
information on our website allows you to
be able
to um
so if you come to I think
features questionnaire
import youa so you have these are the
different
um the different symbols that allow you
and we've we've tried to be very very
smart with these symbols right like plus
means you have more than one
answer um this Arrow this Arrow looks
like a sliding scale if you can if you
can visualize it so this is
this is like a sliding scale the the
upward Arrow looks like upload so that's
why it's there right this is questions
where you can upload a file the number
symbol means questions where you have a
number the question mark is open-ended
questions the slash think of when you're
writing date you write this slash this
slash that that's why we use a slash for
date questions and then you have things
like um this one um licked scale
question um this kind looks like an L
like a funky L right maybe I'm imagining
you know but it's it's done in such a
way that it's kind of easy for you to
figure out it figure it out but the
whole idea is that if you type your
questions in a Word document and you
don't want to start
manually editing a question type then
you can just you can just simply format
your questionnaire like that so that the
platform knows exactly what question it
is and then you can upload it okay and
then that was the first question second
question was yeah can you can you move
questions yeah you can move questions of
course you can move questions however
you want so let's say that let me find
where's my server
now um not
that sorry I lost my
place so let's say this is my survey and
this is the survey we're
doing and and I want to move here AG
number one you can come to logic and
come to rearrange questions you can move
anything anywhere you want see that you
there are two ways to do that you can
come here and select the number but
that's a very slow way of doing it I
just prefer grabbing it and moving it to
where I want so you can move things
wherever you want and that is where
they'll stay and once you're done you
will say apply but know that once you
apply any logic you set will be will be
will be condemned and you have to set it
up again right um and that's why
automatically as I moved it here the
platform automatically throws me a
flag and the flags is telling me that
this one that you move the question that
means that the ski patterns you set are
no longer valid because you've moved
you've moved the position of a question
so that means that the order in which
people were supposed to move has changed
so you have to go and set that flag
again so when I go there I have to
delete this and then uh um and then you
know and you can if you hover over the
stuff it tells you why you have a
flag so here it says that you have
repositioned your survey questions and
this may have affected your survey logic
that relies on order of questions and so
in this case I have to delete it and
create it all over again so that was a
second question the how to rearrange
questions yeah you can move questions
any way you want and that's all fine the
next question was those of you who are
in foreign institutions yeah like I said
you can have you can have your
institution contact us and if if if you
don't you know um we also shared a link
if you click on that link that L shared
that gives you access to our
organization so we welcome you into our
own organization and give you access for
a year so that you have two ways of
doing it you can contact your
institution if you really have if you
for example you might be a professor in
the school and you want your students to
have access as well so that way you set
it up for them or you can just use the
link we provided and you can also have
access that way it's
fine um but if there's no logic yes so
if you don't have any logic you have no
issues to worry yes if there's no logic
um so will the instiution pay for the
next 12 months no they don't have to pay
so for the next 12 months it's going to
be you know um it's it's going to be
free for the institutions
all right any other
questions can you show I don't know if
you can show us again how you downloaded
the format in which the questionnaire
should be in when
um when someone wants
to upload question yeah from there okay
someone so there are two ways to do that
right um if you're already in the
platform if you go to add of a question
and you say oh sorry let's go back you
say um you click on here you say you
want to import from Word
document and there's a sample
questionnaire already
there right you can download a sample
and and it will show you how
examples of so this is what I just
downloaded it's saying this is the title
we've
highlighted every single question
supported is in that document just to
show you how it's done so this is the
title it must start with two art signs
followed by t t for
title if you have some information that
is like a prompt I for
information then sliding scale so you
can go through that and it gives you
this is an example now if all else fails
you know you can always ask quick guys
you should Le how to make use of quick
questions this is supposed to help you
so any question you have you can say
come down here and say how can I
import a
survey question
here that is in the form of award
documents
okay now if if you don't like to answer
it's AI so the answer might be a bit you
know funky you can try and regenerate a
response um and
then can you can reach out to us but
again um like I said if you already on
the platform you can look at the
templates or if you're on the website
you can go to again our website K
squares.com
and under
you go to word questionnaire
input and you can also access the sample
questioner from here too with full
formatting so either way you should be
covered so qualitative data so I think
we already answered the question about
qualitative data um we said yes we you
put a question mark in front of it to
tell that an open-ended question if it's
a question that you're asking them to uh
upload an item then it's a the up upload
it's up
Arrow you still speaking no I'm done
yeah okay uh Lydia asked can I download
my answered questionnaire from Google
form into kqu for cleaning and Analysis
no you cannot unless that it has been
you know um because again remember why
the platform does all it does is because
when you collect data the the platform
has special registers so let me show you
again if I can find it
um so when when the CL platform collects
data um you can see that the the
responses have each variable has a
special code in front of it this is
xcore onecore 26 uncore onecore 0 and so
on and so forth right this is the
genetic print that allows a platform to
determine what kind of variable this is
and what you know is what format it's in
what kind of analysis should be done so
these are special instructions that I
built into the code of the data
intrinsically when you import your data
there's no way this does not exist so
the platform has no way of knowing
whether this is a categorical variable
or whether it has no way of doing that
so so it's it that's why it's impossible
for you to just grab data you collected
on Google forms and bring it to the K
platform you have have to have set the
platform set your questionnaire on the
platform before you can before you can
make your life easy um so can research
institutions have access to this yes of
course if you if you want your research
institute to have access again reach out
to us at info@ kai.com and we'll make
the necessary arrangements for you and
once your in instiution have access we
can also create special sessions just
for your institution so that it's more
tailored to your specific needs
institutionally there's courage still
here I want to be able to draw a map of
oh sorry uh Lydia are you is your
question is is that all you wanted to
ask I see you your hand was also
raised
okay um yeah lyia
okay we can't hear you if you're
speaking but we see you
unmuted
okay I want to be able to draw a map of
respondents at the end of the survey how
do I get the GPS location for my map so
we we we captured data for GPS locations
but we don't release it in the public
data because because that breaches
security and that breaches
anonymity imagine you are collecting
data on very sensitive issues and you
ask the individual says yeah you know I
just killed somebody and I I you know
and I have HIV and I have that
collecting their data their geolocation
means that we can be able to pinpoint
every single respondent to where they
are that is no longer Anonymous data so
for those reasons the platform does not
does not in the public release would not
release data on participants location
because the platform has to comply with
certain standards security standards and
so for that that is the reason why we
don't release we don't capture that data
I hope that that makes sense to
you excuse me Doc yes please go ahead
once more there was a question that was
asked around the cookbook formulation
and your response was that the cookbook
H is formulated from the
metadata let's say for example H I
wanted to add my own jagon for example
do means a
doctor H which is not part of the
metadata how would I be able to do that
thank you well you can create when you
are creating a question right you can
add DOC for doctor as the label so for
example right let's go back to our
survey and let's say you are creating a
question that says um which of the
following
um um how
long have you
been a
doctor my my label could be
Doc right and then let's say I want to
ask them in number of years so I put
lower and upper limits and I say okay
they from zero years to 100 years that's
how long they've been a doctor so that
this is a question they ask how long
have you been a doctor they can select
the number whatever number they they
want so that when I analyze the data in
the code book the variable name will be
Doc right doc that whatever you have
provided is captured is automatically
now metadata so you can specify whatever
the labels are and automatically those
become metadata that are now used in
construction of the variable names and
in designing the architecture of the
data does that make sense I understand
where you're coming from from a data in
terms of
Association that happens in the
background but now let's say like
semantically I want to say like I have
various doctors a doctor that removed
teeth is whatever Jon that we use in our
organization a doctor that cuts hair
whatever jugon that we use use in our
institution how would I like H form my
own
taxonomy that's my question so are you
talking about in the report or
what because in the report maybe I put
it as an abbreviations I want to put in
some abbreviations and then say even in
a legal document I want to say like a
exit means coming out of the door rather
than exit means coming out of the
window so um so you can you know you can
find a way of incorporating that those
those elements in the in in whatever
form you want right in the question here
so for example if your question is a a
multiple response question you can and
you have different types of doctors that
you're trying to assess um so let's say
you have you know doctor who removes
teeth right right um or you have eye
doctor whatever it is right so what
maybe the question is what kind of
doctor are
you and and you have different Jons and
you can say you can say dentist here and
you can say h for example oftalmology
here whatever you want you can type it
right but then it not reduces to what
label do we need to capture it with what
the label is a variable name eventually
so you would have to assign the variable
names you can you can ask AI to suggest
it for you AI can say Okay based on what
you provided here I think that this are
the these are some these are some nice
variable names you can assign so AI is
suggesting that we should call it for
example tooth taker that is rather
hilarious or dental dog that is more
dignifying let's put that or it says I
doctor um you can again type whatever
you want right or you can ask a
suggested for you and oops
and so it's it suggests and then you can
um you can then so whatever you're
typing can be here so that when the
participant is looking at the question
it says what kind of eye doctor are you
doctor removes teeth eye doctor it is
this is what the participant is seeing
the metadata the participant doesn't
care about that but that is what
eventually goes into the background to
form the data structure which will be
using your
analysis does that make sense yeah that
makes sense importing these uh uh
variables H equatable to what you have
typed how would you then
like import it into a report to say like
I want the specific H variables only to
be imported in the report oh so so when
you are doing analysis you say you want
to restrict your analysis for example
only to I doctor
or you want
so
okay sorry sorry I'm interrupting please
go ahead yeah what I want to achieve is
when I create a report I wanted this
taxonomies that I have created here H to
form an abbreviation
list on my report okay so the platform
the platform does all the so when you
when you do your an when you generate
the analysis report the platform
analyzes all the questions for you right
you can also decide to say I I don't
want my analysis to be among everybody I
want my analysis to be only for a subset
of a population let's say I only want my
analysis to be among people who say
there are you know they have they have
they used tobacco or they've been been
to an eye doctor right and then I can
say if tobac whatever variable is equal
to this then analyze my population only
among that group so when you download
your analysis report it's only among the
population you want so you can then take
that report and do whatever you want you
can you canci to do whatever you want
but the platform what it does for you is
to say we're going to do your entire
analysis for you we can either analyze
it for the whole population or we can
analyze it for a subset that you decide
right but whatever you decide the
platform will do that for you what you
take with the results and do is now left
to
you can the subset now be a subset of a
particular like a unique ID in terms of
their responses yeah of course yes it
can be it can be and you can Define that
subset however you want right you can
Define it by many variables you can say
Okay I want my subset to be defined by
tobacco history and by their age last
time and by their legal smoking and you
define whether tobacco history is this
and you can have all ages exactly
whatever you want right you you define
that by whatever criteria you have and
then the platform will then perform the
analysis exactly based on what you have
specified maybe you don't understand my
question let's say like I've got 10
respondents from the 10 respondents
respondents one to 10 with unique
identification numbers H then I want a
respondent number nine all how
respondent number nine has responded can
I be able to throw a report on that
respondent number nine only oh if you
want respond number nine only you then
you go download the RO data set and you
look at respon number nine in the Raw
data set you go to the number nine and
look at their responses you cannot
analyze data for one person analysis by
definition means an aggregation it's
it's it's a summary we cannot summarize
data for one
person so if you want the sorry how
would I import this Excel data into just
that one row into some form of a Word
document you can copy and paste you can
that is a manual process for what
however you want do it right if I want
just for this person then I can copy and
paste I can do whatever but that's
completely up to you at that point okay
I'm answer thanks okay
great okay any any other
question yes can one use yes and no not
now it's we're working on the feature
which will be released soon but we'll
let you know once it's released but for
now it's that you have collected in the
survey on the
platform for a study on geospatial
modeling how the researcher can get the
geographic coordinates of surveyed
points at least at Village level if the
platform does not show this information
due to security issues okay that's a
good question uh I mean on for special
request right we we can accommodate
special request on a Case by case basis
and release due data right so that
that's that's a special exception plus
um but for other cases you can choose to
capture that so if you have a if it's a
if it's a rural survey you're doing and
you're going to you're going house to
house you can also include a special
variable for the launch latitudes and
collect that data although I think the
easiest thing for you is that if you are
especially doing data that involves
geospatial analysis then you can you can
write your special to you know um to to
release and we'll find a way of making
sure that the responses then are no
longer at individual level but
aggregated so that way we meet your
needs without putting individuals
responses at at jopy so that might be
your best bet in that kind of case
I hope that answers your
question yeah doc the one last question
yes in programming we have got a what we
call a API a way of doing things by
calling a particular object so that you
can be able to achieve
a a lot from a database let's say for
example I go to a bank I can just put my
fingerprint in the bank's teller the
bank
API will interrogate home Affairs to
Define who I am and that's the logic
around programming somebody has got to
program that H in data analysis H you're
saying you have a different pockets of
data that is situated in various
institutions um for one within America
you have got that institution that you
showed us the jmbb data set I mean where
they allow you access access to a
particular data then you can do a
statistical data analysis so that you
can write a journal let's say for
example in other countries maybe this
question might not be posed directly to
you but to any other person that is
listening from a South African
perspective note how would I be let's
say for example I was interested in just
a hypothetical example in ensuring that
I can deter a crime within South Africa
then I say for
example any other person that is going
to like come into the
estate that I'm living
in I need to have access to a repository
from home Affairs of the people that
live in our
estate how would I go about asking a
particular government Department of
their
data um that is beyond the scope of this
course or platform because that has more
to do with like you know official or
administrative requests of special
access to a government agency but you
would have to just check up with the
whatever government that is um and you
know and see whether that's that's data
that will be released right because
there also there also caveats that
govern how data are released even from
government agencies to the public and
one of the concerns obviously security
that we're not releasing data that has
you know um identifiable or potentially
identifiable information but that is
that will beyond the scope of this
session how if I can ask how did you
gain access to like maybe the SDC data
or the GMB data that oh those ones are
publicly available so if you go for
example to CDC website for example CDC
and H right you see the data sets are
all publicly available so you can come
here questionnaire data sets and related
information you can download the data
sets and analyze it whatever you want
different countries have different um
you know different ways of providing
data but not all countries do so that's
why it's very country specific so you
would have to really look into the
country that you are interested in
specifically okay thanks Doc you're
welcome
okay
um next what any other
question no
okay
uh you can unmute yourself and speak TI
all
right okay can you hear me now yeah we
can hear you yes we right thank you sir
thank you for the for the lecture it's
quite insightful now I still have a
question concerning this is of signing
for one year
subscription and that a link was sent
earlier and I WR I wrote my question on
the platform I think it was not really
answered I click on that link it took me
back to my login page my dashboard and I
couldn't really I don't understand what
to do again so I need I need more some
clarity on that
okay um now the I'm assuming you use the
link that was for our institution as
Skai squares
right
so to give your name and your email
right are you still with me
okay U maybe
we yes okay yeah I'm still with you sir
I I didn't hear you the last I'm saying
that I assume that when you clicked on
that form it act it brought you to a
form where where it asked you to enter
your name and email is that
correct yes okay okay I will sign okay
log in again and write my name and email
my email as address yeah click on that
link
provide your name and email right it
will then and submit it will generate a
special email to you and you should
check your spam if you did not get that
email once you click on access what that
does is that it automatically adds you
to our own special institutional account
so that you have access but again that
will be for for the next 12 months or so
what if are you are you a member of any
academic institution in
Nigeria no I'm a student one of the
universities in Nigeria yeah so you're
member of a community which which
university is
that as a private university Le City
University yeah so Lead City yeah so we
have insal access for them too so your
best bet will be to get somebody from
your University to contact us so that we
can set up a an Institutional account
for them so that that way because the
sponsorship from Nu covers all
universities both private and public so
that would be your more permanent
solution so again get somebody from your
institution to contact us at info@ kol .
because the insti insal accounts have
already been set up we just need a
member from the ad institution who will
be the ad thank you sir you're
welcome so I can see your
hand apologies La hand but I managed to
get a hold of the question that I was
asking that I asked you earlier on in in
terms of the public available data set
within South Africa okay
great okay any other last
questions
um
okay I think we've covered all your
questions
[Music]
comprehensively all right well thanks
for thanks for joining the session and
um we look forward to seeing you next
week and um Henry you have your hand up
and you want to ask one last quick
question before we
leave okay yeah um can you hear me now
yeah we can hear you clearly thank you
very much do um is about the
Str U I don't know we didn't talk about
the STA um
today so I came late yeah today we focus
on data cleaning and one of the things
we established at the beginning of the
class was that sta is not well conducive
for data cleaning so we said that when
you're analyzing data you have to use we
you have to use the simplest tool that
gets the job done right so even if
you're using data for your work sta
doesn't St is not designed for data
cleaning so we went through the whole
process of cleaning your data in Excel
so that you can take it to to STA and
analyze sta is designed for analysis not
for data cleaning so you're going to
struggle a lot if you're trying to clean
your data in data does that make
sense yeah yeah so that's why this
session was focused on data
cleaning okay because I Le that um the
validity is for a week I
think that's um that's the license yes
that's that's correct unfortunately we
don't have control over that that's
that's that's an external
agency wish I wish give us for one month
but they could only I mean this this
this things cost money in time so I I
understand why they can only issue it
for for a
week okay so just for us to use it
ourself yes exactly yes and then if you
want you have to go and buy and but that
is that has nothing to do with us and
and that's just sta as a as a company
um all right um was nice chatting with
you guys and I look forward to seeing
you next week enjoy the rest of your
day thanks Dr gaku you're welcomeing bye
and thanks
everyone
e
e
e
e for