what are standardized tests and how are they used?
tests that are usually commercially prepared for nationwide use and designed to provide accurate and meaningful information on students’ performance RELATIVE TO THAT OF OTHERS AT THEIR AGE OR GRADE LEVELS Ex. GRE, SAT, ACT
they are given under the same conditions every time; includes same directions, same mode of response, same time limits, same rules about taking breaks, order in which the test is given, etc
what are norms?
standards that are derived from the test scores of a sample of people who are similar to those who will take the test –Allows you to compare individual’s score to the average score of those
a test designed to measure general ABILITIES and to PREDICT future performance. Aka ability tests; schooling wont affect aptitude tests —Typically norm referenced —Can be broad or narrow General intellectual aptitude Mechanical ability Perceptual abilities
standardized tests measuring how much students have learned in a given context. can be norm (compare with students their age)referenced or criterion (mastery of specific skills to a standard)referenced
how is iq measured?
Intelligence Alfred Binet – developed the first IQ test
which tested for mental characteristics, skills, such as memory, knowledge, problem solving and vocab
used to be: (mental age/chronological age x100)
mean (norm or Average score) is 100, Standard Deviation of 15 Intelligence is normally distributed
what is intelligence
general aptitude for learning, often measured by ability to deal with abstractions and to solve problems
what is mental age?
the average test scores received by individuals of a given chronological age
what is chronological age?
the age of an individual in years
what are multifactor aptitude batteries?
a test that predicts ability to learn a variety of specific skills and types of knowledge
breakdown of more specific skills Range of content and emphases –SAT –Differential Aptitude Test –Cognitive Abilities Test –Test of Cognitive Skills
standardized tests that include several subtests designed to measure knowledge of particular subjects Ex. California Achievement Test, Iowa Test of Basic Skills
tests of specific skills used to identify students’ needs and to guide instruction; available mostly for reading and math
produce much more detail information than other achievement tests. A typical math achievement test may give scores for math computation, concepts and applications: a diagnosistic test would give scores on much more specific skills- eg adding decimals, multiplitying fractions, long division, etc. —typically diagnostics tests designed to show areas of strength and weakness in skills- which is very useful
cut off scores
the score designated as the minimum necessary to demonstrate mastery of a subject
values computed from raw scores(number of items correct) that relate students’ performances to those of a norming group
what is a percentile score and what does it tell you?
indicates the percentage of students in the norming group who scored lower than a particular score
a percentile of 33% indicates that the person scored higher than 33% of others who took the test; a percentile score of 90 says that they scored better than 90% of people.. The change in perctile ranks is not equivalent to the change in raw scores. For example, a 5 point incraese in raw scores would make a huge difference in in the middle of the distrubtion (15%), but a 5 point increase once you are at 90% would only be a 1% increase
what is a grade equivalent score?
indicates at what grade level norm the child performed
If a fourth grade child scored at a seventh grade level, he scored in the same range as a seventh grade student would on the same information; =Rough Approximation
A bell shaped symmetrical distribution (bell curve, in which most scores fall near the mean, with progressively fewer occurring as the distance from the mean increases. Ex. weight and height, IQ
65% score witihn 15 points of the mean, aka average score: so since IQ is nomrally distrubuted with an average score of 100, it means 68% score between 85-115 -95% score within 30 points: 70-130 -and because intelligence is normally distrubted, the amount of people with scores lower than 70(MR) or scores higher than 130 is very very small(highly intelligent)
a statistical measure of the degree of dispersion in a distribution of scores indicates how spread out the distribution will be
the average amount of variation, or deviation from the average score. -So the average score is 100, plus or minus 15: means the average range of scores is between 85-115.
what is a stanine?
(aka. standardized nine)a type of standardized score ranging from 1-9, having a mean of 5 and a standard deviation of 2
reported as whole numbers—each stanine represents .5 standard deviation
what is validity?
A measure of the degree to which a test is appropriate for its intended use or purpose DOES THE TEST, TEST what its supposed to be testing
what is content evidence(validity)? ***most important aspect of validity
degree to which the content of a test matches the content of instruction
assessment of the degree of overlap between what is taught and what is tested
degree to which performance on a test are related to scores on another test measuring the same trait
–Does scores on this test predict future performance? –Are scores on this test similar to scores on another test which assess the same thing?
Ex. Does score on SAT predict GPA?
a type of criterion related evidence that exists when scores on a test are related to scores from a measure of a trait that the test could be used to predict
might help predict future behavior
a type of criterion related validity that exists when score on a test are related to scores from another measure of the same or very similar traits
similar tests yield comparable results
a type of evidence about validity that exists when scores on a test are unrelated to scores from one or more measures of other traits when educational or psychological theory about these traits predicts that they should be unrelated
(mechanics test should show that they can put together a machine, but shouldnt necessarily coordinate with their verbal aptitude for example)
The accuracy or CONSISTENCY of a set of scores on a measure; the degree to which you trust the scores on a test are accurate and not due to error.
All measurement has some degree of error: there is no such thing as perfect reliability. —reliability talks about the test, NOT the person
An undesirable characteristic of tests in which item content reliably discriminates against certain students by basis of race, SES, ethnicity or gender —-Items assess knowledge or skills that favor one group over another Limited English Minority
purpose of NCLB legislation?
–instruction by scientifically based research
–supplement stat and local efforts to improve education for all children and eliminate the achievement gap between students from different backgrounds
–must have established subject content standards for reading, math and science
intended to move all children to success on their state standard by 2014
NCLB assessment provisions
each year the state must administer tests in reading, language arts, and math (and science) to students grades 3-8 and one year in HS. (performance on these test shows schools proficiency)
–95% must take high stakes tests–dissegregated reporting of scores (report scores for each school according to sub group)
–schools much make adequate yearly progress**
–consequences for not meeting AYP (depends on how many years failing)
problems /criticisms of NCLB
–presents real obstacles to helping students and strengthening public schools because it focuses on sanctions rather than assistance, mandates rather than support for effective programs
–severe accountability measures can lead schools to focus on a limited set of skills, at the expense of , for example social studies, are and music
–accountability pressures can lead to cheating, or you of practices that will increase scores without increasing learning
1) excessice and narrow testing– (losing SS, art music)
2) state to state variations in standards
how has NCLB impacted education
71% of districts reported reducing time in at least one subject to provide more time for reading & math
Respondents divided over whether this was good or bad —Some thought it shortchanged students —Others thought it helped them help low-achieving students
–Bad effects are documented but research doesn’t tell us how widespread they are Test anxiety; students with disabilities —More analysts find a positive relationship to achievement than don’t, especially for low-income students –It’s hard to separate the effects of high-stakes testing from the effects of different accountability mechanisms
how are NCLB accountability pressures contaminating assessment procedures?
teaching to the test can be: –Good if it means teaching a curriculum aligned to state standards and emphasizing higher order skills
–Bad if it means memorizing the content on test items and emphasizing test-taking skills
a statement of skills or concepts that students should master after a given period of instruction
breaking tasks down into fundamental subskills 1) identify prerequisite skills 2)identify component skills 3) plan how component skills will be assembled into the final skill
planning instruction by first setting goals, the setting unit objectives, and finally planning daily lessons
objectives that deal with students attitudes and values (because they affect how a student learns)
important to keep these in mind as well as cognitive objectives
how to write clear, specific objectives
specific- well defined skills or info with a single right answer, any student could get a certain amount of total correct
clear- specific enough to be meaningful avoid (know, understand, appreciate, grasp, enjoy) use (write, recite, identify, sort solve, construct)
1) knowledge (recall info) 2) comprehension (translating, interpreting, or extrapolating) —interpreting diagrams, graphs, etc 3) application (using principles to solve real life ?’s) —using geometric principles to fill a pool 4) analysis (breaking into simpler parts) —compare and contrasting 5) synthesis (creation of something new) —writing a composition, sentence in a for. lang. 6) evaluation (judging against a given standard) —compare in multiple aspects
Norm referenced evaluations
assessments that focus on the comparisons of a student’s scores with those of other students the same age; level of performance determined by relative position in some known group –gives you a relative percentile or rank
DISADVANTAGES –Poor test-teach overlap –Inconsistent content sampling among different tests (tests arent always equivalent) –Often used to make important decisions (retention, change placement) –Poor performance may be due to lack of instruction in areas measured by test
criterion referenced evaluations
assessments that rate how thoroughly students have mastered specific skills or areas of knowledge; Assesses students’ mastery of skills & subskills in a domain –Compared student performance to an established standard
Many statewide tests used criterion-referenced approach –Advanced –Proficient –Below Basic Often unclear how “mastery” criteria established (80%, 75%, etc.)—why is mastery set at the rate that it is set at
ADVANTAGES –Does not rely on normative comparisons –Directly tied to curriculum domain –Easily interpretable DISADVANTAGES –Not useful in eligibility determination –Not useful in continuous progress monitoring –Limited item sampling of a particular content domain
Table of Specifications
a list of instructional objectives and expected levels of understanding that guides test development
selected response items
test items which respondents can select from one or more possible answers, without requiring the scorer to interpret their response –Multiple choice –True/false –Matching
incorrect responses that are offered as alternative answers to a MC question (wording SHOULD NOT give away as bad answers–plausible, NOT tricky)
common mistake teachers make in constructing easy essay questions
failure to specify clearly the approximate details required in the response and its expected length (possible points doesnt tell length)
assessment of a collection of the student’s work in an area showing growth, self-reflection and achievement E.g. writing portfolio
Assessment of student’s ability to perform tasks in real-life contexts, not only to show knowledge. AKA authentic assessments E.g. driving test, dissertation
Degree to which any assessment procedure contributes to beneficial or improved outcomes for individuals
Use of test should be linked with students learning and/or behaving better
3 components for treatment validity
Inform instructional decisions —Does this test help me select an appropriate intervention? —Does the test help me target areas of difficulty? —Does the information allow me to monitor the student’s progress and make appropriate changes to my instruction??
Make a difference in student’s outcomes —Do the instructional decisions made on the basis of this assessment lead to better learning outcomes for students?
Lead to better educational programs —Assessment information allows you to determine if this is an effective curriculum for your class?
measurement of student performance in academic and other areas; used to determine appropriate teaching and behavior support strategies)
Research finds that students learn more in courses that use tests than in those that do not
ex. tests, graded assignments, quizzes
Student evaluation serves 6 primary purposes:
1. student feedback – should be SPECIFIC 2. Feedback to teachers- should be frequent 3. Information to parents 4. Information for selection of intervention– qualify for a job 5. Information for accountability (how do teacher’s, schools, and school districts measure up) 6. Incentives to increase student effort
asks how well ARE YOU DOING and how can you BE DOING better —-Conducted on a periodic, ongoing basis —-On-the-go decisions (allows for immediate change of instruction) —-brief and frequent
ex.4 tests and 4 quizzes Repeated administration of tests Performance feedback given after each test Set goals after each test to enhance performance Reward students for meeting their goals Do it all over again!
asks how well DID YOU DO tests of student knowledge at the end of instructional units
—-Not frequent, but must be reliable and allow for comparisons among students –should be tied to formative eval.and course objectives
ex. cumulative final
—Individualized decision making for academic skill development —Conducted frequently with brief assessments to monitor progress in response to instruction or intervention (monthly)
Designed to: 1)Estimate rates of improvement 2)Identify students who are not demonstrating adequate progress 3)Compare the efficacy of different forms of instruction —-Thereby design more effective, individualized instructional programs for struggling learners
is a set of standardized, brief (1-5 min) tests used by educators to evaluate the effects of their instructional interventions in the basic skills of reading, mathematics, spelling and writing expression (Shinn, 2004)
Provides an easy and QUICK, DATA (EMPIRICAL) DRIVEN method to gathering student progress –Teachers and students get immediate feedback –Teachers can analyze student scores and adjust student goals and instructional programs —Student data can be compared to teacher’s classroom or school district data
basics of CBM
1) CBM tests (probes)are brief and easy to administer 2) All tests are different, but assess the same skills and the same difficulty level 3)CBM scores are graphed for teachers to use to make decisions about instructional programs and teaching methods for each student 4)CBM monitors student progress throughout the school year 5) Students are given probes at regular intervals Weekly, bi-weekly, monthly 6) Teachers use student data to quantify short- and long-term goals that will meet end-of-year goals —–By the end of 1st grade, Steven will read 30 words a minute; Steven currently reads 10 words a minute
how often do i give CBM’s?
Depends on the purpose —BENCH MARK-All students are tested, typically 3 times a year (Use as universal screening to identify students at risk for failing & therefore in need of more intensive intervention) –STRATEGIC MONITORING-once a month to struggling students, as you are modifying instruction FREQUENT PROGRESS MONITORING-weekly for students needing most intensive interventions
6 principles of achievement testing
1)Test should measure clearly defined objectives, which were actually taught 2)Tests should measure a representative sample of the concepts or skills taught 3)The types of items on the test should be appropriate for measuring desired learning outcomes 4)Tests should fit the purposes of their results 5)All tests contain some degree of measurement error 6) Tests should be used to improve learning (immediate feedback, instructional decisions made from test results ) 7)Tests should be fair
multiple choice items
test items that usually consist of a STEM (question or partial statement) followed by choices or alternatives (one correct choice and 3-4 distractors)
a form of multiple choice test items, most useful when a comparison of two alternative is called for; use guidelines for multiple choice items
—Disadvantage: students have a 50% chance of guessing correctly. —Advantage: students can respond to them more quickly and cover a broad range of content efficiently
test items that are presented in two lists, each item in one list matching one or more items in the other list;
–each list should cover related content; –more items should be included in list B than list A (so they cant simple cross out letters to figure out answer)
fill in the blank test items
Advantage: they can reduce the element of test-wiseness to near zero; especially useful in math; avoid ambiguity where more than one answer fits
problem solving items
test that calls for organizing, selecting, and applying complex procedures that have at least several important steps or components
You should begin by writing either a model response or outline of the essential components or procedures that are involved in problem solving
bias due to carryover of a general attitude about a respondent
Ex. A teacher knows which student wrote which response and her opinion of the student alters the grading