Saturday, August 13, 2011

STUDENT TEST SCORES AND EVALUATION

Associated Administrators of Los Angeles Update | http://bit.ly/pAALes

Week of August 15, 2011 – Superintendent John Deasy is moving rapidly to change the evaluation system of LAUSD teachers and administrators through the use of the Educator Growth and Development Program. His plans grew out of the recommendations of the Teacher Effectiveness Taskforce and the subsequent Board of Education Resolution for Action (September 2, 2010). Based, in part, upon the assumption that improved evaluation will result in improved student achievement, the Superintendent and his team developed a three-year plan for implementation. As mentioned in last week’s Update (week of August 8, 2011), Phase I, Research & Development, took place in 2010-2011. Phase II, Initial Implementation (aka the pilot), will take place this year, 2011-2012. Phase III, Scale (I.e., full implementation Districtwide), is scheduled for 2012-2013.

The new evaluation systems are designed to use multiple measures, including reviews of practice, whichwill incorporate self reviews and individual growth plans; formal observations with pre- and postconferences;informal observations; stakeholder feedback through a variety of surveys; the yet-to-bedefined contributions to the school community measure; and student achievement test results in the form of value-added data (which the District calls Academic Growth Over Time, or AGT). Teachers’ evaluations will be shaped by the new Teaching and Learning Framework, which will be used to rank teachers’ performance levels. Similarly, administrators’ evaluations will be based upon the Leadership Framework, which is still in development.

As noted in last week’s Update, AALA filed an unfair labor practice complaint with the Public Employment Relations Board (PERB) because the District failed to bargain in good faith on the evaluation of AALA members. Simultaneously, we are in negotiations with the District on the implementation of Phase II of the Educator Growth and Development Program. As further stated last week, AALA advocates the improvement of evaluation. However, we have some concerns with the District’s current plans for evaluating AALA members. First, any new system of evaluation must be negotiated. Second, the system must be fair and equitable. Third, adequate resources and support must be provided. Fourth, the work must be doable within the time allocated. Fifth, data used for evaluation purposes must be valid and reliable. This fifth concern calls into question the District’s commitment to use AGT for the evaluation of administrators and teachers.

Standardized test scores have many beneficial uses, such as helping determine individual student needs, assisting with grade level and subject matter planning and guiding implementation of the school’s instructional program. They may be used to assist with the differentiation of instruction, provide support and assistance to teachers and plan professional development. Certainly, principals already share test data with teachers at the beginning of each school year to inform their teaching, ensure adherence to State standards, improve knowledge of subject matter, align marking practices and set the stage for interaction with students and communication with parents.

As valuable as students’ standardized test data may be, they were not designed to be used for the evaluation of teachers and administrators. Nevertheless, the District intends to use AGT in the pilot program being implemented in 114 schools this year and in the District as a whole next year. The District justifies this use of the data by insisting that this year’s pilot evaluation is “no stakes” and will serve as a way to fine-tune the new evaluation tools and process.

It is unfortunate that the District’s senior leadership apparently has not taken time to review in depth the

findings of leading education scholars who challenge the use of AGT for evaluation of teachers and administrators. Such scholars include, among others: • Eva L. Baker, professor of education at UCLA, codirector of the National Center for Evaluation Standards and Student Testing (CRESST)

• Paul E. Barton, former director of the Policy Information Center of the Educational Testing Service and associate director of the National Assessment of Educational Progress

• Linda Darling-Hammond, professor of education at Stanford University

• Edward Haertel, professor of education at Stanford University

• Helen F. Ladd, professor of Public Policy and Economics at Duke University

• Robert L. Linn, distinguished professor emeritus at the University of Colorado and chair of the National Research Council’s Board on Testing and Assessment

• Diane Ravitch, research professor at New York University and historian of American education • Richard Rothstein, research associate of the Economic Policy Institute

• Richard J. Shavelson, professor of education (emeritus) at Stanford University and former president of the American Educational Research Association • Lorrie A. Shepard, dean and professor, School of Education, University of Colorado at Boulder

Collectively, this group of experts authored Economic Policy Institute (EPI) Briefing Paper #278 entitled, PROBLEMS WITH THE USE OF STUDENT TEST SCORES TO EVALUATE TEACHERS.

Some of their findings follow:

1. There is a broad agreement among statisticians, psychometricians, and economists that student test scores alone are not sufficiently reliable and valid indicators of teacher effectiveness to be used in high-stakes personnel decisions, even when the most sophisticated statistical applications such as value-added modeling are employed. For a variety of reasons, analyses of VAM results have led researchers to doubt whether the methodology can accurately identify more and less effective teachers. VAM estimates have proven to be unstable across statistical models, years, and classes that teachers teach.

2. One study found that across five large urban districts, among teachers who were ranked in the top 20% of effectiveness in the first year, fewer than a third were in that top group the next year, and another third moved all the way down to the bottom 40%. Another found that teachers’ effectiveness ratings in one year could only predict from 4% to 16% of the variation in such ratings in the following year. Thus, a teacher who appears to be very ineffective in one year might have a dramatically different result the following year.

3. A study designed to test this question used VAM (Academic Growth Over Time) methods to assign effects to teachers after controlling for other factors, but applied the model backwards to see if credible results were obtained. Surprisingly, it found that students’ fifth grade teachers were good predictors of their fourth grade test scores. Inasmuch as a student’s later fifth grade teacher cannot possibly have influenced that student’s fourth grade performance, this curious result can only mean that VAM results are based on factors other than teachers’ actual effectiveness.

4. For these and other reasons, the research community has cautioned against the heavy reliance on test scores, even Board on Testing and Assessment of the National Research Council of the National Academy of Science stated, …VAM estimates of teacher effectiveness should not be used to make operational decisions because such estimates are far too unstable to be considered fair or reliable.

5. RAND Corporation researchers reported that the estimates from VAM modeling of achievement will often be too imprecise to support some of the desired inferences…and that the research base is currently insufficient to support the use of VAM for high-stakes decisions about individual teachers or schools.

6. Teachers’ value-added evaluations in low-income communities can be further distorted by the summer learning loss their students experience between the time they are tested in the spring and the time they return to school in the fall.

7. For these and other reasons, even when methods are used to adjust statistically for student demographic factors and school differences, teachers have been found to receive lower “effectiveness” scores when they teach new English learners, special education students, and low-income students then when they teach more affluent and educationally advantaged students.

AALA has further concerns about the evaluation of administrators that are beyond the scope of this piece, but of great importance to our members. We plan to explore these matters in future issues of Update.

No comments: