Blog • Dorothy R. Howard

March 23, 2018

Deaccessioning Material from Web Archives
Full edited version of my panel talk for 'The Right to Be Forgotten' excerpted at live event due to time constraint

The National Forum on Ethics and Archiving the Web at the New Museum, New York, NY

Yesterday in the 'Web Archiving as Civic Duty' panel Amelia Acker said something very pertinent to what I want to talk about, and which I pray I’m accurately characterizing. Acker asked: what can web archives learn from the practices of traditional archives of material culture? This provocation was further put to practice in conference presentations suggesting that the traditional library and information science (LIS) principles of provenance and respect of original order are useful to work through problems in web archiving.

I’ve long had discussions with information workers regarding the complex and varied processes of removing collection holdings including material objects and artworks, archival materials, and items from circulating library collections. One common way that removal is talked about in LIS is the concept of deaccessioning

There are similarities between the processes of removing physical books and items from collections and differently, of getting rid of items, metadata, and data from digital archival collections. Focusing on deaccessioning practices in web archives can reveal relationships between institutions, codes of professional ethics, ascriptions of authorship, labor, the politics of discoverability, and the right to be forgotten. Deaccessioning is also a useful concept to think through consent issues particular to openly licensed material donations and removal, and the difficulty of identifying, tracing, detracting the uses of openly licensed items.

An example of debates about deaccessioning that apply to both traditional print archives and digital collections is the politics of requests for removal. Whether we are talking about print collections or digital collections, requests for the removal of material happens for a great number of reasons. Requesting removal is a kind of materially-manifest maintenance labor that can require specific knowledge to accomplish, including language, knowing content that contains you actually exists, and the ability to navigate intentionally obfuscated licensing rules.

One justification for deaccessioning in traditional archives is that the removal of physical works can free up library resources and space for other uses. For example, selling unused objects to buy objects of more value, or freeing up limited library stack space. How can we also think of this getting rid of space conversation in the context of digital libraries? Could we get to a point where it is necessary to remove data because of limited resources? Moore’s Law would defy that logic. On the other hand, if libraries did start storing things like ebooks locally instead of buying subscriptions, maybe there would be a point where there would be too much data that needing to be weeded.

Practices of deaccessioning involve considerations about value. Perhaps it’s not about whether something is ‘in use’ as a primary concern of whether it should stay. Items in digital archives can certainly have use even if humans never see them, especially, as Chris Bourg, Director of MIT Libraries has asked What happens to libraries and librarians when machines can read all the books? Bourg argues that machines and algorithms are a new kind of library patron that do not replace human patrons, but certainly have important and different needs.

There are new kinds of problems and considerations regarding practices of deaccessioning web materials from digital archives. We can learn from the complex set of issues traditional collections have already traversed regarding institutional policy/regulation, cultural questions of value/belonging/ownership, relationships to publics, cost-benefit analysis done in resource- management and preservation.

Some aspects of deaccessioning that I think are specific to web archives include: the removal or lack of removal of metadata, data legacy, and also data maintenance, issues particular to consent and the scale of circulation, specificity and complexity of overlapping and international laws regarding the right to be forgotten, the requirement of technical skill, legal expertise, and also discoverability of images of ourselves to find and demand takedowns. Importantly also, is the common scenario where attribution is still not granted even when open licenses, like Creative Commons, on material does require attribution.

Another aspect that I think web archives have a responsibility to consider is how they are participating in violations of consent by saving webpages, and, as our technologies get better, of also saving the networked data, and making replicable code itself. For example, I have teenage poems on the internet at a location that I am not challenging you to find, that are currently listed on a saved Wayback Machine page from over five years ago that I don't want there and didn't consent to having saved. Is it my fault for putting it up? Should I have assumed that because I put it online, it might be there forever?

People have pushed back against some of the reasons people might have for requesting images to be removed, one of which is vanity. The notions of vanity of course are changing as we understand selfies to be a way that people perform and experiment with their gender identity, as we realize the patriarchal, male gaze embedded in accusations of vanity. Does saying “vanity isn’t enough” in the utilitarian preference for the western historical record ignore the reality of the ways that our presentation of self and impression management might affect our abilities to get jobs, to be protected against threats of harm because of, for example, the violence directed towards trans people in some regions, where evidence of your gender expression saved externally might put you in danger?

This has to do with the temporal aspect of putting free licenses on works, and also of the temporal and sometimes one-time aspects of granting consent for something to enter the public domain. Let’s say someone uploaded an image of of another person to Wikimedia Commons and the formal consent process was done correctly and consensually. In five years, someone’s gender identity has shifted. Wikimedia Commons does not let you change your decision to upload something, to put an open license on it, unless there is a significant and policy-abiding reason.

This issue particular to images uploaded to Wikimedia and the granting of consent has probably been most debated in Wiki Project Medicine in scenarios where photographs of people have been used to represent various medical conditions. We can also intuit this to be a problem if one grants consent and then is used on a Wikipedia article or another space to represent some racial identity, phenotypic characteristic or stereotype, things of violence in a different way than might be grounds for deletion from the public archive. In sum, how do we balance when people want things removed for vanity’s sake, for harm, or to cover up important evidence of their wrongdoing?

Another thing I want to talk about is the embodied and situated working practices of individuals maintaining web archives and how the structuring of labor within the workplace influences the capacities of the workers to uphold the principles and protections that institutions are meant to ensure.

In particular, let's consider the relationship between policies and protections promised by web archives regarding affirmative, conscious, and voluntary consent-seeking practices, and how these policies are enacted in the mundane actions like email and daily decisions made by archivists and maintainers often taxed for time and resources. and the compromises we make that often are based on our own positionality and intuitions based on our interpretations weighing of multiple and sometimes conflicting risks and interests, based on evaluations about utilitarianism, saying “well this risk exists but its worth it for historical memory’s sake.” What are you assuming about its importance as a historical artifact and who is accountable for that decision?

We can think about how policies are executed based on the training, education, and expertise of individuals doing archival work, and those mandated to abide by various web archiving codes of ethics. For example, an institution that relies on disempowered, unpaid interns to do important but mundane work like entering metadata is going to run into problems in the application of institutional policies, for example of identifying works that need to have consent granted or the bias of metadata entered.

How are policies applied in the interstitial time when someone is being trained to handle important material, or has just taken a job in which they do not fulfil necessary workflows? These might seem like edge cases, but the stakes are high, and we should be the people thinking about how to ensure accountability and consent, and also for requests for removal, especially because of risks of establishing precedent.

There is also the loss of institutional memory as jobs are handed from one person to the next. Institutional knowledge is literally stored in the bodies of workers, accumulated over time and existent in their situated intuitions, tool use, and understanding of how to navigate working the social politics of an institution, technical work-arounds.

Thinking about the labor of maintaining archives can be useful in working through some complicated issues of consent. In context, there are tiers of obligations and responsibilities of volunteer laborers contributing to web archiving projects that affect issues of fulfilling institutional deaccessioning guidelines. Volunteer labor changes worker capacity and interest in fulfilling ethical obligations of certain types of information work, applying policies, and investigating violations. This can have huge ethical consequences. For example if images of protests are circulated by activists, there is a potential risk for those who want their identities hidden or might suffer the legal consequences of being at a protest like violation parole.

Free labor is something that people building community archives and mobilizing against hegemonic collecting strategies and Western forms of historicization are certainly familiar with. When communities take archiving into their own hands, they also take on additional labor. And labor has a lot to do with the ways that policies are enacted. Without mainstream institutional support, small archives often take on volunteers, the young and old altruistically pushing back against the significant labor involved in self-representation, yet people that might not have been trained in certain other processes of consent seeking and granting.

Another labor issue is the very tedious tagging and labeling of metadata and information included in web archives important to the discovery of that material by the people that the material is about. When people are able to search online for themselves or for things that they have been involved in or produced, then they are also able to keep institutions more accountable. We can acknowlege that automation is helping with this, thinking about Wikipedia and the volunteer labor that is being offset by bots able to detect and flag copyrighted content in the images uploaded to Wikimedia Commons, to unsourced, plagiarized text in Wikipedia articles. Yet, automation also creates different forms of human labor, like checking to make sure the bots did their work OK.

Institutional policies are in one sense aspirational. They project us not as we are, but as the governing body says we should be in their particular context. Western, Democratic values as applied to institutional policy-making and law are entrenched in utilitarian values, which continue to enable institutions to justify their asymmetrical application of rights and scapegoating of minority populations. This came up in many of yesterday’s panels, in the context of the weighted potential harm vs. value and argued need for the material to exist in public, historical record.

They are also not realistic in other ways, as it sometimes takes us a long time to understand the ethical aspects of our work in that failures become obvious to us over time. Learning happens as we unfold imaginaries of the potential uses of our technologies, test them, apply statistical models, etc. Lest we forget that our models will not be good enough to predict every possible outcome of use as the relationships between institutions, codes of professional ethics, formal and informal agreements, role of the press in archival policy, press shaping conditioning and shaping institutional frameworks. Policies contain caveats of interpretation in how they are applied, and how retribution for negligence is seen out.

Yesterday, Safiya Noble posed a question that was particularly striking to me: Can an algorithm be queer? This is an important provocation, which moves me to the ways that feminist research epistemologies, feminist information science, feminist ethnographic methods reflect interests in the reflection of positionality, and the failure and hypocrisy in which researchers do their work, arising from power asymmetries of the researchers versus subjects studied; archivists versus the archived; the living versus the dead; and the application of varied moralities as we debate the violences and risks of access even amidst the revelatory, equalizing, and educational aspects of information availability.

I think we should also think about the consent required for people to donate material to the public domain, and byzantine, obfuscated terms of service agreements on social media, applications, online services. Even the most technically and legally savvy would hardly have the necessary information to contextualize the legalese, changing aspects of the policy with software updates, the fact that these licenses are written with vagaries by lawyers who pen statements that have secured passage through loopholes that would have a strong legal affirmation were they to be legally challenged. Pushing back against this, Creative Commons licenses have moved towards a human-readable, icon-based, and also machine readable licenses, to confront these problems. However, sometimes simplicity can also backfire, too.

The ways that we frame the ethical issues presented to digital archivists will shape persistent discourses. I think that those motivated by the open web’s call-to-action can learn from complex issues in representing lack/want of resources that are regularly encountered by other global, social justice and advocacy projects coming out of the Western world. Essentially, that digital activism is non-unique in its vulnerability of falling into schemes of categorizing some rights as more important than others based on Western, cultural norms.*~