Data Literacy Series Search
Avoid the Data Loss Nightmare with CrashPlan
Imagine losing years of research in an instant: the data you collected, analyzed, and relied on are gone. Accidental deletion, hardware failure, fire, or a lost device can strike without warning, and it happens more often than you think. Protect your work before it’s too late with CrashPlan, available for free to UCSB researchers, and don’t let data loss come back to haunt you.
PDF - ALTTAGS: Storage, Data Backup, Loss Prevention, Data Recovery
DATE: 10-2025
From Push to Publish: Preserving GitHub Projects with Zenodo
GitHub is a fantastic tool for version control and collaboration during the active phases of a project, but it is not designed for permanent archiving. For long-term preservation and accessibility, deposit your work in a repository like Zenodo, which assigns a digital object identifier (DOI), allowing your project to be reliably cited in the years to come. Thanks to GitHub’s integration with Zenodo, creating an archived snapshot is quick and easy.
PDF - ALTTAGS: Citation, Code Documentation, Code Sharing, Data Preservation
DATE: 09-2025
Keeping Access to Public Datasets Afloat
Datasets and other digital resources are fragile, often lost due to removals, shifting priorities, lapses, or changes in hosting. These losses disrupt access, hinder research and teaching, and undermine scientific reproducibility. Discover how the Data Rescue Project and the Research Data Services Department at the UCSB Library can help preserve public datasets and ensure their ongoing accessibility.
PDF - ALTTAGS: Data Access, Data Archiving, Data Preservation
DATE: 08-2025
Intro to Intercoder Reliability
Intercoder or inter-rater reliability refers to the degree of agreement among independent coders in their categorization or interpretation of data. High reliability reflects not only the consistent application of coding criteria but also a meaningful level of consensus among coders. This suggests that the analysis is not merely subjective, but systematic and replicable. Such consistency and shared understanding are essential for establishing the trustworthiness, rigor, and credibility of research findings.
PDF - ALTTAGS: Data Analysis, Statistics, Cohen's Kappa, Reliability, Rater Agreement
DATE: 07-2025
Common Stats Pitfalls
Understanding widespread misconceptions in statistics is essential for anyone working with quantitative data. By recognizing these pitfalls, researchers can more critically evaluate statistical claims, design more robust studies, analyze data more effectively, and report findings with greater accuracy and confidence.
PDF - ALTTAGS: Data Analysis, Quantitative Data, Reproducibility, Statistics
DATE: 06-2025
The Basics of Text Preprocessing
Text preprocessing is a crucial first step in transforming unstructured text into machine-readable data. It involves cleaning, organizing, and standardizing language to establish a reliable foundation for analysis and interpretation. By removing noise and inconsistencies, preprocessing enhances algorithm performance, leading to more accurate results in tasks such as sentiment analysis, classification, and information retrieval. While the specific workflow will depend on your research question and analytical goals, here is a breakdown of some common steps, along with an example
Perma Link
TAGS: Text Analytics, Natural Language Processing, Data Cleaning, Data Preparation
DATE: 05-2025
Cultivating Quality in Tabular Data
Poor data quality can result in unreliable analysis, inaccurate conclusions, and wasted effort. Since 'quality' is broad and often subjective, we break it down into key dimensions—each with guiding questions to help evaluate critical attributes of tabular data.
PDF - ALTTAGS: Tabular Data, Quality Control
DATE: 04-2025
Decoding Character Encoding Issues
Character encoding is the system of assigning numeric correspondence to characters, such as letters and symbols, so computers can store, process, and share them. Encoding issues have become less common in data handling with the widespread adoption of the UTF-8 standard. Still, some researchers may experience problems when working with data from legacy systems or old databases. Here, we cover some of the basics of character encoding standards and tips for researchers to avoid potential problems.
PDF - ALTTAGS: Data Management, Data Cleaning, Data Organization
DATE: 03-2025
Eight Tips for Handling Secondary Data
Have you identified any pre-existing data that could be relevant to your project? When reusing someone else's data, it's crucial to follow key steps to ensure proper documentation and its provenance. This includes detailing its origin, context, and lineage, which helps maintain transparency and traceability throughout your work.
PDF - ALTTAGS: Data Analysis, Data Documentation, Data Reuse
DATE: 02-2025
Separating Research Files from the CHAFF
When opening files across different operating systems, you may come across irrelevant files, often called Concealed, Hidden, And Forgotten Files (CHAFF). Want to learn how to identify and delete them before sharing your project? Keep reading!
PDF - ALT