Data Literacy Series Search

DLS-2025-10-crashplan-navy.pdf

Avoid the Data Loss Nightmare with CrashPlan

Imagine losing years of research in an instant: the data you collected, analyzed, and relied on are gone. Accidental deletion, hardware failure, fire, or a lost device can strike without warning, and it happens more often than you think. Protect your work before it’s too late with CrashPlan, available for free to UCSB researchers, and don’t let data loss come back to haunt you.

Perma Link

PDF - ALT

TAGS: Storage, Data Backup, Loss Prevention, Data Recovery
DATE: 10-2025

DLS-2025-09-github-zenodo.pdf

From Push to Publish: Preserving GitHub Projects with Zenodo

GitHub is a fantastic tool for version control and collaboration during the active phases of a project, but it is not designed for permanent archiving. For long-term preservation and accessibility, deposit your work in a repository like Zenodo, which assigns a digital object identifier (DOI), allowing your project to be reliably cited in the years to come. Thanks to GitHub’s integration with Zenodo, creating an archived snapshot is quick and easy.

Perma Link

PDF - ALT

TAGS: Citation, Code Documentation, Code Sharing, Data Preservation
DATE: 09-2025

DLS-2025-08-datarescue_navy.pdf

Keeping Access to Public Datasets Afloat

Datasets and other digital resources are fragile, often lost due to removals, shifting priorities, lapses, or changes in hosting. These losses disrupt access, hinder research and teaching, and undermine scientific reproducibility. Discover how the Data Rescue Project and the Research Data Services Department at the UCSB Library can help preserve public datasets and ensure their ongoing accessibility.

Perma Link

PDF - ALT

TAGS: Data Access, Data Archiving, Data Preservation
DATE: 08-2025

DLS-202507-intercodereliability-navy.pdf

Intro to Intercoder Reliability

Intercoder or inter-rater reliability refers to the degree of agreement among independent coders in their categorization or interpretation of data. High reliability reflects not only the consistent application of coding criteria but also a meaningful level of consensus among coders. This suggests that the analysis is not merely subjective, but systematic and replicable. Such consistency and shared understanding are essential for establishing the trustworthiness, rigor, and credibility of research findings.

Perma Link

PDF - ALT

TAGS: Data Analysis, Statistics, Cohen's Kappa, Reliability, Rater Agreement
DATE: 07-2025

DLS-2025-06-StatsPitfalls-navy.pdf

Common Stats Pitfalls

Understanding widespread misconceptions in statistics is essential for anyone working with quantitative data. By recognizing these pitfalls, researchers can more critically evaluate statistical claims, design more robust studies, analyze data more effectively, and report findings with greater accuracy and confidence.

Perma Link

PDF - ALT

TAGS: Data Analysis, Quantitative Data, Reproducibility, Statistics
DATE: 06-2025

DLS-2025-05-TextPreprocessing_navy.pdf

The Basics of Text Preprocessing

Text preprocessing is a crucial first step in transforming unstructured text into machine-readable data. It involves cleaning, organizing, and standardizing language to establish a reliable foundation for analysis and interpretation. By removing noise and inconsistencies, preprocessing enhances algorithm performance, leading to more accurate results in tasks such as sentiment analysis, classification, and information retrieval. While the specific workflow will depend on your research question and analytical goals, here is a breakdown of some common steps, along with an example

Perma Link

PDF - ALT

TAGS: Text Analytics, Natural Language Processing, Data Cleaning, Data Preparation
DATE: 05-2025

DLS-202504-dataquality-navy.pdf

Cultivating Quality in Tabular Data

Poor data quality can result in unreliable analysis, inaccurate conclusions, and wasted effort. Since 'quality' is broad and often subjective, we break it down into key dimensions—each with guiding questions to help evaluate critical attributes of tabular data.

Perma Link

PDF - ALT

TAGS: Tabular Data, Quality Control
DATE: 04-2025

png2pdf.pdf

Decoding Character Encoding Issues

Character encoding is the system of assigning numeric correspondence to characters, such as letters and symbols, so computers can store, process, and share them. Encoding issues have become less common in data handling with the widespread adoption of the UTF-8 standard. Still, some researchers may experience problems when working with data from legacy systems or old databases. Here, we cover some of the basics of character encoding standards and tips for researchers to avoid potential problems.

Perma Link

PDF - ALT

TAGS: Data Management, Data Cleaning, Data Organization
DATE: 03-2025

DLS-202502-secondarysources-navy.pdf

Eight Tips for Handling Secondary Data

Have you identified any pre-existing data that could be relevant to your project? When reusing someone else's data, it's crucial to follow key steps to ensure proper documentation and its provenance. This includes detailing its origin, context, and lineage, which helps maintain transparency and traceability throughout your work.

Perma Link

PDF - ALT

TAGS: Data Analysis, Data Documentation, Data Reuse
DATE: 02-2025

DLS012025-CHAFF-navy.pdf

Separating Research Files from the CHAFF

When opening files across different operating systems, you may come across irrelevant files, often called Concealed, Hidden, And Forgotten Files (CHAFF). Want to learn how to identify and delete them before sharing your project? Keep reading!

Perma Link

PDF - ALT

Avoid the Data Loss Nightmare with CrashPlan

TAGS: Storage, Data Backup, Loss Prevention, Data Recovery DATE: 10-2025

From Push to Publish: Preserving GitHub Projects with Zenodo

TAGS: Citation, Code Documentation, Code Sharing, Data Preservation DATE: 09-2025

Keeping Access to Public Datasets Afloat

TAGS: Data Access, Data Archiving, Data Preservation DATE: 08-2025

Intro to Intercoder Reliability

TAGS: Data Analysis, Statistics, Cohen's Kappa, Reliability, Rater Agreement DATE: 07-2025

Common Stats Pitfalls

TAGS: Data Analysis, Quantitative Data, Reproducibility, Statistics DATE: 06-2025

The Basics of Text Preprocessing

TAGS: Text Analytics, Natural Language Processing, Data Cleaning, Data Preparation DATE: 05-2025

Cultivating Quality in Tabular Data

TAGS: Tabular Data, Quality Control DATE: 04-2025

Decoding Character Encoding Issues

TAGS: Data Management, Data Cleaning, Data Organization DATE: 03-2025

Eight Tips for Handling Secondary Data

TAGS: Data Analysis, Data Documentation, Data Reuse DATE: 02-2025

Separating Research Files from the CHAFF

TAGS: Data Sharing, File Management, Project Organization DATE: 01-2025

TAGS: Storage, Data Backup, Loss Prevention, Data Recovery
DATE: 10-2025

TAGS: Citation, Code Documentation, Code Sharing, Data Preservation
DATE: 09-2025

TAGS: Data Access, Data Archiving, Data Preservation
DATE: 08-2025

TAGS: Data Analysis, Statistics, Cohen's Kappa, Reliability, Rater Agreement
DATE: 07-2025

TAGS: Data Analysis, Quantitative Data, Reproducibility, Statistics
DATE: 06-2025

TAGS: Text Analytics, Natural Language Processing, Data Cleaning, Data Preparation
DATE: 05-2025

TAGS: Tabular Data, Quality Control
DATE: 04-2025

TAGS: Data Management, Data Cleaning, Data Organization
DATE: 03-2025

TAGS: Data Analysis, Data Documentation, Data Reuse
DATE: 02-2025

TAGS: Data Sharing, File Management, Project Organization
DATE: 01-2025