Notch8 Ruby on Rails Web Application Developers

Full-throttle Ruby on Rails Development

  • About
    • Team
    • Active Clients
  • Work
    • Capabilities
    • Framework Updates
    • HykuUP
    • Code Audits and Reviews
    • Ongoing Application Maintenance and Support
  • Clients
    • Case Study: Vizer
    • Case Study: Atla Digital Libary
    • Case Study: ShopTab
    • Case Study: UCSD
    • Case Study: WUSTL
    • Case Study: UCLA
  • Samvera
  • Blog
  • Contact
You are here: Home / Blog

Custom 404 Handling in a Blacklight App

April 13, 2021 by Alisha Evans Leave a Comment

404 not found

The meaning of a 404 status error (according to Wikipedia) is: “The requested resource could not be found but may be available in the future. Subsequent requests by the client are permissible.”

Blacklight apps have a default “404.html” page, but any time we can provide branded content, we should. 🙂

This can be achieved in two ways:

  • A custom page that the “/404” endpoint would route to
  • A custom page that users see for an invalid search result

“/404” route

app/controllers/errors_controller.rb

  • This is the controller that sets up our status and renders the correct page
# frozen_string_literal: true
class ErrorsController < ApplicationController
  def not_found
    render status: 404
  end
end

app/views/errors/not_found.html.erb

  • No need for the div if additional styling isn’t needed
  • This is the same text that shows on the default “404.html” page, but can be adjusted as desired
<div class='record-not-found'>
    <h1>The page you were looking for doesn't exist.</h1>
    <p>You may have mistyped the address or the page may have moved.</p>
    <p>If you are the application owner check the logs for more information.</p>
</div>

config/routes.rb

  • This will map the “/404” route to our custom page above
# Custom error pages
get '/404', to: 'errors#not_found', via: :all

spec/requests/errors_spec.rb

# frozen_string_literal: true

require 'rails_helper'

RSpec.describe 'Errors', type: :request do
  describe ':not found' do
    before(:all) { get '/404' }

    it 'have an http status of 404' do
      expect(response).to have_http_status(:not_found)
    end

    it 'redirect to the custom not_found error page' do
      expect(response.body).to include("The page you were looking for doesn't exist.")
    end
  end
end

NOTE: the default “public/404.html” file must be deleted for the above to take effect

Invalid search result

app/controllers/catalog_controller.rb

  • If “/catalog/alpha” is an invalid search, it will trigger the exception handling below
  • This code is telling the app to render the “record_not_found” file in “views/catalog” folder, along with a 404 status
rescue_from Blacklight::Exceptions::RecordNotFound do
  render 'record_not_found', status: 404
end

app/views/catalog/record_not_found.html.erb

  • Again, no need for the div if additional styling isn’t needed
<div class='record-not-found'>
  <h1>The page you were looking for doesn't exist.</h1>
  <p>You may have mistyped the address or provided an invalid Object ID (<%= params[:id] %>)</p>
</div>

spec/requests/catalog_controller_request_spec.rb

describe 'responds to the Blacklight::Exceptions::RecordNotFound exception' do
  it 'redirects to a custom 404 page' do
    params = { id: 'alpha' }
    get "/catalog/#{params[:id]}"

    expect(response.status).to eq(404)
    expect(response.body).to include("The page you were looking for doesn't exist")
    expect(response.body).to include("You may have mistyped the address or provided an invalid Object ID (#{params[:id]})")
  end
end

NOTE: there’s no need for routing because the user remains at the route they tried to enter

Filed Under: Blog Tagged With: 404, blacklight, custom, samvera

Globus Download in Hyrax

March 1, 2021 by Bess Sadler Leave a Comment

Globus is a tool for transferring very large datasets. It has many advantages over older systems for transferring files, and researchers are increasingly expecting that data repositories should offer Globus integration. While there is not yet an official Globus integration offering from the Samvera community, several institutions have integrated Globus into their repository systems. Notch8 was recently asked to write such an integration for the Hyrax-based Rutgers Virtual Data Collaboratory (which has not yet launched). This blog post will describe the research and design process for this, as well as provide links to some sample code and pointers for future development.

Previous Work

As we undertook this work, we were aided greatly by conducting informational interviews with Nabeela Jaffer at the University of Michigan, and David Chandek-Stark at Duke University. UM and Duke have implemented similar strategies for Globus integration, with a few differences. We are grateful to our colleagues at these institutions for sharing their time and expertise, and this is a wonderful example of how working in an open way helps to advance the state of data repositories in general much faster than teams working in isolation.

High Level Architecture

To enable download via Globus, we are following the same general pattern that both UM and Duke are using: 

  1. Create a shared volume that is writeable by the Hyrax application process 
  2. Create a Globus end-point that reads from that same volume
  3. Automate the export of data sets from Hyrax to that shared volume, organized by unique id
  4. Generate a predictable link that includes the institution’s Globus ID and the item’s unique id, which will allow a user access to the files via the Globus web client

A single work from the Duke Research Data Repository, available for download via the Globus client
A single work from the Duke Research Data Repository, available for download via the Globus client
The top level directory of the Duke Research Data Repository, visible via the Globus client, showing all of the datasets
The top level directory of the Duke Research Data Repository, visible via the Globus client, showing all of the datasets

Implementation choices

While the UM, Duke, and Rutgers solutions all share the same high-level pattern, there are some key differences. Please note that this document is not a complete analysis of each solution; it is only a report of the analysis done at Notch8 in order to fulfill a specific contract for Rutgers University.

Michigan: On-demand export

The University of Michigan’s Deep Blue Data repository copies files on demand for Globus download, offering the user a button that will copy a dataset to Globus download space in a background job, and then email the user when the item is ready for download. Heavily used datasets remain in the Globus download space, and for those items the user is presented an immediate opportunity to download via Globus, with no waiting. The advantages of this approach include more efficient use of space and thus reduced cost for repository operation. The disadvantages of this approach include increased complexity (e.g., the need for an on-demand job to copy the files and notify the user when their files are ready) and the need for active storage management (the Globus download space must periodically be cleaned out).

Duke: Nightly batch exports

The Duke University solution instead chooses to make all of its public data available for download at any given time. Dataset export is tracked via a rails ApplicationRecord object called Globus::Export, which records whether a work has been exported, whether that export succeeded, and when the last export occurred. A nightly scheduled process scans the repository for newly added works by checking each work against its table of Globus::Export records, kicking off an export for any work that has not yet been exported.

Rutgers: Using the Hyrax Actor Stack

The Rutgers approach takes a more real-time approach than either of the above solutions. We adopted the Globus::Export Application Record from the Duke Solution, but our version of Globus::Export has two additional fields: expected_file_sets and completed_file_sets. One of the challenges around data import in Hyrax is the fact that file attachment happens via background jobs, and there is no obvious way to know when a work has been totally assembled. However, by the end of the initial run of the Actor Stack, we know the list of FileSet objects that are attached to a work. We record that list of FileSet identifiers on a Globus::Export. Then, we insert into the background job that is attaching files, a method that kicks off a Globus Export of a particular FileSet and, assuming all goes to plan, records that FileSet id in the corresponding Globus::Export#completed_file_sets. When generating the user-facing view of a work, we check the Globus::Export object for that work, and if the #expected_file_sets match the #completed_file_sets, we display the generated download link.

Future work

Future work for this integration might include:

  • leveraging the browse-everything gem’s file system integration to also allow for Globus upload to particular directory, where data would then be available for cataloging and deposit into Hyrax
  • improved error checking, including more robust checksum validation when the files are copied
  • extraction of this functionality into a gem that could be installed and configured into a Hyrax application without the need for much local customization

This has been a rewarding project, and we are so grateful to the team at Rutgers for the opportunity to better understand the needs of research scientists working with large data sets!

Filed Under: Blog

Adding blacklight_advanced_search to Hyku

November 17, 2020 by Bess Sadler Leave a Comment

I was recently asked to add Blacklight Advanced Search to a Hyku app for the US Department of Transportation. It was a little tricky, so I’m documenting the process in the hopes of making life easier for the next person who has to do this.

Many thanks to Dean Farrell at UNC Libraries for pointing me at UNC’s hy-c implementation, where blacklight advanced search is installed into Hyrax: https://github.com/UNC-Libraries/hy-c/pull/421/files

David Kinzer’s Blacklight Search Notes are also excellent background reading: https://gist.github.com/dkinzer/4f6dbb4634dbbdc99255dbea6305ccae

Write a feature spec first

As always, start with a test. I like to write a high-level feature test as a way of keeping myself focused on what I’m trying to deliver, and then I fill in unit tests for specific methods as needed. Here is my nearly-empty high level feature test. Note that I don’t really know what advanced search is going to look like yet, so I’m not spending lots of time on detail yet. However, it’s still helpful to have this test first. It gives me a way to quickly iterate and prove to myself that the changes I’m making are moving me in the right direction, and it lets me build the test up over time as I’m implementing the feature.

# frozen_string_literal: true

require 'rails_helper'
include Warden::Test::Helpers

RSpec.feature 'Advanced Search', type: :feature, js: true, clean: true do
  context 'an unauthenticated user' do
    scenario 'advanced search basic sanity check' do
      visit '/advanced'
      fill_in('Title', with: 'ambitious aardvark')
      find('#advanced-search-submit').click
      expect(page).to have_content('ambitious aardvark')
      expect(page).to have_content('No results found for your search')
    end
  end
end

Installing blacklight_advanced_search

I follow the blacklight_advanced_search basic instructions:

Add to your application's Gemfile:

gem "blacklight_advanced_search"

then run 'bundle install'. Then run:

rails generate blacklight_advanced_search

Note: It will offer to generate a new basic search partial for you. You do NOT want it. It will break the basic search in Hyku. If you end up with a new app/views/catalog/_search_form.html.erb just delete it and add by hand anywhere you want a link to advanced search.

So, I follow the installation instructions, and then I run my feature spec. The first time I do, I get this error:

undefined method facets_for_advanced_search_form for Hyrax::CatalogSearchBuilder (NoMethodError)`

So, clearly, I’m missing some configuration.

Tracing the error

I drop a byebug in at the line indicated by the stack trace. In this case: blacklight-6.23.0/lib/blacklight/search_builder.rb:147

[142, 151] in /usr/local/rvm/gems/ruby-2.5.8/gems/blacklight-6.23.0/lib/blacklight/search_builder.rb
   142:     #
   143:     # @return a params hash for searching solr.
   144:     def processed_parameters
   145:       request.tap do |request_parameters|
   146:     byebug
=> 147:         processor_chain.each do |method_name|
   148:           send(method_name, request_parameters)
   149:         end
   150:       end
   151:     end
(byebug) request_parameters
{"facet.field"=>[], "facet.query"=>[], "facet.pivot"=>[], "fq"=>[], "hl.fl"=>[]}
(byebug) processor_chain
[:default_solr_parameters, :add_query_to_solr, :add_facet_fq_to_solr, :add_facetting_to_solr, :add_solr_fields_to_query, :add_paging_to_solr, :add_sorting_to_solr, :add_group_config_to_solr, :add_facet_paging_to_solr, :add_access_controls_to_solr_params, :filter_models, :only_active_works, :add_access_controls_to_solr_params, :show_works_or_works_that_contain_files, :show_only_active_records, :filter_collection_facet_for_access, :facets_for_advanced_search_form]

That : facets_for_advanced_search_form at the end of the processor chain seems like a likely culprit, so I’m going to remove it and see if that gets my form to render. Initially, I’m just going to add a next right in place, to see if this fixes the problem, before I go to the trouble of figuring out the right way to do it:

    def processed_parameters
      request.tap do |request_parameters|
        processor_chain.each do |method_name|
          next if method_name == :facets_for_advanced_search_form
          send(method_name, request_parameters)
        end
      end
    end

And now my form renders!

So I’ve proven to myself that the problem is this missing facets_for_advanced_search_form method, but how to fix that in a maintainable way?

Override as little as possible to keep local code maintainable

Let’s go look at the class mentioned in the error message: Hyrax::CatalogSearchBuilder

Here is the source of that method: https://github.com/samvera/hyrax/blob/130c4e600318a9194725b35dd0fd6e19e5108dd9/wp-content/search_builders/hyrax/catalog_search_builder.rb

Notice the excellent guidance provided here:

# If you plan to customize the base catalog search builder behavior (e.g. by
# adding a mixin module provided by a blacklight extension gem), inheriting this
#  class, customizing behavior, and reconfiguring `CatalogController` is the
# preferred mechanism.

Sounds good to me. Let’s try that!

So, I’m going to start by writing a test, of course. I go look at how the search builders are set up in hyrax, and I make this very basic test in spec/search_builders/ntl_search_builder_spec.rb (note that this is adapted from the test setup at https://github.com/samvera/hyrax/blob/130c4e600318a9194725b35dd0fd6e19e5108dd9/spec/search_builders/hyrax/collection_search_builder_spec.rb… I did not attempt to write this off the top of my head.)

# frozen_string_literal: true
RSpec.describe NtlSearchBuilder do
  let(:scope) do
    double(blacklight_config: CatalogController.blacklight_config,
           current_ability: ability)
  end
  let(:user) { create(:user) }
  let(:ability) { ::Ability.new(user) }
  let(:access) { :read }
  let(:builder) { described_class.new(scope).with_access(access) }
  it "can be instantiated" do
    expect(builder).to be_instance_of(described_class)
  end
end

This test doesn’t do anything except verify that the class is set up correctly, but that is better than nothing, and it gives us a place where we can flesh out behavior as we need it. I run it, and of course it fails, for the expected reasons: NameError: uninitialized constant NtlSearchBuilder

I then define the class, by making this file at app/search_builders/ntl_search_builder.rb:


class NtlSearchBuilder < Hyrax::CatalogSearchBuilder

end

And now my test passes!

Now, we set CatalogController to use this new class:

    # Use locally customized NtlSearchBuilder so we can enable blacklight_advanced_search
    config.search_builder_class = NtlSearchBuilder

And I add the facets_for_advanced_search_form method to my NtlSearchBuilder class, just copying the method (https://github.com/projectblacklight/blacklight_advanced_search/blob/master/lib/blacklight_advanced_search/advanced_search_builder.rb#L77-L96)

I also add the lines that the Blacklight advanced search generator installed into app/models/search_builder.rb. This is the bit that actually combines the Advanced Search query capabilities with the Hyrax provided search capabilities (e.g., search gated by permissions, workflow, group membership, etc.)

Now my class looks like this:

##
# A locally defined search builder, which will allow us to customize the search
# behavior of this application. In particular, this is needed to allow us to
# use blacklight_advanced_search.
class NtlSearchBuilder < Hyrax::CatalogSearchBuilder
  include Blacklight::Solr::SearchBuilderBehavior
  include BlacklightAdvancedSearch::AdvancedSearchBuilder
  self.default_processor_chain += [:add_advanced_parse_q_to_solr, :add_advanced_search_to_solr]


  # A Solr param filter that is NOT included by default in the chain,
   # but is appended by AdvancedController#index, to do a search
   # for facets _ignoring_ the current query, we want the facets
   # as if the current query weren't there.
   #
   # Also adds any solr params set in blacklight_config.advanced_search[:form_solr_parameters]
   def facets_for_advanced_search_form(solr_p)
     # ensure empty query is all records, to fetch available facets on entire corpus
     solr_p["q"]            = '{!lucene}*:*'
     # explicitly use lucene defType since we are passing a lucene query above (and appears to be required for solr 7)
     solr_p["defType"]      = 'lucene'
     # We only care about facets, we don't need any rows.
     solr_p["rows"]         = "0"

     # Anything set in config as a literal
     if blacklight_config.advanced_search[:form_solr_parameters]
       solr_p.merge!(blacklight_config.advanced_search[:form_solr_parameters])
     end
   end
end

And now my advanced search page renders and my feature test for advanced search passes.

One last step: a more robust feature test with real data

One last step: I flesh out my advanced search feature spec a bit, including loading some actual client provided data, ensuring the advanced search feature really does work as expected. Here is the feature spec with some of that added:

# frozen_string_literal: true

require 'rails_helper'
include Warden::Test::Helpers

RSpec.feature 'Advanced Search', type: :feature, js: true, clean: true do
  context 'empty solr index' do
    scenario 'basic search sanity check' do
      visit '/'
      fill_in('q', with: 'ambitious aardvark')
      click_button('Go')
      expect(page).to have_content('ambitious aardvark')
      expect(page).to have_content('No results found for your search')
    end
    scenario 'advanced search basic sanity check' do
      visit '/advanced'
      fill_in('Title', with: 'ambitious aardvark')
      find('#advanced-search-submit').click
      expect(page).to have_content('ambitious aardvark')
      expect(page).to have_content('No results found for your search')
    end
  end

  ##
  # Load solr data from Department of Transportation import testing and ensure advanced
  # search works reasonably well with this data.
  context 'with data' do
    let!(:admin_set_collection_type) { FactoryBot.create(:admin_set_collection_type) }
    let!(:user_collection_type) { FactoryBot.create(:user_collection_type) }

    before do
      solr = Blacklight.default_index.connection
      sample_records = JSON.parse(File.open(File.join(fixture_path, 'solr', 'dot-sample-data.json')).read)
      docs = sample_records["response"]["docs"]
      docs.each do |doc|
        doc.delete("_version_")
        doc.delete("score")
        solr.add(doc)
      end
      solr.commit
    end
    scenario 'basic search sanity check' do
      visit '/'
      click_button('Go')
      number_of_search_results = find('span.page_entries').find_all('strong').last.text
      expect(number_of_search_results).to eq('17')
    end
    it "searches by title" do
      visit '/advanced'
      fill_in('Title', with: 'habitat')
      find('#advanced-search-submit').click
      number_of_search_results = find('span.page_entries').find_all('strong').last.text
      expect(number_of_search_results).to eq('1')
    end
  end
end

Filed Under: Blog

Samvera Tech 101

November 6, 2020 by Alisha Evans Leave a Comment

As part of the Samvera Connect Virtual Conference this year, my coworker, Shana Moore, and myself, Alisha Evans, developed a slideshow presentation titled Samvera Tech 101. Our intention was to give an introductory presentation that covered a sampling of common topics and definitions used within the Samvera stack and community.

We are both Software Engineers at Notch8, but we are still relatively new to working with Samvera applications. For that reason, we thought our insights would be valuable in building a beginner-friendly presentation because the learning curve can feel a little steep and overwhelming at times. There are 5 categories we’ll be discussing: Samvera, Data Stores, Background Jobs, User Interfaces and Samvera Applications.

If you’d prefer to watch the presentation, you can do so here. Otherwise, let’s talk Samvera!

Samvera

When we hear the term “Samvera” it may refer to one of two things:

  1. It is the name of the community that maintains a group of software
  2. It also often refers to the actual pieces of code and technologies that, when combined, make up the “Samvera stack”

Traditionally, the Samvera stack is made up of a number of Ruby on Rails based components (called “gems”) in conjunction with three other open source software products: Solr, Fedora, and Blacklight. There’s some flexibility here but we will revisit that point soon. The stack was designed so that users could easily interact with Fedora, without needing to be programmers. It provides the “building blocks” for an institution to create and fully customize a flexible and extensible digital repository solution.

At the foundation level, everything is built on top of Ruby on Rails, which is an open-source full stack web application framework built with the Ruby programming language.

But what’s a framework??

A framework is a collection of code, tools & utilities that adhere to a specific structure when writing software. Frameworks can help to make code more organized and it allows developers to be more productive, instead of reinventing the wheel. We can think of a framework like a stencil. If we were given the task to cut out 1,000 pumpkin shaped holiday cards, we could manually draw and cut each and every one of them, or we could use a pumpkin-shaped stencil or tool to punch them out all at once, for example. From there we can color and customize them as we wish, but they are fundamentally the same since they were cut from the same stencil. 

Likewise, frameworks like Ruby on Rails emphasize “convention over configuration”, which increases efficiency for developers and makes it easier to collaborate with others as well, because the foundation of the applications are the same. From there, it can be fully customized to meet specific needs.

Data Stores

A data store is a repository for continuously storing and managing collections of data. We can think of it as a physical library, but for digital objects in our context.

Solr –

Solr is an indexed based data storage used to power the search functionality of our Samvera applications. It’s quick because it allows for indexing records by ID as well as by other metadata, like ‘author’, which means a record can be linked to multiple pointers and referenced in many different ways. It answers the question: What items have metadata that match? And returns the ids to look up the actual data in a database.

Although this analogy is not as flexible as Solr, a Library Index Card is similar to how Solr works. Index cards contain metadata of a book you are trying to find in a library, but it is not the book itself. It has a numerical index which points to the actual location of the book, which is where a data store, like Fedora, comes in.

Fedora –

We can think of Fedora as the bookshelves in a library. It is used as the persistence layer and is where the actual content and its associated metadata are stored. Thanks to integration with a gem called Valkyrie though, Fedora is no longer a hard requirement for the Samvera Stack.

Valkyrie –

Valkyrie is a data mapper that is used with data stores. It’s a gem for enabling multiple backends for the storage of files and metadata in Samvera. In other words, it’s how we manage and save objects to a database. You may hear of another gem called Active Fedora, but there’s currently ongoing efforts to replace it with Valkyrie because it offers much more flexibility. Active Fedora is tightly coupled with Postgres or MySQL database drivers. It can only talk to one data source. Valkyrie can talk to various versions of Fedora and other storage engines as well. So we will likely continue seeing a phase out of Active Fedora in favor of Valkyrie.

Background Jobs

Because some web requests take a long time to process, we use background jobs (an asynchronous process that runs outside of an application’s request cycle).

In a real world Samvera-related example, if we’re bulk importing and processing 500 records in an application, we’re (probably) impatient! We don’t want to wait for the process to complete before we’re able to do something else. So bulk importing and processing would be the perfect tasks to hand off to a background job. We can continue working while the records process outside of our view, versus waiting around for what would feel like an unresponsive web page to load.

Some popular choices for this are: Delayed::Job and Sidekiq. (Sidekiq requires Redis to store all of its jobs and operational data, as shown in the side diagram).

User Interfaces

How our users interact with our app.

Blacklight – 

blacklight app

Blacklight is a user interface option maintained by Project Blacklight. While it was adopted as one of the core components of the Samvera Tech Stack, it pre-dates Samvera and therefore is used by those in and out of the Samvera community. The Blacklight gem provides our users with the ability to search and or browse the metadata that we’ve indexed in Solr, right out of the box. We can use facets to browse items that are grouped according to predefined metadata keywords like format, language and author; as we see in the image above. Or we can do an open ended search that returns results based on whatever term(s) we’ve searched for.

Although the out of the box solution, once configured with our own metadata standards, is fully deployable… it is also highly customizable. One of the most obvious changes is adding our own repository style guidelines so that our app reflects our larger brand identity. In addition to styling however, we have the ability to use several plugins. Some of these plugins allow us to tailor our apps for the specific type of metadata we have, which we’ll discuss next. While some plugins allow us to do things like add a gallery view or slide show view.

In maintaining our library analogy, Blacklight is like the catalog cabinet that holds a library’s index cards.

Spotlight – 

The Spotlight plugin is the first of 3 plugins we’ll discuss. It’s an extension of the Blacklight gem so it too offers full metadata searching and faceted browsing plus display capabilities. The difference between the two, as the name suggests, is that the Spotlight plugin allows us to create attractive, feature-rich websites that spotlight a particular digital collection. Similar to how museums have lots of items, but showcase them in groups. As a librarian, curator, cultural institution, etc., we can have a large dataset, but present a curated exhibit.

One of the most notable things about the gem is that it’s self serviced. That means that we don’t need a developer to update our interface. Instead of writing code, site administrators would have access to a dashboard that allows us to update the collection, navigation headers, search result behaviors and more through the use of forms, select boxes and other easy to use methods.

GeoBlacklight –

GeoBlacklight was created with the  goal of building a better way to discover, share, and access geospatial data. Which simply means data that has locational information tied to it like coordinates, an address, city, or ZIP code.

It is also based on Blacklight, but it extends the Blacklight functionality for geographical purposes, like adding map views of our search results.

 

ArcLight –

By now we’re catching on that each of these plugins are built on top of Blacklight. So just like Spotlight, and GeoBlacklight, ArcLight has its own niche. It’s specifically tailored to support the discovery and digital delivery of information in archives.

Each institution can have one or more repositories and each repository can have one or more collections. Like Indiana University’s Archive of African American Music and Culture. There are 3 collections in that repository, but this specific one is for Reclaiming the Right to Rock: Black Experiences in Rock Music Collection, 2008-2010.

Samvera Applications

Hyrax –

Hyrax is its own full stack application that allows us to better manage the content that you have in your digital repository, If you’re familiar at all with Sufia or CurationConcerns then a lot of how Hyrax operates will be familiar to you because Hyrax “descended”, if you will, from these two systems. It’s primed to be used with Fedora, Solr, Blacklight, a relational database and Rails of course. One benefit of starting your application with Hyrax instead of starting your application with rails is that Hyrax has a rich and growing set of features built in that are especially useful for repository owners.

For instance: Hyrax provides account administrators with a user friendly dashboard. That gives us the ability to create and edit user profiles, configure workflows, generate work types and work type images, upload multiple files and folders, set user level control over metadata and more.

Another advantage to using Hyrax is that it’s supported and maintained by the Samvera community. Therefore, as time goes on you receive the benefits of the upgrades, bug fixes and new features that come with Hyrax instead of having to maintain these things separately. However, it may not be the best solution for every project.

Hyku –

Hyku is what’s known as a solution bundle. A repository app that’s been bundled in such a way to deliver functionality for a specific set of use cases. The use case for Hyku is multi tenancy. It’s built on top of Hyrax so it comes with all of the features of Hyrax, but being multi tenant means that there’s a single repository owner that can create multiple Hyrax instances for that repository. For example, the Pennsylvania Academic Library Consortium, Inc. and the Private Academic Library Network of Indiana came together to form Hyku Commons and share a single Hyku application. Here we see 4, but eventually dozens of libraries that are a part of the “super consortia” will have their own fully customizable tenant under Hyku Commons.

In addition to multi tenancy, we also get  IIIF Image & Presentation API support, the Universal Viewer, and bulk import scripts. We also get greater customization options like adding fonts and custom CSS. While Hyrax alone is typically deployed locally and requires system administration and Rails development knowledge, Hyku is easier to deploy and maintain.

Hyku as a service is also something that vendors in the community provide. As a repository owner we wouldn’t need our own development team, but we would still get a lot of say in the development of the product. This may be especially useful for a library consortia. As of today there are two companies that provide Hyku as a service. Ubiquity Press provides Ubiquity Repositories and Notch8 provides Hyku Up,

Avalon –

Avalon is the second solution bundle provided by the Samvera community. Its use case is for managing and providing access to large collections of digital audio and video materials. While currently built on the Samvera core components, a future version will be built on Hyrax specifically, with this work resuming in early 2021. With version 7, released January 2020, came two new ways to explore and display collections, a new transcoding dashboard that provides administrators with a way to manage jobs directly within Avalon, a redesigned homepage, and easier configuration of featured collections that are displayed on that homepage.

In conclusion, we wanted to leave you with a visual image of the various technologies we just mentioned. We hope that you now have a better understanding of the pieces in the Samvera Tech Stack and how they all work together.

If you have any questions, feel feel to reach out to Notch8 at [email protected]!

Filed Under: Blog Tagged With: application, blacklight, data store, hyku, hyrax, Rails, samvera, user interface

Bug Hunting in Hyrax

October 25, 2020 by Bess Sadler Leave a Comment

I recently had to find a bug in a Hyrax application, and I thought it might be helpful if I documented the process.

1. Define the problem

I like to start a bug hunt with a clear description of what exactly I’m trying to solve. A great place to do this is in a ticket on the team’s board. In this case, the problem was that when new works were submitted to Hyrax via the create work form, the visibility was not being persisted. I find it helpful to write out the bug definition as a user story:

As a content contributor, when I submit a new work through the form, I want to be able to make it public so that it will be visible to the world. Instead, the work always ends up being marked as restricted / private. 

2. Write a test. Or several.

I wrote a test to check whether the issue was with visibility in general, or whether it was specific to form submission. One big question for me with this bug is whether the problem is with the form submission or the object creation process, so I first wrote a test to see what happens when we create an object without the form.

I like FactoryBot for writing tests, and in a Hyrax app I always make a Hyrax::UploadedFile factory like this:

FactoryBot.define do
  factory :uploaded_file, class: Hyrax::UploadedFile do
    file { Rack::Test::UploadedFile.new("#{::Rails.root}/spec/fixtures/image.jp2") }
    user_id { 1 }
  end
end

And here is my test to sanity check the actor stack, to ensure that, assuming all the parameters reach it, it will behave as expected. This test passed, which tells me that the problem isn’t in the actor stack:

require 'rails_helper'
include Warden::Test::Helpers

RSpec.describe 'Sanity check the actor stack', type: feature, js: false, clean: true do
  context 'a new object created without using the submission form' do
    let(:user) { FactoryBot.create(:user) }
    let(:article) { FactoryBot.build(:article, depositor: user.user_key, visibility: "open") }
    let(:uploaded_file) { FactoryBot.create(:uploaded_file, user_id: user.id)}
    let(:attributes_for_actor) { { uploaded_files: [uploaded_file.id] } }
    it "saves the expected visibility" do
        env = Hyrax::Actors::Environment.new(article, ::Ability.new(user), attributes_for_actor)
        Hyrax::CurationConcern.actor.create(env)
        expect(Article.count).to eq 1
        post_actor_stack_article = Article.last
        expect(post_actor_stack_article.visibility).to eq "open"
    end
  end
end

So, interesting: I haven’t been able to write a failing test for this bug yet. Let’s try again, this time using the form:

# Generated via
#  `rails generate hyrax:work Article`
require 'rails_helper'
include Warden::Test::Helpers

# NOTE: If you generated more than one work, you have to set "js: true"
RSpec.describe 'Create a Article', type: feature, js: true, clean: true do
  context 'a logged in user' do
    let(:user_attributes) do
      { email: '[email protected]' }
    end
    let(:user) do
      User.new(user_attributes) { |u| u.save(validate: false) }
    end
    let(:admin_set_id) { AdminSet.find_or_create_default_admin_set_id }
    let(:permission_template) { Hyrax::PermissionTemplate.find_or_create_by!(source_id: admin_set_id) }
    let(:workflow) { Sipity::Workflow.create!(active: true, name: 'test-workflow', permission_template: permission_template) }

    before do
      # Create a single action that can be taken
      Sipity::WorkflowAction.create!(name: 'submit', workflow: workflow)

      # Grant the user access to deposit into the admin set.
      Hyrax::PermissionTemplateAccess.create!(
        permission_template_id: permission_template.id,
        agent_type: 'user',
        agent_id: user.user_key,
        access: 'deposit'
      )
      login_as user
    end

    it do
      visit '/dashboard'
      click_link "Works"
      click_link "Add new work"

      # If you generate more than one work uncomment these lines
      choose "payload_concern", option: "Article"
      click_button "Create work"

      expect(page).to have_content "Add New Article"
      click_link "Files" # switch tab
      expect(page).to have_content "Add files"
      expect(page).to have_content "Add folder"
      within('span#addfiles') do
        attach_file("files[]", Rails.root.join('spec/fixtures/image.jp2'), visible: false)
        attach_file("files[]", Rails.root.join('spec/fixtures/jp2_fits.xml'), visible: false)
      end
      click_link "Descriptions" # switch tab
      fill_in('Title', with: 'My Test Work')
      fill_in('Creator', with: 'Doe, Jane')
      fill_in('Description', with: 'A brief description of my article')
      select('In Copyright', from: 'article_rights_statement')

      # With selenium and the chrome driver, focus remains on the
      # select box. Click outside the box so the next line can't find
      # its element
      find('body').click
      select('Article', from: 'article_resource_type')

      # With selenium and the chrome driver, focus remains on the
      # select box. Click outside the box so the next line can't find
      # its element
      find('body').click
      choose('article_visibility_open')
      expect(page).to have_content('Please note, making something visible to the world (i.e. marking this as Public) may be viewed as publishing which could impact your ability to')
      check('agreement')

      click_on('Save')
      expect(page).to have_content('My Test Work')
      expect(page).to have_content "Your files are being processed by Hyrax in the background."
      expect(Article.count).to eq 1
      article = Article.last
      expect(article.visibility).to eq "open"
    end
  end
end

If you do any Hyrax development, you’ll notice that this test is mostly the Hyrax-generated feature spec, with a few small tweaks. I had already been spending time ensuring that the feature specs were running for this project, so I had already spent some time tweaking this and I knew it was green. All I added were the last two lines:

      article = Article.last
      expect(article.visibility).to eq "open"

And this test FAILS! Hooray, a test that fails for the right reasons. I have isolated my bug and I now have clear criteria for what it means for this bug to be fixed, along with a check to ensure it doesn’t creep back in as a regression. Now, let’s actually find the bug.

3. The bug hunt

My primary tool on a bug hunt is the byebug command. I drop it into various places in the code, run my test, and see if I hit my byebug. Once I’m in my byebug console, I poke around to see what I can see. If everything looks normal, I move on.

First stop: WorksControllerBehavior

On our tour of what happens to the data after it is submitted via the Hyrax form, we pass briefly through our locally generated controller. We’ve just submitted an article, so the controller in question is app/controllers/hyrax/articles_controller.rb. However, upon examination it’s clear there isn’t much there. Our local controller gives us a place to put overrides that might be specific to that Work type, but unless we’ve customized it most of its behavior is going to come via an include:

    include Hyrax::WorksControllerBehavior

I’m looking for a create method into which to drop a byebug, so I go to find the copy of Hyrax::WorksControllerBehavior that my test is executing. In this case, because we use a docker container development envionment at Notch8, the process looks like this:

> [email protected] > docker-compose exec web bash
[email protected]:/data# bundle show hyrax
/usr/local/bundle/gems/hyrax-2.9.0
[email protected]:/data# cd /usr/local/bundle/gems/hyrax-2.9.0
[email protected]:/usr/local/bundle/gems/hyrax-2.9.0# vi app/controllers/concerns/hyrax/works_controller_behavior.rb

I find the create method, and drop a byebug in at the beginning. I run my test again and I see:

[51, 60] in /usr/local/bundle/gems/hyrax-2.9.0/wp-content/controllers/concerns/hyrax/works_controller_behavior.rb
   51:       build_form
   52:     end
   53:
   54:     def create
   55:      byebug
=> 56:       if actor.create(actor_environment)
   57:         after_create_response
   58:       else
   59:         respond_to do |wants|
   60:           wants.html do
(byebug) params
<ActionController::Parameters {"utf8"=>"✓", "article"=>{"title"=>["My Test Work"], "creator"=>["Doe, Jane"], "description"=>["A brief description of my article"], "resource_type"=>["", "Article"], "rights_statement"=>"http://rightsstatements.org/vocab/InC/1.0/", "abstract"=>"", "bibliographic_citation"=>[""], "contributor"=>[""], "date_created"=>[""], "date_issued"=>"", "extent"=>[""], "funder"=>[""], "funder_identifier"=>[""], "grant_award"=>[""], "grant_number"=>[""], "grant_uri"=>[""], "identifier"=>[""], "institution_organization"=>[""], "issue"=>"", "keyword"=>[""], "language"=>[""], "license"=>[""], "note"=>[""], "peer_review_status"=>"", "publisher"=>[""], "related_resource"=>[""], "rights_notes"=>[""], "school"=>[""], "source"=>[""], "subject"=>[""], "title_alternative"=>[""], "volume"=>"", "admin_set_id"=>"admin_set/default", "member_of_collection_ids"=>"", "visibility"=>"open", "visibility_during_embargo"=>"restricted", "embargo_release_date"=>"2020-10-24", "visibility_after_embargo"=>"open", "visibility_during_lease"=>"open", "lease_expiration_date"=>"2020-10-24", "visibility_after_lease"=>"restricted"}, "uploaded_files"=>["1", "2"], "new_group_name_skel"=>"Select a group", "new_group_permission_skel"=>"none", "new_user_name_skel"=>"", "new_user_permission_skel"=>"none", "agreement"=>"1", "locale"=>"en", "controller"=>"hyrax/articles", "action"=>"create"} permitted: false>

So, first question answered: The visibility IS being sent from the form, and at the time it enters the actor stack on line 56 visibility does indeed equal open.

Time to dive into the actor stack.

A quick tour of the actor stack

Here is a quick way to find out what actors your object will traverse in the actor stack:

(byebug) Hyrax::CurationConcern.actor
#<Hyrax::Actors::TransactionalRequest:0x0000557a92022148 @next_actor=#<Hyrax::Actors::OptimisticLockValidator:0x0000557a92022170 @next_actor=#<Hyrax::Actors::CreateWithRemoteFilesActor:0x0000557a92022198 @next_actor=#<Hyrax::Actors::CreateWithFilesActor:0x0000557a920221c0 @next_actor=#<Hyrax::Actors::CollectionsMembershipActor:0x0000557a920221e8 @next_actor=#<Hyrax::Actors::AddToWorkActor:0x0000557a92022210 @next_actor=#<Hyrax::Actors::AttachMembersActor:0x0000557a92022238 @next_actor=#<Hyrax::Actors::ApplyOrderActor:0x0000557a92022260 @next_actor=#<Hyrax::Actors::DefaultAdminSetActor:0x0000557a92022288 @next_actor=#<Hyrax::Actors::InterpretVisibilityActor:0x0000557a920222b0 @next_actor=#<Hyrax::Actors::TransferRequestActor:0x0000557a920222d8 @next_actor=#<Hyrax::Actors::ApplyPermissionTemplateActor:0x0000557a92022300 @next_actor=#<Hyrax::Actors::CleanupFileSetsActor:0x0000557a92022328 @next_actor=#<Hyrax::Actors::CleanupTrophiesActor:0x0000557a92022350 @next_actor=#<Hyrax::Actors::FeaturedWorkActor:0x0000557a92022378 @next_actor=#<Hyrax::Actors::ModelActor:0x0000557a920223a0 @next_actor=#<Hyrax::Actors::InitializeWorkflowActor:0x0000557a920223c8 @next_actor=#<Hyrax::Actors::Terminator:0x0000557a92022468>>>>>>>>>>>>>>>>>>

Let’s break that out into something easier to read:

  1. Hyrax::Actors::TransactionalRequest
  2. Hyrax::Actors::OptimisticLockValidator
  3. Hyrax::Actors::CreateWithRemoteFilesActor
  4. Hyrax::Actors::CreateWithFilesActor
  5. Hyrax::Actors::CollectionsMembershipActor
  6. Hyrax::Actors::AddToWorkActor
  7. Hyrax::Actors::AttachMembersActor
  8. Hyrax::Actors::ApplyOrderActor
  9. Hyrax::Actors::DefaultAdminSetActor
  10. Hyrax::Actors::InterpretVisibilityActor
  11. Hyrax::Actors::TransferRequestActor
  12. Hyrax::Actors::ApplyPermissionTemplateActor
  13. Hyrax::Actors::CleanupFileSetsActor
  14. Hyrax::Actors::CleanupTrophiesActor
  15. Hyrax::Actors::FeaturedWorkActor
  16. Hyrax::Actors::ModelActor
  17. Hyrax::Actors::InitializeWorkflowActor
  18. Hyrax::Actors::Terminator

I won’t go into all of those, and some of them are clearly not relevant to this use case. I’m going to drop into the first one that I think might be relevant for me, Hyrax::Actors::CreateWithRemoteFilesActor:

[11, 20] in /usr/local/bundle/gems/hyrax-2.9.0/wp-content/actors/hyrax/actors/create_with_remote_files_actor.rb
   11:     class CreateWithRemoteFilesActor < Hyrax::Actors::AbstractActor
   12:       # @param [Hyrax::Actors::Environment] env
   13:       # @return [Boolean] true if create was successful
   14:       def create(env)
   15:        byebug
=> 16:         remote_files = env.attributes.delete(:remote_files)
   17:         next_actor.create(env) && attach_files(env, remote_files)
   18:       end
   19:
   20:       # @param [Hyrax::Actors::Environment] env

When I examine the params at this point in the stack, I see something interesting: visibility is gone.

(byebug) env.attributes
{"title"=>["My Test Work"], "abstract"=>nil, "bibliographic_citation"=>[], "contributor"=>[], "creator"=>["Doe, Jane"], "date_created"=>[], "date_issued"=>nil, "description"=>["A brief description of my article"], "extent"=>[], "funder"=>[], "funder_identifier"=>[], "grant_award"=>[], "grant_number"=>[], "grant_uri"=>[], "identifier"=>[], "institution_organization"=>[], "issue"=>nil, "keyword"=>[], "language"=>[], "license"=>[], "note"=>[], "peer_review_status"=>nil, "publisher"=>[], "related_resource"=>[], "resource_type"=>["Article"], "rights_notes"=>[], "rights_statement"=>"http://rightsstatements.org/vocab/InC/1.0/", "school"=>[], "source"=>[], "subject"=>[], "title_alternative"=>[], "volume"=>nil, "remote_files"=>[], "uploaded_files"=>["1", "2"]}

When I examine the nascent object that the actor stack is acting upon, I see that it is mostly empty, and has the default visibility value, which is restricted:

(byebug) env.curation_concern
#<Article id: nil, head: [], tail: [], depositor: nil, title: [], date_uploaded: nil, date_modified: nil, state: nil, proxy_depositor: nil, on_behalf_of: nil, arkivo_checksum: nil, owner: nil, date_migrated: nil, format: [], bibliographic_citation: [], date_issued: nil, extent: [], institution_organization: [], note: [], related_resource: [], rights_notes: [], title_alternative: [], abstract: nil, funder: [], funder_identifier: [], grant_award: [], grant_number: [], grant_uri: [], issue: nil, peer_review_status: nil, school: [], source: [], volume: nil, label: nil, relative_path: nil, import_url: nil, resource_type: [], creator: [], contributor: [], description: [], keyword: [], license: [], rights_statement: [], publisher: [], date_created: [], subject: [], language: [], identifier: [], based_near: [], related_url: [], access_control_id: nil, representative_id: nil, thumbnail_id: nil, rendering_ids: [], admin_set_id: nil, embargo_id: nil, lease_id: nil>
(byebug) env.curation_concern.visibility
"restricted"

So, by the time we reach Hyrax::Actors::CreateWithRemoteFilesActor our “open” visibility value is missing, so it never gets applied to our skeleton curation_concern, which keeps its default visibility value of restricted. Hmm… I need to look eariler in the process. I do the same check in Hyrax::Actors::TransactionalRequest and see that visibility is also missing there. Where is my missing attribute?

Permitted params

The above got me far enough that I was able to ask a concrete question in the #hyrax channel on Samvera slack, where Tamsin Johnson pointed me to this place where params get sanitized:

[265, 274] in /usr/local/bundle/gems/hyrax-2.9.0/wp-content/controllers/concerns/hyrax/works_controller_behavior.rb
   265:
   266:       # Add uploaded_files to the parameters received by the actor.
   267:       def attributes_for_actor
   268:           byebug
   269:         raw_params = params[hash_key_for_curation_concern]
=> 270:         attributes = if raw_params
   271:                        work_form_service.form_class(curation_concern).model_attributes(raw_params)
   272:                      else
   273:                        {}
   274:                      end

And there, on line 271, is where visibility goes missing. But WHY????

So what’s happening here is that we’re using Hyrax::ArticleForm to define what the permitted parameters are to get submitted through the form:

(byebug) work_form_service.form_class(curation_concern)
Hyrax::ArticleForm

And if I go look at Hyrax::ArticleForm I see that self.terms has been redefined:

    self.terms = [
      :title,
      :abstract,
      :bibliographic_citation,
      ...

Instead of redefining self.terms here, the pattern I prefer would be to add any customized fields to it, like this example from the RepoCamp curriculum:

When I change Hyrax::ArticleForm to instead use a +=, thus keeping all of the fields that come pre-defined in Hyrax, my feature test now passes.

   self.terms += [
     :title,
     :abstract,
     :bibliographic_citation,
     ...

In summary

  1. Write your tests first
  2. Keep your end-to-end feature tests green when you’re doing development
  3. When you customize forms in Hyrax don’t re-define the terms, add to it and subtract from it as you like, but you probably don’t want to lose all the terms Hyrax gives you out of the box.

Filed Under: Blog

  • 1
  • 2
  • 3
  • 4
  • Next Page »

Capabilities & Services

Notch8 has an experienced team of web developers distributed on the west coast of the U.S. available for:

  • Design, Planning, and Architecting
  • iOS and Android Development with React Native
  • Code Audits and Reviews
  • Ruby on Rails and Javascript Development
  • Full Stack Development
  • Samvera / Hyku Data Preservation Solutions
  • Framework Upgrades
  • Monitoring and Support
  • Deployment Optimization and Containerization
  • Team
  • Contact Us
  • © Notch8 2017, some rights reserved

Copyright © 2021 · Parallax Pro Theme on Genesis Framework · WordPress · Log in